Bagging and Pasting Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. Hi, everybody. Welcome back in this video. We continue to talk about ensemble learning in particular. We're gonna learn more about bagging and pasting. ... So we're gonna review the concept behind bagging and pasting bagging. We actually talked about a little bit in the random Forest video slash notebook. Um We're gonna expand upon that idea and then uh get more in detail about what it's going on. ... So the main difference between bagging and pasting, um these are both methods for randomly sampling subsets of your training set to train an ensemble of the same type of classifier on. So this is what we did for random forests in random forests. We would take a random sample of the training set with replacement and then we would uh train a bunch of D do that a bunch of different times and then train different decision trees on it. And that's where we get the forest from these randomly perturbed decision trees. So the doing it with replacement was called bootstrapping. Uh and it's called bagging. What sort of bagging is um doing it with replacement. And so we've recalled what we did in a random forest notebook. Let's be a little bit more clear. So bagging and pasting both refer to this process of producing a number of randomly selected subsets of the training data which are in turn used to train the same type of model or algorithm. So in a random forest, we had the same type which was a decision tree. In other more general bagging and pasting models, you can choose any type of base model. So for instance in this notebook, later, we'll make a bagging or a pasting model using K nearest neighbors as the base uh algorithm. So the difference between bagging and pasting specifically as a process is whether or not you do that random sampling of the training set with or without replacement. So in bagging, the samples of the training set are done with replacement, meaning that each time you choose a point for your subset, it goes back into the pile to be possibly be randomly selected again. And pasting is without replacement. Meaning that once you choose a point in your random sample, it's no longer in the data set. So you can uh it won't be able to be chosen again. A way to remember which one is, which is to remember the origins of the term bagging. So banging is short for bootstrap aggregating and I've made all of the letters that form bagging capital here. And so why is it called bootstrap or the sample of subs of uh sampling with replacement from a data set that's called bootstrap. And the aggregating is the part where you're aggregating. A bunch of different models trained on these very different training sets. Uh When you want to use bagging S K, learn, you set the bootstrap equals true argument uh to true. Uh So for instance, in random forest or in extra trees, you may set bootstrap equal to true or bootstrap equal to false. When bootstrap is true, you're going to be doing the random sampling with replacement. And so bootstrap equals true is bagging because what does bagging stand for bootstrap aggregating? So if the bootstrap is true, you must be doing bagging if you want sampling with replacement, uh you want bootstrap equals true. If you want sampling without replacements, you choose bootstrap equals false. That is pasting. So if it's not bootstrap, it must be pasting. Um That's the way to remember it. So bagging and pasting, uh we saw bagging being applied to random forests uh with decision trees uh pasting, uh bagging and pasting can be applied to any kind of supervised learning algorithm. Uh But the training time of the algorithm does limit how many models you can realistically use in the ensemble. So for instance, if a single one of your algorithms takes five minutes to train. If you were to try and train 100 of them, it could take up to 500 minutes unless you can find a way to train all of them in parallel, which may be possible for uh some people, but it may not be possible for others depending on your hardware. So in S K learn the general bagging classifier, we're gonna work on classification but it can be used for regression. Um Just like random forest can be used for regression even though we demonstrated them as classifiers. Uh And S can learn uh you can use bagging or pasting with the bagging classifier for classification bagging regression for regression. Uh So the bagging classifier uh this is the base classifier object for both pasting and bagging models. The key difference being pasting is when you set bootstrap argument equal to false and uh boosting or bagging. Sorry, bagging is when you set it equal to true. Here's a link to the documentation. So we're gonna demonstrate this with the base model for Kar's neighbors on this uh synthetic data set that we've used for a lot of our classification notebooks where we have uh blue circles up above which are zeros and orange triangles down below which are ones. So we're going to import our model objects here. So from, we're gonna first import the base model which is K nearest neighbors from S K learn dot neighbors. We import K ... uh K neighbors trying to remember what it's called K neighbors classifier. Uh And then we also need to import from S T learn dot ensemble ... import um bagging classifier. OK. So now we're gonna show you how to set up a bagging classifier. So you first call bagging classifier and then we set the uh the base model base estimator. Uh So this is essentially if this was a random forest, this would be decision tree classifier. In this example, we're in a UK neighbors classifier. And then down here, we're comparing it to a single K nearest neighbors with four. So we put four here. Now this is gonna be bagging. So we wanna set bootstrap equals to true. We want to set the number of estimators is how many models we fit. So why don't we set that equal to 100 ... maximum samples is the number of training points sampled. And if we go up, we can see that we had 200 here. So why don't we do? Uh max samples equals to 100. OK. Uh The largest this can be is uh 200 I believe. Um The size of the training set. Uh That is definitely the case for pasting, right? Because if you're sampling without replacement, once you've sampled the entire training set, there's nothing left to sample. OK. So now for the paster, we're gonna copy and paste, we're gonna copy and paste everything. Uh And the only thing we have to change is bootstrap from true to false. So if bootstrap is false, the sampling is done without replacement. So what this is gonna do is it's gonna train 100 K s nearest neighbors K nearest neighbors classifiers uh with four neighbors um for each of these two, and then it's going to train them on different training sets. Uh where the training set is gonna be a sub sale of the, of the original training set. Uh In the bagging case, that sample is done with replacement, meaning there can be repeats. Uh And in the pasting case, that is done without replacement, meaning that there are no repeats ... And then what we've done here and this code chunk that's gonna run is we're gonna go through and fit and then predict on all of them and uh print out the decision boundary for all three and, and plot it. OK. So here is the k nearest neighbors accuracy for uh for just a single key nearest neighbors. ... Uh Here's what it looks like for the bagger and here's what it looks like for the paster. Now, you might be saying, well, hang on a second matt, the single one did perform better. I mean, that can happen one single one can perform better than all of them. Um But we can also see that um for instance, both the bagger and the paster pick up on this one up here in the upper right hand corner, which does not happen with the cane snipers right now, on the other side of the coin, both the bagger and the paster miss on the bottom left hand corner where the Caner snipers does appear to pick up at least a couple of them, we could change this. Uh We could alter the performance if we change the number of samples we consider. So what if instead of a 100 we did uh 80 let's see what happens. Um And so once we do 80 we see that does affect the accuracy as well as the boundaries. OK. And then we could even go up, what if we went up instead of uh down from 100 and we did um 1 25. ... OK. And we can see now they're, now they both perform worse than they did before. So let's go back, we'll go back to our 100. But this MAC samples might be something that you try to optimize with uh cross validation. The number of estimators could also impact it so we could do 500 instead of 100 and see how that impacts it here. It takes longer to train, right? Because we're training more instances of the nearest neighbors. All right. Um Let's go back to the original code. Um just demonstrating that these are different things that you have at your disposal for. Um coming up with different models and trying to do cross validation um uh can help. OK. So now you might be asking why do we want to use bagging or pasting? If you know, in this one specific, it doesn't look like it does as well as the base um model. Uh Well, one this, you're not always gonna end up with this example, right. So there are gonna be examples where bagging and pasting does better and why does it do better? Well, it's because it can introduce bias into the model which allows you uh through this random selection, it introduces bias. How does it introduce bias? Uh Well, when you do a random subs sample of the training set, uh the outliers or the extremes of the data set are more likely to not be included in the subs sample. Why? Because usually these extremes or these outliers are rare occurrences uh for instance, maybe a handful of them out of a very large data set. And so when you do a random subs sample, uh more likely. So with replacement than without replacement, when you do a random subs sample, you're less likely to include those points in the data set that the base estimator gets trained on for bagging or pasting. And so that means that the base models are less likely to over fit on maybe those outliers, which is what we're gonna see in a second uh than just a single model on its own using all of the training data. OK. So here, I've re kind of redone that model where maybe there is um maybe there is a uh uh a couple outliers or there are some measurement errors where some of the blue zeroes were sent over here and some of the orange ones were sent over here. And now hopefully, what we'll see if the code runs the way I think it should run. Uh, what we'll see is that, uh, this, the bootstrap or the bagger and the paster will pick up on these down here, uh, while the regular single model will not, will not. ... Ok. So just remaking fresh model objects and then doing the same thing. Ok. So you can see, for instance, here, um you know, we still have somehow, still have worse performance, I think because we're misclassifying a few more of the blues than the cannier neighbors does. Um But when these outliers over here are picked up on by the can nearest neighbors. When we do sort of the random sampling of a subset, they're less likely to be included in one of our subs samples, meaning that more of the can nearest neighbors are trained on data that don't include these blue points down here. And so the uh border is correctly determined or the um prediction is more or less correctly determined or less likely, maybe it's better to say less likely to be fit on over fit to the data. So again, introducing that bias maybe can in a problem that isn't this one, uh can maybe improve the performance overall. So in general, uh as a rule of thumb, people tend to use bagging as a default. Uh mainly because you're doing better at introducing the bias when you sample with replacement. So when you sample with replacement each time you pick a point for your training set for one of your base estimators, um you're having the original probability of picking that point at each step. Whereas without replacement, once you start to remove so many points, you're more and more likely to pick up the outliers. So bagging introduces more bias than pasting. Um And again, uh what is the main reason that people do this? One main reason is in order for this to be effective, pasting needs a very large data set, which isn't always something that you can get. Uh So with smaller data sets, the random sample tends to be the same across your estimators with pasting. Whereas that's not the case with bagging, right? Because you are picking with replacement. OK. So if you have a very large data set or what you think is a large data set, you might wanna try pasting and then do a cross validation to compare pasting and bagging to see which is better. Uh But if you don't, you might want to just use bagging as your default. And I did say at the beginning that, you know, even though we showed this uh with classification, because I think it's easier to visualize and talk about. Uh you can also do this with regression where you'll choose a base regress and then the prediction for a particular value of X with either bagging or pasting will be to average all of the bagging base model predictions. So for instance, maybe you train 100 linear regressions, you then get the prediction for all 100 of them. And then your prediction for the single bagging model will be the average value of all of the base regression models. So hopefully, that made sense. Um If not, you can always try it out on your own with the bagging regress uh and see how it works for you. OK. So that's gonna be it for this video and that's it for bagging and pasting, which is really what random forests are just a very specialized version of this uh that people use a lot. And so it gets covered as its own model. But bagging and pasting is really just a random forest. But instead of a decision tree, we use uh uh some other kind of model like cannes neighbors. So I don't know if can near neighbors became popular. Maybe we would call it like the K neighborhoods model or something. I don't know. Anyway, I hope you enjoyed this video. I enjoyed making this video and having you watch it and I can't wait to see you next time. Have a great rest of your day. Bye.