Boosting Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. Hi, everybody. Welcome back in this video. We're gonna continue to learn about ensemble learning uh where we're gonna give a brief overview of what boosting algorithms are like uh and what the point is. So in this video, in this notebook, we're gonna show you another common ensemble learning approach um that involves compiling a large collection of weak learners or an ensemble of weak learners. Uh We'll introduce the concepts of weak learners and strong learners. I'm gonna give uh an overall description of a general description of what a boosting algorithm is. So boosting is a very powerful algorithm. It's one of the techniques that is utilized in a number of winning uh data competition entries on sites like Kale. Uh The theory behind them is based upon a um study uh uh a topic of study known as Pack learnability, which is a sub-field in statistical learning. So here, the uh acronym pack stands for probably approximately correct. And at that link, you can see a Wikipedia entry on what that means. Uh So we're gonna really touch lightly on what the theory give you an overview of what's going on. And then we'll show you a couple of actual algorithms in later videos and notebooks. So the idea of a weak learners uh is a statistical learning algorithm. So think learning algorithms are like regression or decision trees, these are learning algorithms. Uh So a weak learner is a statistical learning algorithm uh that does slightly better than random guessing. So for instance, in a binary classification model with say a 50 50 split, uh a weak learner would do maybe um 52 or 55% accuracy, whereas random guests should do about 50% accuracy, right. So that's the idea of a weak learner. It does slightly better than random guessing and some sort of probability theory statements uh that we're not going to cover. But you could co uh you could see it at that link. Uh A statistical learning algorithm is called a strong learner in contrast. Or by contrast, if you can make it as arbitrarily close to the true relationship as you would like um you know, under some training uh sample size conditions. So in general, making a weak learner is relatively easy. Uh The problem is making a strong learner, that's, that's pretty tough. But there's a theorem out there that shows that if you have a problem that is weak learnable, meaning that you can show and you know, prove the existence of a weak learner for that particular problem, then it must also be strong learnable. So if we can go through this process and show, hey, I can make something that does slightly better than random chance. Then that means that out there in the world there is some strong learner algorithm, we just need to figure, figure out what it is. And so the fact that this theorem exists led to the creation of a number of algorithms and an entire class of algorithms that tries to make strong learners. And this is the idea behind boosting. So in practice, a common uh weak learner algorithm as an example is known as a decision Trump a decision. Oh My gosh, a decision stump. There we go. Uh which is a decision tree with a single layer. So think of your decision tree that will just split the data space in in half in some sense, uh make a single cut that is a decision stump. And so what is this idea for boosting? Well boosting takes an ensemble of weak learners. And so we could think of this as some arrangement of decision stumps that is is different than a random forest but um some arrangement of decision stumps. Uh And then by combining this ensemble of weak learners, maybe somehow they're able to produce a strong learner uh as a as a leaving point in this sort of brief overview of the theory, weak learners are um boosting and it can be used for either regression or classification like all of our ensemble learning techniques. So for example, if we're gonna make a booster out of decision stumps. Um We can have these decision stumps be made for classification purposes or we can have them be made for regression purposes. So we're gonna cover two specific boosting algorithms. The first is called adaptive boosting, which we'll see uh in the later notebook. And then the second is called gradient boosting, which we will see in two later notebooks and videos. Uh gradient boosting has both an S K learn implementation and then a nice package called X G boost, which we will also cover. OK. So now you know what boosting is, you have a brief idea of the theory behind it. Um and I can't wait to see it in the videos where we actually show you how to do it. So I hope you enjoyed this video. I enjoyed having you here. Have a great rest of your day. Bye.