k Nearest Neighbors Classifier Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. Hi, everybody. Welcome back in this notebook. We're gonna learn about our first classification algorithm, the K nearest neighbors classifier. Let me go ahead and share my Jupiter notebook and we'll get started. Oh ... OK. So in this notebook, we're gonna introduce Cannier neighbors classification. Uh We'll introduce also our first classification, performance metric accuracy and then we'll see the IRS data set for the first time. So how does kneer neighbors work? Well, cannier neighbors is pretty straightforward. Uh So you put in a point that you would like to predict on and we'll call this point X star. Uh We then find the K closest points to X star in the training set. These K points are called X stars, K nearest neighbors. Uh The categories of each of the nearest neighbors are tabulated. So for instance, it would count up the number of zeros, the number of ones. And if you have more than two categories, it would keep counting. Uh And then the category that receives the most votes is what's predicted for X star. And then if there's a tie between two or more categories, uh the prediction is chosen at random from the tide class. So for instance, if you have uh 22 neighbors that are of category zero and two neighbors that are category one, uh It would just randomly flip a coin between zero and one. This may be easier to understand with a picture. So in these pictures, the red circles are maybe class zero, the green triangles class one and then the X will denote the point at which we're trying to predict. And here K is equal to four uh to demonstrate the three possible outcomes. Uh So a ti zero being the majority class or one. So in this first picture, here's X star at this black X, uh it has its four nearest neighbors are the ones that are pointed to you with the lines. Uh And then um here, it's four nearest neighbors are all red circles. So we would say that this point is probably also a red circle. That's what K nearest neighbors would say. Uh And again, here is an observation X star is now here, the points that are highlighted with the lines uh are the four closest neighbors in the training set. Uh So here we've got three green triangles, one red circle. So green triangles are the majority class. Therefore, it is the uh one that we would predict for X. Here, we would say X star must be a green triangle. Here is, here is our final example, an image form. So we've got two red circles as our nearest neighbors, two green stars as well. This is a tie. So the computer or the algorithm would flip a coin and it would randomly choose either a red circle or a green star. So I want to point out that we in this example, were implicitly, I'm just using Euclidean distance. So like standard distance we learn in high school. Uh That formula, in principle, you can use any distance metric you would like uh there are different options in the S K learn um that we're gonna learn. Uh And then you could also use weighted votes. So we use equally weighted. So each a point here got 1/4 of a vote. But in practice, you can also implement voting where maybe you wait by distance. So the closer the point is to where you're trying to predict the, the higher the weight would be. So how do we implement K near neighbors and S Taylor? Well, we're gonna demonstrate this with the Iris data set. This is the data set that you can find at the U C Irvine machine learning repository. Each, each observation gives some uh measurements for a S or sorry for an Iris which has a sequel uh which is one Iris is pictured here to type of flower. So each observation hassle length, sle width, pedal length and pedal width as well as what class it is. So there are three possible classes here. Uh So in this example, we had two but cannier neighbors also works for three. Here's the Iris data from S K learn. So we've got S A length s a width, pedal length, pedal width and then the iris class. So we're gonna go ahead and make our train test split now. And here's the first five observations of the training set. So here's what the training data looks like. Uh So we have our seal length on the vertical axis, our sl width on the horizontal and we've got different markers for the different types of uh irises. So type zero uh are these blue circles, type one or these yellow triangles, orange triangles and type two are these green. Uh XS we could also plot um ... pedal width and pedal lengths. ... So this is what it looks like with pedal width and pedal length. Uh Let me go back to what I used to have with the seals. ... OK. So this is the data set that we're gonna use to implement K nearest neighbors. It's a very common data set that's used as a, a benchmark when you're trying to measure different classification algorithms as well as um a common learning data set. ... So the K nearest neighbors algorithm can be implemented in S K learn with K neighbors classifier. So we're gonna go from S K learn dot K uh dot neighbors, we import K neighbors classifier, then you can make the model object. So we call K and N is gonna be equal to K neighbors classifier. And then the number of neighbors is just the first number you input. As always. There are other arguments that we're not explicitly covering. For instance, there are arguments where you can change the weighting of votes to be uh not uniformly weighted as well as the distance metric is also an argument you can do, then we fit the model. So we're gonna do uh Iris train and then we do sequel length ... wiz ... pedal links, ... paddle width, ... and then comma Iris train dot species. And then we can predict on our training set as well. So here I'm just gonna copy and paste. Save myself the typing, not predict, ... do, do, do. Uh So it's not species. What is it Iris class. ... OK. And so here is the prediction for this cane s neighbors. Uh So one question now is how do we know if this is any good on the training set at least? Uh So there are a lot of different ways, the most common and maybe the default approach. I don't know if it's the most common in the real world, but one of the default approaches is just to use accuracy. So accuracy is the proportion of all predictions. So if you made 100 predictions, uh the the number of predictions out of 100 that you got, right. So uh it doesn't have to be 100. So essentially all you do is you count up the ones that you got right, and then divide by the total number of predictions you made. So here I've coded it by hand, but you can also use accuracy score from S K learn dot metrics that does the same thing. Um And you can import it without having to code it up every time. So the accuracy on the training set uh for this model uh is given as so, so we do accuracy, the true value which is Iris train dot Iris class dot values and then the predicted value which is given here. And actually, I'm just gonna store it. ... And then, so here I have a training accuracy of 98.3 repeating percent. Um So here we're getting a hard classification, right? So hard classification is 12 or zero. Uh In practice, it's often useful to have not just the hard classification, but also uh this algorithm can give you well, what is the probability that this is a one, a two or a zero? Uh So we can do that with Predict Proba. And after I demonstrate it, I'll go over how it determines these probabilities. And again, I'm gonna copy and paste to save myself some time. ... So what's returned when you call predict Proba is a co uh a NPI array of probabilities. Uh The zero column represents the probability that observation at whatever row you're at is of class zero. Uh This, the one column gives the probability that it's of class one and the three and the two column, Python indexing uh the two column is the probability that it's of class two. So here, for instance, in the first two observations, K nearest neighbors gives these observations of probability one of being a class one uh Iris. And so the way for can nearest neighbors, the way it assigns this probability is just by counting the votes. Uh So the first two observations must all have, have all five of their neighbors in the training set, um be equal to uh the class one. So we could see how this changed if we changed this. So if we go to 10, ... and we can see that for instance, the first observation changes from probability 1 to 80%. So eight of the 10 neighbors must be of class one. Uh And then two of the 10 neighbors must be of class two. So we go back to the five. ... OK. ... So that's it for k nearest neighbors. Uh Hopefully, we covered everything you might want to know about the most basic aspects of it. Uh So again, it works by looking at your k nearest neighbors in the training sets, uh then takes a, a vote or a uniform waiting uh of the different classes and chooses the one uh with the most uh votes. Um The one thing I'll point out is the nice thing about kneer s neighbors is you don't actually have a training period per se. So remember with linear regression, we have to estimate the beta hats or we have to estimate the betas A K, a fine beta hat in K nearest neighbors. It just records the training set. So it's not actually training any sort of parameters. It's not trying to fit anything. Uh It's just recording that. And then for each new prediction, it wants to make, it has to find those K nearest points. So for K nearest neighbors, the part of the algorithm that takes a long time is the prediction uh takes zero time to train. Other than however long it takes your computer to store uh store the training set which should be very short. Uh OK. So that's it for cannier neighbors. I hope you enjoyed learning our first classification algorithm. I enjoyed having you watch this video. Uh And I hope to see you in the next video. Have a great rest of your day. Bye.