GridSearchCV Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.

Hi, everybody. Welcome back in this video. We're gonna build upon what we've learned in all of our supervised learning contents. Uh and talk about grid search CV, which is a nice little function for performing hyper parameter tuning with cross validation. So let me go ahead and share my Jupiter notebook and we'll get started. So in this notebook, we're gonna learn about grid search CV.
But more importantly, or maybe not more importantly, uh in this order, we'll talk about a synthetic regression example. As our motivating example for grid search CV, we'll refresh ourselves on what it means to hyper parameter tune using cross validation. We'll demonstrate what we could do uh by typing up a K fold loop with four uh A K fold four loop.
And then we'll show you how to compare that to what you would do if you want to do it with grid search CV. So over the course of all of our supervised learning content, uh so that includes um that includes regression classification time series analysis. We had uh we would, you know, most often talk about hyper parameter tuning. And I would say, oh, well, if you want to find the value that works best for your problem.
Uh You would need to do some sort of tuning with a grid search with cross validation. I might have said something like that. So when we say hyper parameters here, think about like the alpha term in ridge or last over aggression or even the number K and K nearest neighbors. Um And so the what this essentially means is you would form uh to tune these hyper parameters.
Uh You would set up a grid of potential values that goes from some minimum number. So for example, K equals one in K nearest neighbors up to some maximum number uh like K equals 50 in K nearest neighbors. This is just an example. And then you would uh go up and steady increments or uh scaling in powers of 10, say if you're doing hyper parameter tuning for uh ridge or lasso regression.
And so the idea here is then you'll run the model through the cross validation splits with all of these different grid point values, record the performance on the holdout set and then choose the one that has the best performance. Uh You know, in the case of regression, the lowest MS E. So we're gonna show you how to do this. Uh We've talked about it before, we're gonna demonstrate how to do it first with our old school K fold and a four loop.
And then we'll show you a nice quick and easy way to do it with grid search CV. Uh So I wanna take a moment to pause and say thank you to A T A that we had from our spring 2022 boot camp that showed me about this function and demonstrated to me how to use it. Uh I had heard about it before but I've kept forgetting about it. So I wanna thank uh Gleb uh A T A we had in the spring 2022 boot camp uh that pointed this out and provided some of the code that was the baseline for this notebook.
So we're gonna run a synthetic example. And what I mean by that is I'm going to generate some random data to set up a regression problem. And then we're going to use K fold. Uh We're gonna use a cross validation to tune the hyper parameters of A K nearest neighbor's regression model. And so this regression model, remember it takes the K nearest neighbors within the data space and then, then averages them to get the prediction for the regression in S K learn that's K neighbors regress.
And so for this, we're gonna look at the number uh for K which is an S K learn called N neighbors. And there's also an argument called weights. And so in K nearest neighbors, when we learned about it, we were using uniform weights, meaning each observation got the same weight in the average. Uh There's also an option called distance, which will weight them in the average in the inverse of the distance to the point.
So those points which are closest to the point you're trying to predict on uh will have a higher value in the weighted average than those that are further away. OK. So we're gonna set up a grid for the number of neighbors and then also a grid for the value of weights which can be either uniform or distance. OK. So here we just import the stuff we need.
So for the K fold method, we need first, the regress, then we need K fold. And then finally, this will just allow us to compute the mean squared error for the number of neighbors. Our grid is gonna go from 1 to 50 in steps of one. So we'll test it for K equals one, K equals 2345 and so on. For weights, we're gonna look at the effect of looking at uniform weights for each set of N neighbors or uh distance weights for each set of N neighbors.
So I'm not gonna make you sit through and watch me code up this four loop, but here's the four loop. So for the possible value for our number of neighbors, we're just gonna range from 1 to 51. Our weights will come from this list of either uniform or distance. This is gonna be uh an array that holds the output from our MS E so we have five because that's gonna be the number of splits we have for cross foundation, we have the length of N neighbors which will be the number of neighbors we test
and then we have the length of our weights which will be which weighting uh scheme we use. Then we make our K fold object and then we loop through the cross validation splits each time through the cross validation splits. We get our train train set and our holdout set. Then we loop through all the possible values for the number of neighbors and the K nearest neighbors regression.
Then we loop through all the possible waiting schemes that we're gonna consider for the K nearest neighbors regression. And then for each of those split values. So the split, the number of neighbors and the weighting scheme, we make a new K neighbors regress or object we fit that object on the training data. Then we get a prediction on the holdout data and we record the mean squared error on that holdout data.
OK. So now uh we can see this is the array that, that resulted in. OK. So it's a three dimensional array where each of the splits has uh the MS E on the holdout set for the number of neighbors and the waiting. So for instance, this would be the MS E for the first holdout set with one neighbor and uniform waiting. This would be for one neighbor and distance based waiting.
And then we want to get the average value across all of the splits. So you call N P mean CV MS E S axis equals zero. So this is what that looks like. And we can see as we for both weights, as we increase the number of neighbors, you start to uh improve performance. Now, this is gonna allow us to see where that minimum occurs. So we wanna find the minimum value among all of these averages.
Uh So what we need to do is we call N P mean uh dot org. And then this unravel index, I'll leave it to you to figure out what that does if you haven't seen it before. Uh This is just gonna look at this vector given this shape and then show you where this argument uh gives you the index for. So the entry of this two D array that has the minimum value is in the seventh row in the zeroth column.
OK. So this is the minimum average cross validation, MS E so 1.8577. So how many neighbors was that eight? And the waiting scheme that produced, that was the uniform waiting waiting scheme. OK. So this example was relatively straightforward. Uh If we've seen the other content in supervised learning, which I'm assuming you have if you're watching this video.
Uh So it involved the two dimensional grid search. So the two dimensions were either the number of neighbors or the waiting scheme, it wasn't too difficult to write up. It just involved 34 loops in total one for the cross validation, split, one for the number of neighbors and one for the waiting. But what if we wanted to do a more complicated model?
So we actually could have taught, tried more things for Kane's neighbors. We see that there's options for something called the algorithm options. Uh If we choose different algorithms options for leaf size, um et cetera, there's other neural uh there's other model types that we may have considered like a random forest regression which takes even more hyper parameters.
So at some point this explicitly writing out of a for loop will become tedious just to type it out and very difficult to try and debug because of the number of levels of four loops you have to keep track of. So luckily S K learn will provide a model. So uh provides a model selection object called grid search CV. So this grid search CV does exactly the thing that we just coded up by hand, but you have to code 04 loops.
So grid search CV is great uh for these sorts of hyper parameter tunings because it's going to allow you to look at a uh a wide array of different parameter hyper parameter values without ever having to write a floor loop to test the model on all those different um hyper parameter values. So we're gonna show you how to implement this in uh in Python.
So from S K learn dot model selection, you'll import grid search CV. And then when you make a grid search CV object, you first call grid search CV, then you put in an empty model objects for us. That's K neighbors regressive. OK. Note that you need to have the parentheses because it has to be an object, not the entire class. Next, you put in an argument called param grid.
And this param grid uh contains a dictionary. And in this dictionary, you put uh the string of the input that you'd like to change along with the grid that you would like it to consider. So for instance, we would do numb neighbors. Uh And again, I said it should be a string and neighbors colon range 1 51. So that's because we want the number of neighbors to range from 1 to 51.
Then we will do weights and then we want it to go between either uniform or uh distance after param grid. The next argument you put in is the scoring and this has to be a string. This is gonna look weird. But I trust me, we're gonna explain it in a second. So we're gonna put in uh neg mean squared error, we'll talk about why we use negative mean squared error in a little bit.
And then the last argument you can put in is an argument for the cross validation. So you can put in the number of splits you'd like here. So CV equals five will do five fold, cross validation. This is the default cross validation, uh number of splits that you have. So by default, you'll do five fold. If you want to do 10, you would do CV equals 10, we're gonna stick with five.
So after you've defined this and store it in a variable, you call fit in the exact same way that you did before. OK? So you call fit just like it's a model. And when I call this, what's gonna happen is it's going to go through and loop through all of the different hyper parameter pairings that are default from this grid fit a K neighbors regressive model on all of the different cross validation splits and then record the score for each of those splits and then tell us which one performed
best uh score, not score, but scoring. Sorry about that. The argument should be scoring, not score. OK? So once we've called fit, it's done the equivalent of that four loop series of four loops that we made up above and we can access the information we want from it in the following way. So for instance, you can find the parameter, the hyper parameter combination that gave you the best mean squared error like so, so you call the variable name of the cross validation search and you do
dot Best underscore paras underscore and it was for this model, uh number of neighbors was 10 and the weights was uniform. And so I am preemptively guessing some of you are wondering why is this different from what I got above? We'll talk about that in a second, you can get the negative mean squared error that goes along with this best param set.
And so this is best score, this is the average cross validation, MS E with a negative. So in this case, the best one is the largest one, right? Because negative numbers, we want the one that uh is small, like having the smallest magnitude means you're the largest negative number, right? So this was how you get specific things. What if you want to look at all the results?
So like a nice feature of this for loop, you might be thinking is we now have like an array that we can actually look at and plot things for. If we want to look at all the split performances, you can get the same thing here as well with the CV results argument. And so here it tells you how long it took to fit, what was the standard deviation of that fit time for that parameter?
What was the average time to get the score? Meaning to get the mean squared error? What's the standard deviation of that distribution? This is across all five uh cross validation splits here is the parameters that we looked at ... uh do, do, do parameters. Here is the scores for each of those parameter combinations on the first split, the second split, the third split, the fourth split and then the last split.
And then here is that mean test score along with the standard deviation of that test score. And then here they've ranked the scores for us from uh the worst performance to the to the best performance. So this one would reflect the performance that we're looking at right here. OK. Uh A nice thing is by default, it will take all the look at all of the models that you've considered and then fit the best one on the entire data set.
So what do I mean by entire data set notice here that we put in the training set when we wanted to call grid CV. So in that training set, uh it, after it found which one did best in cross validation, it then refit the model on the training set, which is what we fit into it. So you can get that by calling best estimator underscore. And so you can see that's a K neighbors regress or model with 10 neighbors, which you can then uh make predictions using the same exact thing.
So best estimator dot predict and X train. OK. So here are all the predictions for this model uh that's already been trained on the training set. OK. So before we end this notebook, uh I'm assuming that you have a couple of questions. So the first question I'm thinking you have is why did we use negative mean squared error instead of mean squared error?
So this is primarily because of the way that S K learn was programmed. So S K learn was programmed to have a score that is negative mean squared error and then not mean squared error. So this is just a choice that the developers of S K learn made. You can see all the potential scoring options by running this code. And so you can go through and see things like explained variants uh in our squared score.
I'm sure some of you are familiar with that uh negative mean absolute error, negative mean absolute percentage error. And the one that we use, which was negative means squared error. OK. So you can go through and look at all of these, they have them for both regression like above. And then down here, you can start to see things for classification like recall precision and so forth.
Another thing you may notice from above was that the best model ... was not the one that we had earlier. So the one we had earlier had eight neighbors and a uniform waiting here. It's 10 neighbors. We can see though that this performance is pretty close to the one that we got above with the uh just imagine putting a negative symbol on this uh pretty close to the one that we got by the hand coated um hand coated for loop.
And so probably the reason that this is happening is that grid search CV uh does not have the same cross validation split as the model that we did above with our four loop. Uh So what we can do to fix that is we can change the argument that we put into the CV. So instead of putting a five, we can put the same exact cross validation object that we wanna use.
So we'll copy here. And then if you put this in and paste, so now this will use this exact K fold object as opposed to just running its own in the background after you give it the number of splits. And so once this is done, we can see what were the best parameters, there were eight with uniform weightings and then we can also see what was the best score, ... same exact score that we have but with a negative out front.
OK. So this is a nice feature because by default grid search CV does not do stratified K fold. So if you wanted to do a stratified K fold split, you put like stratified here. Uh It also would not do by default the time series split. So if you want to do time series K fold cross validation, you can use that time series split here instead of just putting a number. OK. So you have a good base understanding. Now of grid search CV. This is a nice little feature of S K learn that should save you
coding time. Uh Now that you have a, a good, strong fundamental understanding of regular cross validation from coding up all those four loops. Uh So if you'd like to learn more, I encourage you to check out the documentation which I've linked to here. I hope you enjoy learning about grid search CV. I enjoyed having you watch this video and I can't wait to see you in the next video. Have a great rest of your day. Bye.