Multiple Linear Regression Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.

Hi, everybody. Welcome back in this video. We're gonna continue our work on regression with a uh an extension to simple linear regression called multiple linear regression. Let me go ahead and share my Jupiter notebook and we'll get started. So a lot of times you're gonna be dealing with problems where you want more than just a single feature to predict some output.
That's the whole idea behind multiple linear regression. So we're going to introduce the model, the theoretical model. Uh We will show you how to fit the model by hand with something called the normal equation. We'll use that equation to fit a sample model and numb pi. And then at the end, we'll show you what you'll probably end up doing most of the time, which is fitting the model with S Kyler.
... So in the multiple linear regression model, we're going to suppose that there is some quantitative variable that we want to predict or model or explain called why there's gonna be a set. Now of M features X one X two all the way up to X M and then multiple the uh the multiple linear regression model regressing Y on those features is Y is equal to beta zero plus beta, one times X one plus beta, two times X two plus dot dot dot plus beta to M times X M plus some random noise epsilon.
Now, we can write this uh a little bit more succinctly by doing X times beta plus epsilon where beta zero through beta M are real constants that we're gonna estimate stored in this M plus one times one vector. So a column vector of length M plus one uh those have the beta I listed in numerical order. So beta zero would be at the top, then beta one, then beta two and so on.
And then capital X is a matrix with M plus one columns. Uh The first column uh or we can think of it as the zeroth column is a all ones. And then the remaining columns are the X one through X M. And numerical order again, we assume epsilon is a normal zero sigma random variable. This is an error term. And again, we are assuming it's independent of the variable X.
So we're gonna, now we have this model. Uh How do we fit it in practice? So we go out into the world, we collect n observations of pairs X I and Y I where I is in the superscript uh in order to fit the model, we're again going to want to minimize the mean square error which for a reminder is one over N times the sum I equals one to N of the actual value minus the predicted value Y I hat uh squared.
So that's the square error part. And this is the mean part we can in, we can input what that actually is. So one over N times the sum of the actual value at I minus the predicted value at I, which is X I times beta hat. That's the estimate of beta that we're trying to optimize. Uh And then again, squared. And so using some linear algebra, you can rewrite it with, as this highlighted expression which we can then derive with respect to beta hat, we can take the derivative with respect to beta
hat. Uh And again, it's OK if you don't know how to take the derivative of this expression with respect to beta hat. But just know that that's what we do in order to find the beta that gives us the minimal MS E uh And then we set it equal to zero and solve for beta hat. So that gives us this equation that I'm highlighting over here. So beta hat is equal to the quantity X transpose X inverse uh times X transpose times Y.
So this approach uh just like it's the same exact thing that we did for simple linear regression. But now with more variables, uh this is the ordinary least squares estimate of the coefficient vector beta. And this equation that we've highlighted here is also known as the normal equation. So let's get back to baseball and then see how we can put this into action in practice.
So I'm loading in the data making my train test split. So we're gonna fit a multiple linear regression model using this data. Uh W is equal to beta zero plus beta one times R plus beta, two times R A plus epsilon. So if you don't remember from the last notebook, uh beta zero is just our co uh constant coefficient uh beta one times R. So R is the number of runs that a team scores in a baseball season.
R A is the number of runs that a team allows in a baseball season. And W is the number of wins that a team will get in that given season. So what we can do is that we will make our X. So this is going to be uh a uh an N by three matrix. So the first column of X will be all ones. That's why we use N P ones. And then the next two columns of X are going to be the runs and the runs allowed.
And then we just make a Y train to correspond with it to keep our um notation uh consistent. So we can now use numpy linear algebra package uh sub package to calculate the normal equation by hand. So beta hat is gonna be equal to, well, we need to do N P dot line oge uh dot In, in, I think let's check. ... OK. Yep. In so N P dot Lin Oge dot I N V is the inverse uh the matrix inverse and NPI dot line oge and then we will do xtra dot transpose.
... So we're using this normal equation here. Now we need to multiply it with X train. So to multiply matrices we do dot the dot operation or the dot function X train. Then this inverse which is X train uh X train transpose times X train inverse. We're gonna multiply that with just ... xtra transpose ... and then we multiply that with Y train. ... Yeah. ... So let's see here.
Oh I forgot the dot ... Here we go. And so here are estimates for beta zero and beta one and beta two hat. OK. So on average it basically what we're saying is we do 84 plus point oh +97 runs minus 970.1 oh one runs out. And so we can now make predictions using this right by just doing beta zero hats plus beta one hat uh times runs plus beta two hat times runs allowed.
And so then the, the MS E on the training set for this uh is 16.94543, et cetera, et cetera. OK. Uh So now we're gonna end this notebook by showing how simple uh this was coding it up using the normal equation and numb pi sort of by hand. Uh now we can just use linear regression that we use for simple linear regression. The same model object can be used here.
So uh this is the same model object. Why don't you pause the video and try and see if you can do the rest of the code on your own or you can just watch me do it. So from S can learn dot linear model, we import linear regression. ... We've made a linear regression model here. And now I'll point out that I included this argument fit intercept equals false.
So the reason that we put fit intercept equals false in this time where we haven't before is our training values for X now have a column of ones in them. And because it has a column of ones that's automatically gonna take care of the intercept for us. So we don't need to additionally fit the intercept that will mess up the model. So now we do rag dot Fit.
I'm gonna note that I do not have to use reshape here because X train is a two dimensional array. So we can take a second actually and look well, what's the shape of xtra? ... Uh This is a two dimensional array. So I don't have to do reshape. Remember reshape is only if our X is our features are one dimensional. So X train Y train ... and then I can get rid of this code chunk.
So here are the coefficients and so we can check beta zero hat is 84 point oh 8, 84 point oh eight. Beta one hat is point oh 97 point oh 97. And then beta two hat as negative 20.1 oh one, negative 10.1 oh one. So we can make predictions like we've been doing all along on the training set R dot Predict Xtra and then we can calculate the MS C. So remember in the last video, if you watched that one, I used mean squared error here, I bitch I just did it by hand for some reason.
Uh But here's what the prediction looks like. Uh should be the same 16.945 16.945. OK. So that's it for the multiple linear regression model. Uh We'll see some more extension on this that allow us to do things with categorical variables and, and et cetera. Uh So this is the most basic introduction to multiple linear regression. Uh It's basically the same exact model but now with more input variables and then we use the same process to get the estimate.
Uh But now we have more features and so we have to use linear algebra as opposed to just using straight calculus uh like with simple linear regression. OK. So that's it for this video. I hope you enjoyed learning about multiple linear regression. I will see you in the next video where we continue to learn about regression. All right. Bye. Have a great rest of your day.