Simple Linear Regression Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. Hi, everybody. Welcome back in this video. We start our regression content with the simplest model simple linear regression. Let me go ahead and share my Jupiter notebook and we can get started. So simple linear regression is gonna be our first uh predictive model. Our first model introduced, it's also gonna be our simplest regression model. So remember aggression repro uh a regression problem is one in which we have a set of features usually capital X in this notebook. It will be little X and then a set of outputs uh which are numeric and conti continuous again, not in the mathematical sense of continuity but continuous uh set of numeric features or numeric outputs. Uh So in this notebook, we will introduce the simple linear regression model. Uh We will discuss and visualize the assum assumptions of that model. I will demonstrate how to fit the model theoretically and practically, and then we'll define them. We will also along the way be defining a mean square error metric. OK. So the model uh for simple linear regression is very straightforward. So we're going to assume that we have some variable Y or outcome that we want to predict in this setting, we will only have a single feature X. Uh This is sort of just to get us used to the idea of how a model works and how to fit it. Uh And is a framework for more complicated regression models. Uh The form of F remember from our supervised learning framework for simple linear regression is as follows. So remember supervised learning framework notebook, we talked about, we assume that Y is equal to F of X plus epsilon where epsilon is some random noise F is the signal that X provides for A Y and simple linear regression. That's beta zero plus beta one X. So the equation for a line plus a random error term where here we're assuming that beta zero and beta one are real numbers which are just constants that we have to estimate. And that epsilon is a normal zero sigma. Uh So normally distributed a mean of zero standard deviation of sigma uh is the error term. And importantly, it's independent of the value of X. So no matter what X is uh epsilon is independent of that. So we can visualize this model nicely because we only have a single input. So we have both the systematic and the error terms visualized here. So the systematic part of the, of the model is the blue line Y is equal to beta zero plus beta one X. And then the error part of the model are these normal curves which uh you know, there should be close, you know, the idea is to make them look like a normal curve, but they are drawn by me. So they might not be exactly normal curves. Uh But the idea here is that for any observation, we go out into the world, you know, why represents some actual thing we care about. Also some actual thing we, we care about, we go out into the world and we draw an observation of these two variables. It will end up on this plot somewhere. Uh For a given value of X, we start of a baseline of this value uh on the blue line. And then some random error perturbs it off the baseline, either up or down depending on the value of epsilon. So essentially, we can think that for every value of X, this model is saying that we go to this line and then the corresponding value of Y is determined by taking a random noise term and then adding it to the line. So uh do, do do when these assumptions hold, there's a lot of nice features about the estimates and the predictions that we could make in the course of fitting on this model that may be touched upon. And either a problem session or uh this says homework, but it should say uh ... corresponding ... practice problems ... notebook. So I used to call them homework. So I've decided to call them practice problems instead. OK. So the idea here uh for fitting this model. So we have this nice model that, as I said, maybe has some nice properties that we'll talk about in either a problem session or a practice problems notebook. So we can fit the model uh in the following way. So let's say we have N observations of pairs X I Y I, how do we fit the model? How do we estimate the beta hats? So our goal remember is to call uh find an estimate of the systematic part uh which we're going to call F hat. So F hat as the estimate of F for simple linear regression, that means we have to estimate beta zero and beta one. Uh that means we have to find our beta zero hat and beta one hat. So how do we do that? How do we find the quote best estimate? Uh So the way we do that is by minimizing mean square error, which is also known as the MS E. So to find our estimates for these coefficients, we minimize a loss function or a cost function uh for a linear regression that is known as the mean square error, which is the square difference of uh the actual value and the estimated value. And then you sum all those up and then you average it. Uh So, uh in particular, that means it's one over N times the sum from I equals one to end of the actual value minus the estimated value squared. So we square it because otherwise the positive errors and the negative errors would um cancel each other out. And we don't use absolute values uh because the, the absolute values are not differentiable everywhere. So the MS E also represents the average squared error of the estimate from the actual value. Uh So for a measurement of the average error, so this, these units are gonna be squared, So it won't be in the same units as Y itself. Uh But one way you can get into units that are on the same scale as why is to just take the square root of the MS E, which is often done. Uh This is known as the root MS E or R MS E. So sometimes people are most interested in this uh because it's on the same scale. So you can kind of think like, well, on average, uh my estimates are seven units away from the uh actual value. So that's the idea behind the root mean square error. So if you do a little bit of calculus, uh we want to minimize this MS E or R MS E, we want to minimize it. So how can we minimize? Well, remember minimization happens in calculus. Uh So you can take the derivative of the MS E with respect to beta zero hats and beta one hat. And if you set those equal to zero and then solve for the corresponding coefficients beta zero hat and beta one hat. Uh You find this formula that minimizes the MS C as beta zero hat equals uh the average value of Y in your sample minus beta one hat times the average value of X in your sample. And then the formula for beta one hat is given as the covariance, the sample covariance between X and Y divided by the sample variants of X. ... OK. Uh So that's the formula we could go out and we could type it up on our own, but we're gonna see how to implement it and S K learn. Uh as a quick note, I did touch on this before but I just wanna clarify mean square error is used as a default loss function for simple linear regression. There's a lot of reasons for this that stem back to its roots as a statistical regression technique. So this was the formulation when we were learning about this years and years and years ago. Uh And it's followed through but importantly, uh MS E results in estimates of beta zero hat and beta one hat that have a lot of nice statistical properties. Uh The most important thing for, you know, purposes of mathematics is that MS E is differentiable with respect to both of the beta hats and it's a convex function. Uh So if you're unsure why that's important, check out the uh calculus notes on the calculus review sheet in the er website um MS E is not the only loss function that people may consider in these types of models. Uh So in the corresponding practice problems notebook, there is uh we talk about the mean absolute error, which is another one people may use. OK. So how do I implement S K learn or simple linear regression in S K learn? Uh As I said, you could use something like numb pi to calculate these on your own. Uh But it will probably be faster and good practice for us to see how we can do it in S K learn. So S K learn uses linear regression uh which is an extension of simple linear regression uh with the linear regression model object. And so here is the link to the documentation of it. And what we're going to see is there are an S K learn things called model objects. So a model object is essentially like uh people can think of it more of like a black box. Uh We know how it works, but in essence, we treat it like a black box. Um So we're gonna have this object, then we give it some training data to fit things like it will calculate beta zero hat and beta one hat using this training data. And then once we have a fitted model object uh one with an estimated beta one hat beta zero hat, uh We can then use that model object to quickly make predictions So here, I'm gonna make some random data. This is randomly generated data just to demonstrate how uh the model works. But in the next notebook, uh you'll see how to uh you know, implement the predictive modeling things that we've talked about. Train test splits, validation sets, cross validation uh in the next notebook with some real data. So here's my randomly generated data. Uh So X is just randomly uniformly taken uh from 01 and then Y is two times X plus one plus a random normal uh with a standard deviation of 10.5. OK. ... So as I said, S K Learn is the Python machine learning model package. Uh We're gonna use these a lot in our notebooks. A lot of these model objects will get used over and over again. So here we're going to learn the pattern that you will use when making uh an S K learn model object fitting it and then using it to make predictions. So the first thing you do is you import the model class. So as we saw from the documentation link, it's stored in the linear model sub package. So from S K learn dot linear model import linear regression, ... this will now allow us to make a linear regression model object. So how do we do that? Well, we just, we call S L R that's the variable that I'm gonna store it in is equal to linear regression. And then I'm gonna put in this argument, copy X is equal to true. So what this argument does is it makes sure that we make a hard copy of the X array uh the features uh prior to fitting the model, this is just a safety precaution to make sure that Python doesn't do anything weird and overrate the features that were there before uh by some sort of weirdness in the fitting process. I just like to do this to be safe. So this will make sure whatever my features are that go in, uh they're not gonna get altered at all by the fitting process. So uh here's what it looks like. So simple linear regression, a linear regression object, it looks like this. Um When you call it, OK, once you have imported the model and then made a model object stored in a variable, you can fit the model. So to fit the model, you call the model variable uh that we're calling S L R uh then we call fit. So we first put in our features. Uh So X now note for S K learn to work, uh your feature has to be a two D array. So we can think of this as we have to put in a column vector. Currently, if we look at X ... as we can see it's a one D array, and if I look at the shape, we can see it's a one D array. What we need to do is turn it into a two D array. So we're gonna use reshape. ... OK. So this is from, if you're unfamiliar with what reshape does or you're confused what we're doing here, check out the Python prep numpy notebook. I talk about re uh reshape in there. Uh But essentially reshape is just gonna take my one D array, which we can think of as a row vector. And essentially it will take it negative 11, it will take it and turn it into the corresponding two D column vector that we would get if we did like a transpose. OK. So here is X here is ... X dot reshape and then negative 11. OK. So 110.88 point 88 point oh 77 point oh 77 and so forth. OK. So the features always for linear regression and I believe most of the model objects always have to be a two D array. That's why we put this reshape and the output can be a one D array. Uh S K learn doesn't care. So we first put the features and then we put the output we now if we see this and no errors, that means we have successfully fit our model object and we're ready to make predictions. So we can do predictions by calling our model the word predict. And then we just have to put in an array of input values. So we could do X again. So this would be the predictions on X and here are all of our predictions. OK. So the first value of X is predicted to be 2.9. And again, this has to be a two D array. So we have to use that reshape for simple linear regression. So these are the basic steps for any S K learn model that we're gonna work with. Uh We'll see some extra steps that we might have to do for data preprocessing. Um But models typically have this form where you're gonna in S K learn, uh you're gonna import the model class, you make a model object, you fit the model objects. Uh making sure to coin to, to make your features two D arrays. If they're not two D arrays, again, this is something we have to do here because X is one D uh for simple linear aggression in future notebooks, we will be dealing with X that are already two D and we will not need to do reshape. I'll show you and I'll make sure to mention it at that time. OK. Uh And then you can predict uh the nice thing about a lot of these models is they also have built in stuff that are unique to the model that you're working with. So for instance, now that we have fit the linear regression, we can call and look at the intercept, ... right. So this is my estimate for beta zero beta zero hat beta one hat uh I can look at the estimate for beta one hat by calling S L R dot CO F underscore and you'll notice that this is an array in an array because as you may have guessed in future notebooks, we have models that have more than just a single beta one, no beta one, beta two. And then we can also uh you know, plot the line. And so here we're gonna use, take advantage of predict to uh plot the line from 0 to 1 for our model. OK. So these were the sample points. These were, this was the model. The black line is the, is the model line. OK. So our F hat uh and if we wanted to uh let's see, we remember that. Let me just take this down real quick. ... This was our, no, I didn't copy it, let me copy it. So I just want to plot the actual actual line with it. There we go and then I'll plot the actual line. And so we can do P L T dot Plot and P dot Linn space. So an evenly spaced array from 0 to 1 with 100 in uh individ uh equally spaced um points uh two times N P dot Linn space 01, 100 uh plus one. And then let's make this red and dotted and the label is uh F of X. OK. So here is our model, the black solid line and the red dotted line is our actual. So we're pretty close. It doesn't do as well up here as it does down here. OK. All right. So you now know the basics about simple linear regression. You know the theoretical model, you know how to fit the model. You know the assumptions behind the model, we learned about mean square error. You also know how to fit the model with a linear regression model object and S K. Uh So that's a lot of stuff for one video in the next video. If you continue on, in order we're going to see how to uh implement simple linear aggression as our first example of a predictive modeling problem. So we'll see all that stuff about train test splits that we just spent a lot of time talking about. We'll see it in action uh in the next video. All right, I hope you enjoyed this video. I hope to see you in that next video and I hope you have a great rest of your day. Bye.