Basic Pipelines Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.

Hi, everybody. Welcome back in this video. We're gonna continue to learn about uh some cleaning steps with basic pipelines. So let me go ahead and share the Jupiter notebook and we'll get started. So in this notebook, we're going to introduce what the concept of a pipeline pipeline is. We'll review S K Len's pipeline object and then we'll show you in addition as a step of this show you how to fit a polynomial regression using a polynomial features transformer as well as a pipeline.
So what is a pipeline? So we've done a little bit of data. If you've watched all the videos in sequence up to this point throughout the regression content, we've done a little bit of data preprocessing uh including scaling which we learned about in the cleaning scaling video. Uh We had to create some new features out of existing features in the uh polynomial regression video, right?
We had to do uh squaress as well as interaction terms. We've had to do one hot encoding which creates dummy variables out of categorical variables. So the concept of a pipeline is a nice framework for combining all of those steps as well as fitting a model into a single container that we can call. So essentially the goal of the pipeline is to have something is stored in a variable where we can say pipe dot fit pipe dot predict.
And that makes our code easier to write and easier to read for someone who's coming along later as well as ourselves when we come back. So here's the visualization of this. So essentially we take our data, it goes into one end of the pipeline and then maybe it gets scaled, maybe some of the columns are transformed. Like we make uh an X one squared and an X one times an X two and a square root of X two.
Uh Maybe we make some one hot encodings, we do some other things and then we have a model at the end that gets fit and predicted. And then out the other side comes transformed data which was used to fit a model which we then use to make predictions. So predictions in a fit model come out on the other side. So we're going to implement this in Python uh using an example problem.
So this is going to be some data that we create. And along the way as we learn about this, we'll learn about polynomial features and the pipeline objects. So we have this data set where I'm gonna fit a polynomial to this. So here's why on my vertical axis, here's X on my horizontal axis And I'm gonna show you how we can build a pipeline to fit a regression model, a polynomial regression model to this data.
... So first we're gonna learn about polynomial features. So if we're gonna do polynomial regression, we're gonna have to make some polynomial features. Uh So that right, as we know from polynomial regression that includes making squares or cubes as well as interactions. So polynomial features is what is known as an S K learn transformer object, which are slightly similar to the scalar objects that we talked about in the cleaning uh data scaling video.
This takes in a two D numpy array and then we'll return a numpy array with columns corresponding to all of the relevant relevant polynomial transformations of a given degree. So essentially what we're saying is if we were to uh say to polynomial features that we want all the transformations of degree two and then we put, then X one and X two is our array.
So we have a column that's X one, a column, that's X two. What we should get back is a column that's X one, a column, that's X two, but then also a column of all the degree two transformations. So that's X one squared, X one times X two is also degree two. And then finally, X two squared. Uh So I want to point out that this is an illustration and that polynomial features itself may return it in a slightly different order.
But this is the idea in some order uh X one, X two, X one squared, X one times X two and X two would be returned by polynomial features. So polynomial features is also stored in the preprocessing sub package of S K. And so now we have it, uh we have it um imported. So I'm gonna go ahead and now make my polynomial features. So we call polynomial features.
The first argument has to be the degree of the polynomial. So we wanted it to be uh what's, what was our model above? We had X minus seven times X plus two times X. So we want our degree three polynomial, we want interaction only. So the interaction only term tells us whether or not we just want interaction terms uh returned which we do not. So we would say interaction only equals false.
And what the, what does this mean? So for instance, if we had interaction only equals true in this upstairs example here, uh what we would get back is X one X two and then X one times X two or maybe just X, one times X two, I'd have to try. But the interaction term here remember is X one times X two. And now finally, we have include bias and so this determines whether or not polynomial features should also return a column of ones.
Uh So here, since we're going to use linear regression, we know that linear regression fits for an intercept on its own. We don't need a column of one. So we're gonna do include bias is equal to false. ... OK. So now before we can use this to transform, there's a key transform any data we have to fit it. And so you might be wondering why do I have to fit this?
Uh Let me check, I think it's a lower case X Y. You may be wondering why do I have to fit a transformer? Well, the transformer needs to see get take in the data and let me not forget my reshape. Uh The transformer needs to take in the data, see how many columns it has and then determine what it needs to return. So for instance, we're going to put in a single column.
So it knows it only needs to return X X squared and X cubed because it's degree three. But if we had two columns as we did in this example where a degree was two, uh it would take in this array and then it would fit that it needs to return X one, X two, X one squared, X one times X two and then X two squared. So the fit step here for a polynomial features transformer object tells the object what columns it needs to return once it has the inputs.
... So now that we have the fit, we can transform it. So these transformer objects have the same arguments as a scaler So we have a fit a transform and a fit transform. Remember the rules that we first need to fit, then we can transform. And then if we're doing a train test split or a validation set or cross validation, we only ever use the transform on the test validation or holdout sets.
Never the fit. ... OK. So uh let me go ahead actually and add an extra step. Let's see what it looks like when we do poly dot transform and then X dot reshape negative 11. And so here, if we show off X X starts off with negative three and then it continues to go all the way up to three. And so what we see here is that we first have negative three and then negative three squared is nine and then negative three cubed is negative 27.
So the columns are returned are X X squared and X cubed. So now that we have polynomial features in mind and we know how they work, we can show how to implement this in a pipeline. So remember our steps here, we're trying to fit a polynomial regression to this. And so first we're going to apply polynomial features to X to get X X squared and X cubed.
And then once we have our features, we can fit a regression. So once we have X X squared and X cubed, we can say regress Y on those three variables. So our pipeline is gonna consist of two steps. And as a quick side, uh we import the pipeline object from the pipeline sub package of S K. So our pipeline wants to fit this model, uh beta zero plus beta one X plus beta two X squared plus beta three X cubed.
And so schematically what our pipeline needs to do is remember, data goes in on the left and models and prediction comes out on the right uh fit models. Uh We need to input the polynomial features object that we just kind of talked about up here and demonstrated. And then we need to put in linear regression because that's our model. So if we go back up to this picture, remember we do all the data cleaning and transformation stuff first, then we have the model as the last step.
So here our data cleaning is just the polynomial features and our model is linear regression. So I'm gonna show you the syntax for making a pipeline first, you call the word pipeline. Now you're going to input a list and within that list, you're going to have a two poles which are going to store two things. So here's our first tuple. The first thing you need to put in is a name.
So for the first step, our name is going to be poly because it's a polynomial features. Then you put a comma, then uh I like to put a space. Now you put in what you want the um step to be. So here we want the step to be polynomial features ... and we want degree three, we want interaction only equals to false. And we want to include bias equals to false. So that was our first step.
Then we can go back now. And now for the second step, the next thing we need to do is a linear regression. So our name for that's just gonna be reg comma. Then we put in linear regression ... copy X equals true. And let me just check to make sure that we've imported linear regression before this. ... Yeah, we imported linear aggression at the top. OK? ... All right.
... So now pipeline can go ahead and we can fit the pipeline. And so in order to fit actually, before we fit, let me make sure it's clear. Uh So we just do pipeline and then a list and then the items of your list are two poles where the first entry of the two pole is whatever you'd like to call that step. And the second entry is the actual S K learn an object, which is the step.
OK? So now we can fit the pipe. So we do pipe dot fit. We put in X first because that's the data. So the fit method here really works a lot more like a model fit. So we're thinking of pipe now as the model and so we can do pipe dot fit and then we put in our features first followed by our output. ... And now we can also use this. Remember I said pipe is like our model so we can do pipe dot predict X dot reshape negative 1 to 1.
And so here are our predictions. Uh And we could even go so far as to make a plot. ... And so I'm gonna go ahead and add in P L T dot plot. Let me go ahead and change the opacity here, P L T dot Plot. We'll just put in uh a line space from negative 3 to 3 with 100 steps. And then we'll do pipe dot predict N P dot Linn space negative 3 to 3 100. And then we need to do reshape ... and then let me make the color something we can actually see ... and then I'll make some line width, I'll make it a little bit wider.
... Oh And I wanted to go up to eight. ... OK. So here are our observations and here's our model that we fit with the pipe. Uh The nice thing about pipelines. So remember we gave all these things names, right? So we had polynomial features. We had regression. So pipelines can access things just like a dictionary would, so we can think of the names as keys and then the, the things within the pipeline as features or values.
Uh So we have pipe uh we can access the polynomial transform object made some polynomial features. And if we wanted to, we could then use transform on it X dot reshape negative 11. We can actually also access the regression ... and we can take a look at all. Right. Well, what are your coefficients? And so anything within a pipeline as long as you give it a name, you can access it just like you would access something stored in a dictionary with a key value and then a value.
... OK. All right. So that's gonna be it for pipelines. Remember the key they work. Uh You just put in a name, the features, you keep them stored in a list of two poles where the name comes first. And then the actual thing you want to do comes second. Uh They always have to end with a model. Uh And then we can think of data going in on the left and then predictions in a fitted model coming out on the right.
OK. So that's it for this video. I hope you enjoy learning about basic pipelines. Uh I hope to see you in the next video and I hope also that you have a great rest of your day. All right. Bye.