Tree-Based Forecasts Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.


11:29:18 Hi! Everybody! Welcome back in this video. We're going to learn about tree based forecasting methods.
11:29:24 So this is a video and a notebook that you should consume after learning about to decision trees and random forests, because otherwise the models won't make sense.
11:29:34 You haven't seen them before, so we're gonna go ahead and show you how you can use tree based methods to make forecasts and the way to do that is with auto regression.
11:29:45 So the idea of forecasting that decision. Trees and random forests like, how can we use those for forecasting?
11:29:51 Is, we're going to treat them like autoregression problems.
11:29:55 So we learned about auto regression when we talked about a Rema models or our models.
11:30:01 So the AR stood for auto regression. So autoregression is the act of regressing a series of data points onto itself.
11:30:09 So, for instance, we would regress yt onto yt minus one yt.
11:30:14 Minus 2 and yt. Minus 3. So in order to make an auto regression model, using a decision tree or a random regression model using a decision tree or a random forest, we have to pre-process slightly alter our time series, data so typically at a lot of our time series apps applications the data
11:30:29 we're getting is just a single series. Y and that, you know, this can be adjusted if we also have a matrix of features at each time. Step.
11:30:37 But we're going to present it as we only have a series y, so basically, if we want to fit a decision tree or a random forest I seem to fit on a an auto reception model, we have to take that time series.
11:30:50 Y sub t, and turn it into a pairing of x and y, so basically, what we need to do is we'll take our time series.
11:30:59 And here's an example where we have a time series of 10 steps as a training set.
11:31:03 This is our training set. So what we would do to turn this into a a, a data set that can be trained, that we can train a decision tree or a random forest on is in this particular example.
11:31:16 We're going to move a window of size 3 along the time series, so we'll do a size 3 window, which would give us 1, 2, 3, and then the Y observation that we would get for that is the fourth observation.
11:31:29 Then you move your window over one unit to go from 2 to 4, and then we're able to observe the fifth as our Y train.
11:31:37 And so basically, the idea here is our feature, are the previous 3 observations, and then the observation that we want to predict is the next observation in the time series.
11:31:49 And so this allows us to take our time series, which is maybe just a Y sub t, and then turned it into a problem with a features matrix and a Y vector that we can use to train a decision tree, or random forest again, if you already had a feature of corresponding top points, you could also
11:32:09 rearrange that to include into your new X strain, for instance, maybe the fourth and fifth column would be the first and second features of your original X-ray.
11:32:19 So let's go ahead and show you how you can do this sort of pre-processing with Pandas.
11:32:26 So we're going to use our Google training data. Our Google stock data, which we've seen a lot before in these notebooks.
11:32:32 And the way to quickly do this is with the shift function for pandas, data frames.
11:32:37 So shift will take in a positive integer n, and then move the values of the time series down that many units.
11:32:44 So here is the first 60'servations of our closing price or column.
11:32:50 And if we do, goog train dot closing Bryce, closing underscore price, and then we put in a shift, and then we put in a 3.
11:33:01 What's gonna happen is, and then let's do dot head of 6.
11:33:04 What's going to happen is we can notice the first 3 rows got shifted down 3 positions. Okay.
11:33:11 And so that's what shift does. So we can combine shift with lists and other numpy functions to quickly get a feature array that has the correspondingly shifted values so the way we're gonna do this in it's already coded out.
11:33:26 I'm just gonna explain the code. So we want a window size of size 3.
11:33:31 And so for every T that we're going to put into shift from one to and 3.
11:33:36 So that's what range of one and 4 is we're going to do Google trained dot closing price dot shift of T, okay, and then we're going to do a dot values reshape the dot values will take this Pandas series.
11:33:49 Turn it into a numpy array, and then that numpy array will get reshaped into a column vector or column. Array a 2D.
11:33:58 Array by the reshape, and then putting it in this list, comprehension will combine it as a list of 3 column vectors.
11:34:06 Okay, so this is the one where it was shifted. One unit down.
11:34:10 This is the one where it was shifted 2 units down, and this is the one where it was shifted 3 units down.
11:34:16 Now you might be wondering why we want to do that.
11:34:18 That will become clear in the next step. So now we've got a list of arrays that are column vectors.
11:34:24 We can then concatenate this list along the columns to give us numpy array that has 3 columns, and however many rows, and so we can see maybe what I'll do is.
11:34:39 Go through the fourth row are the sixth row, and all the columns, so we can see here if we go all the way back to the top.
11:34:48 This was our original Closing price series. We can see here that we've now got it so that it was shifted.
11:34:55 One unit down to units down 3 units down. And now we have the observation.
11:35:01 This is the first observation, the second and third, and so it's slightly flipped.
11:35:06 If I was being smarter, maybe I would have flipped this. So it went.
11:35:08 3, 2, 1, 4, 3, 2, and so forth. But here we've got observation 3.
11:35:14 Observation. 2. Observation, one. Observation for observation, 3.
11:35:18 Observation. 2. Observation. 5. Observation for observation. 3.
11:35:24 So we've got y t minus one y t minus 2 and yt minus 3.
11:35:33 That's after we ditch. We're going to get rid of these first 3 rows because they have missing values.
11:35:37 Okay, so now we've made our new X train so I'm going to store it in a vet in a numpy array that I'm calling X train.
11:35:46 And now to get the Y. Train. All I have to do is go from the fourth entry onward, which is 3 in python.
11:35:56 So now. Why, train has you know why so 4 is the 0 row y sub.
11:36:01 5 is the one row, and so forth. Now that we have x train and Y train, we can train both a decision tree and a random forest.
11:36:10 So we're gonna do from Sklearn dot tree.
11:36:15 We'll import decision tree.
11:36:20 Class of or regressor, not classifier regressor, and then from sk learn dot ensemble.
11:36:29 Well, import random forest regressor.
11:36:36 So now I have to make my models. So the first is the decision.
11:36:40 Pre regressor, and just for ex for demonstrations, I'm gonna set them to have a maximum depth of 5.
11:36:47 Then I'm gonna have random forest regressor with the maximum depth.
11:36:52 So max underscore depth equal Max under depth.
11:36:56 Now that these are defined I can fit them. So I just have to put X train.
11:37:01 Y train dot fit x underscore trained y underscore train so now I've got a a tree that's been fit, and a random forest that's been fit, and here I'm just gonna plot what that looks like so here are the fits.
11:37:19 These are the solid lines you can see. It fits the Time series very closely, and then to get the predictions for the forecast.
11:37:27 You just take the prediction from the most recent observation, and propagate it forward.
11:37:32 So a lot like the naive baseline model. Okay?
11:37:36 And so that's how we get this. We get the prediction on the the last prediction we made, and then we propagate it forward. Why is that?
11:37:44 Well, that's because that's the last 3 observations that we have right.
11:37:50 So the last 3 observations are, and N. Minus one, and N.
11:37:56 Minus 2. And that's just what we have. Okay.
11:38:02 So as you can see this might not be ideal, for when you have models that have trends, so what we can do is basically instead of modeling the timeries directly you can use a decision tree or a random forest to model the first differences, and then that will allow you to accommodate data with
11:38:23 trends. So here, remember, if you do, dot diff that gives you the first differences.
11:38:28 And so this is just a modification of the previous code.
11:38:32 We made to pre-process our data, but instead of making it y, 3 y, 2 y, one, y, 4.
11:38:39 Y 3, y 2! We're going to do the first differences at those time.
11:38:43 Steps instead. Then we're just gonna fit the random forest here.
11:38:48 We could do a decision tree. But we'll just do the random forests.
11:38:52 So again, our training data and and our training data has a features which are made up of the first differences, and then the thing we're predicting is also the first differences.
11:39:02 And so what we can do then to get the actual prediction for the time series, which is what we're interested in, we would do.
11:39:09 Rf fit dot predict not rf, rf, sorry about that.
11:39:16 Rf, dot predict X train and then plus train dot closing price.
11:39:28 And then.
11:39:32 We want the window size forward, I believe.
11:39:39 Let's just double check that I got that right.
11:39:45 Okay, I think we wanna go.
11:39:50 Cause. We did the first differences. That's right. Okay?
11:39:53 And so then to get the forecast. What you would do is the last predicted Delta.
11:40:03 Times, the length. So that's the length of test.
11:40:09 So you're just taking basically think back to when we did the naive model we're predicting what the coefficient should be.
11:40:18 So we're. This is the prediction of that coefficient.
11:40:21 This is propagating it forward and then we're gonna do plus goog train dot closing priced at negative one.
11:40:32 Okay, what did I do here?
11:40:37 So I need a dot values. And then actually, in this should be endp dot, arrange.
11:40:46 One plus one.
11:40:50 There we go!
11:40:58 Alright!
11:41:01 So, as I said, you could include other regressors if you'd like.
11:41:06 That can be in addition. So in addition to the you know, 3 to one or however many observations use, we just use 3 to make it easy for me to draw this picture, but you can use any number and you might want to fine-tune that with a hyperplameter tuning like
11:41:21 cross validation or a validation set, but you could include this extend this to Clude like this, the one column of X.
11:41:29 The 2 comm. Of x original X, and so forth. Okay, all right.
11:41:36 So that is going to be it. For this notebook. I hope you enjoyed learning about how you can make forecasts using decision trees and random forests.
11:41:45 I enjoyed teaching you about it, and I hope to see you next time.