Baseline Forecasts Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.


11:09:33 Hi! Everybody! Welcome back in this video. We'll be looking at baseline forecasts for time series.
11:09:41 So remember, in our regression content. We talked an awful lot about having a baseline.
11:09:45 You want to have a baseline to compare to, so you can tell if you're model is any good.
11:09:49 So with time, series forecast, we're going to look at a series of baselines that you might want to use depending on the features of your time series.
11:09:57 So throughout the notebook, we're going to assume that Y sub t is a time series the training set has little, and observations, and that we are interested in predicting observations at times that are past time step and we'll be working with 2 data sets depending on the type.
11:10:16 Of time series. We're interested in. The first is the Google Parent Company, closing stock price for each trading day from August nineteenth, 2,004 to March 20 fifth, 2,022 this data.
11:10:29 Set was found with Yahoo finance, the second are weekly seasonal influenza case counts for the United States from 1931 to 1950, and this was found on Project Tyco.
11:10:40 So the first types of forecasts baseline forecasts we're going to look at are for time series that do not exhibit a trend and do not exhibit seasonality.
11:10:52 So for this we are going to use the Google Price Time series.
11:10:58 I will point out this very clearly, does exhibit a trend. But this is just so.
11:11:00 We don't have to entice a third time series data set for these lectures.
11:11:06 We're going to use this. It does have a trend.
11:11:08 So. Not probably not going to be the best forecast. But we'll see how we do, so I'm gonna make a train test split.
11:11:16 And for this my training set is going to be all the trading days.
11:11:20 Minus the last 14 days. So, my test set will be those last 14 days.
11:11:25 The first baseline forecast is an average forecast.
11:11:28 So this basically just assumes that there is no pattern in the time series that the next value is just going to be the average value plus some random noise.
11:11:42 So, to estimate this, we would take the average across the observed training settings.
11:11:47 So you add up all the wise divide by N.
11:11:51 And then this is this is your average. This is your average forecast.
11:11:55 Now for this estimate to be any good, you would be assuming that the y sub t are independent and identically distributed estimate here being of the expected value.
11:12:05 So the statistical model that underlies this is known as a white noise model.
11:12:10 So white noise is basically what you're saying is a completely random process where the previous time step has no impact.
11:12:19 On the next time, step, and it's just drawing from a random distribution so let's go ahead and make this average prediction.
11:12:25 So the first thing we need to do is just get the average.
11:12:28 So this is used using the training set. We do underscore train dot closing price dot mean.
11:12:38 So now we have the average and now, in order to get the fit in the forecast, you would just take this average and multiply it by the length.
11:12:47 So here, what we're going to do in this plot is we're going to plot the training sets plot.
11:12:55 The test set, and then plot the prediction. And so actually, I do a multiply this by a vector of ones.
11:13:04 The length of the test set. Okay, so here you can see, because we very clearly have a trend that's increasing over time.
11:13:13 Our average is not very good, not a very good forecast.
11:13:16 But here we've got our forecast. So the blue is the training data.
11:13:20 The red solid is the test data. And then our red dotted line is the average prediction.
11:13:24 The next baseline forecast for data without trend and and without seasonality, is known as the naive forecast.
11:13:31 And so this model, you just basically are going to predict the previous time step.
11:13:37 So the next observation is just going to be the previous observation, plus some random noise.
11:13:43 The underlying statistical model. Here is known as a random walk.
11:13:47 So remember in a random walk. So remember in a random walk, so remember in a random walk. So remember in a random walk, you have some starting position where C is an element of the reals.
11:13:52 It could be a different. It could be a complex, random walk.
11:13:55 But for us we're in the real world. And so then the observation at the next time step is just the the current time step plus a random, variable.
11:14:05 So to make our naive prediction for the forecast.
11:14:07 We're just going to take the last option of the closing price and propagate it.
11:14:14 So we're going to do Google underscore.
11:14:16 Naive is equal to dot closing price. Dot values at negative one, and then again, times a vector.
11:14:25 Of ones. That is the length of the test set.
11:14:32 Okay. And so here we have our training data along with our test data.
11:14:36 This is a much better baseline for the this particular data set so much better baseline for this particular data set than the average.
11:14:48 Okay, so what if, like this Google train set that does have at least a long term trend of increasing?
11:14:57 We have a data set a time series with the trend.
11:15:00 So we're just going to do trend externions of the previous models.
11:15:04 So the first is a trend for cast. This is essentially just doing a regression on time and then plus some random noise.
11:15:13 And so here the underlying statistical model is very similar.
11:15:18 To white noise, but unlike white noise, where we just assume that it's the expected value.
11:15:22 Which is independent of time. Here we're assuming our expected value is dependent on time and takes this linear form.
11:15:30 Now, if we wanted to get fancy, or we could change this to be a non-linear model like an exponential or a logarithmic, or something like that.
11:15:38 But for us in this very simple, baseline, we're going to use the linear model beta 0 plus beta one t to fit this.
11:15:44 We're just gonna use our linear regression from Sk, learn.
11:15:48 So from sk, learn dot linear model, we'll import linear regression.
11:15:56 Then we make our model object. So Reg is equal to linear regression.
11:16:02 Then we're gonna fit the model. So here, you're going to for your features.
11:16:07 Reg dot fit. Our features are going to be just the time steps.
11:16:12 So Np. Dot arrange from one to the length of train.
11:16:20 Plus one. We'll need to reshape that.
11:16:26 Then we're gonna do the training set, closing prices.
11:16:30 So Google chain, dot closing prices.
11:16:35 And then to get our prediction, we're going to do Reg, predict.
11:16:39 And then for our input here, I'm going to do a quick copy and paste and do some quick, do some editing.
11:16:45 So we're gonna start at the length of the training set.
11:16:50 Plus one, because we want the next time step. And then here we're gonna just add in the length of the test set.
11:17:00 And once again we are going to need the reshape negative 1 1.
11:17:07 Okay. So now, we can plot this prediction. You can see here that it doesn't do great.
11:17:14 Y, so what we can once again look at the original time series.
11:17:18 This isn't quite a straight line. So for a lot of time the increase is very slow, but then it gets very rapid.
11:17:24 In the past few years. So some a different type of underlying trend model might be better.
11:17:30 But for this very simple baseline, you just use the linear regression.
11:17:34 The next trend extension trend, baseline as an extension of the naive model.
11:17:38 It's called a random walk with drift. So this model takes the previous point and then adds a coefficient times.
11:17:48 The number of time steps since the last observation, plus random noise.
11:17:54 So this is just a random walk, but now you've added this driift term, which means that each time step is a little bit further away from the previous one.
11:18:03 So you're either increasing or decreasing to get an estimate for this coefficient you just are going to use the first differences.
11:18:10 First, the difference is our calculating. The next time step minus the current time step.
11:18:16 So y 2 minus y one, y, 3, minus y, 2, y 4.
11:18:21 Minus y, 3, and so forth, and we can quickly calculate this, using the diff function for a pandas data frame for a pandas data frame. So we can demonstrate this diff function.
11:18:34 So train dot closing price dot diff. Okay?
11:18:37 And so what this does is, maybe it also helps to look at.
11:18:40 Just train dot closing price. And let's look at the first 5 and the same here.
11:18:50 So here we have the first 5 observations, and then we can see that for the diff, the data series you're getting this by taking this observation minus the previous one.
11:19:03 Was what gives you the 3.9 7 then the next one is 54, minus 53.9, and so forth.
11:19:10 You all notice that there is no 0 entry? Why is that?
11:19:13 Because nothing occurs before the 0 row. So how do we use this to get the estimate of Beta?
11:19:20 You just take the average of the differences, because if we look at the model, the difference between consecutive time steps should just be beta, or the difference of the random noises.
11:19:29 So the estimate Beta, we just take the mean. So we're gonna do Google train dot closing price dot diff dot mean.
11:19:42 And there we go.
11:19:46 So the next model type that we're gonna look at are for data that has seasonality.
11:19:53 But no trend, and for this we're going to use that seasonal influenza data.
11:19:58 So seasonal influenza has a season which is about a year long, or 52 weeks, so this is weekly data so this is the weekly number of influenza cases throughout the entire United States from 1928 to 1940.
11:20:13 8. So for us, we're gonna imagine that a period here is about 52 year or 52 weeks.
11:20:17 So every 52 weeks the flu season starts over. This isn't 100% correct in actuality.
11:20:24 But we're going to use this as a simplifying assumption for these baseline forecasts.
11:20:28 Okay. So for us, what we're gonna do is we're gonna settle aside this very last flu season as our test sets, and then use the rest as our training set, I should point out that in practice, what you'd want to do.
11:20:43 And I think I forgot to mention this at the beginning. You usually aren't going to just look at the test set right away right.
11:20:49 You're either gonna use a validation set or a cross validation.
11:20:54 We're just looking at the training set and the test set just to save time.
11:20:59 In practice. If you're so basically what these 2 baselines for the seasonal data without a trend are going to be are just seasonal extensions of the original baseline.
11:21:15 So the first one is the seasonal average forecast. So the seasonal average forecast is you're going to assume that a.
11:21:21 Season has a period of steps, so it takes M.
11:21:26 Steps to make up an entire season, so for us, that'd be 52 weeks, and for the seasonal average forecast you're going to take for the first step in the season.
11:21:37 You're going to assume. It is the average of the previous first steps across all the seasons.
11:21:43 You have data for for the second step, you'd assume, it's the average of all the previous second steps that you have data for.
11:21:48 And so forth. So basically you're breaking your time series into seasonal chunks.
11:21:53 And then for the forecast you're averaging, based upon what step you are in the season.
11:22:00 So the underlying statistical model. Here is a basically, you're just assuming that each step in a season has its own random, variable, that it's pulling from.
11:22:09 So that's why it's an extension of the average model.
11:22:13 So to save time. I've already coated up here what we would do, so what we're going to do is for each week of the flu season.
11:22:19 We're for our average, our seasonal average forecast, we're gonna add in the average of all the observations from the training set.
11:22:28 Okay, so basically just to make sure, it's clear where we loop through all of the weeks, one through 52.
11:22:36 Then we are going to go ahead and add in the average for each week.
11:22:40 So the average of week one, the average of week 2, and so on.
11:22:45 And then that's how we get our seasonal average for each week.
11:22:48 And so here's what that looks like. We've got our blue solid line, which is the actual data red sod line, which is the test data and the red dotted line is the prediction.
11:22:56 So the red dotted line. It doesn't look like it fits too well with this particular flu season, and part of the reason for that is this.
11:23:05 Flu season seems to have happened quite late in the year. Typically flu season start to take off in October and November, and then kind of peek around January, February.
11:23:16 Here it looks like the flu season didn't really take off until about, you know.
11:23:18 January or late January or February, so that's sort of why this prediction is bad.
11:23:24 But the average does typically hold, for you know, years that aren't anomalies.
11:23:31 Okay. So the next model, that is an extension for the seasonal data is the seasonal, naive model.
11:23:38 So this is exactly the same, and there's a lot of mathematical notation here.
11:23:43 What we're going to say is basically for each step in your period.
11:23:47 You're going to look at the most recent step in the last period.
11:23:51 So each step in your season, you're gonna look at the most recent same steps.
11:23:56 So the first step in the new season you're gonna predict the first step in the previous season.
11:24:01 The first step, in the second step in the new season you're in a predict.
11:24:05 The second step in the previous season so we can sort of think of this as each step within the season, following its own random walk, model that's sort of the idea here.
11:24:15 So for us what we're going to do is just take if we're following this, the prediction, the forecast for the next year.
11:24:23 So just be the what we had in 1946, because for us the next year was 1947.
11:24:29 So that's all we have to do. And then here we can see the, you know, dotted line.
11:24:36 If it's this particular, if it's this particular flu season a little bit better, then then the then the last one last model.
11:24:48 But this is the seasonal model, are the seasonal, naive model.
11:24:53 So you know, these are just baselines. They're not going to be the best.
11:24:57 If they are the best models, you wouldn't need to build more complicated forecasts.
11:25:02 These are just baselines, some of them were better than others.
11:25:07 What else do I wanna say about this? There's also an extension for data that is both seasonal and exhibits a trend.
11:25:14 But I believe we've left that for the practice problems. Notebook.
11:25:17 If you're interested in learning more. So now you have a good idea of 6 baseline models, and then the potential to learn 2 additional baseline time series forecasts.
11:25:26 I hope you enjoyed learning about these, and then next videos, we'll start to build more complicated models for time series forecasting alright, I enjoyed having you here.