WEBVTT

00:00:00.000 --> 00:00:01.000
Okay. So I'm gonna go ahead and get started. And I'm about to hit record.

00:00:01.000 --> 00:00:02.000
Alright welcome back, everybody! So today we're gonna start our time series content.

00:00:02.000 --> 00:00:20.000
So let me go ahead and share my screen and get situated.

00:00:20.000 --> 00:00:35.000
Okay, so today, we're gonna start time series. So in order to get there, you'll want to go to your lectures folder if you're trying to follow along go to supervised learning. And then we're gonna go into time series forecasting so we're gonna try and go

00:00:35.000 --> 00:00:46.000
through 1, 2, we're gonna skip 3. That's really just if you're new to python and don't know about times and date times then we'll go to 4 and then try and get through 5.

00:00:46.000 --> 00:00:50.000
We'll see how far we can get 4 is a little bit of a longer notebook, so it takes a little bit to get through.

00:00:50.000 --> 00:00:56.000
We'll see how far we're able to get through with 5.

00:00:56.000 --> 00:00:57.000
So we're gonna start with notebook number one in the time series.

00:00:57.000 --> 00:01:07.000
What are time series, and forecasting so we're gonna sort of define what a time series is.

00:01:07.000 --> 00:01:10.000
What a forecasting task is! And then also introduce some concepts related to time series called Trend and Seasonality.

00:01:10.000 --> 00:01:17.000
So let's get started a time series is a sequence of data points.

00:01:17.000 --> 00:01:25.000
Note the term sequence here x one y. One x 2 y.

00:01:25.000 --> 00:01:26.000
2, and all the way to Xy. T. And so forth.

00:01:26.000 --> 00:01:27.000
Here x, sub. T is a collection of M features. So it's not regular regression. Right X. Sub.

00:01:27.000 --> 00:01:38.000
T is a collection of M features, so it's not an regular regression. Right X. Sub. Whatever was a feature here.

00:01:38.000 --> 00:01:50.000
X sub t, is a matrix that has M. Columns and then y sub t is a matrix that has m columns, and then y sub t is a numerical variable of interest that time T is a numeric variable of interest at time. T so something we're trying to predict so we're going to

00:01:50.000 --> 00:01:57.000
assume that our time steps throughout the all this content we'll assume that our time steps are evenly spaced.

00:01:57.000 --> 00:02:01.000
Now we are. Gonna look at an example that uses stock price data.

00:02:01.000 --> 00:02:04.000
And technically, those aren't evenly spaced because you don't trade.

00:02:04.000 --> 00:02:19.000
On the holiday, or the weekend, but we're gonna make the assumption of the trading days occur sequentially, even though something may, happen on the weekend of the trading days occur sequentially, even though something may happen on the weekend that could impact, the price on

00:02:19.000 --> 00:02:29.000
Monday. So we're also going to have to make some additional assumptions on X sub t y sub t, depending upon the data set we have technically time series.

00:02:29.000 --> 00:02:38.000
Forecasting is a regression task in the machine learning data, science sense it's not a linear regression model task all the time.

00:02:38.000 --> 00:02:50.000
Sometimes it is but it. Regression is just the class of models, class of models in data, science, and supervis learning, where we have some features that we'd like to use to predict a numeric, continuous variable Y, so it is a regression task.

00:02:50.000 --> 00:03:04.000
But there are some features about time series problems in particular that will require us to change some things from linear regression.

00:03:04.000 --> 00:03:12.000
So one of the slights departures we're going to have is time series may or may not come with a set of features.

00:03:12.000 --> 00:03:15.000
So in general you would set it up like x one y.

00:03:15.000 --> 00:03:21.000
One x 2 y. 2. But what you may end up getting with your problem are just to series of Y variables.

00:03:21.000 --> 00:03:27.000
So y one, y 2, all the way to it, and we're gonna focus on that situation in these notes.

00:03:27.000 --> 00:03:29.000
Focus on a situation where you have a time series, and you're only going to use the time series to predict future values.

00:03:29.000 --> 00:03:43.000
Another thing that we're going to see soon is that the sequential nature of the data impacts our predictive modeling approaches for these data.

00:03:43.000 --> 00:03:47.000
So some examples of time series data include things like the average global temperature over the past 200 years.

00:03:47.000 --> 00:04:03.000
The value of the S. And p. 500 still index the daily new cases of seasonal influenza in the United States of America since 1,900, as well as yearly Boston, Marathon Times.

00:04:03.000 --> 00:04:11.000
So we're gonna start by talking about 2 features of time series data that you tend to look for when you're doing forecasting.

00:04:11.000 --> 00:04:25.000
The first is a trend, so we would say that a Time series exhibits a trend if the values y sub t tend to increase or decrease over time and over time, meaning as little t increases.

00:04:25.000 --> 00:04:33.000
So one example is this, Google Google's Parent company is a stock closing price.

00:04:33.000 --> 00:04:51.000
So this data file, Google underscore stock. Csv contains the closing price of Google's parent company from the initial day that it was training to someday and March of 2022, I forget when it ends so we're gonna plot this and take a

00:04:51.000 --> 00:04:55.000
look so on the vertical axis we have the value of the closing price, and on the horizontal axis we have the date of the trading days.

00:04:55.000 --> 00:05:08.000
So they're connected. Again, there are days where there were no trading, but it looks like there is because it's connected. But just think it!

00:05:08.000 --> 00:05:09.000
Here we have the date and a time series setting. We would treat these as like trading Day.

00:05:09.000 --> 00:05:18.000
One trading day, 2, and so forth. So here we would say that this time series has an increasing trend from the beginning of the time series to the end.

00:05:18.000 --> 00:05:33.000
But I want to know that it is. Sometimes you need to be careful when you're examining time series plots for trends, because the timeline window that you're looking at can impact what you see.

00:05:33.000 --> 00:05:51.000
So here is a time window where we consider the closing value of the stock from just before February 20, s, 2020, all the way to March first, twentieth 20, and here you would say that this time series exhibits a downward trend so being mindful of the windows and the time

00:05:51.000 --> 00:06:01.000
spans. You're considering can impact. You know what you're observing about the trend overall this time series exhibits an increasing trend.

00:06:01.000 --> 00:06:03.000
But as you look at the various pieces of the plot, you can see there are time windows where the stock tends to go down.

00:06:03.000 --> 00:06:11.000
So that's something to keep in mind.

00:06:11.000 --> 00:06:17.000
Another feature about. See about time series that they can have is called Seasonality.

00:06:17.000 --> 00:06:18.000
So a Time series is said to exhibit seasonality.

00:06:18.000 --> 00:06:28.000
If the value of the output variable y sub t demstrates a repeating pattern of some fixed length over time.

00:06:28.000 --> 00:06:29.000
So one way to think of this is a sinusoidal wave, so a sign curve right.

00:06:29.000 --> 00:06:37.000
So this is repeating values over time. It's periodic.

00:06:37.000 --> 00:06:38.000
In time series sense. If you saw something like this, you would say, Okay, it has seasonality.

00:06:38.000 --> 00:06:45.000
This is not the only repeating pattern you'd see.

00:06:45.000 --> 00:06:49.000
There are lots of different patterns that can repeat over time.

00:06:49.000 --> 00:06:56.000
One example of a seasonal data set is the influenza number of cases of seasonal influenza and the United States.

00:06:56.000 --> 00:07:03.000
This is prior to COVID-19. Something sort of changed with our behavior changing in the early years of the pandemic.

00:07:03.000 --> 00:07:04.000
But flu season follows a fairly regular pattern where cases start to slowly increase in late October.

00:07:04.000 --> 00:07:25.000
Slash, early November, increase more rapidly in December and January, and then peak around the late January and February, and then starts to decrease to 0 around April or so so here is a data set on Us Seasonal

00:07:25.000 --> 00:07:38.000
influenza, I believe, confirmed cases. So through some sort of tests at a hospital or doctor's office from 1928 to 1948, this data comes from Project Tyco.

00:07:38.000 --> 00:07:43.000
Okay, I think this is maybe left over from my lecture copy last year.

00:07:43.000 --> 00:07:47.000
So here we have just looking at the data set.

00:07:47.000 --> 00:07:57.000
We have the date, the year, the week, and the number of cases, and for some reason here I think I was demonstrating how to subset the data with a date time.

00:07:57.000 --> 00:08:01.000
And if that's something that you're confused about, we, you can ask the question.

00:08:01.000 --> 00:08:05.000
When I pause for questions, and a little bit.

00:08:05.000 --> 00:08:13.000
And here's what the data looks like. So you can kind of see this seasonal pattern where every year you see, basically the same kind of peak.

00:08:13.000 --> 00:08:32.000
Now in this time series. There are some years where the flu season peaks early, or peaks much higher than before, but roughly, this data follows the same sort of pattern that we discussed with occasional years of like early seasons so this would be considered seasonal data, so before we talk about

00:08:32.000 --> 00:08:53.000
what forecasting is? Are there any questions about time series, trends or seasonality?

00:08:53.000 --> 00:08:54.000
Okay, so forecasting is just what we call the act of predicting future values of a time series.

00:08:54.000 --> 00:09:05.000
So remember our supervised learning. Freework we have y is equal to F of x plus epsilon.

00:09:05.000 --> 00:09:09.000
F is the systematic part. Epsilon is some random error that's typically assumed independent of X.

00:09:09.000 --> 00:09:18.000
When we have forecasting, we make some slight adjustments to the framework that take into account sort of the temporal or sequential nature of the data.

00:09:18.000 --> 00:09:23.000
So here, in the supervised learning framework for our forecasting setting, you're assuming that Y at some time capital T is a function of the features.

00:09:23.000 --> 00:09:38.000
At time. Capital T, which you may or may not have, and the actual time given all the previous observations.

00:09:38.000 --> 00:09:42.000
And then here you can notice the random noise has a subscript.

00:09:42.000 --> 00:09:52.000
Capital T. This is to indicate that the random error while still assumed to be independent of the features you have, is no longer.

00:09:52.000 --> 00:10:02.000
It's not considered to be independent of time. So the random noise may be dependent upon the value of little T, which in this case is capital T.

00:10:02.000 --> 00:10:11.000
So these sorts of this slightly different framework is going to change some of the things we've done in regression like data splits and cross validation.

00:10:11.000 --> 00:10:14.000
But we are going to learn about that in the very next notebook.

00:10:14.000 --> 00:10:31.000
So before we continue on. Are there any questions at all from notebook number one for a time? Series?

00:10:31.000 --> 00:10:32.000
Yeah.

00:10:32.000 --> 00:10:38.000
So can I ask the question. So the So are you saying this is kind of and this looks like a conditional probability type.

00:10:38.000 --> 00:10:43.000
Expression. Is that what it becomes when a time series?

00:10:43.000 --> 00:10:47.000
Yeah, so you are conditioning on the previous observations.

00:10:47.000 --> 00:10:54.000
Right, so like.

00:10:54.000 --> 00:10:55.000
Trying to think so. We haven't seen another example where we do something like this.

00:10:55.000 --> 00:11:01.000
So!

00:11:01.000 --> 00:11:03.000
Yeah, so basically, just yes.

00:11:03.000 --> 00:11:08.000
Okay. And features are just the past performance, I guess.

00:11:08.000 --> 00:11:11.000
Are your kind of your features, then?

00:11:11.000 --> 00:11:31.000
So you may have situations where you have actual, like a matrix matrix of features where, like maybe, for instance, with this, with this flu example, you could have things about sometimes disease is impacted by the weather, so you may be you have data about the weather.

00:11:31.000 --> 00:11:34.000
Or climate variables at the time, at each time stamp as well, like maybe the average temperature, in whatever region you're looking at, or something like that.

00:11:34.000 --> 00:11:48.000
So that would be a feature, and then, on top of that, you also need to take into account the sequential nature of the data which is why you're considering these previous observations.

00:11:48.000 --> 00:11:56.000
Also, there may be something where there's like some sort of temporal like something just by due to the fact, like, it's that time of year.

00:11:56.000 --> 00:11:59.000
So you see something like this. So it's sort of a function of both potential features.

00:11:59.000 --> 00:12:08.000
You may or may not have the time itself, and then you have to consider the previous observations.

00:12:08.000 --> 00:12:09.000
Thank you.

00:12:09.000 --> 00:12:17.000
Yeah. And Erica asked if you wanted to use features in addition to just the Y values are their models that use both.

00:12:17.000 --> 00:12:22.000
Or would you somehow combine multiple different models? So, Erika, there are models that use both for the sake of time.

00:12:22.000 --> 00:12:27.000
We don't cover them in the lectures, but I believe in the practice problems.

00:12:27.000 --> 00:12:34.000
There's an example, the essential basically the model. I think, that's covered in the practice problems as you've a regression model regressing Y onto the features.

00:12:34.000 --> 00:12:48.000
And then you end up modeling the errors, using the templal nature.

00:12:48.000 --> 00:12:56.000
So Keira's asking essentially, epsilon sub t would increase with time, not necessarily so.

00:12:56.000 --> 00:13:09.000
It could increase with time, it could also, like maybe, the errors are also seasonal the way that I sent that errors would depend upon time, or can depend upon time differs depending on the time series. You have.

00:13:09.000 --> 00:13:14.000
It may be independent of time. It just depends on the time series.

00:13:14.000 --> 00:13:15.000
And then a mean is asking, oh, so we're explicitly not factoring in the previous observations.

00:13:15.000 --> 00:13:27.000
So, yeah, I believe there are some models that could explicitly factor in previous observations of the X's as well.

00:13:27.000 --> 00:13:48.000
But I believe probably the idea is that information would be somewhat contained, so like not subconsciously, but trying to think of the word implicitly.

00:13:48.000 --> 00:13:54.000
Okay.

00:13:54.000 --> 00:13:59.000
So just like we said, this is notebook 2 in the time series, folder.

00:13:59.000 --> 00:14:06.000
So notebook, 2 here. So just like we just said, the sequential nature means we have to change some things.

00:14:06.000 --> 00:14:18.000
And so in this notebook, we're going to talk about things. And so in this notebook, we're going to talk about what parts of our sort of supervised learning workflow are we gonna have to change to account for the sequential nature of the data.

00:14:18.000 --> 00:14:22.000
So the first is that we're going to have to make some.

00:14:22.000 --> 00:14:26.000
I mean, basically the only thing that we're going to change is the way we make data splits.

00:14:26.000 --> 00:14:31.000
So this is going to include validation splits, train test splits and cross validation.

00:14:31.000 --> 00:14:50.000
So the basic idea is, remember, when we learned about all those different data splits last week or in previous lectures, you just made them uniformly at random, with the idea of being like you do, that so that your distribution of the split reflects the underlying distribution of the population because you have

00:14:50.000 --> 00:15:04.000
data where your next observation is potential, dependent on all of your previous or some of your previous observations, you can't just uniformly, at random, select the train test splits the validation set splits or the cross validation.

00:15:04.000 --> 00:15:11.000
You have to split the data in a way that respects the fact that some observations occurred after other observations.

00:15:11.000 --> 00:15:28.000
So basically the idea that you can use to keep this straight in your mind is you can't use the future right to predict the past which could happen if you just do a regular random split your training set may have features that you're going to then try and use are observations that you would then try

00:15:28.000 --> 00:15:34.000
and use to predict the previous observations in your test or validation, or cross validation.

00:15:34.000 --> 00:15:38.000
So basically when you do these splits, you need to respect the fact that these later observations occur after the previous ones.

00:15:38.000 --> 00:15:58.000
So let's say hypothetically, we have a full time series here represented by this line of blue dots so I think it's like 12 observations when we make our train test split, you would then just set aside the last however, many you're considering observations as your

00:15:58.000 --> 00:16:06.000
test set. So in forecasting the number of time steps forward that you'd like to be able to forecast and let's call this. Maybe little H.

00:16:06.000 --> 00:16:19.000
Is known as the forecast horizon, so typically what you'd want to do is set aside, maybe 1, 2, or 3 horizons, as the test set, depending on how large your data set is here.

00:16:19.000 --> 00:16:23.000
You could see something like, maybe we had a horizon of 2, and we're setting aside 2 horizons worth of data for the test set.

00:16:23.000 --> 00:16:35.000
Or maybe we have a horizon of HP. 4, in which case we set aside one horizon of of our data as a test set.

00:16:35.000 --> 00:16:40.000
And so you would, while it's currently being presented as a train test Split.

00:16:40.000 --> 00:16:43.000
The same thing would work for a validation set just like with this sort of the Mini training set version of it.

00:16:43.000 --> 00:16:53.000
So that's the idea is in order to respect the fact that you can't use the future to predict the past.

00:16:53.000 --> 00:16:57.000
You just clip off the last. However, many observations.

00:16:57.000 --> 00:17:05.000
Of your data set. So you don't need a special function. So we don't need train test split to do this, you can use train test, split.

00:17:05.000 --> 00:17:07.000
But you don't need train test split. But you don't need it.

00:17:07.000 --> 00:17:13.000
So your data is typically stored in like a numpy array or some sort of pandas data frame.

00:17:13.000 --> 00:17:22.000
So you can just use or some sort of pandas data frame. So you can just use or regular indexing to get the last few observations of the data set set aside as a test set.

00:17:22.000 --> 00:17:33.000
So are there questions about trained test splits and values splits.

00:17:33.000 --> 00:17:36.000
Icons asking, Do we split train tests like this?

00:17:36.000 --> 00:17:37.000
Even if the Time Series data does not have trends or seasonality.

00:17:37.000 --> 00:17:39.000
So any data we have with time should be split like this.

00:17:39.000 --> 00:17:58.000
Yeah. So if the data you're dealing with has a temporal element, it's a time series, like, if it's observations that set time points, you're gonna want to respect the fact that you can't use the future to predict the past.

00:17:58.000 --> 00:18:09.000
So even if your time series doesn't have seasonality or a trend, you still need to respect the fact that it's a time series, and split off the trade test split like this, and then I also notice that somebody had their hand up.

00:18:09.000 --> 00:18:17.000
So if I didn't answer your question, feel free to ask.

00:18:17.000 --> 00:18:23.000
So the question is just slightly off screen and the Cross validation section.

00:18:23.000 --> 00:18:37.000
It seems like if we use the split there. Then what we would see is that the train data and Cb split, one would show up in all splits while trained it in.

00:18:37.000 --> 00:18:44.000
Cb, split 5 shows up once which to me was just that any analysis will be weight towards older data.

00:18:44.000 --> 00:18:48.000
Am I missing something? Where is that? A desired function?

00:18:48.000 --> 00:18:52.000
Yeah. So I would ask that like, we'll pause.

00:18:52.000 --> 00:18:57.000
The ask that question until until we actually like cover this part of the lecture.

00:18:57.000 --> 00:18:59.000
Yeah, yeah.

00:18:59.000 --> 00:19:00.000
No, Nope, we didn't talk about. Yeah. Yeah.

00:19:00.000 --> 00:19:02.000
Okay? Oh, sorry. I think you said about Cross Foundation with this. Well, my Bud!

00:19:02.000 --> 00:19:08.000
Just the train test, split part.

00:19:08.000 --> 00:19:13.000
Okay. Well, on that note, let's dive into the cross. Validation part.

00:19:13.000 --> 00:19:17.000
So with crossoveridation, it works the same way. So like, let's say, I had a different.

00:19:17.000 --> 00:19:20.000
I know I use the same color so it can be confusing.

00:19:20.000 --> 00:19:23.000
But let's say I had a a different cost. Validation set.

00:19:23.000 --> 00:19:34.000
And now this is what my training set, not cross audition set different time series, a bigger time series and then this is what my resultsing training tests, training set looks like after the train test split.

00:19:34.000 --> 00:19:35.000
So I think here I have 6, 9, 1215, 18 different observations.

00:19:35.000 --> 00:19:52.000
So the way Crosst validation works is essentially just sort of the same way as the train test split where you go through and sequentially remove the last.

00:19:52.000 --> 00:19:55.000
However, many observations, you have. So it's this example.

00:19:55.000 --> 00:20:00.000
We're doing five-fold cross-validation split, and we're setting our H equals to 3.

00:20:00.000 --> 00:20:01.000
So here I'm setting aside one horizon for each cross validation split as a hold.

00:20:01.000 --> 00:20:19.000
Out set, and then just like I believe Zach mentioned with his question for each cross validation split as a hold out set. And then, just like, I believe, Zach mentioned with his question, using all of the previous observations.

00:20:19.000 --> 00:20:32.000
As training data, so this is one way you can do it and then as Zach mentioned you are sort of getting this phenomenon or occurrence happening where you know these first 3 points are included in every single training set for the cross validation, splits, and then these next 3 points are included

00:20:32.000 --> 00:20:36.000
in 4 of them. So you are getting more information from the past.

00:20:36.000 --> 00:20:49.000
So you can set it up and the way we're going to learn to do this is called Time Series split so you can set it up so that you have the same size training set for all of your cost. Validation Splits.

00:20:49.000 --> 00:21:01.000
And so basically like, for instance, if we wanted to make our training set size 3 as well, you could make it. So. You only use like these 3 observations, the 3 observations immediately before your hold out set.

00:21:01.000 --> 00:21:11.000
So, for instance, like here, you know, if you did that Cv split, one would be these 6 that you see, and then Cv split, 2 would be the last 6 cv.

00:21:11.000 --> 00:21:20.000
Split 3 would be the last 6, and so forth. So you can do either way, you know, you can see which way it works best for your particular model.

00:21:20.000 --> 00:21:21.000
You can try both and then see if that has an impact on the results of the model.

00:21:21.000 --> 00:21:28.000
If you'd like.

00:21:28.000 --> 00:21:33.000
Okay, so to do this in python, you can use a function from.

00:21:33.000 --> 00:21:34.000
I guess it's technically an object from sk learn, called Time series, Split.

00:21:34.000 --> 00:21:45.000
So from Sqlern dot model. So it's still in model selection, just like K-fold.

00:21:45.000 --> 00:21:50.000
You're gonna import capital t time, capital S series.

00:21:50.000 --> 00:21:54.000
Capital S. Split.

00:21:54.000 --> 00:21:57.000
So what I'm doing right now is just randomly generating.

00:21:57.000 --> 00:22:02.000
And then maybe what I'll do is empty.

00:22:02.000 --> 00:22:09.000
Dot random, dot seed, just to make sure it's always the same so I'm randomly generating a time series using numpy.

00:22:09.000 --> 00:22:18.000
And so here's what my time series looks like. This is just randomly generated not important, other than the fact that we're going to pretend it's a time series.

00:22:18.000 --> 00:22:22.000
So how do I do? The cross validation with time series split?

00:22:22.000 --> 00:22:28.000
So K fold is gonna be called time series Splits.

00:22:28.000 --> 00:22:35.000
The number of splits is the number of splits you want, so you can specify it with N underscore splits so you can just put it first.

00:22:35.000 --> 00:22:39.000
Test size will limit the size of the holdout set.

00:22:39.000 --> 00:22:40.000
So here I believe in my comments. I'm saying that I'm gonna use a test size of 14.

00:22:40.000 --> 00:22:42.000
And so here, if I don't specify a train size.

00:22:42.000 --> 00:22:52.000
So this is sort of also a difference from Kfold and Kfold.

00:22:52.000 --> 00:22:53.000
We would typically do a fraction with time series split.

00:22:53.000 --> 00:23:03.000
You can do a fraction but if you want to be explicit on the exact number of observations being split off, you can just specify.

00:23:03.000 --> 00:23:07.000
So here I've specified the exact number of observations being split off, you can just specify. So here I've specified 14.

00:23:07.000 --> 00:23:08.000
The reason I chose 14 is just because this is what I provided for myself and the in the comments.

00:23:08.000 --> 00:23:11.000
So in this example, the way it would work is the way it would work is the way it's pictured here.

00:23:11.000 --> 00:23:28.000
So instead of 3 here, you'd have 14 observations, and then all the remaining, and then sequentially pruding off the holdout set each time, and then we can loop through this and see how that looks.

00:23:28.000 --> 00:23:35.000
So you can see how each time the previous test index is then added to the list for the train index.

00:23:35.000 --> 00:23:40.000
As you go through. Okay? And then alternatively, you can do it.

00:23:40.000 --> 00:23:44.000
The way that I said in sort of an answer to Zach's question.

00:23:44.000 --> 00:23:55.000
I believe if you set trains, size, and then let's say we want to use just 2 horizons worth of data for each training set so we could set that equal to 28.

00:23:55.000 --> 00:23:56.000
I knew that would be something that happens. Let me check out the documentation to her.

00:23:56.000 --> 00:24:05.000
What this argument is. So Max train size, that's what it is.

00:24:05.000 --> 00:24:10.000
So not train size. But Max underscore tree in size.

00:24:10.000 --> 00:24:13.000
And so now, if you go through, you can see that the training set is never bigger than 20.

00:24:13.000 --> 00:24:20.000
-eight observations that might be hard to see with your eyes.

00:24:20.000 --> 00:24:27.000
But trust me, it's always 28, and then it moves along with the test set.

00:24:27.000 --> 00:24:30.000
Okay.

00:24:30.000 --> 00:24:44.000
Are there any questions about cross validation with time series?

00:24:44.000 --> 00:24:51.000
So Brantly is asked in the chat if our Time series exhibit seality, shouldn't we make sure that our validation sets together cover at least one full cycle of seasons?

00:24:51.000 --> 00:24:52.000
So get a good idea of how the model performs throughout the entire cycle.

00:24:52.000 --> 00:25:03.000
Yes, probably you would want to do that sometimes. You might not have like enough data to ensure that you do that.

00:25:03.000 --> 00:25:15.000
But in general you would like to make sure that you're split or so like your what's the word I'm looking for so like with your horizon split.

00:25:15.000 --> 00:25:24.000
You might have seasonal data, but you only are interested. Let's say you have seasonal data, but you only are interested in predicting, like 3 time steps out.

00:25:24.000 --> 00:25:34.000
In that case your hold outside, you probably wouldn't want to make it like maybe your season is yearly and your data is weekly data.

00:25:34.000 --> 00:25:40.000
And so you'd have a season of roughly 52 time steps.

00:25:40.000 --> 00:25:41.000
The period of the season be roughly 52.

00:25:41.000 --> 00:25:46.000
In that case you probably wouldn't want your hold out.

00:25:46.000 --> 00:25:49.000
Set to always be 52. Your holdout set should be about the size of your horizon, a little bit larger, like 2 or 3 horizons.

00:25:49.000 --> 00:26:19.000
In that case you might wanna make sure that if you're doing sort of this, where were we doing this sort of like Max train size, maybe you want to ensure that you include at least one whole seasons worth of data in your training so?

00:26:24.000 --> 00:26:38.000
Are there any other questions?

00:26:38.000 --> 00:26:44.000
Okay.

00:26:44.000 --> 00:26:48.000
And then.

00:26:48.000 --> 00:26:50.000
Yeah, that should come up. So that's it for your notebook.

00:26:50.000 --> 00:26:53.000
Number 2. We are, gonna we're skipping and live lecture.

00:26:53.000 --> 00:27:08.000
We're skipping notebook number 3. So if you're new to Python, and you've never dealt with like date times, I encourage you to check out this notebook on your own time, there is a live lecture already on the website or not a live lecture already on the website or not a live

00:27:08.000 --> 00:27:16.000
lecture a pre-recorded lecture video on the website for this notebook and it's just good to get some familiarity with sort of date times and time steps.

00:27:16.000 --> 00:27:20.000
And how python handles time in general. So we're going to dive into sort of learning some very first baseline forecasts.

00:27:20.000 --> 00:27:24.000
I will preface like a lot of these forecasts aren't gonna be like the world's best forecasts.

00:27:24.000 --> 00:27:36.000
Sometimes they do a lot better than you'd expect them to, but it's important to always have a baseline for any sort of modeling approach that you're doing.

00:27:36.000 --> 00:27:42.000
So with forecast. These ones are going to be like sort of your Goto as a very simple baseline, because if you're really, maybe you have a really complicated model, if you're really complicated, model that takes a long time to train.

00:27:42.000 --> 00:27:50.000
Can't outperform your baseline. It's not worth keeping that model.

00:27:50.000 --> 00:28:02.000
So we're gonna learn these baseline forecasts in particular, we're gonna cover 6 base 6 baseline models to basically, we're going through different cases of how your time series looks.

00:28:02.000 --> 00:28:07.000
So first, one is, we're going to consider timeline series without trend and without seasonality.

00:28:07.000 --> 00:28:14.000
The next 2 will be time series with a trend, but no seasonality.

00:28:14.000 --> 00:28:23.000
And then the last 2 will be time series with seasonality but not a trend, and I know what you're thinking.

00:28:23.000 --> 00:28:36.000
Well, that doesn't cover all the potential options. I cut it off there, because if you know these last 4 you can put together how to make the last 2, it is explicitly covered in the practice problems.

00:28:36.000 --> 00:28:46.000
So sort of for the sake of time. I just focused on these 6 models and then left it to you to look at the last 2 on your own time.

00:28:46.000 --> 00:28:47.000
So throughout the notebooks I want to assume that whenever I have y sub t, I'm referring to a time series.

00:28:47.000 --> 00:29:04.000
My training set. I'm always going to assume. Has n observations, and in particular, our overall goal is to make observations at times, after what we have observed.

00:29:04.000 --> 00:29:10.000
So for little T. Bigger than N. Remember, N. Here is the size of my training set.

00:29:10.000 --> 00:29:14.000
Okay.

00:29:14.000 --> 00:29:18.000
So, we're gonna stick to that 2 time series. We looked at in the very first notebook today.

00:29:18.000 --> 00:29:20.000
So the Yahoo Finance Time Series, which was the trading, the closing price for the Google Parent Company stock from August nineteenth 2,022.

00:29:20.000 --> 00:29:50.000
-two. So remember, I said, you know, we want our data to be sequential. We're just going to assume that my time-six nineteenth, 2,004 to March twenty-fifth, 2,022 so remember I said you know we want our data. To be sequential.

00:29:50.000 --> 00:30:00.000
We're just going to assume that my time steps are trading days which are not always sequential, right? But we're just going to make that assumption. And then the second is weekly seasonal influenza cases from.

00:30:00.000 --> 00:30:03.000
Okay. So I also want to point out that this might be a new thing for people who are new to using pandas.

00:30:03.000 --> 00:30:10.000
So these data sets Google stock and the influenza, one have columns that have dates in them.

00:30:10.000 --> 00:30:27.000
And so when you read in data, Csv or Csv files, or anything with Pandas, when you read in data with Pandas that has a column, that is a date, you can get it read in as a date.

00:30:27.000 --> 00:30:30.000
If you include this argument, parse underscore dates.

00:30:30.000 --> 00:30:32.000
And so what you do here is you provided a list of columns and those columns should be the ones that you suspect have a date in them.

00:30:32.000 --> 00:30:57.000
So, if the date is well formatted, meaning it follows sort of like standard date formatting there's a couple of different standards, but if it follows one of those standards pandas can usually figure out which one you're using okay, and so now here, are the first 5 entries of the

00:30:57.000 --> 00:31:01.000
data frame. And then maybe just for the sake of following up on that.

00:31:01.000 --> 00:31:07.000
So like you can see the 0 entry. Here is what's known as a Pandas times.

00:31:07.000 --> 00:31:13.000
And then if I got rid of this, so I'll just comment it out.

00:31:13.000 --> 00:31:18.000
Now you can see how it's a string. So without the Parse states, it's read in as a string.

00:31:18.000 --> 00:31:24.000
But with the Parse dates it's read in as a timestamp.

00:31:24.000 --> 00:31:34.000
A pandas's timestamp. So and then here's just a reminder of the time series of what it of what it looks like.

00:31:34.000 --> 00:31:45.000
So the first 2 that we're gonna look at are, you know, we know that this is a time series with a trend and increasing trend in the over the course of the entire time series.

00:31:45.000 --> 00:31:46.000
We're gonna still use it instead of introduce yet another time series, I'm gonna stick to just these 2.

00:31:46.000 --> 00:31:55.000
And I'm gonna use this time series as the one that you know is, I'm gonna build the baseline forecast for time series without a trend.

00:31:55.000 --> 00:32:16.000
I know it has a trend, but we're going to pretend like it doesn't, just for the sake of not having to introduce a third time series for us to keep track of so I'm gonna say that I'm gonna set aside to our 14 days worth of trading of trading days and use that as

00:32:16.000 --> 00:32:28.000
my test set don't really have a reason for using 14. That's just what I chose. When I wrote the notebook.

00:32:28.000 --> 00:32:30.000
Okay. So the first forecast for we're gonna look at is something called the average forecast.

00:32:30.000 --> 00:32:41.000
So the average forecasts just consists of predicting the historical average for every future time point.

00:32:41.000 --> 00:32:50.000
So this is really similar to the baseline we use for regular regression problems where you're just going to take the expected value of the output and then use that for your picture.

00:32:50.000 --> 00:33:13.000
So to write it down in a formula F of little sub t would be the expected value of the time series, plus some random noise for any time in the future, and just the observed value, if it was one of the observations in the training set, and so here epsilon is an error term and then when we're doing this in

00:33:13.000 --> 00:33:18.000
production if you wanted to implement this as one of your models.

00:33:18.000 --> 00:33:31.000
So if this assumption, if this model is to be any good, you'd have to have the assumption that the Y sub t are independent of one another and identically distributed.

00:33:31.000 --> 00:33:32.000
So this is what's also known as a white noise.

00:33:32.000 --> 00:33:45.000
Time, series, so this is me white noise, meaning it's a purely random process for each observation is identically distributed and independent of one another.

00:33:45.000 --> 00:33:46.000
It's usually not gonna be the case of like what you get.

00:33:46.000 --> 00:33:52.000
But it's a good baseline. So how do we make the prediction?

00:33:52.000 --> 00:33:59.000
So we're gonna do Goog, train dot closing price that mean.

00:33:59.000 --> 00:34:07.000
And then we're gonna multiply it by the length of our test set of ones, which was just to be safe.

00:34:07.000 --> 00:34:12.000
We'll just do underscore test it's 14. But this way we know for sure.

00:34:12.000 --> 00:34:17.000
If I go up and change it later, it will always be the length of the test set.

00:34:17.000 --> 00:34:25.000
Ones, okay. So now, what this is gonna do is it's gonna plot my training data.

00:34:25.000 --> 00:34:29.000
My actual test data and my prediction, as well as type out the Mse. For us.

00:34:29.000 --> 00:34:33.000
So I wanna make a disclaimer, and maybe I should have written this in in the notebook.

00:34:33.000 --> 00:34:40.000
This. So if we were trying to do the process of building the best model, we would want to do a validation.

00:34:40.000 --> 00:34:55.000
Set or cross validation, because I'm just defining the different sets of different types of models and then showing you how to fit them we're just going to show you how the model compares to the actual test set.

00:34:55.000 --> 00:35:12.000
This is not what you would want to do if you're actually trying to build a model, but for simplicity, for the lecture, that's just what we're doing. Okay.

00:35:12.000 --> 00:35:17.000
So, as I said, not a great model. And why isn't it a great model?

00:35:17.000 --> 00:35:22.000
So you can see here, this started line is our prediction.

00:35:22.000 --> 00:35:27.000
So it's not a good model, remember, because, like the averages, all of the historical data.

00:35:27.000 --> 00:35:37.000
And because we have such a huge increase from, you know the first trading day to the last trading day, our averages way skewed down.

00:35:37.000 --> 00:35:38.000
Okay, so that's our first forecast. The next baseline forecast for data without a trend.

00:35:38.000 --> 00:35:53.000
And without seasonality, 's known as the Naive forecast, so it's called the naive forecast, because your prediction of what things are going to be like in the next time step is what things are currently so if we were to write this down as a formula.

00:35:53.000 --> 00:35:57.000
Our F of T would be the last observation in the training set, plus some random noise.

00:35:57.000 --> 00:36:13.000
If your observations in the future, or the observed value, if it's in the training set, where again, Epsilon is an error term.

00:36:13.000 --> 00:36:21.000
So the statistical model that underlies this is known as a random walk, where you assume some initial value for time.

00:36:21.000 --> 00:36:25.000
Step 0, and then the future time step is just the current time.

00:36:25.000 --> 00:36:29.000
Step, plus some random noise. So this is the naive model.

00:36:29.000 --> 00:36:33.000
It's naive because you're just assuming things are going to be the same for every time.

00:36:33.000 --> 00:36:45.000
Step in the future, same as the current state. So to make this prediction, you would do you get trained dot closing price, and then we would do.

00:36:45.000 --> 00:36:46.000
I think it's probably easiest for me to just to do dot values and take the very last observation.

00:36:46.000 --> 00:37:06.000
And then multiply it by a vector of ones. That is the length of the test set again, if it was cross-sidation, you know, if it was actually trying to compare to other models, we would do cross-sidation because we're just learning the models.

00:37:06.000 --> 00:37:13.000
It's the test set. Okay. So here, this one actually works a lot better for this particular data set.

00:37:13.000 --> 00:37:14.000
And that's typically because you, you can try and model the stock market with a random walk.

00:37:14.000 --> 00:37:25.000
And it's usually okay in the short term, because stock prices are pretty volatile.

00:37:25.000 --> 00:37:31.000
That's why this one looks a little bit better than than the average.

00:37:31.000 --> 00:37:33.000
Okay.

00:37:33.000 --> 00:37:37.000
Okay. But before we move on to data with a trend, but no seasonality.

00:37:37.000 --> 00:37:48.000
Are there any questions about the average or the naive forecast?

00:37:48.000 --> 00:37:53.000
I mean, and I guess I just have a question of like you're not really supposed to do this with stock price data, right?

00:37:53.000 --> 00:38:07.000
Yeah, I thought I heard that like, you're not supposed to try and predict stock price data because it's not actually doable.

00:38:07.000 --> 00:38:12.000
Yeah, so like, you know, people do come up with models to try and predict.

00:38:12.000 --> 00:38:18.000
Maybe not. Try and guess like the absolute, actual price. I don't actually know what people you know in quantitative trading firms do, and try and predict.

00:38:18.000 --> 00:38:26.000
Like? If you like, the general knowledge is, you're never gonna beat the market.

00:38:26.000 --> 00:38:30.000
And typically, if my understanding of what that means is over, time.

00:38:30.000 --> 00:38:31.000
So in the short term you might get lucky, or maybe have a good model.

00:38:31.000 --> 00:38:39.000
That's good at predicting like a short term thing.

00:38:39.000 --> 00:38:52.000
But if you are, then to try and consistently use the same approach over time, you're always going to be beaten by the market, which is typically meant like an index fund that you know, if you don't have a lot of money that you're willing to throw around and bet on your models I would suggest or if you

00:38:52.000 --> 00:38:57.000
don't have somebody else's money to do it with.

00:38:57.000 --> 00:38:58.000
Right.

00:38:58.000 --> 00:39:07.000
I would suggest, or if you don't have somebody else's money to do it with, I would suggest, don't try and predict the stock market, you know there are places that are quantitative trading firms that would that do have they try and at least predict something and some of them make a lot of

00:39:07.000 --> 00:39:17.000
money, but in the long term they would probably be outperformed by other sort of trading approaches.

00:39:17.000 --> 00:39:18.000
Yeah.

00:39:18.000 --> 00:39:28.000
Alright! Thanks!

00:39:28.000 --> 00:39:32.000
Okay, so our next 2 baseline forecasts are for data with Trend.

00:39:32.000 --> 00:39:36.000
But no seasonality, so either increasing or decreasing over time.

00:39:36.000 --> 00:39:47.000
The first is just called the trend Forecast so the trend forecasts just assumes that Time series is a linear function of time.

00:39:47.000 --> 00:40:00.000
So you might do something like F of T. Is equal to beta 0 plus beta one t plus epsilon if it's in the future, and if it's not in the future, you just take the current observation.

00:40:00.000 --> 00:40:05.000
And so here Beta, 0 and Beta one are real variables and Epsilon is a random error term.

00:40:05.000 --> 00:40:16.000
So the underlying statistical model here is this is really just a play on the average model that is linear in time.

00:40:16.000 --> 00:40:22.000
So in the average model, we just take the expected value and it's fixed for all time, whereas here we're saying the expected value can be modeled as a function of time.

00:40:22.000 --> 00:40:42.000
So I guess this should be e of y t given sub t so I guess this should be the expected value can be modeled as a function of time. So I guess this should be of y of t given sub t, so you do from S.

00:40:42.000 --> 00:40:52.000
K learn that linear model which we've seen before. Import linear regression, and then just do exactly the same thing.

00:40:52.000 --> 00:40:53.000
So that we've done in the past. Few lectures and problem sessions.

00:40:53.000 --> 00:41:00.000
So Reg is equal to linear regression.

00:41:00.000 --> 00:41:09.000
Regg dot fit, and then for your your features here you would just do something like Np.

00:41:09.000 --> 00:41:22.000
Arrange one to the length of the training set. So train, and then you have to do, plus one to account for the fact that you're not starting at 0 anymore.

00:41:22.000 --> 00:41:26.000
And then we would need to reshape, because this is a way dimensional array.

00:41:26.000 --> 00:41:53.000
And then we just put the closing price. So good Goog train dot closing price dot values. And then to make the forecast we would do Reg dot predict the Npr range would now go from the length of the training set to the links.

00:41:53.000 --> 00:41:58.000
Maybe what I'll do here is enter to the length of the training set.

00:41:58.000 --> 00:42:10.000
Plus the length of the test set. Because again, we're just looking at the test set here, plus again, plus one to account for the fact that you know you're not starting at 0.

00:42:10.000 --> 00:42:16.000
And so then that would be our prediction. Oh, for@thereshape.reshape negative one!

00:42:16.000 --> 00:42:20.000
Oh, one!

00:42:20.000 --> 00:42:29.000
Maybe I don't actually need the plus one. Where did it go?

00:42:29.000 --> 00:42:31.000
There we go. I did not need the plus one. Okay, so this is what you get again, doesn't look like it did much better than the average forecast.

00:42:31.000 --> 00:42:40.000
We could compare if we wanted to but that's not the point of this notebook.

00:42:40.000 --> 00:42:53.000
And again, it's probably just due to the fact that we have such a long time series. And this does not look like a linear linear.

00:42:53.000 --> 00:42:58.000
Alright! So is there any questions about the trend forecast, and how you fit it? I guess.

00:42:58.000 --> 00:43:06.000
Are there any questions?

00:43:06.000 --> 00:43:15.000
Okay. So the last for the last of the 2 for data with a trend. But no seasonality is called a random walk with Drift.

00:43:15.000 --> 00:43:35.000
So this is just a trend extension of the naive forecast, and so for this, you're going to take your F of T to be the last observation, plus some constant or parameter that you try and estimate beta times the number of time steps into the future from your last

00:43:35.000 --> 00:43:37.000
observation plus epsilon, and then, if it's occurred, and like within your training, set, you just take whatever the observed value is.

00:43:37.000 --> 00:43:47.000
Here Beta is a real number that you're gonna estimate.

00:43:47.000 --> 00:43:51.000
And so the underlying statistical model here is just called the same thing.

00:43:51.000 --> 00:43:54.000
It's a random walk with drift where your have a sequence of random variables.

00:43:54.000 --> 00:43:58.000
Y. One to y n, or just y one, y 2, y t etc.

00:43:58.000 --> 00:44:15.000
Are you assume if a starting point y sub 0, and then each subsequent point y sub. 0, and then each subsequent point is the current plus this trend y sub. 0, and then each subsequent point is the current plus this trend parameter coefficient

00:44:15.000 --> 00:44:16.000
plus epsilon. And so that's why it's T minus n, because you're assuming you're adding the same amount at each time.

00:44:16.000 --> 00:44:29.000
Step. So the way you can estimate this slope coefficient here, Beta, is to use what's known as first differences.

00:44:29.000 --> 00:44:52.000
And so first difference is are found by calculating the a few like the next time step minus the current, or I guess you could put y sub t minus y sub t minus one. So you can do this really quickly with pandas dot diff function, so if we do just to demonstrate so.

00:44:52.000 --> 00:44:59.000
Looking at the first 5 like values of the closing price.

00:44:59.000 --> 00:45:11.000
We get this, and then with diff, goo train dot closing price dot diff, and then dot head.

00:45:11.000 --> 00:45:16.000
You can compare and kinda eyeball that, you know there's nothing to subtract the cause.

00:45:16.000 --> 00:45:30.000
This is the first entry. So for the one entry it takes the one entry of Goo train and subtracts the 0 entry, and so if you eyeball it, that's how you can tell that this is what you'd get then for the next century you would do the

00:45:30.000 --> 00:45:36.000
2 entry minus the one entry which gives us a point 5 4, and then you just keep going and so then, to get the estimate of the beta, you just take the average value of these.

00:45:36.000 --> 00:45:47.000
So I'm going to copy paste. Get rid of the head, and add in the mean.

00:45:47.000 --> 00:45:50.000
And then, just to make sure I'm not including that.

00:45:50.000 --> 00:45:57.000
Na will do from one onwards.

00:45:57.000 --> 00:46:02.000
Okay. And so then this is what our prediction looks like.

00:46:02.000 --> 00:46:07.000
So a little bit of an increase.

00:46:07.000 --> 00:46:19.000
And you can kind of see from the slope slight slope here.

00:46:19.000 --> 00:46:20.000
So Zack is asking as a random walk with drift different from the linear model, with a different intercept.

00:46:20.000 --> 00:46:34.000
So the linear model you get is regressing the Y values onto the model you get is regressing the Y values onto the time steps.

00:46:34.000 --> 00:46:35.000
So T, one or 1, 2, 3, 4. So you're never going to get well, I guess maybe not.

00:46:35.000 --> 00:46:41.000
Never! It would be rare that you get the intercept here.

00:46:41.000 --> 00:46:50.000
This beta 0 basically you're just fixing Beta 0 to be the last observation of your training set.

00:46:50.000 --> 00:47:07.000
If that makes sense.

00:47:07.000 --> 00:47:14.000
Any other questions about these 2?

00:47:14.000 --> 00:47:15.000
Yeah.

00:47:15.000 --> 00:47:17.000
Just kind of a general question. Could you do something like transform your time?

00:47:17.000 --> 00:47:25.000
Variable, using like a sine wave, or something like that, so then, it's not really a time series anymore.

00:47:25.000 --> 00:47:29.000
And then just do like a normal.

00:47:29.000 --> 00:47:30.000
Okay.

00:47:30.000 --> 00:47:31.000
So it will always yeah. So no matter what transferations you apply to it, it will always be a time series.

00:47:31.000 --> 00:47:38.000
So the series, the Time series, part of it, isn't necessarily like Oh, it's seasonal, or Oh, it has a trend.

00:47:38.000 --> 00:47:46.000
The time series. Part of it is, there is, you know, in the models we're going to learn, and the live lectures.

00:47:46.000 --> 00:47:51.000
There's an assumed dependence structure from one observation compared to the previous one.

00:47:51.000 --> 00:48:01.000
Like. So you're assuming that there is some sort of dependence steps basically.

00:48:01.000 --> 00:48:06.000
So that's the idea behind behind, like the time series.

00:48:06.000 --> 00:48:25.000
And even if you were to take like, if you took the sign of all the observations, or you did like whatever transformations you applied to the different observations like, there would still be some dependence between the observations, so that yeah, you can't really take the time.

00:48:25.000 --> 00:48:34.000
Series, and then, like somehow find a way to decompose it, to not be a timeline anymore.

00:48:34.000 --> 00:48:38.000
Clark's asking if we can show the Beta hat block again.

00:48:38.000 --> 00:48:59.000
We sure can. So to get the Beta hat, remember, so like if y t plus one is equal to yt plus beta plus epsilon, then y t plus one minus y t would be a way to estimate the betas and then typically you assume that the epsilon has a an average

00:48:59.000 --> 00:49:06.000
a mean of 0, in which case the beta, like, by taking the average of all the yt plus ones minus the yt's you're gonna estimate a value for Beta.

00:49:06.000 --> 00:49:14.000
And so that's what we're doing here.

00:49:14.000 --> 00:49:24.000
We're taking the diff. And then, when we take the average of all the diffs again, we're skipping the one here because we're skipping the very 0 row.

00:49:24.000 --> 00:49:35.000
So the very first row we're skipping this because it has a missing value.

00:49:35.000 --> 00:49:41.000
Any other questions?

00:49:41.000 --> 00:49:47.000
Okay. So the last 2 models we're gonna look at are for data with seasonality. But no trend.

00:49:47.000 --> 00:49:52.000
And last 2. In this notebook we are going to look at more models based on the time that I have right now.

00:49:52.000 --> 00:49:53.000
So let's just remind ourselves quickly of what this data set looks like.

00:49:53.000 --> 00:49:59.000
And so here we can confirm earlier in the notebook added Typo.

00:49:59.000 --> 00:50:09.000
So we're looking at from 1,928 to 1948, not 90, I think I had 30 to 50 earlier in the notebook so this is what it looks like.

00:50:09.000 --> 00:50:17.000
Okay, so this is our time series. So we can see that this data tends to this exhibits a sort of yearly pattern.

00:50:17.000 --> 00:50:30.000
So new cases tend to increase at the beginning of each year, and peak in the first sort of quarter of the year, and then decline afterwards, and the cycle occurs on a yearly basis.

00:50:30.000 --> 00:50:31.000
So time series that exhibit this sort of trend.

00:50:31.000 --> 00:50:40.000
Our behavior are set to exhibit seasonality, not trend, but behavior. And I think this is just a repeat from before I had that.

00:50:40.000 --> 00:50:41.000
So. Some of these notebooks I wrote like 2 or 3 years ago, and so they don't always conform with one another.

00:50:41.000 --> 00:51:03.000
I said this in the earlier notebook, okay? So when we have this sort of belief like that, our data is seasonal, we can update these sort of baseline models and average in an E naive model, we can update them to account for the fact that our data, is seasonal so

00:51:03.000 --> 00:51:10.000
we're gonna use this to sort of look at the last year of the data set, which is 1,947.

00:51:10.000 --> 00:51:14.000
We're going to use this as a way to sort of show you how you can make some seasonal baseline module and then compare it to what we observe with the last year.

00:51:14.000 --> 00:51:27.000
Again, if we were actually trying to build a model for predicting on this data the best one we could, we would do cross validation or a validation set.

00:51:27.000 --> 00:51:35.000
But just to introduce the models are going to focus in on just looking and comparing to what we observe in the last year.

00:51:35.000 --> 00:51:36.000
So the first seasonal. But no trend is the seasonal version of the average forecast.

00:51:36.000 --> 00:51:46.000
So again we say.

00:51:46.000 --> 00:51:55.000
Okay. So for each time, step in the seasonal average for forecast, basically, what you're gonna do is predict the average value of all the previous corresponding time steps throughout the series.

00:51:55.000 --> 00:52:06.000
So a seasonal data set has a given season, and that season is, let's say, M.

00:52:06.000 --> 00:52:11.000
Time steps long. I think that's what I used here, and so if it is M.

00:52:11.000 --> 00:52:19.000
Times steps long, you would basically would go through and say for each like observation, one in a period.

00:52:19.000 --> 00:52:23.000
So the length of a season is called a period for each observation.

00:52:23.000 --> 00:52:27.000
One in that period to predict the next observation. One.

00:52:27.000 --> 00:52:28.000
I'm going to take the average over all of those.

00:52:28.000 --> 00:52:30.000
And then for making a prediction that is in time, step 2 of a P.

00:52:30.000 --> 00:52:42.000
Air. You take the average over all of those time. Step 2 observations in your training set for each observation that occurs in time.

00:52:42.000 --> 00:52:43.000
Step 3. In the period in the future you would take the average over all the time. Step 3.

00:52:43.000 --> 00:52:51.000
Observations in your data set. So that's sort of the idea. Here.

00:52:51.000 --> 00:53:02.000
It's easier to say it and explain it with words than it is to read through this formula, what you can do on your own time if you'd like so sort of the underlying statistical model.

00:53:02.000 --> 00:53:12.000
Here is just a seasonal extension of the average forecast, where the underlying model is assuming that each step in your season represents its own like white noise, sequence.

00:53:12.000 --> 00:53:17.000
So for every step you have a different, random, variable that you're drawing from that that that distribution.

00:53:17.000 --> 00:53:22.000
That's the idea. So the nice thing is ahead of time.

00:53:22.000 --> 00:53:27.000
I specified. So I clean the data a little bit.

00:53:27.000 --> 00:53:29.000
So each week. It's each year has 52 weeks, which isn't true.

00:53:29.000 --> 00:53:47.000
Every year has exactly 52 weeks, but for this toy example each week year has 52 weeks, and so, in order to get the average model predictions, all I have to do is through the training set loop through the different weeks of the year.

00:53:47.000 --> 00:53:51.000
One through 52, and then record the average for that week.

00:53:51.000 --> 00:53:56.000
And so that's what this code does is it loops through weeks, one through 52, and then records the average value of cases for that week across the training.

00:53:56.000 --> 00:54:06.000
Set, and so then this is plotting that average forecast, that seasonal average forecast.

00:54:06.000 --> 00:54:16.000
Okay, so the dotted line is the seasonal average forecast.

00:54:16.000 --> 00:54:19.000
The solid red line is the actual observation in 1947.

00:54:19.000 --> 00:54:41.000
So one reason that this doesn't perform as well as because 1947 was sort of a an anomaly year for most seasonal influenza years it started off, maybe like near like actually in the year, whereas other seasons tend to start in December, November.

00:54:41.000 --> 00:54:42.000
December, and then are starting to like, maybe peek in January and come down.

00:54:42.000 --> 00:54:51.000
So that's why you're seeing this sort of disparity between the observed and then the average model.

00:54:51.000 --> 00:54:53.000
Okay. So then the next one is just the seasonal extension of the naive model.

00:54:53.000 --> 00:55:02.000
It's called a seasonal, naive model. Here you take again.

00:55:02.000 --> 00:55:08.000
You're doing this sort of idea where you're breaking up your time, steps your seasons into the corresponding time steps.

00:55:08.000 --> 00:55:25.000
So if I want to make a prediction in the first time step of the of of the model, or whatever, let me start over, if I want to make a prediction in the future, you'd have to figure out what time step that prediction, corresponds to and then find the most recent version of

00:55:25.000 --> 00:55:42.000
that time step in the training data and then choose that. So in this example, for the year 1947, we're just going to take every corresponding week from the year 1947, we're just going to take every corresponding week from 1946 and then copy and paste it over.

00:55:42.000 --> 00:55:54.000
So the underlying model for this is sort of an a seasonal extension of the random Walk, where we would think of each step in the seasons cycle as sort of its own random walk, sequence.

00:55:54.000 --> 00:56:00.000
So let's go ahead and do this. So here we can use the fact that I have already cleaned out the year.

00:56:00.000 --> 00:56:07.000
This data. So I can say to get my predictions for 1947, I just need to get the corresponding values.

00:56:07.000 --> 00:56:11.000
For 1947.

00:56:11.000 --> 00:56:14.000
Okay.

00:56:14.000 --> 00:56:15.000
So that's what the seasonal, naive model looks like.

00:56:15.000 --> 00:56:25.000
On this particular data? Okay, are there any questions about the 2 seasonal?

00:56:25.000 --> 00:56:44.000
But no trend models that we just went over.

00:56:44.000 --> 00:56:50.000
Alright, so like, I said. There are 2 more baselines that you might want to can look at.

00:56:50.000 --> 00:57:03.000
So, for instance, you can have a data set that has both seasonality and a trend in that case, you're basically just doing extensions of again, you're just doing extensions of the average and the naive model.

00:57:03.000 --> 00:57:06.000
But accounting for the fact that you have a trend and seasonality.

00:57:06.000 --> 00:57:12.000
So it gets like cumbersome to write it all out and go over it in a lecture.

00:57:12.000 --> 00:57:26.000
But I did write it all out in the practice problems notebook for time series. So if you're interested in using these baselines for such data, check out the practice problems and you'll see how to do so okay?

00:57:26.000 --> 00:57:34.000
So the last notebook we're gonna dive into today is called averaging and smoothing Snowbook number 5 in the time series.

00:57:34.000 --> 00:57:46.000
Folder. So these are gonna start to introduce some beyond baseline models. So we're gonna see how much of this we can get through today, and then we'll finish it up tomorrow.

00:57:46.000 --> 00:57:49.000
So!

00:57:49.000 --> 00:58:01.000
What do we mean when we say averaging, smoothing forecast, so averaging or smoothing forecasts are ones where you're just going to take an average of some collection of the previous values?

00:58:01.000 --> 00:58:10.000
So the ultimate version of this was that baseline model where you take all the previous observations and the training set, and then average of them together.

00:58:10.000 --> 00:58:29.000
But we're going to learn like more nuanced or subtle averaging models in this notebook, the very first one is known as a moving average forecast and so, for one example, of a moving average forecast for instance, a moving average forecast for instance, a moving average of size, 3 window, size.

00:58:29.000 --> 00:58:36.000
3 you would take the sum of the 3 most previous observations, and then divide by 3.

00:58:36.000 --> 00:58:39.000
So you take the average of the 3 most previous observations, and then divide by 3.

00:58:39.000 --> 00:58:54.000
So this is known as a moving average forcast, because, as T increases you, the window that you're considering is shifting.

00:58:54.000 --> 00:58:57.000
So for our formally, you could write it out like this.

00:58:57.000 --> 00:59:05.000
So the moving average forecast with window sites, K. And equal weights is given by this, where you're just doing the sum from.

00:59:05.000 --> 00:59:12.000
I equals 0 to k minus one of y sub t minus i plus epsilon t, so that's random noise.

00:59:12.000 --> 00:59:20.000
So this is basically like for any observation within your timeline or within your training set, you're moving that window.

00:59:20.000 --> 00:59:24.000
And then, once you get into the future, you're stuck with.

00:59:24.000 --> 00:59:26.000
Okay. I only have these observations. So I'll predictions into the future are just the average of your last K observations.

00:59:26.000 --> 00:59:36.000
So going back to the Google stock we're going to show you how to implement this model.

00:59:36.000 --> 00:59:46.000
So pandas has this function called dot rolling. So, okay, now, I know what I was trying to do.

00:59:46.000 --> 00:59:55.000
So you have you put in the.

00:59:55.000 --> 01:00:00.000
Okay. So now that I have the copy here, you're gonna do rolling.

01:00:00.000 --> 01:00:04.000
And then dot mean and then I'll show you what this looks like.

01:00:04.000 --> 01:00:09.000
Let's do dot head at 10, and then window.

01:00:09.000 --> 01:00:18.000
Sorry about that. I wanted to put, let's say 3, and then there's also this other argument you're gonna want called closed equals left.

01:00:18.000 --> 01:00:21.000
And I'll talk about that does in a second. Okay?

01:00:21.000 --> 01:00:37.000
So what roleing does is it will go through whatever column you've provided it, and then sort of do this rolling window type thing so here we have a window of size 3 so it's gonna look at the 3 most recent objects.

01:00:37.000 --> 01:00:48.000
And then, whatever operation you ask for next, it will calculate that observation, or calculate that statistic, using those observations.

01:00:48.000 --> 01:01:09.000
So here, because we put mean the observation on the 3 row, because our window sizes 3 is the mean of the first 3, then the next on row 4 is the mean of rows, one through 3, and then on row 5 is the mean on rows, 2 through 4, so this isn't something that you

01:01:09.000 --> 01:01:18.000
can use only to calculate means, we could also do something like, what is the median of the last 3 rowss? We could also do something like what is the median of the last 3 Rows.

01:01:18.000 --> 01:01:21.000
Let's see, I forget what's standard deviation is.

01:01:21.000 --> 01:01:23.000
Std, so you could even do something like alright. What's the standard deviation on the last 3 rows, etc.

01:01:23.000 --> 01:01:35.000
So as long as you put in like something where you're calculating a statistic on a column, you can use it for rolling.

01:01:35.000 --> 01:01:46.000
So the first argument tells you what the window size is, so I could change the window size to be like 5 or 2, and it changes like the size of the window.

01:01:46.000 --> 01:01:53.000
We call the window, like the number of observations so I'm gonna go back to 3, because that's the model I specified earlier.

01:01:53.000 --> 01:02:13.000
Then I have this argument called closed equals left. So this is just specifying that I want it to be set up to be the 3 most recent in comparison to what we're on that as opposed to other ones as you can center the window, so here, it would be like Y sub T Y sub T plus one y sub

01:02:13.000 --> 01:02:26.000
t minus one y sub t y sub t , oh, that work out so it'd be y sub t y sub t plus one y sub t plus 2, because we're doing time series forecasting.

01:02:26.000 --> 01:02:35.000
We can't use the future to make predictions. So we have to set it so that the side of the window that we start with is the left hand side.

01:02:35.000 --> 01:02:40.000
Okay, so that was a lot of information. So are there any questions about rolling?

01:02:40.000 --> 01:03:00.000
And how it works, or like what closed means, or there any questions about that?

01:03:00.000 --> 01:03:02.000
Okay. So then we, yeah.

01:03:02.000 --> 01:03:08.000
I still had to understand what the whole clothes is so close equals left means.

01:03:08.000 --> 01:03:12.000
We're saying that we're taking like, you know.

01:03:12.000 --> 01:03:18.000
Let's say, for instance, at the in this table, that we're looking at.

01:03:18.000 --> 01:03:23.000
We're taking the last 3 in.

01:03:23.000 --> 01:03:25.000
And then I guess I don't quite understand, like it seems like a closed interval right?

01:03:25.000 --> 01:03:29.000
But.

01:03:29.000 --> 01:03:30.000
So close, is just specifying like, Where are you? Start the window.

01:03:30.000 --> 01:03:33.000
So when it's equal to left, you're starting it and going backwards.

01:03:33.000 --> 01:03:49.000
If it was equal to, I believe if it's equal to left, you're starting it and going backwards. If it was equal to, I believe if it's equal to right, you'd be going for so well, apparently not. So, let's see we can always.

01:03:49.000 --> 01:03:55.000
I'm guessing it means like the last 3, including the current one, as opposed to excluding the current one.

01:03:55.000 --> 01:03:56.000
Yeah. So why don't we?

01:03:56.000 --> 01:04:02.000
So the average for time equals 3. It factors in 0 1, 2 versus 0 1 2 is the average at 2.

01:04:02.000 --> 01:04:03.000
So let's go here, and we'll just read what the definition says.

01:04:03.000 --> 01:04:11.000
Okay, so closed, if right, the first point in the window is excluded from the calculations.

01:04:11.000 --> 01:04:16.000
If left. The last point in the window is excluded from calculations.

01:04:16.000 --> 01:04:39.000
If both, the no points in the window are excluded, and if neither the first and last points in the window are excluded from calculations so if we don't provide anything we can like, get rid of this altogether, we can see that this row is included in the average right, so we don't

01:04:39.000 --> 01:04:41.000
want that if we wanna make predictions like we can't include the observation itself.

01:04:41.000 --> 01:04:52.000
And making the predictions. And so what we want. So here the window is just looking at the the start of it.

01:04:52.000 --> 01:04:58.000
0 one and 2, and so like, we're saying here what we want is to exclude the last point.

01:04:58.000 --> 01:05:04.000
So that's why we want closed equals left.

01:05:04.000 --> 01:05:12.000
And this was experimenting. This is what we want. So when we have closed equals left, it means exclude the last point in the window.

01:05:12.000 --> 01:05:22.000
So, these first 3 rows are our first window here. So we're saying, do not include the third entry of this window, and that's how we get with like this is, get the average of the 3 previous and then this is the average of the 3 previous and so forth.

01:05:22.000 --> 01:05:32.000
Does that make sense?

01:05:32.000 --> 01:05:34.000
Gotcha!

01:05:34.000 --> 01:05:37.000
Yeah, thanks for asking.

01:05:37.000 --> 01:05:45.000
Are there any other questions about this? It's a good thing we asked, cause it help clear up some confusion.

01:05:45.000 --> 01:05:55.000
If we're getting an error from the close to equals left, should we just expect that? That's a we're using an old version or something.

01:05:55.000 --> 01:05:56.000
Okay. Cool.

01:05:56.000 --> 01:05:59.000
That's probably what you're experiencing. So yeah, that would be my guess.

01:05:59.000 --> 01:06:05.000
Sounds good.

01:06:05.000 --> 01:06:09.000
Awesome. Okay? So then we can use this to make predictions.

01:06:09.000 --> 01:06:14.000
Did I do that already? Yes, so I've done it here.

01:06:14.000 --> 01:06:23.000
So this is like the predictions. And so what we can see here is the green dotted line is sort of the fit on the training data.

01:06:23.000 --> 01:06:27.000
So you can see it's delayed because of the rolling average.

01:06:27.000 --> 01:06:30.000
And then the red line with the points is the prediction.

01:06:30.000 --> 01:06:37.000
The forecast on the test set, and so you'll think like, Oh, well, that's weird, like there are gaps here.

01:06:37.000 --> 01:06:53.000
Remember that we don't trade on the weekend, and so these gaps are the weekend so like there's no trading days in the data here remember, we are like on the horizontal axis we plotted the dates, not the time step.

01:06:53.000 --> 01:07:01.000
So like not like time. Step one time, step 2, etc.

01:07:01.000 --> 01:07:20.000
So, in addition to making models, you can use the moving average to try and get a sense of like the trend of your data or the seasonality of your data, and so by what we're seeing here is the same exact data set minus the test set and we're plotting the moving average

01:07:20.000 --> 01:07:25.000
on top of the data. And maybe it is slightly hard to see.

01:07:25.000 --> 01:07:31.000
So I apologize for that. But as you increase the window size, you can start to reveal some of the trends of the data so like here the sizes may be too small.

01:07:31.000 --> 01:07:38.000
And you're basically just fitting the data exactly but as you increase the window size, you start to get like smooth out the bumpiness of the original time series.

01:07:38.000 --> 01:07:48.000
And you start to see things like sort of this exponential increase.

01:07:48.000 --> 01:07:51.000
It seems like in the trend, at least among the time series.

01:07:51.000 --> 01:07:58.000
And so one way to sort of look for patterns in the data that may be are hard to see if you plot, it is.

01:07:58.000 --> 01:08:12.000
You can also plot moving averages with differing window sizes to see if you can get a sense of the trend or the patterns in the data that.

01:08:12.000 --> 01:08:29.000
So in addition to moving averages, you have more like general weighted average forecasts, so like a moving average has an equal weight on every observation, whereas in general they could have any weight as long as they sort of add up to one right so maybe you have reason, to believe that the most recent

01:08:29.000 --> 01:08:33.000
observation is more important than the previous 2 observations.

01:08:33.000 --> 01:08:44.000
So this is an example where it's a weighted average where I'm taking the a weight of 2 thirds to the most recent observation.

01:08:44.000 --> 01:08:47.000
And then weights of 1, 6 to the second and third.

01:08:47.000 --> 01:08:51.000
Most recent observations. So this would be a wayed average forecast.

01:08:51.000 --> 01:08:56.000
And I'm going to code this up as a function.

01:08:56.000 --> 01:09:16.000
And remind myself what I was trying to do when I wrote this.

01:09:16.000 --> 01:09:24.000
Okay. I think I remember now. Alright. So then, what I was trying to do was just the T is gonna be the position.

01:09:24.000 --> 01:09:28.000
So I'm gonna do.

01:09:28.000 --> 01:09:43.000
Return, 2 divided, 3, 2 divided, by 3, so 2 thirds times, Google train dot closing price at T.

01:09:43.000 --> 01:09:52.000
Maybe dot values at T minus one and then plus one over 6 times.

01:09:52.000 --> 01:10:04.000
Google, train, dot closing price values at T minus 2, and then plus the last 1 one over 6 times.

01:10:04.000 --> 01:10:10.000
Goo train dot closing price.

01:10:10.000 --> 01:10:15.000
Top values, t, minus, 3. Okay, so this is just taking like this very particular one.

01:10:15.000 --> 01:10:27.000
And writing it as a function.

01:10:27.000 --> 01:10:28.000
And you did dot value instead of.

01:10:28.000 --> 01:10:31.000
Oh, Jeeze, what did I do? Not? Yeah. Yup, okay.

01:10:31.000 --> 01:10:35.000
There we go. And so this is what that particular one looks like.

01:10:35.000 --> 01:10:36.000
Doesn't look very different from the moving average one. In my opinion.

01:10:36.000 --> 01:10:45.000
But this is just introducing the idea that you don't need to have equal weights on your moving average.

01:10:45.000 --> 01:10:51.000
You could have a weighted average where you play around with the different weights.

01:10:51.000 --> 01:10:59.000
So the underlying statistical model for any any of these models that we're looking at.

01:10:59.000 --> 01:11:04.000
Up to this point is known as an Ma Q model. So if we let epsilon sub t be a white noise sequence, then a moving average stochastic process of order.

01:11:04.000 --> 01:11:16.000
Q ma q is given by y sub t is equal to beta 0 epsilon t plus beta one epsilon t.

01:11:16.000 --> 01:11:23.000
Minus one plus dot dot dot plus beta q epsilon t minus one plus dot dot dot plus beta q epsilon t minus q. So this might seem like weird like, what am I talking about?

01:11:23.000 --> 01:11:32.000
This is gonna come back when we talk about what are known as a rema models and armor models.

01:11:32.000 --> 01:11:37.000
Tomorrow. Models and arma models tomorrow. So you know, keep this under your cap until tomorrow.

01:11:37.000 --> 01:11:40.000
It will come.

01:11:40.000 --> 01:11:45.000
So Lara is asking, How do you go about figuring out what weights to apply?

01:11:45.000 --> 01:11:46.000
So in general you probably aren't gonna try fiddling around with like figuring out.

01:11:46.000 --> 01:12:04.000
Weights on your own, you would do something like what we're going to learn next called smoothing where you could do something like what we're going to learn next called smoothing where you could do. Like set up a cross validation to figure out the coefficients for smoothing if that doesn't make

01:12:04.000 --> 01:12:20.000
sense we haven't covered smoothing, but if if a few minutes we will or you could fit what's known as an Maq model in which case there's sort of like an algorithm that will go on in the background what's known as an Maq model in which case they're sort of like an Algorithm that will go on in the

01:12:20.000 --> 01:12:27.000
background when you call this, we're not covering it in this notebook because we are gonna learn like the more general thing, tomorrow but you you can you fit that?

01:12:27.000 --> 01:12:31.000
And then the fitting algorithm figures out the optimal weights for you.

01:12:31.000 --> 01:12:35.000
So you don't have to try and code up like and keep track of.

01:12:35.000 --> 01:12:49.000
Okay, what's the wait on this one? The weight on that one and the weight on the third one.

01:12:49.000 --> 01:12:58.000
Okay. So the last set of models that we're going to look at in this notebook are known as exponential smoothing models so there's going to be 3.

01:12:58.000 --> 01:13:04.000
And basically the names are going to be somewhat funny because it's basically going to be regular exponential, smoothing, double, exponential smoothing models.

01:13:04.000 --> 01:13:10.000
So there's going to be 3. And basically, the names are going to be somewhat funny because it's basically going to be regular, exponential, smoothing, double exponential smoothing and triple exponential smoothing.

01:13:10.000 --> 01:13:22.000
And it'll make sense with. That means as we see the models. At least I hope it will. So we're gonna sort of bring back this hat notation that we've seen in earlier notebooks where Y sub T sub hat is going to be the prediction or the forecast at time sub

01:13:22.000 --> 01:13:27.000
T. So the first one we're looking at is simple, exponential, smoothing.

01:13:27.000 --> 01:13:28.000
So for this, you're going to say that the prediction at time, t plus one is the observed value.

01:13:28.000 --> 01:13:40.000
Times, alpha at time sub t plus one minus alpha of the times.

01:13:40.000 --> 01:13:50.000
The prediction at time. Sub t. So this is if you're in the training set, and then when you're not in the training set.

01:13:50.000 --> 01:13:56.000
You're going to do Alpha times. The last value plus one minus alpha times.

01:13:56.000 --> 01:13:59.000
The predicted value for the the last observed value.

01:13:59.000 --> 01:14:00.000
So Alpha. Here is another example. Of a hyper parameter.

01:14:00.000 --> 01:14:08.000
It's set between 0 and one you can select it by hand.

01:14:08.000 --> 01:14:16.000
Or you can find it through some kind of algorithm. So the way that we're going to do it here, there's a way we're going to implement it.

01:14:16.000 --> 01:14:31.000
There's a way called maximum likelihood estimation that can be run in in the background to find a value for Alpha, or you can do cross-validation that can be run in in the background to find a value for Alpha, or you can do cross validation. And set up a grid of alpha values and then find the one that gives you

01:14:31.000 --> 01:14:32.000
the best Msc. So there's sort of 2 ways that I think are helpful to think about.

01:14:32.000 --> 01:14:41.000
This model is the first, is sort of like an adjustment of a naive force.

01:14:41.000 --> 01:14:42.000
So if you do, a little rearranging, you can turn simple.

01:14:42.000 --> 01:14:50.000
Exponential, smoothhing into the following, so you take the estimate for the next time.

01:14:50.000 --> 01:14:55.000
Step is equal to the estimate at the current time. Step plus alpha times.

01:14:55.000 --> 01:15:06.000
Y sub t minus y hat sub t, and so one way to think about this is, we've got the current plus alpha times.

01:15:06.000 --> 01:15:09.000
The residual or sort of an error to so like.

01:15:09.000 --> 01:15:13.000
If we sort of squint this E sub t is sort of like an estimate of the epsilon from the original naive model.

01:15:13.000 --> 01:15:28.000
So this is sort of like a play on the naive model another way we might wanna think about it is, it's sort of a weighted average that we're going to sort of I keep saying sort of a lot that we're going to optimize.

01:15:28.000 --> 01:15:39.000
And so the optimization can happen either through this maximum likelihood that I talked about, or it can be through the cross validation that I mentioned, as well.

01:15:39.000 --> 01:15:48.000
So if you rewrite all of this out from you know all the way from the beginning of the training set the prediction at time.

01:15:48.000 --> 01:16:01.000
T plus one is alpha times y sub t plus one minus alpha times y sub t minus one plus one minus alpha squared times y sub t minus alpha squared times y sub t minus 2, and so forth.

01:16:01.000 --> 01:16:06.000
So you can keep doing this. This is a weighted sum that includes all prior information.

01:16:06.000 --> 01:16:11.000
So all of the previous observations are included in this sum, and then you can play around with the value of Alpha.

01:16:11.000 --> 01:16:15.000
So if it is a lower value of Alpha, you pay more attention to the most recent.

01:16:15.000 --> 01:16:29.000
Observations right? And then, if it is a higher value of alpha, or did I get that backwards?

01:16:29.000 --> 01:16:36.000
Let's see. So if Alpha is big, the current observation has a high weight, and then the next current.

01:16:36.000 --> 01:16:50.000
So I think if Alpha is so, the bigger Alpha is, the more you pay attention to the most recent, and then the smaller alpha is, the less you pay attention to the most recent okay, alright.

01:16:50.000 --> 01:16:54.000
So I see I have a question.

01:16:54.000 --> 01:16:59.000
So Jonathan's asking. I've noticed these models tend to stop after end, just reporting.

01:16:59.000 --> 01:17:19.000
The last result is that normal, or would you let them model, predict the later periods in the test window based on the predictions earlier in that window so you do not use the predictions so like these models are always good to stop at the most recent observation you don't use the predictions that were made

01:17:19.000 --> 01:17:22.000
so like, let's say again, the training sets size one.

01:17:22.000 --> 01:17:26.000
You wouldn't use the prediction of n plus one.

01:17:26.000 --> 01:17:36.000
So then predict n plus 2. So you're only going to ever with these models, only going to ever use the last observation.

01:17:36.000 --> 01:17:37.000
Like only use the training set to make predictions on test sets or hold out sets.

01:17:37.000 --> 01:17:44.000
So you're never going to use like previous predictions to then make future predictions.

01:17:44.000 --> 01:17:59.000
If that makes sense so one way that this gets adjusted for for things like trends is, they add, like a trend term like we talked about, and the next exponential smoothing is for data with a trend.

01:17:59.000 --> 01:18:13.000
And then, similarly, for data with seasonality, there's also another extension. So that's how these models will work.

01:18:13.000 --> 01:18:21.000
Are there other questions?

01:18:21.000 --> 01:18:25.000
It's good about the smoothing.

01:18:25.000 --> 01:18:39.000
So it seems like if you scroll up a little bit.

01:18:39.000 --> 01:18:40.000
Yup!

01:18:40.000 --> 01:18:44.000
The even on like that. Point that we've collected in some sense the prediction for y 2 might no longer be y 2 right is that either we're just like kind of inducting I'm like, Oh, okay, you know, why one, you're predicting one.

01:18:44.000 --> 01:18:47.000
And then based on your parameter Alpha you're adjusting how much you predict y.

01:18:47.000 --> 01:19:00.000
2, and so on. Or is it like, you have some model that gives you predictions, and you're sort of smoothing out after you've develop some other model.

01:19:00.000 --> 01:19:01.000
So all of the predictions are made using this function here. So like.

01:19:01.000 --> 01:19:15.000
Why, one, I think, it's just y one. It's just the starting point but then, like y one, I think it's just y one. It's just the starting point. But then, like y 2 would be alpha times.

01:19:15.000 --> 01:19:21.000
Y one sorry. Alpha times y one plus one minus alpha times.

01:19:21.000 --> 01:19:26.000
Y one hat, which would be the.

01:19:26.000 --> 01:19:30.000
Oh, so I think there's just. I may have forgotten to write it down.

01:19:30.000 --> 01:19:35.000
I think there's usually like an assumption of like a Y 0 that you don't observe.

01:19:35.000 --> 01:19:40.000
If that makes sense. Yeah, but like.

01:19:40.000 --> 01:19:45.000
You're not using the actual values in this particular model, whereas in the baseline models we just use the actual values.

01:19:45.000 --> 01:19:48.000
But remember, like we don't care about how good it is at fitting the actual training set.

01:19:48.000 --> 01:20:01.000
What we care about is like our ability to forecast.

01:20:01.000 --> 01:20:10.000
And then Zach's asking, Why are stocks treated as having equal length time steps when there are weekends and holidays without trading?

01:20:10.000 --> 01:20:12.000
Is this a simplification we make for illustrative purposes.

01:20:12.000 --> 01:20:21.000
Or is it a common assumption? So in this, in notebooks, I'm just making it because I need to find a time series to demonstrate the data.

01:20:21.000 --> 01:20:25.000
And this was one that I thought people might be interested in seeing.

01:20:25.000 --> 01:20:28.000
So that's why I use it. I'm not.

01:20:28.000 --> 01:20:35.000
I've never worked in sort of trying to actually predict stock markets, so I don't know if this is a common assumption because I do know, like things could happen about the company over the weekend.

01:20:35.000 --> 01:20:44.000
That maybe would impact the price. Starting on the next trading day.

01:20:44.000 --> 01:20:54.000
But I think maybe that would be, you know, in these sort of models you could maybe assume that like that's absorbed into the random noise.

01:20:54.000 --> 01:21:06.000
If that makes sense. So I'm not sure what's done in practice, because I've never worked with these types of problems like with stock markets in these notebooks, it's just in assumption made for the models.

01:21:06.000 --> 01:21:15.000
And it was just a data set, I thought, might be interesting, interesting to people.

01:21:15.000 --> 01:21:16.000
Okay, so how do we fit simple exponential smoothing in Python?

01:21:16.000 --> 01:21:23.000
So we use the stats, models, packages, package.

01:21:23.000 --> 01:21:29.000
So here's a link to their documentation. So this may or may not be installed on your computer.

01:21:29.000 --> 01:21:33.000
So to check that you have it installed. You can import stats, models.

01:21:33.000 --> 01:21:34.000
It's standard to import it's a link to their documentation.

01:21:34.000 --> 01:21:35.000
So it should check that you haven't installed. You can import stats, models. It's standard to import it. As Sm.

01:21:35.000 --> 01:21:48.000
Just like happened earlier. Somebody asked a question. Oh, this part of pandas isn't working.

01:21:48.000 --> 01:22:01.000
Is it because my version is different? If what I code up doesn't work on yours, check the version that you have, and it's probably because the versions different. If what I code up doesn't work on yours check the version that you have and it's probably because the versions are slightly different.

01:22:01.000 --> 01:22:02.000
So you know, that's usually gonna be the answer.

01:22:02.000 --> 01:22:19.000
Well, I guess the most usual and common answer will be, there's a small typo, and what you what you typed the second most common answer for a difference between your code and your code is that our versions are different so always consult the documentation for your particular version what i'm writing I think should work.

01:22:19.000 --> 01:22:26.000
But sometimes stats. Models will make changes, slight changes that change the way you fit the model.

01:22:26.000 --> 01:22:29.000
So you can find instructions for installing stats models with Pip and conductions for installing stats, models with Pip and conduct.

01:22:29.000 --> 01:22:34.000
Here, and then, I believe, after last week. My hope is that everybody has practice installing a python package.

01:22:34.000 --> 01:22:43.000
If not, you can check out the instructions on our website.

01:22:43.000 --> 01:22:48.000
I believe it's somewhere under first steps for the data science website.

01:22:48.000 --> 01:22:57.000
So we're gonna import the model directly. So there is a simple X smoothing model type and stats models.

01:22:57.000 --> 01:22:59.000
That's we can find here. So we would do.

01:22:59.000 --> 01:23:04.000
This is part of the Tsa, which is Time series, analysis.

01:23:04.000 --> 01:23:15.000
Api and Stats, models, so from stats, models, Tsa Api, we're going to import simple exp with lowercase X and P.

01:23:15.000 --> 01:23:21.000
Smoothing with a capital S. So then, what to fit this type of model?

01:23:21.000 --> 01:23:28.000
It is slightly different than the model fitting process and essay learn because it's a different package.

01:23:28.000 --> 01:23:37.000
So the first thing we're gonna do is we're going to call simple exp smoothing with a capital E.

01:23:37.000 --> 01:23:50.000
Then you're going to input the training data. So Goog, train dot closing price, then I'm going to call dot fit inside fit.

01:23:50.000 --> 01:23:59.000
You put the alpha, which in the Stats models code is called smoothing light.

01:23:59.000 --> 01:24:02.000
For illustrative purposes. I'm just gonna choose.

01:24:02.000 --> 01:24:06.000
Let's say point 5. And let's do point 7. No reason.

01:24:06.000 --> 01:24:15.000
I just decided point 7 on a win. And then the next thing we're gonna do is put optimize equals, false.

01:24:15.000 --> 01:24:23.000
So when optimized, is equal to true, it's not going to take your input for the smoothing level and use it.

01:24:23.000 --> 01:24:34.000
It will use it as an initial guess at Alpha, and then find the one that it wants to use for the final model as using the method of maximum likelihood.

01:24:34.000 --> 01:24:42.000
I believe, and so then, if we want to have a set value of Alpha, we have to set optimized equal to false.

01:24:42.000 --> 01:25:00.000
Okay. So in our particular case of predictive modeling, what we would probably do instead of using the maximum likelihood which will best fit the training data, we would want to do sort of a cross-validation to go through and try different values of alpha probably evenly spaced on a

01:25:00.000 --> 01:25:02.000
grit. So now that you have the fitted model.

01:25:02.000 --> 01:25:14.000
So we have a fitted model here when we want to get the fitted values, and maybe I'll demonstrate this as its own code chunk to get the fitted values.

01:25:14.000 --> 01:25:19.000
You would call the variable that you stored the model in, and then you can do.

01:25:19.000 --> 01:25:22.000
Dot fitted.

01:25:22.000 --> 01:25:31.000
No underscore dot fitted values. And so what this is are these?

01:25:31.000 --> 01:25:55.000
So the values for the training set, and then to get the forecast, you would do the variable that you stored the model in dot forecast, and then you just input a number that is like how far out you'd like to forecast so if I just want to forecast one day I put one if I want to forecast 2 days into

01:25:55.000 --> 01:26:02.000
the future I would put 2 I wanted to forecast 2 days into the future. I would put 2. I wanted to forecast the length of the test set, which is what I'm going to do in this version.

01:26:02.000 --> 01:26:12.000
I would do length of Google. Test. Okay? And you can notice, the forecast is exactly the same, no matter how far out in the future I go.

01:26:12.000 --> 01:26:17.000
And we can see, you know, that should be expected based on the format of the model.

01:26:17.000 --> 01:26:28.000
Okay, so here's what this forecast looks like. So the green dotted line are these fitted values and the red line with the circles is the the forecast.

01:26:28.000 --> 01:26:35.000
And this shouldn't say weighted average. It should say.

01:26:35.000 --> 01:26:43.000
Simple exponential smoothing. Okay? And then, as I said, we could use cross-validation or availability set to find the best alpha.

01:26:43.000 --> 01:27:05.000
So I'm gonna end it there for today. After asking answering any questions tomorrow we'll pick up with double exponential smoothing and triple exponential smoothing and triple exponential smoothing and then finish out as far as we can in the timeline you may notice that I

01:27:05.000 --> 01:27:06.000
have sort of a next steps, and then 2 additional notebooks.

01:27:06.000 --> 01:27:14.000
After that so notebooks 9 and 10, we're not gonna cover in live lecture.

01:27:14.000 --> 01:27:17.000
So notebook 9 builds on some stuff that we'll learn in classification.

01:27:17.000 --> 01:27:25.000
And then notebook 10 is sort of a one-off thing for people interested in the profit model which is popular in industry.

01:27:25.000 --> 01:27:26.000
Yeah. But on that note let's answer some questions and then sign off for today.

01:27:26.000 --> 01:27:30.000
So Jonathan's asking, Will this class cover panel data models?

01:27:30.000 --> 01:27:39.000
So no, and I don't know what you mean.

01:27:39.000 --> 01:27:42.000
I don't know what you mean. I don't know what panel data refers to.

01:27:42.000 --> 01:27:52.000
I've never heard the term before, so no, we're not going to cover that unless maybe it does mean something that we are going to cover and I just never heard the term before.

01:27:52.000 --> 01:27:55.000
But I don't think we'll be covering panel data.

01:27:55.000 --> 01:28:03.000
Are there any questions about any other questions that people want to ask before I stop recording?

01:28:03.000 --> 01:28:18.000
Sorry. I think you might have answered this earlier, but so, if you have other features, you know, in addition to just the time it like is that something that's gonna be covered.

01:28:18.000 --> 01:28:22.000
I think you mentioned it earlier with Erica's question, but.

01:28:22.000 --> 01:28:23.000
Yeah, so we're not gonna cover those particular models.

01:28:23.000 --> 01:28:27.000
I believe in the practice problems. There is a model that cover like a way to do.

01:28:27.000 --> 01:28:33.000
I believe in the practice problems for Time Series I do cover how to fit such a model.

01:28:33.000 --> 01:28:57.000
It is in the Stats models, documentation. You can see it there as well. There are models that do that. We're just not going to cover it in the live lectures, and then I think, and the notebook number 9 in here is a model that can accommodate additional features as well.

01:28:57.000 --> 01:28:58.000
Yeah.

01:28:58.000 --> 01:29:03.000
Okay. Thank you.

01:29:03.000 --> 01:29:07.000
Any other questions?

01:29:07.000 --> 01:29:08.000
Yeah.

01:29:08.000 --> 01:29:14.000
Yes, I had a question. So in these examples, you've got you're basically using a 14 day horizon.

01:29:14.000 --> 01:29:15.000
Yup!

01:29:15.000 --> 01:29:18.000
And but you're multiplying, you know you're doing.

01:29:18.000 --> 01:29:21.000
Np. Dot ones, and your kind of just taking the first value and multiplying it and applying it to all of the 14 values.

01:29:21.000 --> 01:29:34.000
So. Is there ever a time when you can actually have different values in those 14 year horizons?

01:29:34.000 --> 01:29:47.000
Yeah. So the only time what you're gonna see different values across the forecast are gonna be models and these notebooks are gonna be models with trend components.

01:29:47.000 --> 01:30:05.000
So the 2 trend models in the baseline and the 2, and the seasonal models like they have different values for each part of the forecast the 2 trend models in the baseline and the 2 and the seasonal models like they have different values for each part of the forecast this next

01:30:05.000 --> 01:30:17.000
notebook when we do double, exponential and triple exponential, they allow for different values for each step of the forecast, and then we are going to learn a model called their rema, which will have different values, depending on the options.

01:30:17.000 --> 01:30:21.000
You set for the but the ones that we've set so far are fixed like they, you know, because we can only use the training data to make the predictions.

01:30:21.000 --> 01:30:33.000
The forecast. They're fixed with using the last. However, many observations.

01:30:33.000 --> 01:30:34.000
Yeah. Thanks.

01:30:34.000 --> 01:30:36.000
Yup!

01:30:36.000 --> 01:30:38.000
Okay, so, I'm going to go ahead and stop recording.

01:30:38.000 --> 01:30:40.000
But I will stick around for any extra questions if you have to go.

01:30:40.000 --> 01:30:43.000
That's fine, and I'll see you tomorrow.

01:30:43.000 --> 01:30:51.000
But I'll see