WEBVTT

00:00:00.000 --> 00:00:01.000
Okay, welcome back. This is lecture number 7. So this is our second day on time series.

00:00:01.000 --> 00:00:04.000
So that's what we're gonna be wrapping up today.

00:00:04.000 --> 00:00:15.000
So the last time we left off we were working on learning about appveraging and smoothing.

00:00:15.000 --> 00:00:16.000
So that's where we're going to pick up today.

00:00:16.000 --> 00:00:27.000
Let me get my chat set up for questions. And okay, so.

00:00:27.000 --> 00:00:28.000
What we had gotten through was sort of talking through moving average forecasts.

00:00:28.000 --> 00:00:42.000
We applied them to this Google training set along the way we learned things like dot rolling, which takes a rolling average of a set window size.

00:00:42.000 --> 00:00:48.000
Then we applied these models to that data set.

00:00:48.000 --> 00:00:56.000
We also had mentioned how you can use rolling averages to get a sense of features of the data set like whether or not it has a trend.

00:00:56.000 --> 00:01:17.000
We talked about more general weighted average forecasts where a moving average has a sort of set weights on the individual, the one the earlier moving average has equal weight on all the previous observations, whereas in general you can have weighted averages where you have different weights on each of

00:01:17.000 --> 00:01:20.000
the previous observations, and this was an example of that.

00:01:20.000 --> 00:01:27.000
And then finally, we introduced exponential smoothing with just the first one, which was simple, exponential, smoothing.

00:01:27.000 --> 00:01:36.000
And so, because for the rest of this notebook we're going to build upon simple exponential smoothing I think it'd be a good idea to review what simple, exponential smoothing is.

00:01:36.000 --> 00:01:44.000
So we have a time. Series y sub t. And then for that time series, any observation that's in the training set the 4 key.

00:01:44.000 --> 00:01:45.000
The fitted value on that time series. You take alpha Times, y sub t.

00:01:45.000 --> 00:01:54.000
So alpha times the current value plus one minus alpha times.

00:01:54.000 --> 00:01:55.000
The fitted or forecasted value at that time step.

00:01:55.000 --> 00:02:08.000
So this would be like one step into the fifth. If your data, or if what you're trying to forecast is outside of the training set, you would do alpha times.

00:02:08.000 --> 00:02:12.000
The last observation, plus one month. Alpha times.

00:02:12.000 --> 00:02:18.000
The fitted value for the last observation. So, Alpha, here is a hyper parameter, that is, from 0 to one inclusive, that you can either choose by hand, or you can use some sort of algorithm to fit it.

00:02:18.000 --> 00:02:42.000
So, if you choose some side of some sort of algorithm, this could be the one that's implemented by stats, models, or you could use a cross filedation process to find the one that has the best average Msc okay, so we talked about how this is sort of like a slight adjustment on the naive

00:02:42.000 --> 00:02:53.000
forecast where you're taking the current fitted value, and then adding alpha times like a random draw where the random draw is being approximated by the error term.

00:02:53.000 --> 00:03:03.000
We're also thinking of it as like a weighted average that we can sort of more easily optimize than just try and find each individual weight by setting it up as like Alpha times.

00:03:03.000 --> 00:03:12.000
This geometric sum. So to implement simple exponential smoothing in python, we use stats, models.

00:03:12.000 --> 00:03:32.000
First, we talked about the package and checking to make sure you have it installed, but once you have it installed, you can do simple exponential smoothing with this model object called simple X exp smoothing, and then to fit such a model, you call the model object input the training.

00:03:32.000 --> 00:03:37.000
Data which is this, the time series? And then you call dot fit.

00:03:37.000 --> 00:03:54.000
You can specify the smoothing level which is your Alpha, what we called alpha, and unless you set optimized equal to false, the algorithm will then go through sort of a fitting process to find the optimal alpha where in this case it's doing optimal for the training set

00:03:54.000 --> 00:04:08.000
not optimal for predicting so if you're going to use this for predicive models that we're going to do, you're going to want to set optimized equal to fours and then use something like cross validation to find your value of Alpha so this now, produces a

00:04:08.000 --> 00:04:13.000
fitted model object in stats models. So once you have a fitted model object, you can call dot fitted values which will give you what the fit is on the training data.

00:04:13.000 --> 00:04:23.000
And then dot forecast for you input and integer. So this is the length of the test set.

00:04:23.000 --> 00:04:29.000
But we could also do just like 2, and the integer a positive integer.

00:04:29.000 --> 00:04:37.000
It will give you that many time steps into the future. So to get the forecast on the test set, we would just put in the length of the test set.

00:04:37.000 --> 00:04:51.000
Now I want to remind everybody that because we're just like the models and these notebooks, we're not doing cross validation or a validation set. So I'm just sort of giving you a sense of what's they look like against a set that it wasn't trained on so because

00:04:51.000 --> 00:04:57.000
of that. We're looking at the test set in practice when you're doing forecasting you do not touch the test set till the very end right so you'd want to do something like a validation set or cross validation.

00:04:57.000 --> 00:05:09.000
So this is just for the simplicity of the of the notebook to be able to introduce the model and then look at it on a data set. It wasn't trained on.

00:05:09.000 --> 00:05:17.000
And so here's what that model looks like for this particular choice of Alpha.

00:05:17.000 --> 00:05:41.000
And this particular data set. Okay, so are there any questions on any of the stuff that we talked about last time that you wanna make sure we wrap up before we move on?

00:05:41.000 --> 00:05:47.000
Okay, so after simple, exponential smoothing, this is a model that you might want to use.

00:05:47.000 --> 00:05:52.000
It works pretty well here, I would say, in terms of like, in comparison to some of the baselines.

00:05:52.000 --> 00:06:00.000
We looked at, but in general simple, exponential smoothing is for data without a trend and data without seasonality.

00:06:00.000 --> 00:06:07.000
So both no trend and no seasonality there are some extensions of exponential smoothing that account for this, so one extension is for data that has a trend, but no seasonality.

00:06:07.000 --> 00:06:14.000
And so this is called double exponential smoothing.

00:06:14.000 --> 00:06:19.000
It's called this because you just do exponential smoothing twice.

00:06:19.000 --> 00:06:22.000
So this is going to look weird. The way it's set up.

00:06:22.000 --> 00:06:25.000
It's just a kind of have to do it this way.

00:06:25.000 --> 00:06:29.000
Because it's a slightly complicated like thing to write down.

00:06:29.000 --> 00:06:33.000
So the forecast here is for any value within the training set.

00:06:33.000 --> 00:06:39.000
You're going to do this thing called S. Sub t, one plus b sub t minus one.

00:06:39.000 --> 00:06:42.000
And I think I probably should have kept my notation consistent.

00:06:42.000 --> 00:06:48.000
But this is for predicting, like at time T, we're using the previous observation.

00:06:48.000 --> 00:06:55.000
So s sub t plus one plus b sub t minus one, and we will talk about like what the heck are. S and B.

00:06:55.000 --> 00:07:02.000
We'll talk about that in just a second, and then for observations outside of the training set. You do? S.

00:07:02.000 --> 00:07:09.000
Sub n. So this is the corresponding s term for the last observation of the time series plus t minus n.

00:07:09.000 --> 00:07:13.000
So this is just how many times it's in the future.

00:07:13.000 --> 00:07:19.000
Are you from your training? Set times B. Sub n, and so now, let's talk about what this S.

00:07:19.000 --> 00:07:27.000
And B are. So we've got S. And then here again, I guess I'm really really inconsistent with my notation.

00:07:27.000 --> 00:07:28.000
But the value of S at time t plus one is alpha times y sub t plus one minus alpha times.

00:07:28.000 --> 00:07:43.000
S, sub t, minus one plus b sub t, minus one. So if you keep track of this, this is the predicted value at time, T.

00:07:43.000 --> 00:07:44.000
Where we're assuming S of one is just equal to Y sub.

00:07:44.000 --> 00:07:56.000
One. The first observation. So this part should look very familiar to simple, exponential, smoothing.

00:07:56.000 --> 00:08:02.000
It's just simple, exponential smoothing. Where now the S is sort of being broken down into components.

00:08:02.000 --> 00:08:10.000
S. And B, and then the B part is where the double part of double exponential smoothing comes in. You're doing here.

00:08:10.000 --> 00:08:30.000
You're doing exponential smoothing. But now for the trend component, so b sub t plus one is equal to beta times, s sub t minus s sub t minus one plus one minus beta times b sub t minus one so this right here, might look familiar it's sort of like a first

00:08:30.000 --> 00:08:31.000
differencing. So this is giving you a estimate of the trend or the of the of the drift sort of thing.

00:08:31.000 --> 00:08:54.000
The difference that you might be expecting to get from one time step to the next, and you're basically just doing like an exponential smoothing on in the baseline model sort of what we called the betas for the random walks with drifts so here you're basically just doing

00:08:54.000 --> 00:09:05.000
exponential smoothing on that, and then to get your prediction you're taking like the last observation bus to your exponentially smooth, trend component, and doing exponential smoothing on that as well.

00:09:05.000 --> 00:09:13.000
So 2. Exponential smoothings. So before we pause for questions, I'll show you how to do this in stat models.

00:09:13.000 --> 00:09:30.000
It's the exact same process. And I guess I should also mention the Beta is also a hyper parameter from 0 to one just like Alpha so now you have 2 hyper parameters that you can adjust by hand use the stats models to get the best version on the

00:09:30.000 --> 00:09:35.000
training set where the what they mean by best depends on whatever algorithm they're using.

00:09:35.000 --> 00:09:43.000
Or you can do cross validation with a grid of alpha values and a grid of Beta values to find the one that gives you the best Msc.

00:09:43.000 --> 00:09:48.000
Say so. The way to fit double exponential smoothing is the same.

00:09:48.000 --> 00:09:58.000
It's just the model object is different. So here the model object is called the Holt model, named after one of the researchers that developed the model.

00:09:58.000 --> 00:09:59.000
If you look at this, you maybe will guess the other one is Winters.

00:09:59.000 --> 00:10:07.000
And so then you call Holt. You put in the training set.

00:10:07.000 --> 00:10:12.000
You call dot fit. You put in smoothing level. So this is still Alpha.

00:10:12.000 --> 00:10:16.000
So for this initial version, I just chose it to be point 6.

00:10:16.000 --> 00:10:21.000
No particular reason, and then smoothing underscore trend is the value of Beta.

00:10:21.000 --> 00:10:23.000
So here I've chosen both Alpha and Beta to be point 6, and then again I chose optimize to be false because I don't want to use.

00:10:23.000 --> 00:10:30.000
I don't want stats. Models to find the one that's best.

00:10:30.000 --> 00:10:38.000
On the training data, you know, because that's not what I care about with predictive models.

00:10:38.000 --> 00:10:43.000
So now it's been fit. And then just sort of looking at the difference.

00:10:43.000 --> 00:10:53.000
So here, you know, it's kind of capturing that the values I've chosen are sort of picking up more so on the very recent trend of a slight downturn in the stock price.

00:10:53.000 --> 00:10:54.000
And so that's what you see it going down. So on the very recent trend of a slight downturn in the stock price. And so that's why you see it going down.

00:10:54.000 --> 00:10:56.000
Another thing you might be wondering is, this doesn't look like a constant decrease.

00:10:56.000 --> 00:11:06.000
So remember here our time series, the time steps we're considering as sequential our trading days.

00:11:06.000 --> 00:11:10.000
And so, because there's a gap for the weekends like.

00:11:10.000 --> 00:11:11.000
That's why it looks like it's been shifted in this sort of piece. Wise way.

00:11:11.000 --> 00:11:20.000
If we, instead of plotting the date on the horizontal axis for applieding trading day, one trading day, 2. Trading day, 3.

00:11:20.000 --> 00:11:30.000
It would just look like a straight line. Okay? Alright. So before moving on to the last exponential smoothing model, we'll consider.

00:11:30.000 --> 00:11:47.000
Are there any questions about the double exponential smoothing?

00:11:47.000 --> 00:11:49.000
Okay. So the last model, we're going to talk about is triple, exponential, smoothing.

00:11:49.000 --> 00:12:01.000
So, okay, yeah, we have a question. So Yahweh is asking, why is the trend decreasing?

00:12:01.000 --> 00:12:02.000
So the overall trend of the original data set right is increasing.

00:12:02.000 --> 00:12:09.000
And I think we have it here right? It's increasing.

00:12:09.000 --> 00:12:30.000
But if you notice the more recent trend, is a decrease trend, and so the values of Alpha and Beta that we chose for this particular model are putting a heavier focus on the more recent trend of going down and sort of predicting a continued downward trend now this

00:12:30.000 --> 00:12:50.000
isn't the best for this particular. You know this particular option. So we would want to again do a cross-validation to sort of get like on average, what seemed to be the best values of Alpha and Beta.

00:12:50.000 --> 00:12:56.000
Tian and Wong is saying, obviously, we've said that these methods don't predict the stock market.

00:12:56.000 --> 00:13:00.000
Well, what kinds of scenarios might this be good for in practice?

00:13:00.000 --> 00:13:16.000
So this model may or may not do well on predicting various stock market data, just because I chose like values of Alpha and Beta that didn't do well on this one test set doesn't mean that you couldn't get a good model with exponential smoothing it's just a tool

00:13:16.000 --> 00:13:19.000
in your tool belt, and you like like with regression.

00:13:19.000 --> 00:13:31.000
You'll try lots of different models on the time series that you're working on, and then find which one tends to do best tends to do best overall with something like cross validation.

00:13:31.000 --> 00:13:43.000
So this could work. Well, it's just I haven't gone through the process of trying to find the best values of Alpha and Beta.

00:13:43.000 --> 00:13:49.000
Okay. So the triple, exponential smoothing are also sometimes known as the halt winters Forecast.

00:13:49.000 --> 00:14:10.000
Again after the researchers that developed the model. This is 3 types of smoothhing, and so they're gonna look really weird, because not only is there another type of smoothing, there is now a new type of like there's 2 types of this model so it's like we have 2 things 2 different models with 3 different forms.

00:14:10.000 --> 00:14:19.000
Of smoothing. So the key thing to think of is, I'm going to ignore the particular models for a second and just give you a general breakdown.

00:14:19.000 --> 00:14:29.000
So here you're still gonna have smoothing for sort of just the time series, like if it didn't have a trend or seasonality, then you'll have the smoothing part for the trend.

00:14:29.000 --> 00:14:33.000
So that's the S and B that we just saw and double exponential smoothing.

00:14:33.000 --> 00:14:39.000
And then the final portion is another smoothing. But now, this time for the seasonality.

00:14:39.000 --> 00:14:40.000
So this is for data sets that seem to exhibit seasonality the third type of smoothing is for that.

00:14:40.000 --> 00:14:54.000
So the reason there are 2 types of the whole winters model is because of the different types of seasonality.

00:14:54.000 --> 00:14:55.000
So the first type of seasonality, so the first type of seasonality, you might see, is something called multiplicative seasonality.

00:14:55.000 --> 00:15:02.000
So multiplicative seasonality, so multiplicative, seasonality, is the idea behind.

00:15:02.000 --> 00:15:04.000
It is, for at each step in your season your time series tends to be multiplied by a certain value.

00:15:04.000 --> 00:15:20.000
So, for instance, like I think here I have an example with trying to build on our infectious diseases.

00:15:20.000 --> 00:15:28.000
Example, with the flu training set. So in an infectious disease model and I wanna like sort of preface.

00:15:28.000 --> 00:15:31.000
This as like this isn't what infectious disease modelers use, but it's sort of just trying to give a rationale behind like multiplicative seasonality.

00:15:31.000 --> 00:15:46.000
So, in an infectious disease, that is seasonal. Early in the season you have to consider sort of this parameter called are not, and R.

00:15:46.000 --> 00:15:51.000
Naught is the number of infected, or the number of susceptible individuals.

00:15:51.000 --> 00:16:03.000
So people who are not immune to the disease that an infected person is likely to infect during their lifetime and in like a population where there ares like nobody else that's sick.

00:16:03.000 --> 00:16:06.000
So it's sort of a theoretical cost that gets used all the time with like trying to see how infectious the disease is.

00:16:06.000 --> 00:16:24.000
So early on in an infectious disease. Basically, you can think of from one time step to the next if the time step is the lifespan of the disease, then you might expect, like the number of infected people from time, step one to time. Step.

00:16:24.000 --> 00:16:32.000
2 to multiply by R. Naught. And so basically, what we mean by multiplicative seasonality is, there's sort of a multiplicative factor for each time.

00:16:32.000 --> 00:16:33.000
Step in the season. If that makes sense. So that's multiplicative.

00:16:33.000 --> 00:16:39.000
Seasonality and all these formulas are setting up is just these 2 are basically the same as the last one.

00:16:39.000 --> 00:16:49.000
So doing. This, smoothing on the seasonality part, are smoothing on just the regular time series.

00:16:49.000 --> 00:16:50.000
So you have your current value, and it's being multiplied by this. I guess.

00:16:50.000 --> 00:16:59.000
Divide by the seasonal part, and then you have the trend part here.

00:16:59.000 --> 00:17:02.000
Then for the trend part, it's the same exact thing as before.

00:17:02.000 --> 00:17:22.000
And now this new seasonal part, you can see it's multiplicative, because instead of like adding, you're having these constants being multiplied by a different factors, these C, these Ct minus M's are, you know, the current or object observation divided by some factor, S sub T

00:17:22.000 --> 00:17:26.000
the next type of seasonality you might get is called additive seasonality.

00:17:26.000 --> 00:17:28.000
And so the idea between for additive seasonality is for each time step in the cycle.

00:17:28.000 --> 00:17:52.000
Some amount is added to the previous step. So, for instance, to maybe in you work for an ice cream shop or something like that, and you see from quarter one to quarter to you always tend to see a constant increase of 100 from whatever the price was in quarter one to whatever you know, to quarter, to and maybe

00:17:52.000 --> 00:18:14.000
that's something like quarter. One is in a colder season of the year, and then quarter to is in a warmer season of the year, so you always see, like a pretty standard bump in sales, as the temperatures tend to go up and so that's called and so that's called additive seasonality where the current value of the time series is added to or

00:18:14.000 --> 00:18:21.000
subtracted from depending on where you are in the season. And so this is the triple, exponential smoothing where the seasonal term is additive, and that's the slight difference here.

00:18:21.000 --> 00:18:27.000
It's just accounting for the additivity.

00:18:27.000 --> 00:18:32.000
So it sees. I see we have a question here.

00:18:32.000 --> 00:18:36.000
Icons asking, and again I apologize if I'm saying your name incorrectly, why wouldn't we want the maximum likelihood to find the best? Alphas!

00:18:36.000 --> 00:18:44.000
And betas in the training set. Isn't that another way to find the best estimates for Alpha and Beta?

00:18:44.000 --> 00:18:49.000
So we could do cross validation and try and find these estimates via maximum likelihood.

00:18:49.000 --> 00:18:59.000
So finding the estimate, finding the values of Alpha and Beta that provide the best fit on the training set isn't our goal for predictive modeling.

00:18:59.000 --> 00:19:08.000
So remember, in predictive modeling, we could fit perfectly on the training set, and it wouldn't do us any good if we weren't able to predict on things we haven't seen before.

00:19:08.000 --> 00:19:26.000
So with predictive modeling, we wouldn't want to do something like maximum likelihood to find the best values of Alpha and Beta on the training set and use those we would want to use like Cross validation to find the best value that the values of alpha and beta and then in

00:19:26.000 --> 00:19:31.000
this instance, if we did the third one gamma as well, that provide, like the lowest Msd.

00:19:31.000 --> 00:19:35.000
Or the lowest mean absolute error across the cross.

00:19:35.000 --> 00:19:52.000
Validation. Now you could, I think, maybe, what you're alluding to with your question you could do something where your initial guess and your grid for Alpha Beta and Gamma is determined by maximum likelihood, so like whatever stats models is doing in the background is determined by

00:19:52.000 --> 00:19:54.000
that, and then you use that to set up a grid around those values.

00:19:54.000 --> 00:20:04.000
But it may not necessarily be the case that that will give you the overall best values for Alpha, Beta, and Gamma.

00:20:04.000 --> 00:20:15.000
But it is an approach you could take so you could. Do, you know, use the stats models to get the alpha, the Beta and gamma that best fits the training data.

00:20:15.000 --> 00:20:19.000
And then well, I guess part. It might be more complicated to code up, but then you could try and do that.

00:20:19.000 --> 00:20:26.000
If that makes sense.

00:20:26.000 --> 00:20:27.000
Thank you. Alright. So how do we do this in Python?

00:20:27.000 --> 00:20:35.000
So now, instead of simple, xonential smoothing, or halts, now we just do exponential smoothing.

00:20:35.000 --> 00:20:48.000
So this is the name of the model object for the third one, and I think in general you can use exponential smoothing to do either of the prior models as well but I would have to check the documentation more closely.

00:20:48.000 --> 00:20:54.000
So we're going to use this flu data set that we've looked at before.

00:20:54.000 --> 00:21:01.000
And then, just like before, I'm gonna set aside the last year as my test set.

00:21:01.000 --> 00:21:04.000
So the way that this works is exactly the same way as before.

00:21:04.000 --> 00:21:09.000
So we're going to call this put in the training data and then call Dot fit.

00:21:09.000 --> 00:21:19.000
And then input the parameters. So we're gonna do exponential smoothing.

00:21:19.000 --> 00:21:21.000
And then flew trained dot cases. Okay? So now that I'm reading my comments, I there are 2 new things.

00:21:21.000 --> 00:21:34.000
Here that we have not had before. So the first thing we, the first new thing is, we have to tell it what type of seasonality we want.

00:21:34.000 --> 00:21:35.000
So for this example, we're going to use multiplicative seality.

00:21:35.000 --> 00:21:48.000
So seasonal, is equal to and ul, and then you also have to set the number of periods.

00:21:48.000 --> 00:21:58.000
So by. I guess not. The number of periods, the number of time steps within a period, so that for this they call it seasonal underscore periods. So we would do.

00:21:58.000 --> 00:22:05.000
Seasonal underscore periods. And for this data it's 52, because our data is weekly.

00:22:05.000 --> 00:22:09.000
And it's a yearly period or season. Maybe I'm getting period wrong.

00:22:09.000 --> 00:22:12.000
I apologize if I am so, then we're gonna call fit.

00:22:12.000 --> 00:22:17.000
And then the 3 things we need to put in our Alpha our Beta, and our Gamma.

00:22:17.000 --> 00:22:22.000
So these again. I'm not going to go through a cross-sidation.

00:22:22.000 --> 00:22:25.000
So I'm just going to pick values. So for Alpha, it's called again.

00:22:25.000 --> 00:22:27.000
It's called smoothing underscore level.

00:22:27.000 --> 00:22:39.000
And let's just go with point 6 again. And then for Beta, it's a smoothing, not smoothing, smoothing trend, and this is also point 6 again.

00:22:39.000 --> 00:22:40.000
And then finally, we're doing a smoothing seasonal.

00:22:40.000 --> 00:22:47.000
So this is our Gamma, and this is a point 6.

00:22:47.000 --> 00:22:55.000
And then finally, we're doing that optimized equals false because we don't we don't care about the fit on the training set.

00:22:55.000 --> 00:22:59.000
We're just giving an example of fitting the model.

00:22:59.000 --> 00:23:07.000
Okay. So now I have my fitted model. We have the blue line is the true training data, the green solid line is the fit.

00:23:07.000 --> 00:23:15.000
On the Holt winter model, and then we've got the red dotted line is the prediction and the test data is in the red, solid line.

00:23:15.000 --> 00:23:16.000
And so maybe to show off like how you can. Sorry there's a fly.

00:23:16.000 --> 00:23:25.000
The show off. Like, you know, different values. So like we could set this maybe equal to point 1.

00:23:25.000 --> 00:23:34.000
There is no trend, so we could set that maybe to 0, and then smoothing level will keep it point 6.

00:23:34.000 --> 00:23:35.000
And it basically looks like the same model. So there wasn't really much of a difference.

00:23:35.000 --> 00:23:43.000
But in, you know, in general changing this will, we'll change.

00:23:43.000 --> 00:23:47.000
Change the model. So maybe if we change this to like point 9, we'd see a difference.

00:23:47.000 --> 00:23:49.000
Barely. Okay. But that's you know. That's how fitting these models work.

00:23:49.000 --> 00:24:03.000
And that's exponential smoothing, which is just a class of Time series forecasts which you can use in in your time series, tool chest.

00:24:03.000 --> 00:24:20.000
Are there any questions about exponential smoothhing before we move on?

00:24:20.000 --> 00:24:21.000
Yeah.

00:24:21.000 --> 00:24:28.000
And just can clarify. So I'm just trying to, I guess, compared to the weighted smoothing, the explanations.

00:24:28.000 --> 00:24:36.000
So would you say the exponential smoothing? We can think about it as an extension of sort of a weighted, smoothing approach.

00:24:36.000 --> 00:24:47.000
Yeah, so the simple xonential smoothing is a weighted average model where the weights are given by this sort of like alpha times, one minus alpha to the power.

00:24:47.000 --> 00:24:51.000
So this is a weighted average one that then you can like.

00:24:51.000 --> 00:25:11.000
It considers all of the previous observations, and then you can sort of optimize by finding the value of alpha that works best, either through cross validation or something like that, so that it is a weighted average, and then the weighted average sort of comparison starts to it is average.

00:25:11.000 --> 00:25:22.000
But because of this different like adding in the trend in the seasonality components, it slightly gets away from that with the other 2.

00:25:22.000 --> 00:25:23.000
Cause. I guess the part I was confused about is Dx.

00:25:23.000 --> 00:25:29.000
Where's the I would expect to see some exponential somewhere.

00:25:29.000 --> 00:25:32.000
Is it just in the Taylor expansion that the exponential comes in my?

00:25:32.000 --> 00:25:37.000
Oh, so this isn't like the way the formulas are set up like.

00:25:37.000 --> 00:25:45.000
Remember, you have to remember that this hat contains, like Alpha, like it contains, like all the previous ones.

00:25:45.000 --> 00:25:54.000
So you have to like. If you expand it all the way out, you'll get something like this.

00:25:54.000 --> 00:25:55.000
Does that make sense?

00:25:55.000 --> 00:25:59.000
Okay. Alright!

00:25:59.000 --> 00:26:00.000
Yeah. I think so I gotta think more about this.

00:26:00.000 --> 00:26:02.000
But thanks. Yeah.

00:26:02.000 --> 00:26:20.000
Yup! Yahweh asking what cross validation helps, so it probably would help to get better models if we went through, and then found the one that had, at at the very least we would be getting the values for Alpha Beta and Gamma that have the lowest mean square

00:26:20.000 --> 00:26:23.000
error. Whether or not this would give us, like a good fit for this particular problem, it's hard to tell, so I think I said yesterday, but it's worth repeating.

00:26:23.000 --> 00:26:36.000
1947 is like a sort of weird flu season compared to all the previous flu season. It kind of seems like.

00:26:36.000 --> 00:26:51.000
So other seasons sort of start to take off a little bit in in December, whereas, like with 1947, it doesn't really seem to start until probably, like February or maybe March, so I think it's sort of like 1947 is

00:26:51.000 --> 00:27:08.000
not like close to the typical year, although there are some years in the past like 1936, is somewhat similar with a lower peak, so flu seasons are kind of difficult, because with these sorts of time series models, because there's a lot of different factors in determining like the shape, of a flu

00:27:08.000 --> 00:27:21.000
season beyond, just like what were the cases like last week? If that makes sense.

00:27:21.000 --> 00:27:26.000
Okay.

00:27:26.000 --> 00:27:32.000
So, we're still in the Time Series section. So we just finished no book number 5, and we're gonna move on to notebook number 6.

00:27:32.000 --> 00:27:36.000
If you're trying to play along at home.

00:27:36.000 --> 00:27:54.000
So we're gonna take a brief aside from forecasts and models to sort of dive a little bit deeper into the theory of time series, just because we need some of these terms to build our next forecast types so we're gonna learn about something called stationarity and something called auto correlation so if you did the problem.

00:27:54.000 --> 00:28:00.000
Session today. This term should look a little bit familiar, whereas, like this term, maybe isn't as familiar.

00:28:00.000 --> 00:28:14.000
So stationarity is a statistical property of a time series that sometimes you want to assume that your time series is stationary before you use a particular forecast type.

00:28:14.000 --> 00:28:19.000
So in general we would say that a time series y sub t is strictly stationary.

00:28:19.000 --> 00:28:31.000
If the joint probability, distribution of the series of like this sequence is the same as the sequence where the time points have been shifted by Tau.

00:28:31.000 --> 00:28:39.000
So basically what this is saying is the joint distribution only depends upon the intervals between the observation.

00:28:39.000 --> 00:28:42.000
So like the time between T. One and T. 2, and T.

00:28:42.000 --> 00:28:48.000
3, and so forth, so in particular, if the number of I've observations you have is just one.

00:28:48.000 --> 00:28:54.000
The expected value of y sub t is always mu to, regardless of where, in the time series you're looking at it, and the variance of y sub t is always equal to Sigma squared for all.

00:28:54.000 --> 00:29:06.000
Observations, y sub t, and then, if N. Was to the joint distribution of y sub t one and y sub t, 2 only depends upon the time distance between T. 2 and T.

00:29:06.000 --> 00:29:07.000
One, t. 2 only depends upon the time distance between T. 2 and T.

00:29:07.000 --> 00:29:20.000
One here I'm assuming T. 2 is later than T. One. So this is known as the Lag between T. 2 and T.

00:29:20.000 --> 00:29:21.000
One, and maybe if again, if you did, the problem session this term, blag is familiar.

00:29:21.000 --> 00:29:24.000
So we'll look a little bit more at what lag later in the notebook.

00:29:24.000 --> 00:29:32.000
So strict stationarity as usually pretty restrictive.

00:29:32.000 --> 00:29:35.000
And there's very few time series that are strictly stationary.

00:29:35.000 --> 00:29:36.000
So it's more useful to think of stationarity with a weaker sense.

00:29:36.000 --> 00:29:58.000
So when we say a time series is exhibit stationarity, or is stationary, that means that again, the expected value for any observation in the time series is, are equal to Mu, and then the covariance between any 2 time steps is just a function of the lag so here the

00:29:58.000 --> 00:30:01.000
timesteps are t, and then t plus tau, and then the lag is just the towel. Okay.

00:30:01.000 --> 00:30:11.000
So it might look weird to see it like this. You could think of this as like the covariance between y 2 and Y.

00:30:11.000 --> 00:30:12.000
3 is some function of one. The covariance between Y.

00:30:12.000 --> 00:30:13.000
-covariance between y. 2 and y. 3 is some function of one. The covariance between y. One and Y.

00:30:13.000 --> 00:30:23.000
3 is some function of 2. So like when you have actual numbers in here, you would just have to calculate.

00:30:23.000 --> 00:30:38.000
So some examples of stationary time series are white noise. So that is basically like you're just drawing from a random distribution the first differences of a random walk.

00:30:38.000 --> 00:30:45.000
So if you have a random walk, and then you look at the what you know, the current, the next step, minus the current step.

00:30:45.000 --> 00:30:50.000
Those are the first differences, a moving average process. So this we looked at in the last notebook, a moving average process is also stationary.

00:30:50.000 --> 00:30:59.000
So basically, why are we talking about this? It's just in the next forecasting approach.

00:30:59.000 --> 00:31:01.000
We're going to need our time series data to be stationary before we can apply the forecast.

00:31:01.000 --> 00:31:09.000
And so I just wanted to give an introduction to what that concept means. Here.

00:31:09.000 --> 00:31:17.000
Remember, stationery means this, this concept. That doesn't mean we're gonna formally go through and check.

00:31:17.000 --> 00:31:31.000
If these things hold. But we are going to get a sense, for like times when a time series is very clearly violating stationarity, and then we might wanna make an adjustment.

00:31:31.000 --> 00:31:32.000
So how can we gauge whether or not a time series is viating?

00:31:32.000 --> 00:31:37.000
We're gonna look at something called the autocorrelation.

00:31:37.000 --> 00:31:45.000
So the auto correlation is just the correlation of a time series with its future observations placed at different lags.

00:31:45.000 --> 00:31:50.000
So think of like having the time series, and then, like on one side, and then in a different column, you have the time series, but then you've shifted it like H.

00:31:50.000 --> 00:32:03.000
Not values or whatever K. Values. If you were to calculate the correlation between the original time series and then the Lag Time Series.

00:32:03.000 --> 00:32:17.000
That's the auto correlation. So this is the autocorrelate, the formula for that at a given lagged a little K, we're I'm not gonna say it out loud because it's a I I don't want to say it.

00:32:17.000 --> 00:32:30.000
But this is the formula. It's just the correlation for formula, but now it's been replaced instead of X and Y it's y, and then the lag diversion of y so one thing you can do to check the stationarity assumption.

00:32:30.000 --> 00:32:34.000
If it's reasonable, is to look at what's known as a coral gram.

00:32:34.000 --> 00:32:41.000
So this is sort of, I think you did this today, and the near the end of the problem session.

00:32:41.000 --> 00:32:42.000
If you did the problem session. So this is called the Autocorrelation Plot, or a coral gram.

00:32:42.000 --> 00:32:52.000
So basically you're just gonna produce these different auto correlations at different lags.

00:32:52.000 --> 00:32:55.000
And then plot them so you'll plot the autocorrelation on the vertical axis, and then the lag on the horizontal axis.

00:32:55.000 --> 00:33:03.000
So stats models, has a function for doing this.

00:33:03.000 --> 00:33:07.000
So we're going to import stats models as Sm.

00:33:07.000 --> 00:33:11.000
And then we're gonna just generate a time series.

00:33:11.000 --> 00:33:16.000
So my time series here is just random noise.

00:33:16.000 --> 00:33:21.000
And then we're going to go ahead and plot it so to plot the time series.

00:33:21.000 --> 00:33:37.000
You do sm dot graphics dot tsa, and then the function is plot underscore acf, so with that, you're gonna go ahead and put the time series.

00:33:37.000 --> 00:33:43.000
So for us. It's called series the Number of lags you want plotted out to.

00:33:43.000 --> 00:33:48.000
So, according to my comment, I, for this example, I just want likes to be 30.

00:33:48.000 --> 00:33:51.000
We're going to set. Alpha equals to none.

00:33:51.000 --> 00:33:56.000
I'll talk about that after we make the plot, and then that's it.

00:33:56.000 --> 00:34:02.000
I was reading my comment to see what I wanted. Okay, so here we go.

00:34:02.000 --> 00:34:09.000
I forgot that this is the last thing I need. Ax equals ax. Okay, so we'll talk about that, too.

00:34:09.000 --> 00:34:12.000
So this is the auto correlation as a function of the lag.

00:34:12.000 --> 00:34:15.000
So you're always going to have a line that goes straight up to one for 0.

00:34:15.000 --> 00:34:21.000
Right, because everything is a correlation of one. What's it with itself?

00:34:21.000 --> 00:34:37.000
But then, if it's white noise, which is stationary, you'll wanna see sort of like these auto correlations that are pretty low and sort of go between being negative and positive, and don't really show a clear pattern. So what you're looking for when you're trying to judge if a time

00:34:37.000 --> 00:34:42.000
series seems stationary. Is this sort of thing sorry? I heard a noise in my apartment.

00:34:42.000 --> 00:34:52.000
I wanted to make sure everything was okay. Looking for this sort of thing where you have relatively low correlations, low core autocorrelations.

00:34:52.000 --> 00:34:56.000
And it kind of just bounces around between positive or negative, with no clear pattern.

00:34:56.000 --> 00:35:02.000
So I'm gonna go ahead and talk about the 2 things I said I was ignoring.

00:35:02.000 --> 00:35:05.000
So the first argument here is just saying whether or not you want confidence intervals to be plotted.

00:35:05.000 --> 00:35:18.000
So, for instance, if I do get rid of this, you'll see the difference like there's these bars, and we're not talking like I didn't want to talk about what the confidence intervals were.

00:35:18.000 --> 00:35:22.000
I was going to leave it to you if you want to learn more about that.

00:35:22.000 --> 00:35:27.000
So in general, I'm just going to turn it off, and then the other part is this, ax equals ax.

00:35:27.000 --> 00:35:37.000
So here I make a map, plot, lib, figure, objects, and then a sub plot or a an axis object, sort an ax.

00:35:37.000 --> 00:35:45.000
I just wanted to tell this tell the function like, Hey, plot this thing here instead of plotting it on it so and so that way, I could control the size of it.

00:35:45.000 --> 00:35:49.000
So that's what these 2 arguments are.

00:35:49.000 --> 00:35:56.000
Okay. So before we go through more examples of like, where it's obvious that stationarity is being violated.

00:35:56.000 --> 00:36:02.000
Are there any questions about auto correlation or this function that we just went over?

00:36:02.000 --> 00:36:20.000
Why, I take a drink of water.

00:36:20.000 --> 00:36:28.000
Okay.

00:36:28.000 --> 00:36:38.000
Can I get a physical intuition for stationarity?

00:36:38.000 --> 00:36:46.000
Hmm! I don't know that I personally have like a an intuition, for, like what you would see, and like a real world like Time series, if that makes sense. So I don't.

00:36:46.000 --> 00:36:57.000
I don't personally have, like a good intuition for, like a real world example of like, here is stationarity and what it looks like.

00:36:57.000 --> 00:37:02.000
Yeah. Sorry.

00:37:02.000 --> 00:37:13.000
Could I? So? Could it be like, for example, if the we have stock prices that are not are kind of constant over time? Could you say that?

00:37:13.000 --> 00:37:17.000
That's something that's statistically like. Prices are over time.

00:37:17.000 --> 00:37:34.000
The meeting isn't changing drastically. And could you say that an example of the stationary, Awesome Time series?

00:37:34.000 --> 00:37:38.000
So I guess.

00:37:38.000 --> 00:37:44.000
I think, like in the real world, stock prices probably aren't stationary, but in this particular, like hypothetical example, where the company seems to just always be at the same price for all time.

00:37:44.000 --> 00:37:57.000
That's probably it probably would be stayery right, because if it's hovering around the same price, then the expected value would probably be constant, and then the difference.

00:37:57.000 --> 00:38:15.000
Sorry. I guess we're looking here the difference with the covariance between 2 time steps would probably either be a constant or some function of the lag, so that probably would work.

00:38:15.000 --> 00:38:25.000
But I don't know that it's like I don't know that there are many stock prices that are like that.

00:38:25.000 --> 00:38:26.000
Thanks.

00:38:26.000 --> 00:38:29.000
Yup, and then asking, Does autocorrelation have the same information as finding the foyer components?

00:38:29.000 --> 00:38:40.000
So you can do like Fourier transform stuff to get sort of information on like seasonality.

00:38:40.000 --> 00:38:57.000
I don't know if it touches I would have to like look up if you can use it, to tell whether or not the Stationarity assumption is being violated.

00:38:57.000 --> 00:39:02.000
I think, looking at the autocorrelation function is probably easier cause you just have to calculate some some correlations and then look at the plot.

00:39:02.000 --> 00:39:12.000
It's probably easier computationally than doing the.

00:39:12.000 --> 00:39:16.000
What's it called? Been doing the Fourier transform stuff?

00:39:16.000 --> 00:39:37.000
But if you are interested and learning about that, if anyone's interested in learning about like relationships between time series and how you can use things called foia transforms check out the practice problems, notebook for time series, I go over how you can how you can use it there.

00:39:37.000 --> 00:39:46.000
So we're gonna now go over sort of 2 examples of instances where you have a data set that is violating stationarity.

00:39:46.000 --> 00:39:56.000
So 2 ways to violate stationarity is to have data that has a trend, because there, you're going to be violating.

00:39:56.000 --> 00:40:02.000
The constant or the expected value, being a constant, independent of time, right?

00:40:02.000 --> 00:40:05.000
And then the other one is for data with the seasonality.

00:40:05.000 --> 00:40:10.000
So for the trend one, we'll just look at the Google stock data.

00:40:10.000 --> 00:40:21.000
And I guess I wanted me myself to do this again. So closing Price lags equals, let's say I don't know 100.

00:40:21.000 --> 00:40:28.000
Alpha equals none, hey? X equals. Ax. Okay? So this Google stock data, right?

00:40:28.000 --> 00:40:32.000
Had a very positive trend, and so you'll see that over time.

00:40:32.000 --> 00:40:35.000
It's a very, it's like a positive correlation.

00:40:35.000 --> 00:40:42.000
That's very high. And it's starts to decrease as you go down.

00:40:42.000 --> 00:40:51.000
So time series with trends are not stationary, and if you see something like this, this indicates that your time series isn't stationary.

00:40:51.000 --> 00:40:59.000
We will see like how you can take this and produce a stationary Time series, and just a little bit.

00:40:59.000 --> 00:41:07.000
Another example is seasonal seasonal time series. So here we're gonna do look at that flu example.

00:41:07.000 --> 00:41:13.000
So flu dot cases was my time series. Then we want, let's say, 52.

00:41:13.000 --> 00:41:29.000
So let's do 1 54. Maybe. I guess it should be 1, 56 that, and then Alpha equals none.

00:41:29.000 --> 00:41:34.000
And then ax equals, ax.

00:41:34.000 --> 00:41:56.000
Okay, so with seasonal seasonal data, you'll tend to see sort of this kind of pattern where you go between periods of negative correlation, positive correlation, negative correlation, positive correlation, negative correlation, positive correlation and so these sorts of correlation, this sort of

00:41:56.000 --> 00:41:57.000
like behavior is an indicator of the seasonality.

00:41:57.000 --> 00:42:10.000
And so you can use it to try and get a sense of like how long the seasons are based upon sort of like gauging like how long the curves happen, but sort of like going back to sides question.

00:42:10.000 --> 00:42:13.000
There are other ways to get a sense of what the sameality is by using.

00:42:13.000 --> 00:42:18.000
The foia transforms. This is sort of just giving you an indicator that this is maybe seasonal data, and it would be.

00:42:18.000 --> 00:42:24.000
It's not stationary. And so you're gonna need to either for the models that we're gonna learn. Next, you're gonna need to do some adjustments for it.

00:42:24.000 --> 00:42:30.000
And if you'd like to get a guess at what the seasonality is, there are other techniques that you can appeal to that.

00:42:30.000 --> 00:42:40.000
You can learn in the practice problems.

00:42:40.000 --> 00:42:55.000
So Yahoo is asking if the stationarity is implying a predictability in some sense, so I guess it's predictable in the sense that you know that the I guess it's just.

00:42:55.000 --> 00:43:01.000
It's as predictable right as the variance of the distribution.

00:43:01.000 --> 00:43:13.000
It's being pulled from, if that makes sense so like, if you have a really high variance like you maybe know that it's on average, it's always gonna be one value.

00:43:13.000 --> 00:43:16.000
But it may be vastly different from it, based on the polls.

00:43:16.000 --> 00:43:23.000
If that makes sense, Pedro is asking, How would you translate from lag periodicity to actual data periodicity?

00:43:23.000 --> 00:43:26.000
So the looking at the lag, you can kind of get a sense of like how long the periods are happening.

00:43:26.000 --> 00:43:36.000
Just sort of by looking at the like you would expect to see with seasonal data.

00:43:36.000 --> 00:43:46.000
You expect to see these autocorrelation functions that are like either negative negative negative going up to positive, positive positive going back down sort of like a sign curve.

00:43:46.000 --> 00:43:47.000
So like you can try and like count out the curve where again, like I would look at the practice problems.

00:43:47.000 --> 00:43:59.000
To get a sense for maybe a better way to check it out with a Fourier transforms.

00:43:59.000 --> 00:44:05.000
So like, here you can kind of see like it's positive goes down to negative and backup to positive.

00:44:05.000 --> 00:44:10.000
Around 52, so that would just set, you know, that would indicate to me, like, you know, I only know it's 52, because of the way the data is.

00:44:10.000 --> 00:44:13.000
But like it's somewhere between 40 and 60, so you might try different values.

00:44:13.000 --> 00:44:31.000
There and then, like goes back down to negative, back to positive, and so forth.

00:44:31.000 --> 00:44:37.000
So auto. Core. So Whileilead's asking, what would autocorrelation look like for stationery series?

00:44:37.000 --> 00:44:49.000
So it would look something like this where the audio correlations aren't too big aren't too small, and they sort of just like randomly ping back and forth between positive and negative.

00:44:49.000 --> 00:44:57.000
The covariance is just like a function of the lag, meaning that the correlation should also be.

00:44:57.000 --> 00:45:04.000
What was that right?

00:45:04.000 --> 00:45:18.000
Yeah, so it should just kinda look like this. And then I might need to sit down and write down on a piece of paper if I wanted to make some other kind of statement that I started there.

00:45:18.000 --> 00:45:28.000
So when you have these sorts of things, so you can do, there are formsal statistical tests that I've linked to here their Wikipedia Wikipedia links you.

00:45:28.000 --> 00:45:36.000
There are formal statistical tests that you can test like whether or not a time series is stationary.

00:45:36.000 --> 00:45:42.000
I think most of the time you don't do this unless maybe you're, you know, trying to publish in an academic journal or something, and you need to sort of justify your assumption.

00:45:42.000 --> 00:46:01.000
But like an industry, you're probably not gonna do these sorts of tests to prove that it's there to not reject that null hypothesis, that it is stationary so in general, like, you'll probably just make these sorts of plots and then get a sense from that and

00:46:01.000 --> 00:46:10.000
then see a sense from that, and then see whether or not you have to apply what's known as a difference sync. So I saw.

00:46:10.000 --> 00:46:18.000
So Assad's asking rather than small magnitude periodicity in the correlation plot, as I've seen in that plot, I thought the seasonality was indicated by the autocorrelation spiking in a periodic way.

00:46:18.000 --> 00:46:29.000
Yeah. So this is indicate that, like, the seasonality is indicated here, because it's sort of going, like, you know, back and forth in this, almost like sine wave type way.

00:46:29.000 --> 00:46:39.000
That's what indicates that their seasonality and the data, well doesn't have to do with like the fact that their small magnitude here. So sometimes, like with the problem session, right?

00:46:39.000 --> 00:46:40.000
There was really big see? Like the seasonality was really strong.

00:46:40.000 --> 00:46:46.000
And so like you saw really big spikes going from like one to almost negative one back and forth with that example from the problem session.

00:46:46.000 --> 00:47:01.000
So the seasonality is just indicated by the fact that you're seeing sort of a periodic pattern.

00:47:01.000 --> 00:47:10.000
Okay. So when you have time series that appear to break the stationarity assumption or don't appear to be stationary because currently we don't have that assumption.

00:47:10.000 --> 00:47:21.000
So when you have time series that don't appear to be stationary, you have to use something called differencing to create a time series that is stationary.

00:47:21.000 --> 00:47:25.000
So when you have.

00:47:25.000 --> 00:47:28.000
One with a trend. You'll do something called first or second differencing.

00:47:28.000 --> 00:47:34.000
There are extensions to seasonal data that I left again to the practice problems for the sake of time.

00:47:34.000 --> 00:47:37.000
So we're going to focus on just data with a trend.

00:47:37.000 --> 00:47:43.000
And then like, if you're interested in seeing seasonal differencing, you can check out the practice problems.

00:47:43.000 --> 00:47:48.000
So differencing is something we have seen before with the dot diff from pandas.

00:47:48.000 --> 00:47:59.000
So it's just producing a new time series of different values that I'm gonna denote, with the sort of upside down triangle.

00:47:59.000 --> 00:48:03.000
And so basically, you'll have to start at the second observation and then move forward right?

00:48:03.000 --> 00:48:09.000
So you're gonna do y 2 minus y one. Then the next observation is y 3 minus y 2.

00:48:09.000 --> 00:48:15.000
Then the next observation is y, 4 minus y, 3, and then the next observation is y 4 minus y, 3, and then the last. You know you keep going.

00:48:15.000 --> 00:48:16.000
The next, you know you keep going. The next eventually would be like yt minus y t minus one.

00:48:16.000 --> 00:48:28.000
So these are the first difference. Time series. So now you have a new time series that's indexing starts at 2 right?

00:48:28.000 --> 00:48:31.000
Because there's nothing occurring before observation. One.

00:48:31.000 --> 00:48:36.000
These are sort of analogous to a first derivative of a function.

00:48:36.000 --> 00:48:39.000
So we saw this in Hi earlier notebooks with pandas dot diff.

00:48:39.000 --> 00:48:46.000
So going back to that Google example, we'll do Google dot closing price dot diff.

00:48:46.000 --> 00:48:54.000
And then from one forward, and then we're gonna do.

00:48:54.000 --> 00:48:59.000
I think we did 30 or right. Was that right? 30? We did a so we're gonna do a hundred.

00:48:59.000 --> 00:49:06.000
So lags, equals a hundred alpha equals none, and then?

00:49:06.000 --> 00:49:09.000
What was it? A X equals ax?

00:49:09.000 --> 00:49:16.000
Okay. And so now you can see here that through first differencing we have produced a new time series.

00:49:16.000 --> 00:49:21.000
That is, appears to not break this idea, that it's stationary.

00:49:21.000 --> 00:49:22.000
We don't know for sure if it's stationary.

00:49:22.000 --> 00:49:32.000
But the auto correlation plot doesn't appear to blatantly break that assumption.

00:49:32.000 --> 00:49:37.000
So sometimes you'll do this, and you'll still get you know, one that is very clearly breaking the assumption.

00:49:37.000 --> 00:49:41.000
So you'll have to do something called differencing. Again.

00:49:41.000 --> 00:49:51.000
So after first difference, thing is second, differencing, and that's just taking the difference. Time series.

00:49:51.000 --> 00:49:57.000
So like this delta is not delta this upside down triangle y 2 upside down, triangle y.

00:49:57.000 --> 00:50:07.000
3 upside down. Triangle y- 4. It's taking that and then performing first differencing on that.

00:50:07.000 --> 00:50:13.000
So through the entries of the second difference to time series, you're going to take the first difference time series, and then subtract off the previous observation of that.

00:50:13.000 --> 00:50:26.000
So if you wanted to put this in terms of the original time series, and then subtract off the previous observation of that. So if you wanted to put this in terms of the original time, series, it would be yt minus 2 times yt.

00:50:26.000 --> 00:50:35.000
Minus one plus y t minus 2. But in pandas we could just.

00:50:35.000 --> 00:50:36.000
So no train. We're just looking at the regular time series.

00:50:36.000 --> 00:50:46.000
So goog dot closing price dot diff right? So this was first differencing.

00:50:46.000 --> 00:50:55.000
And then second difference thing would be Google, dot closing price dot diff dot diff.

00:50:55.000 --> 00:51:06.000
Okay. And so now, you can see, like we're taking, you know, point 5 4 and subtracting off 3 points.

00:51:06.000 --> 00:51:10.000
9, 7. And that's how you get negative 3.4.

00:51:10.000 --> 00:51:17.000
You could do it again if you needed to. So sometimes you might need to do different thing 3 times, and you might be wondering, well, why do I need to do differencing?

00:51:17.000 --> 00:51:19.000
Why do I need a time series that look stationary?

00:51:19.000 --> 00:51:24.000
Okay.

00:51:24.000 --> 00:51:32.000
From a parents in the next notebook. When we learn about the arena.

00:51:32.000 --> 00:51:36.000
Okay, are there any other questions about, you know, differencing before we move?

00:51:36.000 --> 00:51:37.000
Okay.

00:51:37.000 --> 00:51:53.000
So I have a.

00:51:53.000 --> 00:51:54.000
Yeah.

00:51:54.000 --> 00:52:07.000
So that would that the times of the difference in it depends on the for example, the lens of the data you can you?

00:52:07.000 --> 00:52:08.000
So, for example, if we have a move.

00:52:08.000 --> 00:52:22.000
Longer range of data, and then the stock market print to fortuate more so may need to look at the overall trend. Oh, so!

00:52:22.000 --> 00:52:24.000
You know.

00:52:24.000 --> 00:52:30.000
Yeah, so we could have a time series. It has an increasing trend.

00:52:30.000 --> 00:52:35.000
But then the rate of increasing is itself increasing or decreasing. So if that was the.

00:52:35.000 --> 00:52:36.000
Case we would expect to see the first difference. Data still appear to have.

00:52:36.000 --> 00:52:42.000
In the Autocorrelation Plot.

00:52:42.000 --> 00:52:46.000
And so then, if we wanted to turn that into a stationary time series, we would have to apply second differencing to the original data. Does that make sense?

00:52:46.000 --> 00:52:51.000
Yes. Okay. Yeah. Cool. Thanks.

00:52:51.000 --> 00:52:57.000
Yup!

00:52:57.000 --> 00:53:07.000
Awesome. Okay? So now we are, we just finished notebook number 6 in time series, we're about to start notebook number 7.

00:53:07.000 --> 00:53:15.000
So this is going to be the last forecasting model that we are going to learn in live lecture.

00:53:15.000 --> 00:53:19.000
There are 2 other notebooks. If you're interested in learning additional forecast models.

00:53:19.000 --> 00:53:28.000
But I'll talk about those later. Okay, so we've talked about some of.

00:53:28.000 --> 00:53:35.000
Before, so a Rima is a really popular model type that tends to perform. You know pretty well.

00:53:35.000 --> 00:53:44.000
So we talked about one of the components. So a rima is sort of broken down into 3 different components. The first.

00:53:44.000 --> 00:53:52.000
The auto receptionist component, and then the part that we've already talked about is the moving average component. So the AR comes from autoregressive.

00:53:52.000 --> 00:53:56.000
And then the ma comes from moving average. So we talked about moving average.

00:53:56.000 --> 00:53:57.000
In our previous notebook. So we're not going to touch on those here.

00:53:57.000 --> 00:54:04.000
But to help build up our knowledge before we go into the full arena model.

00:54:04.000 --> 00:54:08.000
We are going to talk about auto regressive models first.

00:54:08.000 --> 00:54:12.000
So an autoregressive model is one where you're going.

00:54:12.000 --> 00:54:16.000
To regress onto previous observations.

00:54:16.000 --> 00:54:18.000
So, if you have a.

00:54:18.000 --> 00:54:22.000
The series y sub t the auto regressive model of order.

00:54:22.000 --> 00:54:23.000
P is the one that regresses the current or the yeah, the one that you'd like to predict.

00:54:23.000 --> 00:54:31.000
So y sub t on a linear combination of the previous.

00:54:31.000 --> 00:54:33.000
Observations, so.

00:54:33.000 --> 00:54:39.000
Here are the Alpha I, an unlike with smoothing.

00:54:39.000 --> 00:54:46.000
These are actual parameters, so you would find them with sort of like an ordinary, least squares or.

00:54:46.000 --> 00:54:47.000
That the alpha I are fit, and then Epsilon t is random noise.

00:54:47.000 --> 00:54:55.000
So it's auto regressive. So auto, I believe, means self.

00:54:55.000 --> 00:55:01.000
And then we's regression. So auto receptionist and so this is often denoted in time series literature as an Arp so autoregressive of order.

00:55:01.000 --> 00:55:08.000
P. So that means that.

00:55:08.000 --> 00:55:20.000
Regressing on the P. Previous observations. Okay, so this is not something that you're gonna have to see up on your own and then run like sk learns, linear regression we are going to go ahead and like sets.

00:55:20.000 --> 00:55:25.000
Models will do this for us. You're not gonna have to prepare the data.

00:55:25.000 --> 00:55:33.000
And sort of then do a linear regression type thing stats models, has, like an auto regressive model, as opposed to in the case where you have just order one.

00:55:33.000 --> 00:55:44.000
This is known as a Markov process. So y of T is equal to Alpha times y t minus one plus epsilon t.

00:55:44.000 --> 00:55:45.000
As I said, we're not so like I said, that stats models will use it.

00:55:45.000 --> 00:55:53.000
So we're not gonna like, explicitly, like with exponential smoothing where we don't simple and then halt and then triple.

00:55:53.000 --> 00:55:55.000
We're not gonna go through and fit like an auto regressive, a moving average, and so forth.

00:55:55.000 --> 00:56:03.000
So the Arema model that we're going to fit later in the notebook.

00:56:03.000 --> 00:56:08.000
It can as a part of the model, fit any individual model type, so it can fit an AR model.

00:56:08.000 --> 00:56:14.000
It can fit an Ma model so we're not gonna like, cover each one individually.

00:56:14.000 --> 00:56:20.000
We're just gonna have essentially show you just gonna eventually show you like the full Arema model.

00:56:20.000 --> 00:56:26.000
But I thought it was important to sort of look at this AR component before moving on.

00:56:26.000 --> 00:56:42.000
So are there questions about auto regressive models?

00:56:42.000 --> 00:56:45.000
Okay.

00:56:45.000 --> 00:56:52.000
So after our AR and ma, which we talked about in a previous notebook, there are, is the combination of the 2 known as in Arma model.

00:56:52.000 --> 00:56:58.000
So the this is sort of the real statistical basis of the Arema moment.

00:56:58.000 --> 00:57:05.000
So we have the AR and the ma so like I said, the Ma stands for moving average, and this is a moving average of order.

00:57:05.000 --> 00:57:15.000
Queue. So ars is typically uses P to denote the number of terms, and then Ma will typically use Q to denote the number of terms.

00:57:15.000 --> 00:57:29.000
And so the formal forecast for this is why is equal to beta 0 times.

00:57:29.000 --> 00:57:33.000
A random, variable, drawn at time, t plus beta one times a random, variable.

00:57:33.000 --> 00:57:35.000
Drawn from the same distribution at time, T. Minus one all the way up to Beta, Q.

00:57:35.000 --> 00:57:41.000
Times epsilon, t, minus q. So here, instead of.

00:57:41.000 --> 00:57:50.000
At previous observations. The formal statistical model is assuming that there's some underlying random process that's being drawn from at every time. Step.

00:57:50.000 --> 00:57:55.000
And then the moving average forecast is trying to estimate these betas.

00:57:55.000 --> 00:57:56.000
We're not gonna independently talk about how this model is fit in the background.

00:57:56.000 --> 00:58:11.000
But this is this formal, statistical model. So when you combine these 2 into an arm of process, you have both the auto receptionist components in the model.

00:58:11.000 --> 00:58:13.000
So these autoregressions happening, combined with this moving average component so this is an arma p comma.

00:58:13.000 --> 00:58:26.000
Q. So the P. Refers to the number of autoregressive terms, and the queue refers to the number of moving average terms.

00:58:26.000 --> 00:58:46.000
So notice that, like I said earlier, if you set P equals to 0, you recover just a moving average process where, if you set Q equal to 0, you recover the auto regressive process, which is why we don't have to individually show you how to fit each one of these.

00:58:46.000 --> 00:58:50.000
So Yahoo is asking, can P be more than one yup, so Peak can be more than one.

00:58:50.000 --> 00:58:53.000
This was just a particular example that hasn't a name and statistics.

00:58:53.000 --> 00:59:00.000
So this is a Markov process which has been studied extensively in the field of probability, theory, and St.

00:59:00.000 --> 00:59:04.000
Statistics but P. Can be more than one P. Could be.

00:59:04.000 --> 00:59:10.000
3, 4, 5, etc.

00:59:10.000 --> 00:59:16.000
Okay, while he's asking is Q. The length of the window and moving average.

00:59:16.000 --> 00:59:20.000
Yes, so Q. Would be like like earlier. We had a window size 3.

00:59:20.000 --> 00:59:32.000
So queue is just the number of terms that you're including in the window.

00:59:32.000 --> 00:59:39.000
Okay, so arma models have an explicit assumption that you have a stationary time series.

00:59:39.000 --> 00:59:42.000
So this is why we talked about that in the last notebook.

00:59:42.000 --> 00:59:45.000
So if you have a time series that is not stationary, you're not going to have a good fit or forecast in the long run.

00:59:45.000 --> 00:59:54.000
So this is the final component of the rema model is the I.

00:59:54.000 --> 00:59:59.000
So the I stands for differenti so I forget what the I actually stands for. Different things.

00:59:59.000 --> 01:00:02.000
So I forget what the I actually stands for. But it's just for differencing so the AR is the autoregressive.

01:00:02.000 --> 01:00:09.000
The ma is the moving average, and then the I stands for the differencing that happens so remember, and I guess I closed it. But let's pull it back up.

01:00:09.000 --> 01:00:35.000
When we took the Google's stock data and then performed first differencing on it, we went from a data set as very clearly not stationary to producing a time series that is, appears to not at least at least appears not to break the stationarity assumption, so that's the idea here is

01:00:35.000 --> 01:00:44.000
we need to. If we have a time series that is not stationary, but we need to make it stationary or produce a stationary looking one.

01:00:44.000 --> 01:00:45.000
We're going to perform differencing and then fit the AR ma the arma model on the difference.

01:00:45.000 --> 01:00:54.000
Time series. So that's the idea here. So this is known as, an Arima Pdq.

01:00:54.000 --> 01:01:07.000
So P. Again, is that AR part so how many autoregressive terms d is the number of times that differencing happens?

01:01:07.000 --> 01:01:34.000
And Q is the number of terms in the moving average. So before we go ahead and show you how to do this in Python, are there any questions, sort of like conceptually about the model?

01:01:34.000 --> 01:01:36.000
Okay, so there is an arema in the time series.

01:01:36.000 --> 01:01:45.000
But for the sake of like you guys learning how to do other stuff later, and either the problem session or the practice problems we're going to import the model called saramax.

01:01:45.000 --> 01:01:51.000
So the S. There stands for seasonal the version of Areema that we're learning is for data.

01:01:51.000 --> 01:02:01.000
Without seasonality, so there is an extension for seasonal data as well, which I think is either touched upon in the problem session or in the practice problems.

01:02:01.000 --> 01:02:11.000
So we're going to go ahead and say from stats, models, dot tsa.

01:02:11.000 --> 01:02:21.000
And then I think I need a dot Api. We're gonna import, Sarah Max.

01:02:21.000 --> 01:02:25.000
We're gonna use this Google training data.

01:02:25.000 --> 01:02:29.000
So for it to make this model we call Sarah Max.

01:02:29.000 --> 01:02:38.000
I import the data. Or I put in the training data. So Google underscore trained dot closing.

01:02:38.000 --> 01:02:46.000
Price. Then I give the order. So this Pdq. Is known as the Order of the model.

01:02:46.000 --> 01:02:53.000
So, I say, is equal to. And for this, just again, like you could do.

01:02:53.000 --> 01:03:04.000
And I believe in the next problem. Session. You'll implement cross validation here where you do a grid, and then you can choose different values for P.

01:03:04.000 --> 01:03:15.000
And for queue, for here I set d equal to one because of what we discovered in our previous notebooks for p and for queue, for here I set d equal to one because of what we discovered in our previous notebook, where, if I do first differencing I get a time, series that appears, to be stationary

01:03:15.000 --> 01:03:26.000
so, and I just chose 5 and 5, just because again, if I wanted to find like the best P and the best C, I would use cross validation.

01:03:26.000 --> 01:03:33.000
Then I call dot fit, and here I'm going to increase the number of iterations to make sure the algorithm has time to or has enough iterations to fit, and I forget what the exact argument is.

01:03:33.000 --> 01:03:49.000
So I'm gonna go ahead and go to the documentation real quick.

01:03:49.000 --> 01:03:54.000
I'm just looking for what they called the Max iter term, and maybe it's not here.

01:03:54.000 --> 01:04:00.000
So I'm just gonna try and call call it Max Iter, and then see if that breaks it.

01:04:00.000 --> 01:04:06.000
So let's say.

01:04:06.000 --> 01:04:12.000
Okay. Yup, so it's running. So it did not converge.

01:04:12.000 --> 01:04:18.000
And let's do. Maybe 10,000.

01:04:18.000 --> 01:04:24.000
So this is one. So sometimes you'll see. So in sk, learn like when a model fits, it just fits, and we don't see anything.

01:04:24.000 --> 01:04:32.000
Other packages will tend to like print things out. So I think you can set it so that this will go to a log file if you know how to set up a log file.

01:04:32.000 --> 01:04:37.000
But other packages like that's models will print out like the fitting process.

01:04:37.000 --> 01:04:41.000
So you can always come through and check if you know what this is doing.

01:04:41.000 --> 01:04:50.000
Okay. So I'm gonna go ahead and cheat. To make sure I actually called this thing what it was supposed to be.

01:04:50.000 --> 01:04:55.000
Wrong one. Here we go. So I just wanna double check that I have the okay.

01:04:55.000 --> 01:05:03.000
So there's no underscore. So that's why.

01:05:03.000 --> 01:05:10.000
Okay.

01:05:10.000 --> 01:05:11.000
So we just have to wait for it to fit the model.

01:05:11.000 --> 01:05:17.000
So this is finding the values for the betas and the alphas!

01:05:17.000 --> 01:05:23.000
From the AR and ma processes. So while this there we go. So it looks like it's spinning.

01:05:23.000 --> 01:05:30.000
It is fitting, and so then we can go ahead and plot the fit.

01:05:30.000 --> 01:05:34.000
Okay, so this is what the fit looks like with those values of P and Q.

01:05:34.000 --> 01:05:35.000
And then we could go through and see how it changes.

01:05:35.000 --> 01:05:46.000
So if we did something like instead of 5, 1, 5. And we did like maybe 2, 1, 2.

01:05:46.000 --> 01:05:51.000
We could see how like the fit changes. Okay.

01:05:51.000 --> 01:05:56.000
Alright. So the problem session. We'll talk a little bit more and you'll get practice with like choosing a p in a queue through cross validation.

01:05:56.000 --> 01:06:06.000
But if you want to also see how you can extend the Rima model for seasonal data, the theoretical points of it. You can check out the practice problems, notebook for seasonal data, the theoretical points of it. You can check out the practice problems.

01:06:06.000 --> 01:06:34.000
Notebook as well. So that's it for the Rima stuff. But before we close out this notebook and move on.

01:06:34.000 --> 01:06:35.000
Icons asking, Can you explain what P. And Q are? Again?

01:06:35.000 --> 01:06:44.000
Oh, I sure can! So in, whenever we specify in a rema model, we have to specify an order.

01:06:44.000 --> 01:06:51.000
Which is this triplet, Pdq. So the P. Specifies the number of auto regressive terms that are included in the model.

01:06:51.000 --> 01:07:00.000
So these are like the number of terms of like regressing w on to past observations of itself.

01:07:00.000 --> 01:07:06.000
So, if P. Was 3 it would be y t minus one progressing or y t regressing on y t minus one y t minus 2 and yt. Minus 3.

01:07:06.000 --> 01:07:16.000
The queue then specifies the number of moving average terms.

01:07:16.000 --> 01:07:22.000
So, if Q was 2, you'd see like a I guess an E and Epsilon T.

01:07:22.000 --> 01:07:28.000
And then an epsilon t, and then an epsilon t minus one, I believe or maybe it would just be the beta 0 epsilon T.

01:07:28.000 --> 01:07:32.000
I think is always there, so it'd be Beta Epsilon T.

01:07:32.000 --> 01:07:33.000
Minus one, and Beta epsilon beta 2 epsilon t, minus 2.

01:07:33.000 --> 01:07:40.000
So the P. And Q just determine, like the number of auto regressive terms.

01:07:40.000 --> 01:07:44.000
That's the P. And then the cue determines the number of moving average terms.

01:07:44.000 --> 01:07:50.000
That's the queue.

01:07:50.000 --> 01:07:58.000
And then the D is the number of times you do differencing.

01:07:58.000 --> 01:08:04.000
Melanie is asking, How does the sum of Beta Epsilon terms represent moving average in this model?

01:08:04.000 --> 01:08:07.000
So this is the assumed, this is the statistical model of a moving average process.

01:08:07.000 --> 01:08:26.000
So moving average process, you assume that you have this sequence of identically distributed random variables with mean 0 and a set variance that you're drawing at each time step.

01:08:26.000 --> 01:08:32.000
And so you have this. Now in the previous notebook we just took avverages of the previous values.

01:08:32.000 --> 01:08:43.000
So that's sort of how like that is moving averages where we're not like using an algorithm to find the different beta zeros, the beta ones.

01:08:43.000 --> 01:08:46.000
So like it's not a one to one correlation with what we called moving averages, because there is a process to find these estimates.

01:08:46.000 --> 01:08:59.000
We just didn't go over it so like in that process, I believe they assume like this is the first observation, and then the basically fit.

01:08:59.000 --> 01:09:09.000
The rest of the betas, using the differences from the next observations so like, I have some references in the next notebook that go through the actual fitting process.

01:09:09.000 --> 01:09:13.000
If you're interested, but it is like the moving averages.

01:09:13.000 --> 01:09:19.000
We looked at are a form of moving average model.

01:09:19.000 --> 01:09:20.000
So then Madhi's asking. He and Q aren't supposed to be equal right?

01:09:20.000 --> 01:09:27.000
So there's no reason that P. And Q. Don't have to like. I.

01:09:27.000 --> 01:09:30.000
There's a like theoretical reason, I believe, for P.

01:09:30.000 --> 01:09:40.000
And Q to not be equal, so they can be equal. If you want them to be, or if that's what the cross validation observes is the model that generalizes best.

01:09:40.000 --> 01:09:44.000
Power. Hms. Asking in general, do more. Pq. And D.

01:09:44.000 --> 01:09:55.000
Give a better result. So the d should be specified to the number of different things that need to be applied applied in order to get a time series, that it appears stationary applied in order to get a time series that appears stationary.

01:09:55.000 --> 01:10:04.000
So here we went with one, because the data that we are looking at after first differencing didn't appear to violate stationarity.

01:10:04.000 --> 01:10:12.000
So that's why D was one. And then it's not necessarily the case that just having a higher P and a higher queue will give you better performance.

01:10:12.000 --> 01:10:20.000
You'd have to do something like cross validation or a validation set to see which ones generalize. Well.

01:10:20.000 --> 01:10:24.000
Ramazan, saying, I just read on the documentation of Saramax, that we can add other features into this model if we choose to do so, does it just add linear regression terms?

01:10:24.000 --> 01:10:34.000
Yes, so I believe in general, if you, we can double check.

01:10:34.000 --> 01:10:38.000
So like the end dog. That's what we've been like.

01:10:38.000 --> 01:10:48.000
This is just the time series, I believe, and then the exog, which I believe stands for exogenous like these are, if you want to provide other features to the time series.

01:10:48.000 --> 01:10:49.000
So like maybe in addition to the stock price, you have other variables at each time.

01:10:49.000 --> 01:10:52.000
Step that maybe are used to, you know, help predict stock prices.

01:10:52.000 --> 01:11:07.000
You could include those here, and so I believe the way it works is they just include it like you'd have like, what's another Greek letter, Gamma? One x.

01:11:07.000 --> 01:11:15.000
One gamma, 2 x 2 gamma, 3 x 3, and then they would just fit.

01:11:15.000 --> 01:11:20.000
They would include it in the model that way.

01:11:20.000 --> 01:11:21.000
Yeah, I avoided those just for it to for clarity.

01:11:21.000 --> 01:11:25.000
I avoided this cause. It's it's hard to find data sets like that.

01:11:25.000 --> 01:11:27.000
That I can use for examples, and then also, I wanted to focus mainly on the time.

01:11:27.000 --> 01:11:38.000
Series component, because that is probably what's new to most of you.

01:11:38.000 --> 01:11:50.000
Okay? Are there any other questions? These are good questions.

01:11:50.000 --> 01:11:56.000
Okay.

01:11:56.000 --> 01:11:57.000
So that's gonna be it for the live lecture coverage of time series stuff.

01:11:57.000 --> 01:12:06.000
If you're interested in learning more like, maybe your project is on time series, or if you are like, I want to work in problems that work on forecasting.

01:12:06.000 --> 01:12:14.000
I've provided some next steps that you can go through, so you can, you know, obviously work through the problem sessions and the practice problem notebooks.

01:12:14.000 --> 01:12:18.000
It expands on the content that we've covered today.

01:12:18.000 --> 01:12:34.000
Here are some theoretical books that are useful. So these first 2 are like what I drew upon to sort of lay out some of the material we've covered, and then this last one, I think, is nice.

01:12:34.000 --> 01:12:43.000
So it's, you know, written by these 2 professors from Australia, and they, in addition, like, have, I believe they have like videos for all of their different lectures, or maybe not.

01:12:43.000 --> 01:12:49.000
Yeah. So they go through they're working in more of a the.

01:12:49.000 --> 01:12:52.000
This is for like, if you want to do it in R.

01:12:52.000 --> 01:13:07.000
So the coding is based on our. But the concepts like it doesn't matter like they still go over things like, you know, but the concepts like it doesn't matter like they still go over things like you know, or Rema, and stuff that it doesn't matter.

01:13:07.000 --> 01:13:11.000
If you're in or in python, the models are still the same. It's just different coding languages.

01:13:11.000 --> 01:13:12.000
So I thought, this is a nice book, and then at the end they go into some more like advanced forecasting stuff that we haven't covered, which may be, will be of interest.

01:13:12.000 --> 01:13:16.000
But I still believe from the people I've talked to, who work in industry like Areema, still gets used a lot.

01:13:16.000 --> 01:13:30.000
So it's not like, you know, been left in the dust by neural networks, or whatever other types of models it still gets used as a decent model.

01:13:30.000 --> 01:13:39.000
There are also some other python packages, so like we've used the Tsa sub package of stats, models.

01:13:39.000 --> 01:13:46.000
But I believe there's a nice list of time series packages that a Github user compiled that you can find here.

01:13:46.000 --> 01:13:50.000
Assuming that the link is still up from the last time I checked.

01:13:50.000 --> 01:13:54.000
So that's that you can also look through.

01:13:54.000 --> 01:14:12.000
So there are 2 additional Time Series lectures that we're not going to cover live the first that you'll want to come back to after we go over decision trees and random forests in our classification, and ensemble learning content and then number 10 you could probably go to at any time.

01:14:12.000 --> 01:14:15.000
So it talks about something called the Facebook profit Module.

01:14:15.000 --> 01:14:22.000
So I've talked to some friends of mine that work as like data analysts and data scientists and you know, some of them have said, this model still gets used quite a bit in industry.

01:14:22.000 --> 01:14:26.000
It is one of the models that I think gets blamed for sort of that zillow disaster that happened a couple of years ago.

01:14:26.000 --> 01:14:34.000
But there are some like mitigating factors that it's not necessarily the models fault that that disaster happened.

01:14:34.000 --> 01:14:41.000
But you know you want to be careful when you use it, because it is slightly different.

01:14:41.000 --> 01:14:46.000
Than the models we've looked at so far, but you know it's useful to know.

01:14:46.000 --> 01:14:50.000
So I go over the theory of it, and then how to implement it in notebook number 10.

01:14:50.000 --> 01:14:55.000
So 9 and 10 both have pre-recorded lectures.

01:14:55.000 --> 01:15:00.000
If you wanna follow along with a voice in a video. Okay?

01:15:00.000 --> 01:15:13.000
So that's time series. Let me go ahead and shut these notebooks down, and we will get started on classification.

01:15:13.000 --> 01:15:19.000
Okay, so we're gonna start just like we start with time series, we're gonna start by like talking about some adjustments.

01:15:19.000 --> 01:15:26.000
We need to make for classification problems. And then I think there's a chance that that will take us until the end of time.

01:15:26.000 --> 01:15:30.000
Like for today, and we'll come back for more classification stuff tomorrow.

01:15:30.000 --> 01:15:41.000
So once this kernel loads. It's been taking a while. I don't know why.

01:15:41.000 --> 01:15:46.000
Okay, so we're gonna sort of look at an illustrating example.

01:15:46.000 --> 01:15:55.000
First, it's like a tiny, silly little example that you will probably rarely encounter in the real world, but it kind of illustrates sort of an idea.

01:15:55.000 --> 01:16:03.000
So we have some sample output data. So in classification, we're trying to predict sort of a categorical or binary outcome.

01:16:03.000 --> 01:16:07.000
So in this example, we're going to imagine we have a binary outcome of why that is like zeros and ones that we'd like to predict.

01:16:07.000 --> 01:16:10.000
So here's some sample output data. So why has all these observations?

01:16:10.000 --> 01:16:23.000
And 2 of them are ones, and we're going to sort of imagine what happens if we make a trained test split the way we've been doing.

01:16:23.000 --> 01:16:26.000
So just using sk learns train test, split.

01:16:26.000 --> 01:16:31.000
So here I go through, and I just make 5 different random splits.

01:16:31.000 --> 01:16:34.000
And then I look at the train, test the train set and the test set, just to illustrate what happens.

01:16:34.000 --> 01:16:42.000
So the first like, let's see, 4 of the 5.

01:16:42.000 --> 01:16:48.000
None of the one examples show up in the test set, which is okay for the training right?

01:16:48.000 --> 01:16:59.000
Because we're still getting the ones in the training set, but because the test set don't have does not have any ones in it. It's hard for us to get a sense of like it's our test set. Performance is going to be.

01:16:59.000 --> 01:17:00.000
If we don't have any of the ones in the test set.

01:17:00.000 --> 01:17:04.000
So ones are often used to encode variables of like the thing we're trying to predict.

01:17:04.000 --> 01:17:25.000
So maybe this thing we're trying to predict is whether or not someone is exclusing credit card fraud, or whether or not this person has a certain deadly disease so it's important to be able to gauge like how good we are at predicting ones, because that's typically the

01:17:25.000 --> 01:17:36.000
thing we care about most. Here is another situation where we have a split, where all of the ones ended up in the test set, and none of the ones were in the training set.

01:17:36.000 --> 01:17:39.000
So if something like this happens, it's impossible for us to train a model to predict what one is, because we don't have any examples to show our model.

01:17:39.000 --> 01:17:54.000
So with classification data, doing the same train test split that we've done for regression is not going to it's not gonna cut it.

01:17:54.000 --> 01:18:01.000
There's some issues that can arise. So what what's going on here is whenever we make a train test split.

01:18:01.000 --> 01:18:16.000
There's this underlying assumption for all predictive modeling that the data that we're training on is the same has, like the same underlying distribution as the data that we're drawing from.

01:18:16.000 --> 01:18:17.000
So out in the world. Maybe we have this sample that is split like this between zeros and ones.

01:18:17.000 --> 01:18:25.000
So like roughly a sixth of the data is a one.

01:18:25.000 --> 01:18:32.000
And what we make the train test split like we've been doing so far, we could hypothetically end up with the situation where the training set has about a third one's and the test set has about a twelfth one.

01:18:32.000 --> 01:18:36.000
So neither of these are reflective of the distribution that they were originally drawn from.

01:18:36.000 --> 01:18:52.000
So what? We're trying to do is make these train test splits in a way that the training set and the test set, and then, similarly, for cross validation and validation sets.

01:18:52.000 --> 01:18:59.000
They all are roughly similar to the sample you know, which is hopefully similar to the underlying distribution.

01:18:59.000 --> 01:19:02.000
So this is what we're hoping to have with any type of data split.

01:19:02.000 --> 01:19:03.000
We're making is that the distribution of zeros and ones?

01:19:03.000 --> 01:19:17.000
Or if we have multi-class, the distribution of the possible values of Y are the same, or pretty close to the same across all the different data splits, we're making.

01:19:17.000 --> 01:19:21.000
So how can we do that? It's called a stratified splits?

01:19:21.000 --> 01:19:26.000
So in theory, what we do is we first take our data.

01:19:26.000 --> 01:19:41.000
We stratify it into the different classes. So if we have binder data, we would do all zeros over here and all ones over here, then, on all of the different possible classes for Y, we do a random train test Split.

01:19:41.000 --> 01:19:42.000
So we'd have in this example a training set of zeros, a test set of zeros, a training set of ones, a test set of ones.

01:19:42.000 --> 01:19:54.000
And then after this, the splits are done on each individual clock class.

01:19:54.000 --> 01:19:58.000
For why we recombine them into an overall training set.

01:19:58.000 --> 01:20:08.000
So all the training ones and all the training zeros get recombined, and then all the test zeros and all the test ones get recombined to make a training set in a test set.

01:20:08.000 --> 01:20:09.000
So an sk learn you can do this by just providing a stratify argument.

01:20:09.000 --> 01:20:19.000
So here's that beer data set, we looked at before where we had both Stouts and Ipas.

01:20:19.000 --> 01:20:27.000
So the split there was roughly 56% Ipa, 44% stout.

01:20:27.000 --> 01:20:33.000
So the way we would do this, as you call trained tests split.

01:20:33.000 --> 01:20:36.000
You're gonna input the data frame. So beer dot copy.

01:20:36.000 --> 01:20:42.000
If you had a oh, numpy raise, you might do like x comma y.

01:20:42.000 --> 01:20:48.000
We're still gonna do shuffle equals true, we'll put in a random state to be.

01:20:48.000 --> 01:20:59.000
I don't know. 1, 2, 2, 2, 2, 2, and then the final thing that we need to put in well, I guess the second to the last thing is test size.

01:20:59.000 --> 01:21:00.000
So let's do. I don't know point 2.

01:21:00.000 --> 01:21:04.000
So the last thing we need to put in is this stratify argument?

01:21:04.000 --> 01:21:10.000
So we say, stratify, and then you need to input the variable that you're stratifying on.

01:21:10.000 --> 01:21:14.000
So for us that's going to be beer at beer.

01:21:14.000 --> 01:21:19.000
Underscore type.

01:21:19.000 --> 01:21:26.000
The variable we are stratifying.

01:21:26.000 --> 01:21:27.000
So now we can see in the training set. It's roughly the same.

01:21:27.000 --> 01:21:33.000
It's not exactly the same, but it's pretty close.

01:21:33.000 --> 01:21:35.000
And then the test set is the same way again. It's not going to be exactly the same right.

01:21:35.000 --> 01:21:45.000
Because we're dealing with discrete numbers. And so it's just sometimes impossible to get the exact same split.

01:21:45.000 --> 01:21:53.000
But these are for all intents and purposes the same split of like 56 45.

01:21:53.000 --> 01:21:54.000
So that's how you do. A stratified train test Split.

01:21:54.000 --> 01:21:58.000
Are there any questions on anything we've talked about so far in this notebook?

01:21:58.000 --> 01:22:05.000
Can you only stratify on a single variable?

01:22:05.000 --> 01:22:12.000
So you can stratify a multiple variables. I'm not entirely sure I'd have to like double check the documentation to see how to do it in train test. Split.

01:22:12.000 --> 01:22:19.000
But in practice you can so like in a lot of like clinical trials, right?

01:22:19.000 --> 01:22:27.000
You might stratify your patience, based on things like gender or other features so like.

01:22:27.000 --> 01:22:31.000
If you are trying to do like oh, what's an example?

01:22:31.000 --> 01:22:37.000
So we did for part of my Phd research, we did a study like trying to get people to get flu shots.

01:22:37.000 --> 01:22:41.000
And so 2 of the things we stratified on were the self-reported gender.

01:22:41.000 --> 01:22:46.000
And then, whether or not that person had a requirement from their job to get a flu shot.

01:22:46.000 --> 01:22:49.000
So we stratified on 2 things there. So it is.

01:22:49.000 --> 01:22:59.000
It can be done in theory, the thing to remember is the more things you include in the stratification so like, if you try and stratify on too many things it will be maybe impossible to actually get observations in each of your boxes.

01:22:59.000 --> 01:23:09.000
Because you're just trying to spit the data up too much.

01:23:09.000 --> 01:23:10.000
So, Pedro's asking, How can you stratify?

01:23:10.000 --> 01:23:15.000
Non-binary features and labels. So it works exactly the same way.

01:23:15.000 --> 01:23:26.000
So like, let's say, if you know, instead of 2 beers I had Ipa stouts and loggers, so that's another type of beer so if I had that, I would still just put this in here and the splitting would happen exactly the same way.

01:23:26.000 --> 01:23:40.000
So instead of having zeros and ones, here, imagine if you had zeros ones and twos, and then the exact same thing happens where you do a random split on the twos and then recombine. After that.

01:23:40.000 --> 01:23:42.000
So it's always the same, always gonna be the same process of randomly splitting the individual possible categories.

01:23:42.000 --> 01:23:53.000
Then recombining them.

01:23:53.000 --> 01:23:54.000
Yeah, so you can use it. And people do use it as well for regression.

01:23:54.000 --> 01:24:11.000
So let's say you have a categorical, variable that you want to use in regression, and you want to ensure that there's equal representation of the categories and your trained test splits or cross validation. You can use that as you could do a stratified train test

01:24:11.000 --> 01:24:24.000
split there as well. Another thing I've seen people do, at least in textbooks, is like, maybe you have sort of a weird distribution where you have a few really high value observations.

01:24:24.000 --> 01:24:35.000
You can like, sort of bend the data, you could bend the Y data and then do the stratified split on the bins, and then unbin the data like when you're actually doing the predictions and stuff.

01:24:35.000 --> 01:24:41.000
So you can do it for that as well.

01:24:41.000 --> 01:24:49.000
Any other questions?

01:24:49.000 --> 01:24:52.000
Okay, so, just like there was a time series split for time.

01:24:52.000 --> 01:24:55.000
Series, cross Validation. There is a stratified K-fold object for stratified crosstalidation.

01:24:55.000 --> 01:25:04.000
So from S. K. Learn.

01:25:04.000 --> 01:25:08.000
This is also stored in model selection.

01:25:08.000 --> 01:25:15.000
We're gonna import, stratified Kate, Fold.

01:25:15.000 --> 01:25:29.000
And then just like before you specify the model object so stratified a fold and then we're gonna input 5, and then scuffle equals.

01:25:29.000 --> 01:25:38.000
True Randall. I don't know that. Okay.

01:25:38.000 --> 01:25:40.000
And now you might be wondering what'll wait a minute.

01:25:40.000 --> 01:25:45.000
Where do you specify the stratified part that comes when you call the split?

01:25:45.000 --> 01:25:58.000
So when I call the split, I'm gonna have to have so unlike with regular cross validation, where for a data frame, I could input.

01:25:58.000 --> 01:26:05.000
I I could input the data frame without an X and a Y for cross validation to work with stratified.

01:26:05.000 --> 01:26:14.000
You need both the X hey, and the Y, or you need like the you know your features, and then, whatever columns you're using to stratify.

01:26:14.000 --> 01:26:24.000
So here it's beer type. But like or I think it was, Joe like that question, you know, if you had more than one, you'd probably need to specify it here.

01:26:24.000 --> 01:26:30.000
So we're gonna run through this. And then, oh, no!

01:26:30.000 --> 01:26:36.000
What happened? Oh, by randomly pouting my keyboard, I made a random state number.

01:26:36.000 --> 01:26:39.000
That's too big. So let's do that. There we go.

01:26:39.000 --> 01:26:59.000
Okay. So now you'll see. Like as you go through again, it's not gonna be perfectly the same for each one, but it'll be roughly the same across all of our spits, and then sort of just showing you training split is roughly the same as the holdout set split.

01:26:59.000 --> 01:27:09.000
Alright! Are there any questions about stratified fold?

01:27:09.000 --> 01:27:10.000
Awesome, alright, so I will stick around if you guys have any questions tomorrow.

01:27:10.000 --> 01:27:22.000
The next lecture will continue on with our classification stuff, and actually learn some models, and then some different metrics.

01:27:22.000 --> 01:27:28.000
Tomorrow. In the problem session problem, session number 7 you'll finish up time series, content.

01:27:28.000 --> 01:27:30.000
And yeah, and then we're just moving right along. So I will stop recording.

01:27:30.000 --> 01:27:35.000
And then I will