WEBVTT 00:00:00.000 --> 00:00:01.000 Okay, welcome back. This is lecture number 7. So this is our second day on time series. 00:00:01.000 --> 00:00:04.000 So that's what we're gonna be wrapping up today. 00:00:04.000 --> 00:00:15.000 So the last time we left off we were working on learning about appveraging and smoothing. 00:00:15.000 --> 00:00:16.000 So that's where we're going to pick up today. 00:00:16.000 --> 00:00:27.000 Let me get my chat set up for questions. And okay, so. 00:00:27.000 --> 00:00:28.000 What we had gotten through was sort of talking through moving average forecasts. 00:00:28.000 --> 00:00:42.000 We applied them to this Google training set along the way we learned things like dot rolling, which takes a rolling average of a set window size. 00:00:42.000 --> 00:00:48.000 Then we applied these models to that data set. 00:00:48.000 --> 00:00:56.000 We also had mentioned how you can use rolling averages to get a sense of features of the data set like whether or not it has a trend. 00:00:56.000 --> 00:01:17.000 We talked about more general weighted average forecasts where a moving average has a sort of set weights on the individual, the one the earlier moving average has equal weight on all the previous observations, whereas in general you can have weighted averages where you have different weights on each of 00:01:17.000 --> 00:01:20.000 the previous observations, and this was an example of that. 00:01:20.000 --> 00:01:27.000 And then finally, we introduced exponential smoothing with just the first one, which was simple, exponential, smoothing. 00:01:27.000 --> 00:01:36.000 And so, because for the rest of this notebook we're going to build upon simple exponential smoothing I think it'd be a good idea to review what simple, exponential smoothing is. 00:01:36.000 --> 00:01:44.000 So we have a time. Series y sub t. And then for that time series, any observation that's in the training set the 4 key. 00:01:44.000 --> 00:01:45.000 The fitted value on that time series. You take alpha Times, y sub t. 00:01:45.000 --> 00:01:54.000 So alpha times the current value plus one minus alpha times. 00:01:54.000 --> 00:01:55.000 The fitted or forecasted value at that time step. 00:01:55.000 --> 00:02:08.000 So this would be like one step into the fifth. If your data, or if what you're trying to forecast is outside of the training set, you would do alpha times. 00:02:08.000 --> 00:02:12.000 The last observation, plus one month. Alpha times. 00:02:12.000 --> 00:02:18.000 The fitted value for the last observation. So, Alpha, here is a hyper parameter, that is, from 0 to one inclusive, that you can either choose by hand, or you can use some sort of algorithm to fit it. 00:02:18.000 --> 00:02:42.000 So, if you choose some side of some sort of algorithm, this could be the one that's implemented by stats, models, or you could use a cross filedation process to find the one that has the best average Msc okay, so we talked about how this is sort of like a slight adjustment on the naive 00:02:42.000 --> 00:02:53.000 forecast where you're taking the current fitted value, and then adding alpha times like a random draw where the random draw is being approximated by the error term. 00:02:53.000 --> 00:03:03.000 We're also thinking of it as like a weighted average that we can sort of more easily optimize than just try and find each individual weight by setting it up as like Alpha times. 00:03:03.000 --> 00:03:12.000 This geometric sum. So to implement simple exponential smoothing in python, we use stats, models. 00:03:12.000 --> 00:03:32.000 First, we talked about the package and checking to make sure you have it installed, but once you have it installed, you can do simple exponential smoothing with this model object called simple X exp smoothing, and then to fit such a model, you call the model object input the training. 00:03:32.000 --> 00:03:37.000 Data which is this, the time series? And then you call dot fit. 00:03:37.000 --> 00:03:54.000 You can specify the smoothing level which is your Alpha, what we called alpha, and unless you set optimized equal to false, the algorithm will then go through sort of a fitting process to find the optimal alpha where in this case it's doing optimal for the training set 00:03:54.000 --> 00:04:08.000 not optimal for predicting so if you're going to use this for predicive models that we're going to do, you're going to want to set optimized equal to fours and then use something like cross validation to find your value of Alpha so this now, produces a 00:04:08.000 --> 00:04:13.000 fitted model object in stats models. So once you have a fitted model object, you can call dot fitted values which will give you what the fit is on the training data. 00:04:13.000 --> 00:04:23.000 And then dot forecast for you input and integer. So this is the length of the test set. 00:04:23.000 --> 00:04:29.000 But we could also do just like 2, and the integer a positive integer. 00:04:29.000 --> 00:04:37.000 It will give you that many time steps into the future. So to get the forecast on the test set, we would just put in the length of the test set. 00:04:37.000 --> 00:04:51.000 Now I want to remind everybody that because we're just like the models and these notebooks, we're not doing cross validation or a validation set. So I'm just sort of giving you a sense of what's they look like against a set that it wasn't trained on so because 00:04:51.000 --> 00:04:57.000 of that. We're looking at the test set in practice when you're doing forecasting you do not touch the test set till the very end right so you'd want to do something like a validation set or cross validation. 00:04:57.000 --> 00:05:09.000 So this is just for the simplicity of the of the notebook to be able to introduce the model and then look at it on a data set. It wasn't trained on. 00:05:09.000 --> 00:05:17.000 And so here's what that model looks like for this particular choice of Alpha. 00:05:17.000 --> 00:05:41.000 And this particular data set. Okay, so are there any questions on any of the stuff that we talked about last time that you wanna make sure we wrap up before we move on? 00:05:41.000 --> 00:05:47.000 Okay, so after simple, exponential smoothing, this is a model that you might want to use. 00:05:47.000 --> 00:05:52.000 It works pretty well here, I would say, in terms of like, in comparison to some of the baselines. 00:05:52.000 --> 00:06:00.000 We looked at, but in general simple, exponential smoothing is for data without a trend and data without seasonality. 00:06:00.000 --> 00:06:07.000 So both no trend and no seasonality there are some extensions of exponential smoothing that account for this, so one extension is for data that has a trend, but no seasonality. 00:06:07.000 --> 00:06:14.000 And so this is called double exponential smoothing. 00:06:14.000 --> 00:06:19.000 It's called this because you just do exponential smoothing twice. 00:06:19.000 --> 00:06:22.000 So this is going to look weird. The way it's set up. 00:06:22.000 --> 00:06:25.000 It's just a kind of have to do it this way. 00:06:25.000 --> 00:06:29.000 Because it's a slightly complicated like thing to write down. 00:06:29.000 --> 00:06:33.000 So the forecast here is for any value within the training set. 00:06:33.000 --> 00:06:39.000 You're going to do this thing called S. Sub t, one plus b sub t minus one. 00:06:39.000 --> 00:06:42.000 And I think I probably should have kept my notation consistent. 00:06:42.000 --> 00:06:48.000 But this is for predicting, like at time T, we're using the previous observation. 00:06:48.000 --> 00:06:55.000 So s sub t plus one plus b sub t minus one, and we will talk about like what the heck are. S and B. 00:06:55.000 --> 00:07:02.000 We'll talk about that in just a second, and then for observations outside of the training set. You do? S. 00:07:02.000 --> 00:07:09.000 Sub n. So this is the corresponding s term for the last observation of the time series plus t minus n. 00:07:09.000 --> 00:07:13.000 So this is just how many times it's in the future. 00:07:13.000 --> 00:07:19.000 Are you from your training? Set times B. Sub n, and so now, let's talk about what this S. 00:07:19.000 --> 00:07:27.000 And B are. So we've got S. And then here again, I guess I'm really really inconsistent with my notation. 00:07:27.000 --> 00:07:28.000 But the value of S at time t plus one is alpha times y sub t plus one minus alpha times. 00:07:28.000 --> 00:07:43.000 S, sub t, minus one plus b sub t, minus one. So if you keep track of this, this is the predicted value at time, T. 00:07:43.000 --> 00:07:44.000 Where we're assuming S of one is just equal to Y sub. 00:07:44.000 --> 00:07:56.000 One. The first observation. So this part should look very familiar to simple, exponential, smoothing. 00:07:56.000 --> 00:08:02.000 It's just simple, exponential smoothing. Where now the S is sort of being broken down into components. 00:08:02.000 --> 00:08:10.000 S. And B, and then the B part is where the double part of double exponential smoothing comes in. You're doing here. 00:08:10.000 --> 00:08:30.000 You're doing exponential smoothing. But now for the trend component, so b sub t plus one is equal to beta times, s sub t minus s sub t minus one plus one minus beta times b sub t minus one so this right here, might look familiar it's sort of like a first 00:08:30.000 --> 00:08:31.000 differencing. So this is giving you a estimate of the trend or the of the of the drift sort of thing. 00:08:31.000 --> 00:08:54.000 The difference that you might be expecting to get from one time step to the next, and you're basically just doing like an exponential smoothing on in the baseline model sort of what we called the betas for the random walks with drifts so here you're basically just doing 00:08:54.000 --> 00:09:05.000 exponential smoothing on that, and then to get your prediction you're taking like the last observation bus to your exponentially smooth, trend component, and doing exponential smoothing on that as well. 00:09:05.000 --> 00:09:13.000 So 2. Exponential smoothings. So before we pause for questions, I'll show you how to do this in stat models. 00:09:13.000 --> 00:09:30.000 It's the exact same process. And I guess I should also mention the Beta is also a hyper parameter from 0 to one just like Alpha so now you have 2 hyper parameters that you can adjust by hand use the stats models to get the best version on the 00:09:30.000 --> 00:09:35.000 training set where the what they mean by best depends on whatever algorithm they're using. 00:09:35.000 --> 00:09:43.000 Or you can do cross validation with a grid of alpha values and a grid of Beta values to find the one that gives you the best Msc. 00:09:43.000 --> 00:09:48.000 Say so. The way to fit double exponential smoothing is the same. 00:09:48.000 --> 00:09:58.000 It's just the model object is different. So here the model object is called the Holt model, named after one of the researchers that developed the model. 00:09:58.000 --> 00:09:59.000 If you look at this, you maybe will guess the other one is Winters. 00:09:59.000 --> 00:10:07.000 And so then you call Holt. You put in the training set. 00:10:07.000 --> 00:10:12.000 You call dot fit. You put in smoothing level. So this is still Alpha. 00:10:12.000 --> 00:10:16.000 So for this initial version, I just chose it to be point 6. 00:10:16.000 --> 00:10:21.000 No particular reason, and then smoothing underscore trend is the value of Beta. 00:10:21.000 --> 00:10:23.000 So here I've chosen both Alpha and Beta to be point 6, and then again I chose optimize to be false because I don't want to use. 00:10:23.000 --> 00:10:30.000 I don't want stats. Models to find the one that's best. 00:10:30.000 --> 00:10:38.000 On the training data, you know, because that's not what I care about with predictive models. 00:10:38.000 --> 00:10:43.000 So now it's been fit. And then just sort of looking at the difference. 00:10:43.000 --> 00:10:53.000 So here, you know, it's kind of capturing that the values I've chosen are sort of picking up more so on the very recent trend of a slight downturn in the stock price. 00:10:53.000 --> 00:10:54.000 And so that's what you see it going down. So on the very recent trend of a slight downturn in the stock price. And so that's why you see it going down. 00:10:54.000 --> 00:10:56.000 Another thing you might be wondering is, this doesn't look like a constant decrease. 00:10:56.000 --> 00:11:06.000 So remember here our time series, the time steps we're considering as sequential our trading days. 00:11:06.000 --> 00:11:10.000 And so, because there's a gap for the weekends like. 00:11:10.000 --> 00:11:11.000 That's why it looks like it's been shifted in this sort of piece. Wise way. 00:11:11.000 --> 00:11:20.000 If we, instead of plotting the date on the horizontal axis for applieding trading day, one trading day, 2. Trading day, 3. 00:11:20.000 --> 00:11:30.000 It would just look like a straight line. Okay? Alright. So before moving on to the last exponential smoothing model, we'll consider. 00:11:30.000 --> 00:11:47.000 Are there any questions about the double exponential smoothing? 00:11:47.000 --> 00:11:49.000 Okay. So the last model, we're going to talk about is triple, exponential, smoothing. 00:11:49.000 --> 00:12:01.000 So, okay, yeah, we have a question. So Yahweh is asking, why is the trend decreasing? 00:12:01.000 --> 00:12:02.000 So the overall trend of the original data set right is increasing. 00:12:02.000 --> 00:12:09.000 And I think we have it here right? It's increasing. 00:12:09.000 --> 00:12:30.000 But if you notice the more recent trend, is a decrease trend, and so the values of Alpha and Beta that we chose for this particular model are putting a heavier focus on the more recent trend of going down and sort of predicting a continued downward trend now this 00:12:30.000 --> 00:12:50.000 isn't the best for this particular. You know this particular option. So we would want to again do a cross-validation to sort of get like on average, what seemed to be the best values of Alpha and Beta. 00:12:50.000 --> 00:12:56.000 Tian and Wong is saying, obviously, we've said that these methods don't predict the stock market. 00:12:56.000 --> 00:13:00.000 Well, what kinds of scenarios might this be good for in practice? 00:13:00.000 --> 00:13:16.000 So this model may or may not do well on predicting various stock market data, just because I chose like values of Alpha and Beta that didn't do well on this one test set doesn't mean that you couldn't get a good model with exponential smoothing it's just a tool 00:13:16.000 --> 00:13:19.000 in your tool belt, and you like like with regression. 00:13:19.000 --> 00:13:31.000 You'll try lots of different models on the time series that you're working on, and then find which one tends to do best tends to do best overall with something like cross validation. 00:13:31.000 --> 00:13:43.000 So this could work. Well, it's just I haven't gone through the process of trying to find the best values of Alpha and Beta. 00:13:43.000 --> 00:13:49.000 Okay. So the triple, exponential smoothing are also sometimes known as the halt winters Forecast. 00:13:49.000 --> 00:14:10.000 Again after the researchers that developed the model. This is 3 types of smoothhing, and so they're gonna look really weird, because not only is there another type of smoothing, there is now a new type of like there's 2 types of this model so it's like we have 2 things 2 different models with 3 different forms. 00:14:10.000 --> 00:14:19.000 Of smoothing. So the key thing to think of is, I'm going to ignore the particular models for a second and just give you a general breakdown. 00:14:19.000 --> 00:14:29.000 So here you're still gonna have smoothing for sort of just the time series, like if it didn't have a trend or seasonality, then you'll have the smoothing part for the trend. 00:14:29.000 --> 00:14:33.000 So that's the S and B that we just saw and double exponential smoothing. 00:14:33.000 --> 00:14:39.000 And then the final portion is another smoothing. But now, this time for the seasonality. 00:14:39.000 --> 00:14:40.000 So this is for data sets that seem to exhibit seasonality the third type of smoothing is for that. 00:14:40.000 --> 00:14:54.000 So the reason there are 2 types of the whole winters model is because of the different types of seasonality. 00:14:54.000 --> 00:14:55.000 So the first type of seasonality, so the first type of seasonality, you might see, is something called multiplicative seasonality. 00:14:55.000 --> 00:15:02.000 So multiplicative seasonality, so multiplicative, seasonality, is the idea behind. 00:15:02.000 --> 00:15:04.000 It is, for at each step in your season your time series tends to be multiplied by a certain value. 00:15:04.000 --> 00:15:20.000 So, for instance, like I think here I have an example with trying to build on our infectious diseases. 00:15:20.000 --> 00:15:28.000 Example, with the flu training set. So in an infectious disease model and I wanna like sort of preface. 00:15:28.000 --> 00:15:31.000 This as like this isn't what infectious disease modelers use, but it's sort of just trying to give a rationale behind like multiplicative seasonality. 00:15:31.000 --> 00:15:46.000 So, in an infectious disease, that is seasonal. Early in the season you have to consider sort of this parameter called are not, and R. 00:15:46.000 --> 00:15:51.000 Naught is the number of infected, or the number of susceptible individuals. 00:15:51.000 --> 00:16:03.000 So people who are not immune to the disease that an infected person is likely to infect during their lifetime and in like a population where there ares like nobody else that's sick. 00:16:03.000 --> 00:16:06.000 So it's sort of a theoretical cost that gets used all the time with like trying to see how infectious the disease is. 00:16:06.000 --> 00:16:24.000 So early on in an infectious disease. Basically, you can think of from one time step to the next if the time step is the lifespan of the disease, then you might expect, like the number of infected people from time, step one to time. Step. 00:16:24.000 --> 00:16:32.000 2 to multiply by R. Naught. And so basically, what we mean by multiplicative seasonality is, there's sort of a multiplicative factor for each time. 00:16:32.000 --> 00:16:33.000 Step in the season. If that makes sense. So that's multiplicative. 00:16:33.000 --> 00:16:39.000 Seasonality and all these formulas are setting up is just these 2 are basically the same as the last one. 00:16:39.000 --> 00:16:49.000 So doing. This, smoothing on the seasonality part, are smoothing on just the regular time series. 00:16:49.000 --> 00:16:50.000 So you have your current value, and it's being multiplied by this. I guess. 00:16:50.000 --> 00:16:59.000 Divide by the seasonal part, and then you have the trend part here. 00:16:59.000 --> 00:17:02.000 Then for the trend part, it's the same exact thing as before. 00:17:02.000 --> 00:17:22.000 And now this new seasonal part, you can see it's multiplicative, because instead of like adding, you're having these constants being multiplied by a different factors, these C, these Ct minus M's are, you know, the current or object observation divided by some factor, S sub T 00:17:22.000 --> 00:17:26.000 the next type of seasonality you might get is called additive seasonality. 00:17:26.000 --> 00:17:28.000 And so the idea between for additive seasonality is for each time step in the cycle. 00:17:28.000 --> 00:17:52.000 Some amount is added to the previous step. So, for instance, to maybe in you work for an ice cream shop or something like that, and you see from quarter one to quarter to you always tend to see a constant increase of 100 from whatever the price was in quarter one to whatever you know, to quarter, to and maybe 00:17:52.000 --> 00:18:14.000 that's something like quarter. One is in a colder season of the year, and then quarter to is in a warmer season of the year, so you always see, like a pretty standard bump in sales, as the temperatures tend to go up and so that's called and so that's called additive seasonality where the current value of the time series is added to or 00:18:14.000 --> 00:18:21.000 subtracted from depending on where you are in the season. And so this is the triple, exponential smoothing where the seasonal term is additive, and that's the slight difference here. 00:18:21.000 --> 00:18:27.000 It's just accounting for the additivity. 00:18:27.000 --> 00:18:32.000 So it sees. I see we have a question here. 00:18:32.000 --> 00:18:36.000 Icons asking, and again I apologize if I'm saying your name incorrectly, why wouldn't we want the maximum likelihood to find the best? Alphas! 00:18:36.000 --> 00:18:44.000 And betas in the training set. Isn't that another way to find the best estimates for Alpha and Beta? 00:18:44.000 --> 00:18:49.000 So we could do cross validation and try and find these estimates via maximum likelihood. 00:18:49.000 --> 00:18:59.000 So finding the estimate, finding the values of Alpha and Beta that provide the best fit on the training set isn't our goal for predictive modeling. 00:18:59.000 --> 00:19:08.000 So remember, in predictive modeling, we could fit perfectly on the training set, and it wouldn't do us any good if we weren't able to predict on things we haven't seen before. 00:19:08.000 --> 00:19:26.000 So with predictive modeling, we wouldn't want to do something like maximum likelihood to find the best values of Alpha and Beta on the training set and use those we would want to use like Cross validation to find the best value that the values of alpha and beta and then in 00:19:26.000 --> 00:19:31.000 this instance, if we did the third one gamma as well, that provide, like the lowest Msd. 00:19:31.000 --> 00:19:35.000 Or the lowest mean absolute error across the cross. 00:19:35.000 --> 00:19:52.000 Validation. Now you could, I think, maybe, what you're alluding to with your question you could do something where your initial guess and your grid for Alpha Beta and Gamma is determined by maximum likelihood, so like whatever stats models is doing in the background is determined by 00:19:52.000 --> 00:19:54.000 that, and then you use that to set up a grid around those values. 00:19:54.000 --> 00:20:04.000 But it may not necessarily be the case that that will give you the overall best values for Alpha, Beta, and Gamma. 00:20:04.000 --> 00:20:15.000 But it is an approach you could take so you could. Do, you know, use the stats models to get the alpha, the Beta and gamma that best fits the training data. 00:20:15.000 --> 00:20:19.000 And then well, I guess part. It might be more complicated to code up, but then you could try and do that. 00:20:19.000 --> 00:20:26.000 If that makes sense. 00:20:26.000 --> 00:20:27.000 Thank you. Alright. So how do we do this in Python? 00:20:27.000 --> 00:20:35.000 So now, instead of simple, xonential smoothing, or halts, now we just do exponential smoothing. 00:20:35.000 --> 00:20:48.000 So this is the name of the model object for the third one, and I think in general you can use exponential smoothing to do either of the prior models as well but I would have to check the documentation more closely. 00:20:48.000 --> 00:20:54.000 So we're going to use this flu data set that we've looked at before. 00:20:54.000 --> 00:21:01.000 And then, just like before, I'm gonna set aside the last year as my test set. 00:21:01.000 --> 00:21:04.000 So the way that this works is exactly the same way as before. 00:21:04.000 --> 00:21:09.000 So we're going to call this put in the training data and then call Dot fit. 00:21:09.000 --> 00:21:19.000 And then input the parameters. So we're gonna do exponential smoothing. 00:21:19.000 --> 00:21:21.000 And then flew trained dot cases. Okay? So now that I'm reading my comments, I there are 2 new things. 00:21:21.000 --> 00:21:34.000 Here that we have not had before. So the first thing we, the first new thing is, we have to tell it what type of seasonality we want. 00:21:34.000 --> 00:21:35.000 So for this example, we're going to use multiplicative seality. 00:21:35.000 --> 00:21:48.000 So seasonal, is equal to and ul, and then you also have to set the number of periods. 00:21:48.000 --> 00:21:58.000 So by. I guess not. The number of periods, the number of time steps within a period, so that for this they call it seasonal underscore periods. So we would do. 00:21:58.000 --> 00:22:05.000 Seasonal underscore periods. And for this data it's 52, because our data is weekly. 00:22:05.000 --> 00:22:09.000 And it's a yearly period or season. Maybe I'm getting period wrong. 00:22:09.000 --> 00:22:12.000 I apologize if I am so, then we're gonna call fit. 00:22:12.000 --> 00:22:17.000 And then the 3 things we need to put in our Alpha our Beta, and our Gamma. 00:22:17.000 --> 00:22:22.000 So these again. I'm not going to go through a cross-sidation. 00:22:22.000 --> 00:22:25.000 So I'm just going to pick values. So for Alpha, it's called again. 00:22:25.000 --> 00:22:27.000 It's called smoothing underscore level. 00:22:27.000 --> 00:22:39.000 And let's just go with point 6 again. And then for Beta, it's a smoothing, not smoothing, smoothing trend, and this is also point 6 again. 00:22:39.000 --> 00:22:40.000 And then finally, we're doing a smoothing seasonal. 00:22:40.000 --> 00:22:47.000 So this is our Gamma, and this is a point 6. 00:22:47.000 --> 00:22:55.000 And then finally, we're doing that optimized equals false because we don't we don't care about the fit on the training set. 00:22:55.000 --> 00:22:59.000 We're just giving an example of fitting the model. 00:22:59.000 --> 00:23:07.000 Okay. So now I have my fitted model. We have the blue line is the true training data, the green solid line is the fit. 00:23:07.000 --> 00:23:15.000 On the Holt winter model, and then we've got the red dotted line is the prediction and the test data is in the red, solid line. 00:23:15.000 --> 00:23:16.000 And so maybe to show off like how you can. Sorry there's a fly. 00:23:16.000 --> 00:23:25.000 The show off. Like, you know, different values. So like we could set this maybe equal to point 1. 00:23:25.000 --> 00:23:34.000 There is no trend, so we could set that maybe to 0, and then smoothing level will keep it point 6. 00:23:34.000 --> 00:23:35.000 And it basically looks like the same model. So there wasn't really much of a difference. 00:23:35.000 --> 00:23:43.000 But in, you know, in general changing this will, we'll change. 00:23:43.000 --> 00:23:47.000 Change the model. So maybe if we change this to like point 9, we'd see a difference. 00:23:47.000 --> 00:23:49.000 Barely. Okay. But that's you know. That's how fitting these models work. 00:23:49.000 --> 00:24:03.000 And that's exponential smoothing, which is just a class of Time series forecasts which you can use in in your time series, tool chest. 00:24:03.000 --> 00:24:20.000 Are there any questions about exponential smoothhing before we move on? 00:24:20.000 --> 00:24:21.000 Yeah. 00:24:21.000 --> 00:24:28.000 And just can clarify. So I'm just trying to, I guess, compared to the weighted smoothing, the explanations. 00:24:28.000 --> 00:24:36.000 So would you say the exponential smoothing? We can think about it as an extension of sort of a weighted, smoothing approach. 00:24:36.000 --> 00:24:47.000 Yeah, so the simple xonential smoothing is a weighted average model where the weights are given by this sort of like alpha times, one minus alpha to the power. 00:24:47.000 --> 00:24:51.000 So this is a weighted average one that then you can like. 00:24:51.000 --> 00:25:11.000 It considers all of the previous observations, and then you can sort of optimize by finding the value of alpha that works best, either through cross validation or something like that, so that it is a weighted average, and then the weighted average sort of comparison starts to it is average. 00:25:11.000 --> 00:25:22.000 But because of this different like adding in the trend in the seasonality components, it slightly gets away from that with the other 2. 00:25:22.000 --> 00:25:23.000 Cause. I guess the part I was confused about is Dx. 00:25:23.000 --> 00:25:29.000 Where's the I would expect to see some exponential somewhere. 00:25:29.000 --> 00:25:32.000 Is it just in the Taylor expansion that the exponential comes in my? 00:25:32.000 --> 00:25:37.000 Oh, so this isn't like the way the formulas are set up like. 00:25:37.000 --> 00:25:45.000 Remember, you have to remember that this hat contains, like Alpha, like it contains, like all the previous ones. 00:25:45.000 --> 00:25:54.000 So you have to like. If you expand it all the way out, you'll get something like this. 00:25:54.000 --> 00:25:55.000 Does that make sense? 00:25:55.000 --> 00:25:59.000 Okay. Alright! 00:25:59.000 --> 00:26:00.000 Yeah. I think so I gotta think more about this. 00:26:00.000 --> 00:26:02.000 But thanks. Yeah. 00:26:02.000 --> 00:26:20.000 Yup! Yahweh asking what cross validation helps, so it probably would help to get better models if we went through, and then found the one that had, at at the very least we would be getting the values for Alpha Beta and Gamma that have the lowest mean square 00:26:20.000 --> 00:26:23.000 error. Whether or not this would give us, like a good fit for this particular problem, it's hard to tell, so I think I said yesterday, but it's worth repeating. 00:26:23.000 --> 00:26:36.000 1947 is like a sort of weird flu season compared to all the previous flu season. It kind of seems like. 00:26:36.000 --> 00:26:51.000 So other seasons sort of start to take off a little bit in in December, whereas, like with 1947, it doesn't really seem to start until probably, like February or maybe March, so I think it's sort of like 1947 is 00:26:51.000 --> 00:27:08.000 not like close to the typical year, although there are some years in the past like 1936, is somewhat similar with a lower peak, so flu seasons are kind of difficult, because with these sorts of time series models, because there's a lot of different factors in determining like the shape, of a flu 00:27:08.000 --> 00:27:21.000 season beyond, just like what were the cases like last week? If that makes sense. 00:27:21.000 --> 00:27:26.000 Okay. 00:27:26.000 --> 00:27:32.000 So, we're still in the Time Series section. So we just finished no book number 5, and we're gonna move on to notebook number 6. 00:27:32.000 --> 00:27:36.000 If you're trying to play along at home. 00:27:36.000 --> 00:27:54.000 So we're gonna take a brief aside from forecasts and models to sort of dive a little bit deeper into the theory of time series, just because we need some of these terms to build our next forecast types so we're gonna learn about something called stationarity and something called auto correlation so if you did the problem. 00:27:54.000 --> 00:28:00.000 Session today. This term should look a little bit familiar, whereas, like this term, maybe isn't as familiar. 00:28:00.000 --> 00:28:14.000 So stationarity is a statistical property of a time series that sometimes you want to assume that your time series is stationary before you use a particular forecast type. 00:28:14.000 --> 00:28:19.000 So in general we would say that a time series y sub t is strictly stationary. 00:28:19.000 --> 00:28:31.000 If the joint probability, distribution of the series of like this sequence is the same as the sequence where the time points have been shifted by Tau. 00:28:31.000 --> 00:28:39.000 So basically what this is saying is the joint distribution only depends upon the intervals between the observation. 00:28:39.000 --> 00:28:42.000 So like the time between T. One and T. 2, and T. 00:28:42.000 --> 00:28:48.000 3, and so forth, so in particular, if the number of I've observations you have is just one. 00:28:48.000 --> 00:28:54.000 The expected value of y sub t is always mu to, regardless of where, in the time series you're looking at it, and the variance of y sub t is always equal to Sigma squared for all. 00:28:54.000 --> 00:29:06.000 Observations, y sub t, and then, if N. Was to the joint distribution of y sub t one and y sub t, 2 only depends upon the time distance between T. 2 and T. 00:29:06.000 --> 00:29:07.000 One, t. 2 only depends upon the time distance between T. 2 and T. 00:29:07.000 --> 00:29:20.000 One here I'm assuming T. 2 is later than T. One. So this is known as the Lag between T. 2 and T. 00:29:20.000 --> 00:29:21.000 One, and maybe if again, if you did, the problem session this term, blag is familiar. 00:29:21.000 --> 00:29:24.000 So we'll look a little bit more at what lag later in the notebook. 00:29:24.000 --> 00:29:32.000 So strict stationarity as usually pretty restrictive. 00:29:32.000 --> 00:29:35.000 And there's very few time series that are strictly stationary. 00:29:35.000 --> 00:29:36.000 So it's more useful to think of stationarity with a weaker sense. 00:29:36.000 --> 00:29:58.000 So when we say a time series is exhibit stationarity, or is stationary, that means that again, the expected value for any observation in the time series is, are equal to Mu, and then the covariance between any 2 time steps is just a function of the lag so here the 00:29:58.000 --> 00:30:01.000 timesteps are t, and then t plus tau, and then the lag is just the towel. Okay. 00:30:01.000 --> 00:30:11.000 So it might look weird to see it like this. You could think of this as like the covariance between y 2 and Y. 00:30:11.000 --> 00:30:12.000 3 is some function of one. The covariance between Y. 00:30:12.000 --> 00:30:13.000 -covariance between y. 2 and y. 3 is some function of one. The covariance between y. One and Y. 00:30:13.000 --> 00:30:23.000 3 is some function of 2. So like when you have actual numbers in here, you would just have to calculate. 00:30:23.000 --> 00:30:38.000 So some examples of stationary time series are white noise. So that is basically like you're just drawing from a random distribution the first differences of a random walk. 00:30:38.000 --> 00:30:45.000 So if you have a random walk, and then you look at the what you know, the current, the next step, minus the current step. 00:30:45.000 --> 00:30:50.000 Those are the first differences, a moving average process. So this we looked at in the last notebook, a moving average process is also stationary. 00:30:50.000 --> 00:30:59.000 So basically, why are we talking about this? It's just in the next forecasting approach. 00:30:59.000 --> 00:31:01.000 We're going to need our time series data to be stationary before we can apply the forecast. 00:31:01.000 --> 00:31:09.000 And so I just wanted to give an introduction to what that concept means. Here. 00:31:09.000 --> 00:31:17.000 Remember, stationery means this, this concept. That doesn't mean we're gonna formally go through and check. 00:31:17.000 --> 00:31:31.000 If these things hold. But we are going to get a sense, for like times when a time series is very clearly violating stationarity, and then we might wanna make an adjustment. 00:31:31.000 --> 00:31:32.000 So how can we gauge whether or not a time series is viating? 00:31:32.000 --> 00:31:37.000 We're gonna look at something called the autocorrelation. 00:31:37.000 --> 00:31:45.000 So the auto correlation is just the correlation of a time series with its future observations placed at different lags. 00:31:45.000 --> 00:31:50.000 So think of like having the time series, and then, like on one side, and then in a different column, you have the time series, but then you've shifted it like H. 00:31:50.000 --> 00:32:03.000 Not values or whatever K. Values. If you were to calculate the correlation between the original time series and then the Lag Time Series. 00:32:03.000 --> 00:32:17.000 That's the auto correlation. So this is the autocorrelate, the formula for that at a given lagged a little K, we're I'm not gonna say it out loud because it's a I I don't want to say it. 00:32:17.000 --> 00:32:30.000 But this is the formula. It's just the correlation for formula, but now it's been replaced instead of X and Y it's y, and then the lag diversion of y so one thing you can do to check the stationarity assumption. 00:32:30.000 --> 00:32:34.000 If it's reasonable, is to look at what's known as a coral gram. 00:32:34.000 --> 00:32:41.000 So this is sort of, I think you did this today, and the near the end of the problem session. 00:32:41.000 --> 00:32:42.000 If you did the problem session. So this is called the Autocorrelation Plot, or a coral gram. 00:32:42.000 --> 00:32:52.000 So basically you're just gonna produce these different auto correlations at different lags. 00:32:52.000 --> 00:32:55.000 And then plot them so you'll plot the autocorrelation on the vertical axis, and then the lag on the horizontal axis. 00:32:55.000 --> 00:33:03.000 So stats models, has a function for doing this. 00:33:03.000 --> 00:33:07.000 So we're going to import stats models as Sm. 00:33:07.000 --> 00:33:11.000 And then we're gonna just generate a time series. 00:33:11.000 --> 00:33:16.000 So my time series here is just random noise. 00:33:16.000 --> 00:33:21.000 And then we're going to go ahead and plot it so to plot the time series. 00:33:21.000 --> 00:33:37.000 You do sm dot graphics dot tsa, and then the function is plot underscore acf, so with that, you're gonna go ahead and put the time series. 00:33:37.000 --> 00:33:43.000 So for us. It's called series the Number of lags you want plotted out to. 00:33:43.000 --> 00:33:48.000 So, according to my comment, I, for this example, I just want likes to be 30. 00:33:48.000 --> 00:33:51.000 We're going to set. Alpha equals to none. 00:33:51.000 --> 00:33:56.000 I'll talk about that after we make the plot, and then that's it. 00:33:56.000 --> 00:34:02.000 I was reading my comment to see what I wanted. Okay, so here we go. 00:34:02.000 --> 00:34:09.000 I forgot that this is the last thing I need. Ax equals ax. Okay, so we'll talk about that, too. 00:34:09.000 --> 00:34:12.000 So this is the auto correlation as a function of the lag. 00:34:12.000 --> 00:34:15.000 So you're always going to have a line that goes straight up to one for 0. 00:34:15.000 --> 00:34:21.000 Right, because everything is a correlation of one. What's it with itself? 00:34:21.000 --> 00:34:37.000 But then, if it's white noise, which is stationary, you'll wanna see sort of like these auto correlations that are pretty low and sort of go between being negative and positive, and don't really show a clear pattern. So what you're looking for when you're trying to judge if a time 00:34:37.000 --> 00:34:42.000 series seems stationary. Is this sort of thing sorry? I heard a noise in my apartment. 00:34:42.000 --> 00:34:52.000 I wanted to make sure everything was okay. Looking for this sort of thing where you have relatively low correlations, low core autocorrelations. 00:34:52.000 --> 00:34:56.000 And it kind of just bounces around between positive or negative, with no clear pattern. 00:34:56.000 --> 00:35:02.000 So I'm gonna go ahead and talk about the 2 things I said I was ignoring. 00:35:02.000 --> 00:35:05.000 So the first argument here is just saying whether or not you want confidence intervals to be plotted. 00:35:05.000 --> 00:35:18.000 So, for instance, if I do get rid of this, you'll see the difference like there's these bars, and we're not talking like I didn't want to talk about what the confidence intervals were. 00:35:18.000 --> 00:35:22.000 I was going to leave it to you if you want to learn more about that. 00:35:22.000 --> 00:35:27.000 So in general, I'm just going to turn it off, and then the other part is this, ax equals ax. 00:35:27.000 --> 00:35:37.000 So here I make a map, plot, lib, figure, objects, and then a sub plot or a an axis object, sort an ax. 00:35:37.000 --> 00:35:45.000 I just wanted to tell this tell the function like, Hey, plot this thing here instead of plotting it on it so and so that way, I could control the size of it. 00:35:45.000 --> 00:35:49.000 So that's what these 2 arguments are. 00:35:49.000 --> 00:35:56.000 Okay. So before we go through more examples of like, where it's obvious that stationarity is being violated. 00:35:56.000 --> 00:36:02.000 Are there any questions about auto correlation or this function that we just went over? 00:36:02.000 --> 00:36:20.000 Why, I take a drink of water. 00:36:20.000 --> 00:36:28.000 Okay. 00:36:28.000 --> 00:36:38.000 Can I get a physical intuition for stationarity? 00:36:38.000 --> 00:36:46.000 Hmm! I don't know that I personally have like a an intuition, for, like what you would see, and like a real world like Time series, if that makes sense. So I don't. 00:36:46.000 --> 00:36:57.000 I don't personally have, like a good intuition for, like a real world example of like, here is stationarity and what it looks like. 00:36:57.000 --> 00:37:02.000 Yeah. Sorry. 00:37:02.000 --> 00:37:13.000 Could I? So? Could it be like, for example, if the we have stock prices that are not are kind of constant over time? Could you say that? 00:37:13.000 --> 00:37:17.000 That's something that's statistically like. Prices are over time. 00:37:17.000 --> 00:37:34.000 The meeting isn't changing drastically. And could you say that an example of the stationary, Awesome Time series? 00:37:34.000 --> 00:37:38.000 So I guess. 00:37:38.000 --> 00:37:44.000 I think, like in the real world, stock prices probably aren't stationary, but in this particular, like hypothetical example, where the company seems to just always be at the same price for all time. 00:37:44.000 --> 00:37:57.000 That's probably it probably would be stayery right, because if it's hovering around the same price, then the expected value would probably be constant, and then the difference. 00:37:57.000 --> 00:38:15.000 Sorry. I guess we're looking here the difference with the covariance between 2 time steps would probably either be a constant or some function of the lag, so that probably would work. 00:38:15.000 --> 00:38:25.000 But I don't know that it's like I don't know that there are many stock prices that are like that. 00:38:25.000 --> 00:38:26.000 Thanks. 00:38:26.000 --> 00:38:29.000 Yup, and then asking, Does autocorrelation have the same information as finding the foyer components? 00:38:29.000 --> 00:38:40.000 So you can do like Fourier transform stuff to get sort of information on like seasonality. 00:38:40.000 --> 00:38:57.000 I don't know if it touches I would have to like look up if you can use it, to tell whether or not the Stationarity assumption is being violated. 00:38:57.000 --> 00:39:02.000 I think, looking at the autocorrelation function is probably easier cause you just have to calculate some some correlations and then look at the plot. 00:39:02.000 --> 00:39:12.000 It's probably easier computationally than doing the. 00:39:12.000 --> 00:39:16.000 What's it called? Been doing the Fourier transform stuff? 00:39:16.000 --> 00:39:37.000 But if you are interested and learning about that, if anyone's interested in learning about like relationships between time series and how you can use things called foia transforms check out the practice problems, notebook for time series, I go over how you can how you can use it there. 00:39:37.000 --> 00:39:46.000 So we're gonna now go over sort of 2 examples of instances where you have a data set that is violating stationarity. 00:39:46.000 --> 00:39:56.000 So 2 ways to violate stationarity is to have data that has a trend, because there, you're going to be violating. 00:39:56.000 --> 00:40:02.000 The constant or the expected value, being a constant, independent of time, right? 00:40:02.000 --> 00:40:05.000 And then the other one is for data with the seasonality. 00:40:05.000 --> 00:40:10.000 So for the trend one, we'll just look at the Google stock data. 00:40:10.000 --> 00:40:21.000 And I guess I wanted me myself to do this again. So closing Price lags equals, let's say I don't know 100. 00:40:21.000 --> 00:40:28.000 Alpha equals none, hey? X equals. Ax. Okay? So this Google stock data, right? 00:40:28.000 --> 00:40:32.000 Had a very positive trend, and so you'll see that over time. 00:40:32.000 --> 00:40:35.000 It's a very, it's like a positive correlation. 00:40:35.000 --> 00:40:42.000 That's very high. And it's starts to decrease as you go down. 00:40:42.000 --> 00:40:51.000 So time series with trends are not stationary, and if you see something like this, this indicates that your time series isn't stationary. 00:40:51.000 --> 00:40:59.000 We will see like how you can take this and produce a stationary Time series, and just a little bit. 00:40:59.000 --> 00:41:07.000 Another example is seasonal seasonal time series. So here we're gonna do look at that flu example. 00:41:07.000 --> 00:41:13.000 So flu dot cases was my time series. Then we want, let's say, 52. 00:41:13.000 --> 00:41:29.000 So let's do 1 54. Maybe. I guess it should be 1, 56 that, and then Alpha equals none. 00:41:29.000 --> 00:41:34.000 And then ax equals, ax. 00:41:34.000 --> 00:41:56.000 Okay, so with seasonal seasonal data, you'll tend to see sort of this kind of pattern where you go between periods of negative correlation, positive correlation, negative correlation, positive correlation, negative correlation, positive correlation and so these sorts of correlation, this sort of 00:41:56.000 --> 00:41:57.000 like behavior is an indicator of the seasonality. 00:41:57.000 --> 00:42:10.000 And so you can use it to try and get a sense of like how long the seasons are based upon sort of like gauging like how long the curves happen, but sort of like going back to sides question. 00:42:10.000 --> 00:42:13.000 There are other ways to get a sense of what the sameality is by using. 00:42:13.000 --> 00:42:18.000 The foia transforms. This is sort of just giving you an indicator that this is maybe seasonal data, and it would be. 00:42:18.000 --> 00:42:24.000 It's not stationary. And so you're gonna need to either for the models that we're gonna learn. Next, you're gonna need to do some adjustments for it. 00:42:24.000 --> 00:42:30.000 And if you'd like to get a guess at what the seasonality is, there are other techniques that you can appeal to that. 00:42:30.000 --> 00:42:40.000 You can learn in the practice problems. 00:42:40.000 --> 00:42:55.000 So Yahoo is asking if the stationarity is implying a predictability in some sense, so I guess it's predictable in the sense that you know that the I guess it's just. 00:42:55.000 --> 00:43:01.000 It's as predictable right as the variance of the distribution. 00:43:01.000 --> 00:43:13.000 It's being pulled from, if that makes sense so like, if you have a really high variance like you maybe know that it's on average, it's always gonna be one value. 00:43:13.000 --> 00:43:16.000 But it may be vastly different from it, based on the polls. 00:43:16.000 --> 00:43:23.000 If that makes sense, Pedro is asking, How would you translate from lag periodicity to actual data periodicity? 00:43:23.000 --> 00:43:26.000 So the looking at the lag, you can kind of get a sense of like how long the periods are happening. 00:43:26.000 --> 00:43:36.000 Just sort of by looking at the like you would expect to see with seasonal data. 00:43:36.000 --> 00:43:46.000 You expect to see these autocorrelation functions that are like either negative negative negative going up to positive, positive positive going back down sort of like a sign curve. 00:43:46.000 --> 00:43:47.000 So like you can try and like count out the curve where again, like I would look at the practice problems. 00:43:47.000 --> 00:43:59.000 To get a sense for maybe a better way to check it out with a Fourier transforms. 00:43:59.000 --> 00:44:05.000 So like, here you can kind of see like it's positive goes down to negative and backup to positive. 00:44:05.000 --> 00:44:10.000 Around 52, so that would just set, you know, that would indicate to me, like, you know, I only know it's 52, because of the way the data is. 00:44:10.000 --> 00:44:13.000 But like it's somewhere between 40 and 60, so you might try different values. 00:44:13.000 --> 00:44:31.000 There and then, like goes back down to negative, back to positive, and so forth. 00:44:31.000 --> 00:44:37.000 So auto. Core. So Whileilead's asking, what would autocorrelation look like for stationery series? 00:44:37.000 --> 00:44:49.000 So it would look something like this where the audio correlations aren't too big aren't too small, and they sort of just like randomly ping back and forth between positive and negative. 00:44:49.000 --> 00:44:57.000 The covariance is just like a function of the lag, meaning that the correlation should also be. 00:44:57.000 --> 00:45:04.000 What was that right? 00:45:04.000 --> 00:45:18.000 Yeah, so it should just kinda look like this. And then I might need to sit down and write down on a piece of paper if I wanted to make some other kind of statement that I started there. 00:45:18.000 --> 00:45:28.000 So when you have these sorts of things, so you can do, there are formsal statistical tests that I've linked to here their Wikipedia Wikipedia links you. 00:45:28.000 --> 00:45:36.000 There are formal statistical tests that you can test like whether or not a time series is stationary. 00:45:36.000 --> 00:45:42.000 I think most of the time you don't do this unless maybe you're, you know, trying to publish in an academic journal or something, and you need to sort of justify your assumption. 00:45:42.000 --> 00:46:01.000 But like an industry, you're probably not gonna do these sorts of tests to prove that it's there to not reject that null hypothesis, that it is stationary so in general, like, you'll probably just make these sorts of plots and then get a sense from that and 00:46:01.000 --> 00:46:10.000 then see a sense from that, and then see whether or not you have to apply what's known as a difference sync. So I saw. 00:46:10.000 --> 00:46:18.000 So Assad's asking rather than small magnitude periodicity in the correlation plot, as I've seen in that plot, I thought the seasonality was indicated by the autocorrelation spiking in a periodic way. 00:46:18.000 --> 00:46:29.000 Yeah. So this is indicate that, like, the seasonality is indicated here, because it's sort of going, like, you know, back and forth in this, almost like sine wave type way. 00:46:29.000 --> 00:46:39.000 That's what indicates that their seasonality and the data, well doesn't have to do with like the fact that their small magnitude here. So sometimes, like with the problem session, right? 00:46:39.000 --> 00:46:40.000 There was really big see? Like the seasonality was really strong. 00:46:40.000 --> 00:46:46.000 And so like you saw really big spikes going from like one to almost negative one back and forth with that example from the problem session. 00:46:46.000 --> 00:47:01.000 So the seasonality is just indicated by the fact that you're seeing sort of a periodic pattern. 00:47:01.000 --> 00:47:10.000 Okay. So when you have time series that appear to break the stationarity assumption or don't appear to be stationary because currently we don't have that assumption. 00:47:10.000 --> 00:47:21.000 So when you have time series that don't appear to be stationary, you have to use something called differencing to create a time series that is stationary. 00:47:21.000 --> 00:47:25.000 So when you have. 00:47:25.000 --> 00:47:28.000 One with a trend. You'll do something called first or second differencing. 00:47:28.000 --> 00:47:34.000 There are extensions to seasonal data that I left again to the practice problems for the sake of time. 00:47:34.000 --> 00:47:37.000 So we're going to focus on just data with a trend. 00:47:37.000 --> 00:47:43.000 And then like, if you're interested in seeing seasonal differencing, you can check out the practice problems. 00:47:43.000 --> 00:47:48.000 So differencing is something we have seen before with the dot diff from pandas. 00:47:48.000 --> 00:47:59.000 So it's just producing a new time series of different values that I'm gonna denote, with the sort of upside down triangle. 00:47:59.000 --> 00:48:03.000 And so basically, you'll have to start at the second observation and then move forward right? 00:48:03.000 --> 00:48:09.000 So you're gonna do y 2 minus y one. Then the next observation is y 3 minus y 2. 00:48:09.000 --> 00:48:15.000 Then the next observation is y, 4 minus y, 3, and then the next observation is y 4 minus y, 3, and then the last. You know you keep going. 00:48:15.000 --> 00:48:16.000 The next, you know you keep going. The next eventually would be like yt minus y t minus one. 00:48:16.000 --> 00:48:28.000 So these are the first difference. Time series. So now you have a new time series that's indexing starts at 2 right? 00:48:28.000 --> 00:48:31.000 Because there's nothing occurring before observation. One. 00:48:31.000 --> 00:48:36.000 These are sort of analogous to a first derivative of a function. 00:48:36.000 --> 00:48:39.000 So we saw this in Hi earlier notebooks with pandas dot diff. 00:48:39.000 --> 00:48:46.000 So going back to that Google example, we'll do Google dot closing price dot diff. 00:48:46.000 --> 00:48:54.000 And then from one forward, and then we're gonna do. 00:48:54.000 --> 00:48:59.000 I think we did 30 or right. Was that right? 30? We did a so we're gonna do a hundred. 00:48:59.000 --> 00:49:06.000 So lags, equals a hundred alpha equals none, and then? 00:49:06.000 --> 00:49:09.000 What was it? A X equals ax? 00:49:09.000 --> 00:49:16.000 Okay. And so now you can see here that through first differencing we have produced a new time series. 00:49:16.000 --> 00:49:21.000 That is, appears to not break this idea, that it's stationary. 00:49:21.000 --> 00:49:22.000 We don't know for sure if it's stationary. 00:49:22.000 --> 00:49:32.000 But the auto correlation plot doesn't appear to blatantly break that assumption. 00:49:32.000 --> 00:49:37.000 So sometimes you'll do this, and you'll still get you know, one that is very clearly breaking the assumption. 00:49:37.000 --> 00:49:41.000 So you'll have to do something called differencing. Again. 00:49:41.000 --> 00:49:51.000 So after first difference, thing is second, differencing, and that's just taking the difference. Time series. 00:49:51.000 --> 00:49:57.000 So like this delta is not delta this upside down triangle y 2 upside down, triangle y. 00:49:57.000 --> 00:50:07.000 3 upside down. Triangle y- 4. It's taking that and then performing first differencing on that. 00:50:07.000 --> 00:50:13.000 So through the entries of the second difference to time series, you're going to take the first difference time series, and then subtract off the previous observation of that. 00:50:13.000 --> 00:50:26.000 So if you wanted to put this in terms of the original time series, and then subtract off the previous observation of that. So if you wanted to put this in terms of the original time, series, it would be yt minus 2 times yt. 00:50:26.000 --> 00:50:35.000 Minus one plus y t minus 2. But in pandas we could just. 00:50:35.000 --> 00:50:36.000 So no train. We're just looking at the regular time series. 00:50:36.000 --> 00:50:46.000 So goog dot closing price dot diff right? So this was first differencing. 00:50:46.000 --> 00:50:55.000 And then second difference thing would be Google, dot closing price dot diff dot diff. 00:50:55.000 --> 00:51:06.000 Okay. And so now, you can see, like we're taking, you know, point 5 4 and subtracting off 3 points. 00:51:06.000 --> 00:51:10.000 9, 7. And that's how you get negative 3.4. 00:51:10.000 --> 00:51:17.000 You could do it again if you needed to. So sometimes you might need to do different thing 3 times, and you might be wondering, well, why do I need to do differencing? 00:51:17.000 --> 00:51:19.000 Why do I need a time series that look stationary? 00:51:19.000 --> 00:51:24.000 Okay. 00:51:24.000 --> 00:51:32.000 From a parents in the next notebook. When we learn about the arena. 00:51:32.000 --> 00:51:36.000 Okay, are there any other questions about, you know, differencing before we move? 00:51:36.000 --> 00:51:37.000 Okay. 00:51:37.000 --> 00:51:53.000 So I have a. 00:51:53.000 --> 00:51:54.000 Yeah. 00:51:54.000 --> 00:52:07.000 So that would that the times of the difference in it depends on the for example, the lens of the data you can you? 00:52:07.000 --> 00:52:08.000 So, for example, if we have a move. 00:52:08.000 --> 00:52:22.000 Longer range of data, and then the stock market print to fortuate more so may need to look at the overall trend. Oh, so! 00:52:22.000 --> 00:52:24.000 You know. 00:52:24.000 --> 00:52:30.000 Yeah, so we could have a time series. It has an increasing trend. 00:52:30.000 --> 00:52:35.000 But then the rate of increasing is itself increasing or decreasing. So if that was the. 00:52:35.000 --> 00:52:36.000 Case we would expect to see the first difference. Data still appear to have. 00:52:36.000 --> 00:52:42.000 In the Autocorrelation Plot. 00:52:42.000 --> 00:52:46.000 And so then, if we wanted to turn that into a stationary time series, we would have to apply second differencing to the original data. Does that make sense? 00:52:46.000 --> 00:52:51.000 Yes. Okay. Yeah. Cool. Thanks. 00:52:51.000 --> 00:52:57.000 Yup! 00:52:57.000 --> 00:53:07.000 Awesome. Okay? So now we are, we just finished notebook number 6 in time series, we're about to start notebook number 7. 00:53:07.000 --> 00:53:15.000 So this is going to be the last forecasting model that we are going to learn in live lecture. 00:53:15.000 --> 00:53:19.000 There are 2 other notebooks. If you're interested in learning additional forecast models. 00:53:19.000 --> 00:53:28.000 But I'll talk about those later. Okay, so we've talked about some of. 00:53:28.000 --> 00:53:35.000 Before, so a Rima is a really popular model type that tends to perform. You know pretty well. 00:53:35.000 --> 00:53:44.000 So we talked about one of the components. So a rima is sort of broken down into 3 different components. The first. 00:53:44.000 --> 00:53:52.000 The auto receptionist component, and then the part that we've already talked about is the moving average component. So the AR comes from autoregressive. 00:53:52.000 --> 00:53:56.000 And then the ma comes from moving average. So we talked about moving average. 00:53:56.000 --> 00:53:57.000 In our previous notebook. So we're not going to touch on those here. 00:53:57.000 --> 00:54:04.000 But to help build up our knowledge before we go into the full arena model. 00:54:04.000 --> 00:54:08.000 We are going to talk about auto regressive models first. 00:54:08.000 --> 00:54:12.000 So an autoregressive model is one where you're going. 00:54:12.000 --> 00:54:16.000 To regress onto previous observations. 00:54:16.000 --> 00:54:18.000 So, if you have a. 00:54:18.000 --> 00:54:22.000 The series y sub t the auto regressive model of order. 00:54:22.000 --> 00:54:23.000 P is the one that regresses the current or the yeah, the one that you'd like to predict. 00:54:23.000 --> 00:54:31.000 So y sub t on a linear combination of the previous. 00:54:31.000 --> 00:54:33.000 Observations, so. 00:54:33.000 --> 00:54:39.000 Here are the Alpha I, an unlike with smoothing. 00:54:39.000 --> 00:54:46.000 These are actual parameters, so you would find them with sort of like an ordinary, least squares or. 00:54:46.000 --> 00:54:47.000 That the alpha I are fit, and then Epsilon t is random noise. 00:54:47.000 --> 00:54:55.000 So it's auto regressive. So auto, I believe, means self. 00:54:55.000 --> 00:55:01.000 And then we's regression. So auto receptionist and so this is often denoted in time series literature as an Arp so autoregressive of order. 00:55:01.000 --> 00:55:08.000 P. So that means that. 00:55:08.000 --> 00:55:20.000 Regressing on the P. Previous observations. Okay, so this is not something that you're gonna have to see up on your own and then run like sk learns, linear regression we are going to go ahead and like sets. 00:55:20.000 --> 00:55:25.000 Models will do this for us. You're not gonna have to prepare the data. 00:55:25.000 --> 00:55:33.000 And sort of then do a linear regression type thing stats models, has, like an auto regressive model, as opposed to in the case where you have just order one. 00:55:33.000 --> 00:55:44.000 This is known as a Markov process. So y of T is equal to Alpha times y t minus one plus epsilon t. 00:55:44.000 --> 00:55:45.000 As I said, we're not so like I said, that stats models will use it. 00:55:45.000 --> 00:55:53.000 So we're not gonna like, explicitly, like with exponential smoothing where we don't simple and then halt and then triple. 00:55:53.000 --> 00:55:55.000 We're not gonna go through and fit like an auto regressive, a moving average, and so forth. 00:55:55.000 --> 00:56:03.000 So the Arema model that we're going to fit later in the notebook. 00:56:03.000 --> 00:56:08.000 It can as a part of the model, fit any individual model type, so it can fit an AR model. 00:56:08.000 --> 00:56:14.000 It can fit an Ma model so we're not gonna like, cover each one individually. 00:56:14.000 --> 00:56:20.000 We're just gonna have essentially show you just gonna eventually show you like the full Arema model. 00:56:20.000 --> 00:56:26.000 But I thought it was important to sort of look at this AR component before moving on. 00:56:26.000 --> 00:56:42.000 So are there questions about auto regressive models? 00:56:42.000 --> 00:56:45.000 Okay. 00:56:45.000 --> 00:56:52.000 So after our AR and ma, which we talked about in a previous notebook, there are, is the combination of the 2 known as in Arma model. 00:56:52.000 --> 00:56:58.000 So the this is sort of the real statistical basis of the Arema moment. 00:56:58.000 --> 00:57:05.000 So we have the AR and the ma so like I said, the Ma stands for moving average, and this is a moving average of order. 00:57:05.000 --> 00:57:15.000 Queue. So ars is typically uses P to denote the number of terms, and then Ma will typically use Q to denote the number of terms. 00:57:15.000 --> 00:57:29.000 And so the formal forecast for this is why is equal to beta 0 times. 00:57:29.000 --> 00:57:33.000 A random, variable, drawn at time, t plus beta one times a random, variable. 00:57:33.000 --> 00:57:35.000 Drawn from the same distribution at time, T. Minus one all the way up to Beta, Q. 00:57:35.000 --> 00:57:41.000 Times epsilon, t, minus q. So here, instead of. 00:57:41.000 --> 00:57:50.000 At previous observations. The formal statistical model is assuming that there's some underlying random process that's being drawn from at every time. Step. 00:57:50.000 --> 00:57:55.000 And then the moving average forecast is trying to estimate these betas. 00:57:55.000 --> 00:57:56.000 We're not gonna independently talk about how this model is fit in the background. 00:57:56.000 --> 00:58:11.000 But this is this formal, statistical model. So when you combine these 2 into an arm of process, you have both the auto receptionist components in the model. 00:58:11.000 --> 00:58:13.000 So these autoregressions happening, combined with this moving average component so this is an arma p comma. 00:58:13.000 --> 00:58:26.000 Q. So the P. Refers to the number of autoregressive terms, and the queue refers to the number of moving average terms. 00:58:26.000 --> 00:58:46.000 So notice that, like I said earlier, if you set P equals to 0, you recover just a moving average process where, if you set Q equal to 0, you recover the auto regressive process, which is why we don't have to individually show you how to fit each one of these. 00:58:46.000 --> 00:58:50.000 So Yahoo is asking, can P be more than one yup, so Peak can be more than one. 00:58:50.000 --> 00:58:53.000 This was just a particular example that hasn't a name and statistics. 00:58:53.000 --> 00:59:00.000 So this is a Markov process which has been studied extensively in the field of probability, theory, and St. 00:59:00.000 --> 00:59:04.000 Statistics but P. Can be more than one P. Could be. 00:59:04.000 --> 00:59:10.000 3, 4, 5, etc. 00:59:10.000 --> 00:59:16.000 Okay, while he's asking is Q. The length of the window and moving average. 00:59:16.000 --> 00:59:20.000 Yes, so Q. Would be like like earlier. We had a window size 3. 00:59:20.000 --> 00:59:32.000 So queue is just the number of terms that you're including in the window. 00:59:32.000 --> 00:59:39.000 Okay, so arma models have an explicit assumption that you have a stationary time series. 00:59:39.000 --> 00:59:42.000 So this is why we talked about that in the last notebook. 00:59:42.000 --> 00:59:45.000 So if you have a time series that is not stationary, you're not going to have a good fit or forecast in the long run. 00:59:45.000 --> 00:59:54.000 So this is the final component of the rema model is the I. 00:59:54.000 --> 00:59:59.000 So the I stands for differenti so I forget what the I actually stands for. Different things. 00:59:59.000 --> 01:00:02.000 So I forget what the I actually stands for. But it's just for differencing so the AR is the autoregressive. 01:00:02.000 --> 01:00:09.000 The ma is the moving average, and then the I stands for the differencing that happens so remember, and I guess I closed it. But let's pull it back up. 01:00:09.000 --> 01:00:35.000 When we took the Google's stock data and then performed first differencing on it, we went from a data set as very clearly not stationary to producing a time series that is, appears to not at least at least appears not to break the stationarity assumption, so that's the idea here is 01:00:35.000 --> 01:00:44.000 we need to. If we have a time series that is not stationary, but we need to make it stationary or produce a stationary looking one. 01:00:44.000 --> 01:00:45.000 We're going to perform differencing and then fit the AR ma the arma model on the difference. 01:00:45.000 --> 01:00:54.000 Time series. So that's the idea here. So this is known as, an Arima Pdq. 01:00:54.000 --> 01:01:07.000 So P. Again, is that AR part so how many autoregressive terms d is the number of times that differencing happens? 01:01:07.000 --> 01:01:34.000 And Q is the number of terms in the moving average. So before we go ahead and show you how to do this in Python, are there any questions, sort of like conceptually about the model? 01:01:34.000 --> 01:01:36.000 Okay, so there is an arema in the time series. 01:01:36.000 --> 01:01:45.000 But for the sake of like you guys learning how to do other stuff later, and either the problem session or the practice problems we're going to import the model called saramax. 01:01:45.000 --> 01:01:51.000 So the S. There stands for seasonal the version of Areema that we're learning is for data. 01:01:51.000 --> 01:02:01.000 Without seasonality, so there is an extension for seasonal data as well, which I think is either touched upon in the problem session or in the practice problems. 01:02:01.000 --> 01:02:11.000 So we're going to go ahead and say from stats, models, dot tsa. 01:02:11.000 --> 01:02:21.000 And then I think I need a dot Api. We're gonna import, Sarah Max. 01:02:21.000 --> 01:02:25.000 We're gonna use this Google training data. 01:02:25.000 --> 01:02:29.000 So for it to make this model we call Sarah Max. 01:02:29.000 --> 01:02:38.000 I import the data. Or I put in the training data. So Google underscore trained dot closing. 01:02:38.000 --> 01:02:46.000 Price. Then I give the order. So this Pdq. Is known as the Order of the model. 01:02:46.000 --> 01:02:53.000 So, I say, is equal to. And for this, just again, like you could do. 01:02:53.000 --> 01:03:04.000 And I believe in the next problem. Session. You'll implement cross validation here where you do a grid, and then you can choose different values for P. 01:03:04.000 --> 01:03:15.000 And for queue, for here I set d equal to one because of what we discovered in our previous notebooks for p and for queue, for here I set d equal to one because of what we discovered in our previous notebook, where, if I do first differencing I get a time, series that appears, to be stationary 01:03:15.000 --> 01:03:26.000 so, and I just chose 5 and 5, just because again, if I wanted to find like the best P and the best C, I would use cross validation. 01:03:26.000 --> 01:03:33.000 Then I call dot fit, and here I'm going to increase the number of iterations to make sure the algorithm has time to or has enough iterations to fit, and I forget what the exact argument is. 01:03:33.000 --> 01:03:49.000 So I'm gonna go ahead and go to the documentation real quick. 01:03:49.000 --> 01:03:54.000 I'm just looking for what they called the Max iter term, and maybe it's not here. 01:03:54.000 --> 01:04:00.000 So I'm just gonna try and call call it Max Iter, and then see if that breaks it. 01:04:00.000 --> 01:04:06.000 So let's say. 01:04:06.000 --> 01:04:12.000 Okay. Yup, so it's running. So it did not converge. 01:04:12.000 --> 01:04:18.000 And let's do. Maybe 10,000. 01:04:18.000 --> 01:04:24.000 So this is one. So sometimes you'll see. So in sk, learn like when a model fits, it just fits, and we don't see anything. 01:04:24.000 --> 01:04:32.000 Other packages will tend to like print things out. So I think you can set it so that this will go to a log file if you know how to set up a log file. 01:04:32.000 --> 01:04:37.000 But other packages like that's models will print out like the fitting process. 01:04:37.000 --> 01:04:41.000 So you can always come through and check if you know what this is doing. 01:04:41.000 --> 01:04:50.000 Okay. So I'm gonna go ahead and cheat. To make sure I actually called this thing what it was supposed to be. 01:04:50.000 --> 01:04:55.000 Wrong one. Here we go. So I just wanna double check that I have the okay. 01:04:55.000 --> 01:05:03.000 So there's no underscore. So that's why. 01:05:03.000 --> 01:05:10.000 Okay. 01:05:10.000 --> 01:05:11.000 So we just have to wait for it to fit the model. 01:05:11.000 --> 01:05:17.000 So this is finding the values for the betas and the alphas! 01:05:17.000 --> 01:05:23.000 From the AR and ma processes. So while this there we go. So it looks like it's spinning. 01:05:23.000 --> 01:05:30.000 It is fitting, and so then we can go ahead and plot the fit. 01:05:30.000 --> 01:05:34.000 Okay, so this is what the fit looks like with those values of P and Q. 01:05:34.000 --> 01:05:35.000 And then we could go through and see how it changes. 01:05:35.000 --> 01:05:46.000 So if we did something like instead of 5, 1, 5. And we did like maybe 2, 1, 2. 01:05:46.000 --> 01:05:51.000 We could see how like the fit changes. Okay. 01:05:51.000 --> 01:05:56.000 Alright. So the problem session. We'll talk a little bit more and you'll get practice with like choosing a p in a queue through cross validation. 01:05:56.000 --> 01:06:06.000 But if you want to also see how you can extend the Rima model for seasonal data, the theoretical points of it. You can check out the practice problems, notebook for seasonal data, the theoretical points of it. You can check out the practice problems. 01:06:06.000 --> 01:06:34.000 Notebook as well. So that's it for the Rima stuff. But before we close out this notebook and move on. 01:06:34.000 --> 01:06:35.000 Icons asking, Can you explain what P. And Q are? Again? 01:06:35.000 --> 01:06:44.000 Oh, I sure can! So in, whenever we specify in a rema model, we have to specify an order. 01:06:44.000 --> 01:06:51.000 Which is this triplet, Pdq. So the P. Specifies the number of auto regressive terms that are included in the model. 01:06:51.000 --> 01:07:00.000 So these are like the number of terms of like regressing w on to past observations of itself. 01:07:00.000 --> 01:07:06.000 So, if P. Was 3 it would be y t minus one progressing or y t regressing on y t minus one y t minus 2 and yt. Minus 3. 01:07:06.000 --> 01:07:16.000 The queue then specifies the number of moving average terms. 01:07:16.000 --> 01:07:22.000 So, if Q was 2, you'd see like a I guess an E and Epsilon T. 01:07:22.000 --> 01:07:28.000 And then an epsilon t, and then an epsilon t minus one, I believe or maybe it would just be the beta 0 epsilon T. 01:07:28.000 --> 01:07:32.000 I think is always there, so it'd be Beta Epsilon T. 01:07:32.000 --> 01:07:33.000 Minus one, and Beta epsilon beta 2 epsilon t, minus 2. 01:07:33.000 --> 01:07:40.000 So the P. And Q just determine, like the number of auto regressive terms. 01:07:40.000 --> 01:07:44.000 That's the P. And then the cue determines the number of moving average terms. 01:07:44.000 --> 01:07:50.000 That's the queue. 01:07:50.000 --> 01:07:58.000 And then the D is the number of times you do differencing. 01:07:58.000 --> 01:08:04.000 Melanie is asking, How does the sum of Beta Epsilon terms represent moving average in this model? 01:08:04.000 --> 01:08:07.000 So this is the assumed, this is the statistical model of a moving average process. 01:08:07.000 --> 01:08:26.000 So moving average process, you assume that you have this sequence of identically distributed random variables with mean 0 and a set variance that you're drawing at each time step. 01:08:26.000 --> 01:08:32.000 And so you have this. Now in the previous notebook we just took avverages of the previous values. 01:08:32.000 --> 01:08:43.000 So that's sort of how like that is moving averages where we're not like using an algorithm to find the different beta zeros, the beta ones. 01:08:43.000 --> 01:08:46.000 So like it's not a one to one correlation with what we called moving averages, because there is a process to find these estimates. 01:08:46.000 --> 01:08:59.000 We just didn't go over it so like in that process, I believe they assume like this is the first observation, and then the basically fit. 01:08:59.000 --> 01:09:09.000 The rest of the betas, using the differences from the next observations so like, I have some references in the next notebook that go through the actual fitting process. 01:09:09.000 --> 01:09:13.000 If you're interested, but it is like the moving averages. 01:09:13.000 --> 01:09:19.000 We looked at are a form of moving average model. 01:09:19.000 --> 01:09:20.000 So then Madhi's asking. He and Q aren't supposed to be equal right? 01:09:20.000 --> 01:09:27.000 So there's no reason that P. And Q. Don't have to like. I. 01:09:27.000 --> 01:09:30.000 There's a like theoretical reason, I believe, for P. 01:09:30.000 --> 01:09:40.000 And Q to not be equal, so they can be equal. If you want them to be, or if that's what the cross validation observes is the model that generalizes best. 01:09:40.000 --> 01:09:44.000 Power. Hms. Asking in general, do more. Pq. And D. 01:09:44.000 --> 01:09:55.000 Give a better result. So the d should be specified to the number of different things that need to be applied applied in order to get a time series, that it appears stationary applied in order to get a time series that appears stationary. 01:09:55.000 --> 01:10:04.000 So here we went with one, because the data that we are looking at after first differencing didn't appear to violate stationarity. 01:10:04.000 --> 01:10:12.000 So that's why D was one. And then it's not necessarily the case that just having a higher P and a higher queue will give you better performance. 01:10:12.000 --> 01:10:20.000 You'd have to do something like cross validation or a validation set to see which ones generalize. Well. 01:10:20.000 --> 01:10:24.000 Ramazan, saying, I just read on the documentation of Saramax, that we can add other features into this model if we choose to do so, does it just add linear regression terms? 01:10:24.000 --> 01:10:34.000 Yes, so I believe in general, if you, we can double check. 01:10:34.000 --> 01:10:38.000 So like the end dog. That's what we've been like. 01:10:38.000 --> 01:10:48.000 This is just the time series, I believe, and then the exog, which I believe stands for exogenous like these are, if you want to provide other features to the time series. 01:10:48.000 --> 01:10:49.000 So like maybe in addition to the stock price, you have other variables at each time. 01:10:49.000 --> 01:10:52.000 Step that maybe are used to, you know, help predict stock prices. 01:10:52.000 --> 01:11:07.000 You could include those here, and so I believe the way it works is they just include it like you'd have like, what's another Greek letter, Gamma? One x. 01:11:07.000 --> 01:11:15.000 One gamma, 2 x 2 gamma, 3 x 3, and then they would just fit. 01:11:15.000 --> 01:11:20.000 They would include it in the model that way. 01:11:20.000 --> 01:11:21.000 Yeah, I avoided those just for it to for clarity. 01:11:21.000 --> 01:11:25.000 I avoided this cause. It's it's hard to find data sets like that. 01:11:25.000 --> 01:11:27.000 That I can use for examples, and then also, I wanted to focus mainly on the time. 01:11:27.000 --> 01:11:38.000 Series component, because that is probably what's new to most of you. 01:11:38.000 --> 01:11:50.000 Okay? Are there any other questions? These are good questions. 01:11:50.000 --> 01:11:56.000 Okay. 01:11:56.000 --> 01:11:57.000 So that's gonna be it for the live lecture coverage of time series stuff. 01:11:57.000 --> 01:12:06.000 If you're interested in learning more like, maybe your project is on time series, or if you are like, I want to work in problems that work on forecasting. 01:12:06.000 --> 01:12:14.000 I've provided some next steps that you can go through, so you can, you know, obviously work through the problem sessions and the practice problem notebooks. 01:12:14.000 --> 01:12:18.000 It expands on the content that we've covered today. 01:12:18.000 --> 01:12:34.000 Here are some theoretical books that are useful. So these first 2 are like what I drew upon to sort of lay out some of the material we've covered, and then this last one, I think, is nice. 01:12:34.000 --> 01:12:43.000 So it's, you know, written by these 2 professors from Australia, and they, in addition, like, have, I believe they have like videos for all of their different lectures, or maybe not. 01:12:43.000 --> 01:12:49.000 Yeah. So they go through they're working in more of a the. 01:12:49.000 --> 01:12:52.000 This is for like, if you want to do it in R. 01:12:52.000 --> 01:13:07.000 So the coding is based on our. But the concepts like it doesn't matter like they still go over things like, you know, but the concepts like it doesn't matter like they still go over things like you know, or Rema, and stuff that it doesn't matter. 01:13:07.000 --> 01:13:11.000 If you're in or in python, the models are still the same. It's just different coding languages. 01:13:11.000 --> 01:13:12.000 So I thought, this is a nice book, and then at the end they go into some more like advanced forecasting stuff that we haven't covered, which may be, will be of interest. 01:13:12.000 --> 01:13:16.000 But I still believe from the people I've talked to, who work in industry like Areema, still gets used a lot. 01:13:16.000 --> 01:13:30.000 So it's not like, you know, been left in the dust by neural networks, or whatever other types of models it still gets used as a decent model. 01:13:30.000 --> 01:13:39.000 There are also some other python packages, so like we've used the Tsa sub package of stats, models. 01:13:39.000 --> 01:13:46.000 But I believe there's a nice list of time series packages that a Github user compiled that you can find here. 01:13:46.000 --> 01:13:50.000 Assuming that the link is still up from the last time I checked. 01:13:50.000 --> 01:13:54.000 So that's that you can also look through. 01:13:54.000 --> 01:14:12.000 So there are 2 additional Time Series lectures that we're not going to cover live the first that you'll want to come back to after we go over decision trees and random forests in our classification, and ensemble learning content and then number 10 you could probably go to at any time. 01:14:12.000 --> 01:14:15.000 So it talks about something called the Facebook profit Module. 01:14:15.000 --> 01:14:22.000 So I've talked to some friends of mine that work as like data analysts and data scientists and you know, some of them have said, this model still gets used quite a bit in industry. 01:14:22.000 --> 01:14:26.000 It is one of the models that I think gets blamed for sort of that zillow disaster that happened a couple of years ago. 01:14:26.000 --> 01:14:34.000 But there are some like mitigating factors that it's not necessarily the models fault that that disaster happened. 01:14:34.000 --> 01:14:41.000 But you know you want to be careful when you use it, because it is slightly different. 01:14:41.000 --> 01:14:46.000 Than the models we've looked at so far, but you know it's useful to know. 01:14:46.000 --> 01:14:50.000 So I go over the theory of it, and then how to implement it in notebook number 10. 01:14:50.000 --> 01:14:55.000 So 9 and 10 both have pre-recorded lectures. 01:14:55.000 --> 01:15:00.000 If you wanna follow along with a voice in a video. Okay? 01:15:00.000 --> 01:15:13.000 So that's time series. Let me go ahead and shut these notebooks down, and we will get started on classification. 01:15:13.000 --> 01:15:19.000 Okay, so we're gonna start just like we start with time series, we're gonna start by like talking about some adjustments. 01:15:19.000 --> 01:15:26.000 We need to make for classification problems. And then I think there's a chance that that will take us until the end of time. 01:15:26.000 --> 01:15:30.000 Like for today, and we'll come back for more classification stuff tomorrow. 01:15:30.000 --> 01:15:41.000 So once this kernel loads. It's been taking a while. I don't know why. 01:15:41.000 --> 01:15:46.000 Okay, so we're gonna sort of look at an illustrating example. 01:15:46.000 --> 01:15:55.000 First, it's like a tiny, silly little example that you will probably rarely encounter in the real world, but it kind of illustrates sort of an idea. 01:15:55.000 --> 01:16:03.000 So we have some sample output data. So in classification, we're trying to predict sort of a categorical or binary outcome. 01:16:03.000 --> 01:16:07.000 So in this example, we're going to imagine we have a binary outcome of why that is like zeros and ones that we'd like to predict. 01:16:07.000 --> 01:16:10.000 So here's some sample output data. So why has all these observations? 01:16:10.000 --> 01:16:23.000 And 2 of them are ones, and we're going to sort of imagine what happens if we make a trained test split the way we've been doing. 01:16:23.000 --> 01:16:26.000 So just using sk learns train test, split. 01:16:26.000 --> 01:16:31.000 So here I go through, and I just make 5 different random splits. 01:16:31.000 --> 01:16:34.000 And then I look at the train, test the train set and the test set, just to illustrate what happens. 01:16:34.000 --> 01:16:42.000 So the first like, let's see, 4 of the 5. 01:16:42.000 --> 01:16:48.000 None of the one examples show up in the test set, which is okay for the training right? 01:16:48.000 --> 01:16:59.000 Because we're still getting the ones in the training set, but because the test set don't have does not have any ones in it. It's hard for us to get a sense of like it's our test set. Performance is going to be. 01:16:59.000 --> 01:17:00.000 If we don't have any of the ones in the test set. 01:17:00.000 --> 01:17:04.000 So ones are often used to encode variables of like the thing we're trying to predict. 01:17:04.000 --> 01:17:25.000 So maybe this thing we're trying to predict is whether or not someone is exclusing credit card fraud, or whether or not this person has a certain deadly disease so it's important to be able to gauge like how good we are at predicting ones, because that's typically the 01:17:25.000 --> 01:17:36.000 thing we care about most. Here is another situation where we have a split, where all of the ones ended up in the test set, and none of the ones were in the training set. 01:17:36.000 --> 01:17:39.000 So if something like this happens, it's impossible for us to train a model to predict what one is, because we don't have any examples to show our model. 01:17:39.000 --> 01:17:54.000 So with classification data, doing the same train test split that we've done for regression is not going to it's not gonna cut it. 01:17:54.000 --> 01:18:01.000 There's some issues that can arise. So what what's going on here is whenever we make a train test split. 01:18:01.000 --> 01:18:16.000 There's this underlying assumption for all predictive modeling that the data that we're training on is the same has, like the same underlying distribution as the data that we're drawing from. 01:18:16.000 --> 01:18:17.000 So out in the world. Maybe we have this sample that is split like this between zeros and ones. 01:18:17.000 --> 01:18:25.000 So like roughly a sixth of the data is a one. 01:18:25.000 --> 01:18:32.000 And what we make the train test split like we've been doing so far, we could hypothetically end up with the situation where the training set has about a third one's and the test set has about a twelfth one. 01:18:32.000 --> 01:18:36.000 So neither of these are reflective of the distribution that they were originally drawn from. 01:18:36.000 --> 01:18:52.000 So what? We're trying to do is make these train test splits in a way that the training set and the test set, and then, similarly, for cross validation and validation sets. 01:18:52.000 --> 01:18:59.000 They all are roughly similar to the sample you know, which is hopefully similar to the underlying distribution. 01:18:59.000 --> 01:19:02.000 So this is what we're hoping to have with any type of data split. 01:19:02.000 --> 01:19:03.000 We're making is that the distribution of zeros and ones? 01:19:03.000 --> 01:19:17.000 Or if we have multi-class, the distribution of the possible values of Y are the same, or pretty close to the same across all the different data splits, we're making. 01:19:17.000 --> 01:19:21.000 So how can we do that? It's called a stratified splits? 01:19:21.000 --> 01:19:26.000 So in theory, what we do is we first take our data. 01:19:26.000 --> 01:19:41.000 We stratify it into the different classes. So if we have binder data, we would do all zeros over here and all ones over here, then, on all of the different possible classes for Y, we do a random train test Split. 01:19:41.000 --> 01:19:42.000 So we'd have in this example a training set of zeros, a test set of zeros, a training set of ones, a test set of ones. 01:19:42.000 --> 01:19:54.000 And then after this, the splits are done on each individual clock class. 01:19:54.000 --> 01:19:58.000 For why we recombine them into an overall training set. 01:19:58.000 --> 01:20:08.000 So all the training ones and all the training zeros get recombined, and then all the test zeros and all the test ones get recombined to make a training set in a test set. 01:20:08.000 --> 01:20:09.000 So an sk learn you can do this by just providing a stratify argument. 01:20:09.000 --> 01:20:19.000 So here's that beer data set, we looked at before where we had both Stouts and Ipas. 01:20:19.000 --> 01:20:27.000 So the split there was roughly 56% Ipa, 44% stout. 01:20:27.000 --> 01:20:33.000 So the way we would do this, as you call trained tests split. 01:20:33.000 --> 01:20:36.000 You're gonna input the data frame. So beer dot copy. 01:20:36.000 --> 01:20:42.000 If you had a oh, numpy raise, you might do like x comma y. 01:20:42.000 --> 01:20:48.000 We're still gonna do shuffle equals true, we'll put in a random state to be. 01:20:48.000 --> 01:20:59.000 I don't know. 1, 2, 2, 2, 2, 2, and then the final thing that we need to put in well, I guess the second to the last thing is test size. 01:20:59.000 --> 01:21:00.000 So let's do. I don't know point 2. 01:21:00.000 --> 01:21:04.000 So the last thing we need to put in is this stratify argument? 01:21:04.000 --> 01:21:10.000 So we say, stratify, and then you need to input the variable that you're stratifying on. 01:21:10.000 --> 01:21:14.000 So for us that's going to be beer at beer. 01:21:14.000 --> 01:21:19.000 Underscore type. 01:21:19.000 --> 01:21:26.000 The variable we are stratifying. 01:21:26.000 --> 01:21:27.000 So now we can see in the training set. It's roughly the same. 01:21:27.000 --> 01:21:33.000 It's not exactly the same, but it's pretty close. 01:21:33.000 --> 01:21:35.000 And then the test set is the same way again. It's not going to be exactly the same right. 01:21:35.000 --> 01:21:45.000 Because we're dealing with discrete numbers. And so it's just sometimes impossible to get the exact same split. 01:21:45.000 --> 01:21:53.000 But these are for all intents and purposes the same split of like 56 45. 01:21:53.000 --> 01:21:54.000 So that's how you do. A stratified train test Split. 01:21:54.000 --> 01:21:58.000 Are there any questions on anything we've talked about so far in this notebook? 01:21:58.000 --> 01:22:05.000 Can you only stratify on a single variable? 01:22:05.000 --> 01:22:12.000 So you can stratify a multiple variables. I'm not entirely sure I'd have to like double check the documentation to see how to do it in train test. Split. 01:22:12.000 --> 01:22:19.000 But in practice you can so like in a lot of like clinical trials, right? 01:22:19.000 --> 01:22:27.000 You might stratify your patience, based on things like gender or other features so like. 01:22:27.000 --> 01:22:31.000 If you are trying to do like oh, what's an example? 01:22:31.000 --> 01:22:37.000 So we did for part of my Phd research, we did a study like trying to get people to get flu shots. 01:22:37.000 --> 01:22:41.000 And so 2 of the things we stratified on were the self-reported gender. 01:22:41.000 --> 01:22:46.000 And then, whether or not that person had a requirement from their job to get a flu shot. 01:22:46.000 --> 01:22:49.000 So we stratified on 2 things there. So it is. 01:22:49.000 --> 01:22:59.000 It can be done in theory, the thing to remember is the more things you include in the stratification so like, if you try and stratify on too many things it will be maybe impossible to actually get observations in each of your boxes. 01:22:59.000 --> 01:23:09.000 Because you're just trying to spit the data up too much. 01:23:09.000 --> 01:23:10.000 So, Pedro's asking, How can you stratify? 01:23:10.000 --> 01:23:15.000 Non-binary features and labels. So it works exactly the same way. 01:23:15.000 --> 01:23:26.000 So like, let's say, if you know, instead of 2 beers I had Ipa stouts and loggers, so that's another type of beer so if I had that, I would still just put this in here and the splitting would happen exactly the same way. 01:23:26.000 --> 01:23:40.000 So instead of having zeros and ones, here, imagine if you had zeros ones and twos, and then the exact same thing happens where you do a random split on the twos and then recombine. After that. 01:23:40.000 --> 01:23:42.000 So it's always the same, always gonna be the same process of randomly splitting the individual possible categories. 01:23:42.000 --> 01:23:53.000 Then recombining them. 01:23:53.000 --> 01:23:54.000 Yeah, so you can use it. And people do use it as well for regression. 01:23:54.000 --> 01:24:11.000 So let's say you have a categorical, variable that you want to use in regression, and you want to ensure that there's equal representation of the categories and your trained test splits or cross validation. You can use that as you could do a stratified train test 01:24:11.000 --> 01:24:24.000 split there as well. Another thing I've seen people do, at least in textbooks, is like, maybe you have sort of a weird distribution where you have a few really high value observations. 01:24:24.000 --> 01:24:35.000 You can like, sort of bend the data, you could bend the Y data and then do the stratified split on the bins, and then unbin the data like when you're actually doing the predictions and stuff. 01:24:35.000 --> 01:24:41.000 So you can do it for that as well. 01:24:41.000 --> 01:24:49.000 Any other questions? 01:24:49.000 --> 01:24:52.000 Okay, so, just like there was a time series split for time. 01:24:52.000 --> 01:24:55.000 Series, cross Validation. There is a stratified K-fold object for stratified crosstalidation. 01:24:55.000 --> 01:25:04.000 So from S. K. Learn. 01:25:04.000 --> 01:25:08.000 This is also stored in model selection. 01:25:08.000 --> 01:25:15.000 We're gonna import, stratified Kate, Fold. 01:25:15.000 --> 01:25:29.000 And then just like before you specify the model object so stratified a fold and then we're gonna input 5, and then scuffle equals. 01:25:29.000 --> 01:25:38.000 True Randall. I don't know that. Okay. 01:25:38.000 --> 01:25:40.000 And now you might be wondering what'll wait a minute. 01:25:40.000 --> 01:25:45.000 Where do you specify the stratified part that comes when you call the split? 01:25:45.000 --> 01:25:58.000 So when I call the split, I'm gonna have to have so unlike with regular cross validation, where for a data frame, I could input. 01:25:58.000 --> 01:26:05.000 I I could input the data frame without an X and a Y for cross validation to work with stratified. 01:26:05.000 --> 01:26:14.000 You need both the X hey, and the Y, or you need like the you know your features, and then, whatever columns you're using to stratify. 01:26:14.000 --> 01:26:24.000 So here it's beer type. But like or I think it was, Joe like that question, you know, if you had more than one, you'd probably need to specify it here. 01:26:24.000 --> 01:26:30.000 So we're gonna run through this. And then, oh, no! 01:26:30.000 --> 01:26:36.000 What happened? Oh, by randomly pouting my keyboard, I made a random state number. 01:26:36.000 --> 01:26:39.000 That's too big. So let's do that. There we go. 01:26:39.000 --> 01:26:59.000 Okay. So now you'll see. Like as you go through again, it's not gonna be perfectly the same for each one, but it'll be roughly the same across all of our spits, and then sort of just showing you training split is roughly the same as the holdout set split. 01:26:59.000 --> 01:27:09.000 Alright! Are there any questions about stratified fold? 01:27:09.000 --> 01:27:10.000 Awesome, alright, so I will stick around if you guys have any questions tomorrow. 01:27:10.000 --> 01:27:22.000 The next lecture will continue on with our classification stuff, and actually learn some models, and then some different metrics. 01:27:22.000 --> 01:27:28.000 Tomorrow. In the problem session problem, session number 7 you'll finish up time series, content. 01:27:28.000 --> 01:27:30.000 And yeah, and then we're just moving right along. So I will stop recording. 01:27:30.000 --> 01:27:35.000 And then I will