WEBVTT 00:00:00.000 --> 00:00:01.000 Okay. So I'm gonna go ahead and get started. And I'm about to hit record. 00:00:01.000 --> 00:00:02.000 Alright welcome back, everybody! So today we're gonna start our time series content. 00:00:02.000 --> 00:00:20.000 So let me go ahead and share my screen and get situated. 00:00:20.000 --> 00:00:35.000 Okay, so today, we're gonna start time series. So in order to get there, you'll want to go to your lectures folder if you're trying to follow along go to supervised learning. And then we're gonna go into time series forecasting so we're gonna try and go 00:00:35.000 --> 00:00:46.000 through 1, 2, we're gonna skip 3. That's really just if you're new to python and don't know about times and date times then we'll go to 4 and then try and get through 5. 00:00:46.000 --> 00:00:50.000 We'll see how far we can get 4 is a little bit of a longer notebook, so it takes a little bit to get through. 00:00:50.000 --> 00:00:56.000 We'll see how far we're able to get through with 5. 00:00:56.000 --> 00:00:57.000 So we're gonna start with notebook number one in the time series. 00:00:57.000 --> 00:01:07.000 What are time series, and forecasting so we're gonna sort of define what a time series is. 00:01:07.000 --> 00:01:10.000 What a forecasting task is! And then also introduce some concepts related to time series called Trend and Seasonality. 00:01:10.000 --> 00:01:17.000 So let's get started a time series is a sequence of data points. 00:01:17.000 --> 00:01:25.000 Note the term sequence here x one y. One x 2 y. 00:01:25.000 --> 00:01:26.000 2, and all the way to Xy. T. And so forth. 00:01:26.000 --> 00:01:27.000 Here x, sub. T is a collection of M features. So it's not regular regression. Right X. Sub. 00:01:27.000 --> 00:01:38.000 T is a collection of M features, so it's not an regular regression. Right X. Sub. Whatever was a feature here. 00:01:38.000 --> 00:01:50.000 X sub t, is a matrix that has M. Columns and then y sub t is a matrix that has m columns, and then y sub t is a numerical variable of interest that time T is a numeric variable of interest at time. T so something we're trying to predict so we're going to 00:01:50.000 --> 00:01:57.000 assume that our time steps throughout the all this content we'll assume that our time steps are evenly spaced. 00:01:57.000 --> 00:02:01.000 Now we are. Gonna look at an example that uses stock price data. 00:02:01.000 --> 00:02:04.000 And technically, those aren't evenly spaced because you don't trade. 00:02:04.000 --> 00:02:19.000 On the holiday, or the weekend, but we're gonna make the assumption of the trading days occur sequentially, even though something may, happen on the weekend of the trading days occur sequentially, even though something may happen on the weekend that could impact, the price on 00:02:19.000 --> 00:02:29.000 Monday. So we're also going to have to make some additional assumptions on X sub t y sub t, depending upon the data set we have technically time series. 00:02:29.000 --> 00:02:38.000 Forecasting is a regression task in the machine learning data, science sense it's not a linear regression model task all the time. 00:02:38.000 --> 00:02:50.000 Sometimes it is but it. Regression is just the class of models, class of models in data, science, and supervis learning, where we have some features that we'd like to use to predict a numeric, continuous variable Y, so it is a regression task. 00:02:50.000 --> 00:03:04.000 But there are some features about time series problems in particular that will require us to change some things from linear regression. 00:03:04.000 --> 00:03:12.000 So one of the slights departures we're going to have is time series may or may not come with a set of features. 00:03:12.000 --> 00:03:15.000 So in general you would set it up like x one y. 00:03:15.000 --> 00:03:21.000 One x 2 y. 2. But what you may end up getting with your problem are just to series of Y variables. 00:03:21.000 --> 00:03:27.000 So y one, y 2, all the way to it, and we're gonna focus on that situation in these notes. 00:03:27.000 --> 00:03:29.000 Focus on a situation where you have a time series, and you're only going to use the time series to predict future values. 00:03:29.000 --> 00:03:43.000 Another thing that we're going to see soon is that the sequential nature of the data impacts our predictive modeling approaches for these data. 00:03:43.000 --> 00:03:47.000 So some examples of time series data include things like the average global temperature over the past 200 years. 00:03:47.000 --> 00:04:03.000 The value of the S. And p. 500 still index the daily new cases of seasonal influenza in the United States of America since 1,900, as well as yearly Boston, Marathon Times. 00:04:03.000 --> 00:04:11.000 So we're gonna start by talking about 2 features of time series data that you tend to look for when you're doing forecasting. 00:04:11.000 --> 00:04:25.000 The first is a trend, so we would say that a Time series exhibits a trend if the values y sub t tend to increase or decrease over time and over time, meaning as little t increases. 00:04:25.000 --> 00:04:33.000 So one example is this, Google Google's Parent company is a stock closing price. 00:04:33.000 --> 00:04:51.000 So this data file, Google underscore stock. Csv contains the closing price of Google's parent company from the initial day that it was training to someday and March of 2022, I forget when it ends so we're gonna plot this and take a 00:04:51.000 --> 00:04:55.000 look so on the vertical axis we have the value of the closing price, and on the horizontal axis we have the date of the trading days. 00:04:55.000 --> 00:05:08.000 So they're connected. Again, there are days where there were no trading, but it looks like there is because it's connected. But just think it! 00:05:08.000 --> 00:05:09.000 Here we have the date and a time series setting. We would treat these as like trading Day. 00:05:09.000 --> 00:05:18.000 One trading day, 2, and so forth. So here we would say that this time series has an increasing trend from the beginning of the time series to the end. 00:05:18.000 --> 00:05:33.000 But I want to know that it is. Sometimes you need to be careful when you're examining time series plots for trends, because the timeline window that you're looking at can impact what you see. 00:05:33.000 --> 00:05:51.000 So here is a time window where we consider the closing value of the stock from just before February 20, s, 2020, all the way to March first, twentieth 20, and here you would say that this time series exhibits a downward trend so being mindful of the windows and the time 00:05:51.000 --> 00:06:01.000 spans. You're considering can impact. You know what you're observing about the trend overall this time series exhibits an increasing trend. 00:06:01.000 --> 00:06:03.000 But as you look at the various pieces of the plot, you can see there are time windows where the stock tends to go down. 00:06:03.000 --> 00:06:11.000 So that's something to keep in mind. 00:06:11.000 --> 00:06:17.000 Another feature about. See about time series that they can have is called Seasonality. 00:06:17.000 --> 00:06:18.000 So a Time series is said to exhibit seasonality. 00:06:18.000 --> 00:06:28.000 If the value of the output variable y sub t demstrates a repeating pattern of some fixed length over time. 00:06:28.000 --> 00:06:29.000 So one way to think of this is a sinusoidal wave, so a sign curve right. 00:06:29.000 --> 00:06:37.000 So this is repeating values over time. It's periodic. 00:06:37.000 --> 00:06:38.000 In time series sense. If you saw something like this, you would say, Okay, it has seasonality. 00:06:38.000 --> 00:06:45.000 This is not the only repeating pattern you'd see. 00:06:45.000 --> 00:06:49.000 There are lots of different patterns that can repeat over time. 00:06:49.000 --> 00:06:56.000 One example of a seasonal data set is the influenza number of cases of seasonal influenza and the United States. 00:06:56.000 --> 00:07:03.000 This is prior to COVID-19. Something sort of changed with our behavior changing in the early years of the pandemic. 00:07:03.000 --> 00:07:04.000 But flu season follows a fairly regular pattern where cases start to slowly increase in late October. 00:07:04.000 --> 00:07:25.000 Slash, early November, increase more rapidly in December and January, and then peak around the late January and February, and then starts to decrease to 0 around April or so so here is a data set on Us Seasonal 00:07:25.000 --> 00:07:38.000 influenza, I believe, confirmed cases. So through some sort of tests at a hospital or doctor's office from 1928 to 1948, this data comes from Project Tyco. 00:07:38.000 --> 00:07:43.000 Okay, I think this is maybe left over from my lecture copy last year. 00:07:43.000 --> 00:07:47.000 So here we have just looking at the data set. 00:07:47.000 --> 00:07:57.000 We have the date, the year, the week, and the number of cases, and for some reason here I think I was demonstrating how to subset the data with a date time. 00:07:57.000 --> 00:08:01.000 And if that's something that you're confused about, we, you can ask the question. 00:08:01.000 --> 00:08:05.000 When I pause for questions, and a little bit. 00:08:05.000 --> 00:08:13.000 And here's what the data looks like. So you can kind of see this seasonal pattern where every year you see, basically the same kind of peak. 00:08:13.000 --> 00:08:32.000 Now in this time series. There are some years where the flu season peaks early, or peaks much higher than before, but roughly, this data follows the same sort of pattern that we discussed with occasional years of like early seasons so this would be considered seasonal data, so before we talk about 00:08:32.000 --> 00:08:53.000 what forecasting is? Are there any questions about time series, trends or seasonality? 00:08:53.000 --> 00:08:54.000 Okay, so forecasting is just what we call the act of predicting future values of a time series. 00:08:54.000 --> 00:09:05.000 So remember our supervised learning. Freework we have y is equal to F of x plus epsilon. 00:09:05.000 --> 00:09:09.000 F is the systematic part. Epsilon is some random error that's typically assumed independent of X. 00:09:09.000 --> 00:09:18.000 When we have forecasting, we make some slight adjustments to the framework that take into account sort of the temporal or sequential nature of the data. 00:09:18.000 --> 00:09:23.000 So here, in the supervised learning framework for our forecasting setting, you're assuming that Y at some time capital T is a function of the features. 00:09:23.000 --> 00:09:38.000 At time. Capital T, which you may or may not have, and the actual time given all the previous observations. 00:09:38.000 --> 00:09:42.000 And then here you can notice the random noise has a subscript. 00:09:42.000 --> 00:09:52.000 Capital T. This is to indicate that the random error while still assumed to be independent of the features you have, is no longer. 00:09:52.000 --> 00:10:02.000 It's not considered to be independent of time. So the random noise may be dependent upon the value of little T, which in this case is capital T. 00:10:02.000 --> 00:10:11.000 So these sorts of this slightly different framework is going to change some of the things we've done in regression like data splits and cross validation. 00:10:11.000 --> 00:10:14.000 But we are going to learn about that in the very next notebook. 00:10:14.000 --> 00:10:31.000 So before we continue on. Are there any questions at all from notebook number one for a time? Series? 00:10:31.000 --> 00:10:32.000 Yeah. 00:10:32.000 --> 00:10:38.000 So can I ask the question. So the So are you saying this is kind of and this looks like a conditional probability type. 00:10:38.000 --> 00:10:43.000 Expression. Is that what it becomes when a time series? 00:10:43.000 --> 00:10:47.000 Yeah, so you are conditioning on the previous observations. 00:10:47.000 --> 00:10:54.000 Right, so like. 00:10:54.000 --> 00:10:55.000 Trying to think so. We haven't seen another example where we do something like this. 00:10:55.000 --> 00:11:01.000 So! 00:11:01.000 --> 00:11:03.000 Yeah, so basically, just yes. 00:11:03.000 --> 00:11:08.000 Okay. And features are just the past performance, I guess. 00:11:08.000 --> 00:11:11.000 Are your kind of your features, then? 00:11:11.000 --> 00:11:31.000 So you may have situations where you have actual, like a matrix matrix of features where, like maybe, for instance, with this, with this flu example, you could have things about sometimes disease is impacted by the weather, so you may be you have data about the weather. 00:11:31.000 --> 00:11:34.000 Or climate variables at the time, at each time stamp as well, like maybe the average temperature, in whatever region you're looking at, or something like that. 00:11:34.000 --> 00:11:48.000 So that would be a feature, and then, on top of that, you also need to take into account the sequential nature of the data which is why you're considering these previous observations. 00:11:48.000 --> 00:11:56.000 Also, there may be something where there's like some sort of temporal like something just by due to the fact, like, it's that time of year. 00:11:56.000 --> 00:11:59.000 So you see something like this. So it's sort of a function of both potential features. 00:11:59.000 --> 00:12:08.000 You may or may not have the time itself, and then you have to consider the previous observations. 00:12:08.000 --> 00:12:09.000 Thank you. 00:12:09.000 --> 00:12:17.000 Yeah. And Erica asked if you wanted to use features in addition to just the Y values are their models that use both. 00:12:17.000 --> 00:12:22.000 Or would you somehow combine multiple different models? So, Erika, there are models that use both for the sake of time. 00:12:22.000 --> 00:12:27.000 We don't cover them in the lectures, but I believe in the practice problems. 00:12:27.000 --> 00:12:34.000 There's an example, the essential basically the model. I think, that's covered in the practice problems as you've a regression model regressing Y onto the features. 00:12:34.000 --> 00:12:48.000 And then you end up modeling the errors, using the templal nature. 00:12:48.000 --> 00:12:56.000 So Keira's asking essentially, epsilon sub t would increase with time, not necessarily so. 00:12:56.000 --> 00:13:09.000 It could increase with time, it could also, like maybe, the errors are also seasonal the way that I sent that errors would depend upon time, or can depend upon time differs depending on the time series. You have. 00:13:09.000 --> 00:13:14.000 It may be independent of time. It just depends on the time series. 00:13:14.000 --> 00:13:15.000 And then a mean is asking, oh, so we're explicitly not factoring in the previous observations. 00:13:15.000 --> 00:13:27.000 So, yeah, I believe there are some models that could explicitly factor in previous observations of the X's as well. 00:13:27.000 --> 00:13:48.000 But I believe probably the idea is that information would be somewhat contained, so like not subconsciously, but trying to think of the word implicitly. 00:13:48.000 --> 00:13:54.000 Okay. 00:13:54.000 --> 00:13:59.000 So just like we said, this is notebook 2 in the time series, folder. 00:13:59.000 --> 00:14:06.000 So notebook, 2 here. So just like we just said, the sequential nature means we have to change some things. 00:14:06.000 --> 00:14:18.000 And so in this notebook, we're going to talk about things. And so in this notebook, we're going to talk about what parts of our sort of supervised learning workflow are we gonna have to change to account for the sequential nature of the data. 00:14:18.000 --> 00:14:22.000 So the first is that we're going to have to make some. 00:14:22.000 --> 00:14:26.000 I mean, basically the only thing that we're going to change is the way we make data splits. 00:14:26.000 --> 00:14:31.000 So this is going to include validation splits, train test splits and cross validation. 00:14:31.000 --> 00:14:50.000 So the basic idea is, remember, when we learned about all those different data splits last week or in previous lectures, you just made them uniformly at random, with the idea of being like you do, that so that your distribution of the split reflects the underlying distribution of the population because you have 00:14:50.000 --> 00:15:04.000 data where your next observation is potential, dependent on all of your previous or some of your previous observations, you can't just uniformly, at random, select the train test splits the validation set splits or the cross validation. 00:15:04.000 --> 00:15:11.000 You have to split the data in a way that respects the fact that some observations occurred after other observations. 00:15:11.000 --> 00:15:28.000 So basically the idea that you can use to keep this straight in your mind is you can't use the future right to predict the past which could happen if you just do a regular random split your training set may have features that you're going to then try and use are observations that you would then try 00:15:28.000 --> 00:15:34.000 and use to predict the previous observations in your test or validation, or cross validation. 00:15:34.000 --> 00:15:38.000 So basically when you do these splits, you need to respect the fact that these later observations occur after the previous ones. 00:15:38.000 --> 00:15:58.000 So let's say hypothetically, we have a full time series here represented by this line of blue dots so I think it's like 12 observations when we make our train test split, you would then just set aside the last however, many you're considering observations as your 00:15:58.000 --> 00:16:06.000 test set. So in forecasting the number of time steps forward that you'd like to be able to forecast and let's call this. Maybe little H. 00:16:06.000 --> 00:16:19.000 Is known as the forecast horizon, so typically what you'd want to do is set aside, maybe 1, 2, or 3 horizons, as the test set, depending on how large your data set is here. 00:16:19.000 --> 00:16:23.000 You could see something like, maybe we had a horizon of 2, and we're setting aside 2 horizons worth of data for the test set. 00:16:23.000 --> 00:16:35.000 Or maybe we have a horizon of HP. 4, in which case we set aside one horizon of of our data as a test set. 00:16:35.000 --> 00:16:40.000 And so you would, while it's currently being presented as a train test Split. 00:16:40.000 --> 00:16:43.000 The same thing would work for a validation set just like with this sort of the Mini training set version of it. 00:16:43.000 --> 00:16:53.000 So that's the idea is in order to respect the fact that you can't use the future to predict the past. 00:16:53.000 --> 00:16:57.000 You just clip off the last. However, many observations. 00:16:57.000 --> 00:17:05.000 Of your data set. So you don't need a special function. So we don't need train test split to do this, you can use train test, split. 00:17:05.000 --> 00:17:07.000 But you don't need train test split. But you don't need it. 00:17:07.000 --> 00:17:13.000 So your data is typically stored in like a numpy array or some sort of pandas data frame. 00:17:13.000 --> 00:17:22.000 So you can just use or some sort of pandas data frame. So you can just use or regular indexing to get the last few observations of the data set set aside as a test set. 00:17:22.000 --> 00:17:33.000 So are there questions about trained test splits and values splits. 00:17:33.000 --> 00:17:36.000 Icons asking, Do we split train tests like this? 00:17:36.000 --> 00:17:37.000 Even if the Time Series data does not have trends or seasonality. 00:17:37.000 --> 00:17:39.000 So any data we have with time should be split like this. 00:17:39.000 --> 00:17:58.000 Yeah. So if the data you're dealing with has a temporal element, it's a time series, like, if it's observations that set time points, you're gonna want to respect the fact that you can't use the future to predict the past. 00:17:58.000 --> 00:18:09.000 So even if your time series doesn't have seasonality or a trend, you still need to respect the fact that it's a time series, and split off the trade test split like this, and then I also notice that somebody had their hand up. 00:18:09.000 --> 00:18:17.000 So if I didn't answer your question, feel free to ask. 00:18:17.000 --> 00:18:23.000 So the question is just slightly off screen and the Cross validation section. 00:18:23.000 --> 00:18:37.000 It seems like if we use the split there. Then what we would see is that the train data and Cb split, one would show up in all splits while trained it in. 00:18:37.000 --> 00:18:44.000 Cb, split 5 shows up once which to me was just that any analysis will be weight towards older data. 00:18:44.000 --> 00:18:48.000 Am I missing something? Where is that? A desired function? 00:18:48.000 --> 00:18:52.000 Yeah. So I would ask that like, we'll pause. 00:18:52.000 --> 00:18:57.000 The ask that question until until we actually like cover this part of the lecture. 00:18:57.000 --> 00:18:59.000 Yeah, yeah. 00:18:59.000 --> 00:19:00.000 No, Nope, we didn't talk about. Yeah. Yeah. 00:19:00.000 --> 00:19:02.000 Okay? Oh, sorry. I think you said about Cross Foundation with this. Well, my Bud! 00:19:02.000 --> 00:19:08.000 Just the train test, split part. 00:19:08.000 --> 00:19:13.000 Okay. Well, on that note, let's dive into the cross. Validation part. 00:19:13.000 --> 00:19:17.000 So with crossoveridation, it works the same way. So like, let's say, I had a different. 00:19:17.000 --> 00:19:20.000 I know I use the same color so it can be confusing. 00:19:20.000 --> 00:19:23.000 But let's say I had a a different cost. Validation set. 00:19:23.000 --> 00:19:34.000 And now this is what my training set, not cross audition set different time series, a bigger time series and then this is what my resultsing training tests, training set looks like after the train test split. 00:19:34.000 --> 00:19:35.000 So I think here I have 6, 9, 1215, 18 different observations. 00:19:35.000 --> 00:19:52.000 So the way Crosst validation works is essentially just sort of the same way as the train test split where you go through and sequentially remove the last. 00:19:52.000 --> 00:19:55.000 However, many observations, you have. So it's this example. 00:19:55.000 --> 00:20:00.000 We're doing five-fold cross-validation split, and we're setting our H equals to 3. 00:20:00.000 --> 00:20:01.000 So here I'm setting aside one horizon for each cross validation split as a hold. 00:20:01.000 --> 00:20:19.000 Out set, and then just like I believe Zach mentioned with his question for each cross validation split as a hold out set. And then, just like, I believe, Zach mentioned with his question, using all of the previous observations. 00:20:19.000 --> 00:20:32.000 As training data, so this is one way you can do it and then as Zach mentioned you are sort of getting this phenomenon or occurrence happening where you know these first 3 points are included in every single training set for the cross validation, splits, and then these next 3 points are included 00:20:32.000 --> 00:20:36.000 in 4 of them. So you are getting more information from the past. 00:20:36.000 --> 00:20:49.000 So you can set it up and the way we're going to learn to do this is called Time Series split so you can set it up so that you have the same size training set for all of your cost. Validation Splits. 00:20:49.000 --> 00:21:01.000 And so basically like, for instance, if we wanted to make our training set size 3 as well, you could make it. So. You only use like these 3 observations, the 3 observations immediately before your hold out set. 00:21:01.000 --> 00:21:11.000 So, for instance, like here, you know, if you did that Cv split, one would be these 6 that you see, and then Cv split, 2 would be the last 6 cv. 00:21:11.000 --> 00:21:20.000 Split 3 would be the last 6, and so forth. So you can do either way, you know, you can see which way it works best for your particular model. 00:21:20.000 --> 00:21:21.000 You can try both and then see if that has an impact on the results of the model. 00:21:21.000 --> 00:21:28.000 If you'd like. 00:21:28.000 --> 00:21:33.000 Okay, so to do this in python, you can use a function from. 00:21:33.000 --> 00:21:34.000 I guess it's technically an object from sk learn, called Time series, Split. 00:21:34.000 --> 00:21:45.000 So from Sqlern dot model. So it's still in model selection, just like K-fold. 00:21:45.000 --> 00:21:50.000 You're gonna import capital t time, capital S series. 00:21:50.000 --> 00:21:54.000 Capital S. Split. 00:21:54.000 --> 00:21:57.000 So what I'm doing right now is just randomly generating. 00:21:57.000 --> 00:22:02.000 And then maybe what I'll do is empty. 00:22:02.000 --> 00:22:09.000 Dot random, dot seed, just to make sure it's always the same so I'm randomly generating a time series using numpy. 00:22:09.000 --> 00:22:18.000 And so here's what my time series looks like. This is just randomly generated not important, other than the fact that we're going to pretend it's a time series. 00:22:18.000 --> 00:22:22.000 So how do I do? The cross validation with time series split? 00:22:22.000 --> 00:22:28.000 So K fold is gonna be called time series Splits. 00:22:28.000 --> 00:22:35.000 The number of splits is the number of splits you want, so you can specify it with N underscore splits so you can just put it first. 00:22:35.000 --> 00:22:39.000 Test size will limit the size of the holdout set. 00:22:39.000 --> 00:22:40.000 So here I believe in my comments. I'm saying that I'm gonna use a test size of 14. 00:22:40.000 --> 00:22:42.000 And so here, if I don't specify a train size. 00:22:42.000 --> 00:22:52.000 So this is sort of also a difference from Kfold and Kfold. 00:22:52.000 --> 00:22:53.000 We would typically do a fraction with time series split. 00:22:53.000 --> 00:23:03.000 You can do a fraction but if you want to be explicit on the exact number of observations being split off, you can just specify. 00:23:03.000 --> 00:23:07.000 So here I've specified the exact number of observations being split off, you can just specify. So here I've specified 14. 00:23:07.000 --> 00:23:08.000 The reason I chose 14 is just because this is what I provided for myself and the in the comments. 00:23:08.000 --> 00:23:11.000 So in this example, the way it would work is the way it would work is the way it's pictured here. 00:23:11.000 --> 00:23:28.000 So instead of 3 here, you'd have 14 observations, and then all the remaining, and then sequentially pruding off the holdout set each time, and then we can loop through this and see how that looks. 00:23:28.000 --> 00:23:35.000 So you can see how each time the previous test index is then added to the list for the train index. 00:23:35.000 --> 00:23:40.000 As you go through. Okay? And then alternatively, you can do it. 00:23:40.000 --> 00:23:44.000 The way that I said in sort of an answer to Zach's question. 00:23:44.000 --> 00:23:55.000 I believe if you set trains, size, and then let's say we want to use just 2 horizons worth of data for each training set so we could set that equal to 28. 00:23:55.000 --> 00:23:56.000 I knew that would be something that happens. Let me check out the documentation to her. 00:23:56.000 --> 00:24:05.000 What this argument is. So Max train size, that's what it is. 00:24:05.000 --> 00:24:10.000 So not train size. But Max underscore tree in size. 00:24:10.000 --> 00:24:13.000 And so now, if you go through, you can see that the training set is never bigger than 20. 00:24:13.000 --> 00:24:20.000 -eight observations that might be hard to see with your eyes. 00:24:20.000 --> 00:24:27.000 But trust me, it's always 28, and then it moves along with the test set. 00:24:27.000 --> 00:24:30.000 Okay. 00:24:30.000 --> 00:24:44.000 Are there any questions about cross validation with time series? 00:24:44.000 --> 00:24:51.000 So Brantly is asked in the chat if our Time series exhibit seality, shouldn't we make sure that our validation sets together cover at least one full cycle of seasons? 00:24:51.000 --> 00:24:52.000 So get a good idea of how the model performs throughout the entire cycle. 00:24:52.000 --> 00:25:03.000 Yes, probably you would want to do that sometimes. You might not have like enough data to ensure that you do that. 00:25:03.000 --> 00:25:15.000 But in general you would like to make sure that you're split or so like your what's the word I'm looking for so like with your horizon split. 00:25:15.000 --> 00:25:24.000 You might have seasonal data, but you only are interested. Let's say you have seasonal data, but you only are interested in predicting, like 3 time steps out. 00:25:24.000 --> 00:25:34.000 In that case your hold outside, you probably wouldn't want to make it like maybe your season is yearly and your data is weekly data. 00:25:34.000 --> 00:25:40.000 And so you'd have a season of roughly 52 time steps. 00:25:40.000 --> 00:25:41.000 The period of the season be roughly 52. 00:25:41.000 --> 00:25:46.000 In that case you probably wouldn't want your hold out. 00:25:46.000 --> 00:25:49.000 Set to always be 52. Your holdout set should be about the size of your horizon, a little bit larger, like 2 or 3 horizons. 00:25:49.000 --> 00:26:19.000 In that case you might wanna make sure that if you're doing sort of this, where were we doing this sort of like Max train size, maybe you want to ensure that you include at least one whole seasons worth of data in your training so? 00:26:24.000 --> 00:26:38.000 Are there any other questions? 00:26:38.000 --> 00:26:44.000 Okay. 00:26:44.000 --> 00:26:48.000 And then. 00:26:48.000 --> 00:26:50.000 Yeah, that should come up. So that's it for your notebook. 00:26:50.000 --> 00:26:53.000 Number 2. We are, gonna we're skipping and live lecture. 00:26:53.000 --> 00:27:08.000 We're skipping notebook number 3. So if you're new to Python, and you've never dealt with like date times, I encourage you to check out this notebook on your own time, there is a live lecture already on the website or not a live lecture already on the website or not a live 00:27:08.000 --> 00:27:16.000 lecture a pre-recorded lecture video on the website for this notebook and it's just good to get some familiarity with sort of date times and time steps. 00:27:16.000 --> 00:27:20.000 And how python handles time in general. So we're going to dive into sort of learning some very first baseline forecasts. 00:27:20.000 --> 00:27:24.000 I will preface like a lot of these forecasts aren't gonna be like the world's best forecasts. 00:27:24.000 --> 00:27:36.000 Sometimes they do a lot better than you'd expect them to, but it's important to always have a baseline for any sort of modeling approach that you're doing. 00:27:36.000 --> 00:27:42.000 So with forecast. These ones are going to be like sort of your Goto as a very simple baseline, because if you're really, maybe you have a really complicated model, if you're really complicated, model that takes a long time to train. 00:27:42.000 --> 00:27:50.000 Can't outperform your baseline. It's not worth keeping that model. 00:27:50.000 --> 00:28:02.000 So we're gonna learn these baseline forecasts in particular, we're gonna cover 6 base 6 baseline models to basically, we're going through different cases of how your time series looks. 00:28:02.000 --> 00:28:07.000 So first, one is, we're going to consider timeline series without trend and without seasonality. 00:28:07.000 --> 00:28:14.000 The next 2 will be time series with a trend, but no seasonality. 00:28:14.000 --> 00:28:23.000 And then the last 2 will be time series with seasonality but not a trend, and I know what you're thinking. 00:28:23.000 --> 00:28:36.000 Well, that doesn't cover all the potential options. I cut it off there, because if you know these last 4 you can put together how to make the last 2, it is explicitly covered in the practice problems. 00:28:36.000 --> 00:28:46.000 So sort of for the sake of time. I just focused on these 6 models and then left it to you to look at the last 2 on your own time. 00:28:46.000 --> 00:28:47.000 So throughout the notebooks I want to assume that whenever I have y sub t, I'm referring to a time series. 00:28:47.000 --> 00:29:04.000 My training set. I'm always going to assume. Has n observations, and in particular, our overall goal is to make observations at times, after what we have observed. 00:29:04.000 --> 00:29:10.000 So for little T. Bigger than N. Remember, N. Here is the size of my training set. 00:29:10.000 --> 00:29:14.000 Okay. 00:29:14.000 --> 00:29:18.000 So, we're gonna stick to that 2 time series. We looked at in the very first notebook today. 00:29:18.000 --> 00:29:20.000 So the Yahoo Finance Time Series, which was the trading, the closing price for the Google Parent Company stock from August nineteenth 2,022. 00:29:20.000 --> 00:29:50.000 -two. So remember, I said, you know, we want our data to be sequential. We're just going to assume that my time-six nineteenth, 2,004 to March twenty-fifth, 2,022 so remember I said you know we want our data. To be sequential. 00:29:50.000 --> 00:30:00.000 We're just going to assume that my time steps are trading days which are not always sequential, right? But we're just going to make that assumption. And then the second is weekly seasonal influenza cases from. 00:30:00.000 --> 00:30:03.000 Okay. So I also want to point out that this might be a new thing for people who are new to using pandas. 00:30:03.000 --> 00:30:10.000 So these data sets Google stock and the influenza, one have columns that have dates in them. 00:30:10.000 --> 00:30:27.000 And so when you read in data, Csv or Csv files, or anything with Pandas, when you read in data with Pandas that has a column, that is a date, you can get it read in as a date. 00:30:27.000 --> 00:30:30.000 If you include this argument, parse underscore dates. 00:30:30.000 --> 00:30:32.000 And so what you do here is you provided a list of columns and those columns should be the ones that you suspect have a date in them. 00:30:32.000 --> 00:30:57.000 So, if the date is well formatted, meaning it follows sort of like standard date formatting there's a couple of different standards, but if it follows one of those standards pandas can usually figure out which one you're using okay, and so now here, are the first 5 entries of the 00:30:57.000 --> 00:31:01.000 data frame. And then maybe just for the sake of following up on that. 00:31:01.000 --> 00:31:07.000 So like you can see the 0 entry. Here is what's known as a Pandas times. 00:31:07.000 --> 00:31:13.000 And then if I got rid of this, so I'll just comment it out. 00:31:13.000 --> 00:31:18.000 Now you can see how it's a string. So without the Parse states, it's read in as a string. 00:31:18.000 --> 00:31:24.000 But with the Parse dates it's read in as a timestamp. 00:31:24.000 --> 00:31:34.000 A pandas's timestamp. So and then here's just a reminder of the time series of what it of what it looks like. 00:31:34.000 --> 00:31:45.000 So the first 2 that we're gonna look at are, you know, we know that this is a time series with a trend and increasing trend in the over the course of the entire time series. 00:31:45.000 --> 00:31:46.000 We're gonna still use it instead of introduce yet another time series, I'm gonna stick to just these 2. 00:31:46.000 --> 00:31:55.000 And I'm gonna use this time series as the one that you know is, I'm gonna build the baseline forecast for time series without a trend. 00:31:55.000 --> 00:32:16.000 I know it has a trend, but we're going to pretend like it doesn't, just for the sake of not having to introduce a third time series for us to keep track of so I'm gonna say that I'm gonna set aside to our 14 days worth of trading of trading days and use that as 00:32:16.000 --> 00:32:28.000 my test set don't really have a reason for using 14. That's just what I chose. When I wrote the notebook. 00:32:28.000 --> 00:32:30.000 Okay. So the first forecast for we're gonna look at is something called the average forecast. 00:32:30.000 --> 00:32:41.000 So the average forecasts just consists of predicting the historical average for every future time point. 00:32:41.000 --> 00:32:50.000 So this is really similar to the baseline we use for regular regression problems where you're just going to take the expected value of the output and then use that for your picture. 00:32:50.000 --> 00:33:13.000 So to write it down in a formula F of little sub t would be the expected value of the time series, plus some random noise for any time in the future, and just the observed value, if it was one of the observations in the training set, and so here epsilon is an error term and then when we're doing this in 00:33:13.000 --> 00:33:18.000 production if you wanted to implement this as one of your models. 00:33:18.000 --> 00:33:31.000 So if this assumption, if this model is to be any good, you'd have to have the assumption that the Y sub t are independent of one another and identically distributed. 00:33:31.000 --> 00:33:32.000 So this is what's also known as a white noise. 00:33:32.000 --> 00:33:45.000 Time, series, so this is me white noise, meaning it's a purely random process for each observation is identically distributed and independent of one another. 00:33:45.000 --> 00:33:46.000 It's usually not gonna be the case of like what you get. 00:33:46.000 --> 00:33:52.000 But it's a good baseline. So how do we make the prediction? 00:33:52.000 --> 00:33:59.000 So we're gonna do Goog, train dot closing price that mean. 00:33:59.000 --> 00:34:07.000 And then we're gonna multiply it by the length of our test set of ones, which was just to be safe. 00:34:07.000 --> 00:34:12.000 We'll just do underscore test it's 14. But this way we know for sure. 00:34:12.000 --> 00:34:17.000 If I go up and change it later, it will always be the length of the test set. 00:34:17.000 --> 00:34:25.000 Ones, okay. So now, what this is gonna do is it's gonna plot my training data. 00:34:25.000 --> 00:34:29.000 My actual test data and my prediction, as well as type out the Mse. For us. 00:34:29.000 --> 00:34:33.000 So I wanna make a disclaimer, and maybe I should have written this in in the notebook. 00:34:33.000 --> 00:34:40.000 This. So if we were trying to do the process of building the best model, we would want to do a validation. 00:34:40.000 --> 00:34:55.000 Set or cross validation, because I'm just defining the different sets of different types of models and then showing you how to fit them we're just going to show you how the model compares to the actual test set. 00:34:55.000 --> 00:35:12.000 This is not what you would want to do if you're actually trying to build a model, but for simplicity, for the lecture, that's just what we're doing. Okay. 00:35:12.000 --> 00:35:17.000 So, as I said, not a great model. And why isn't it a great model? 00:35:17.000 --> 00:35:22.000 So you can see here, this started line is our prediction. 00:35:22.000 --> 00:35:27.000 So it's not a good model, remember, because, like the averages, all of the historical data. 00:35:27.000 --> 00:35:37.000 And because we have such a huge increase from, you know the first trading day to the last trading day, our averages way skewed down. 00:35:37.000 --> 00:35:38.000 Okay, so that's our first forecast. The next baseline forecast for data without a trend. 00:35:38.000 --> 00:35:53.000 And without seasonality, 's known as the Naive forecast, so it's called the naive forecast, because your prediction of what things are going to be like in the next time step is what things are currently so if we were to write this down as a formula. 00:35:53.000 --> 00:35:57.000 Our F of T would be the last observation in the training set, plus some random noise. 00:35:57.000 --> 00:36:13.000 If your observations in the future, or the observed value, if it's in the training set, where again, Epsilon is an error term. 00:36:13.000 --> 00:36:21.000 So the statistical model that underlies this is known as a random walk, where you assume some initial value for time. 00:36:21.000 --> 00:36:25.000 Step 0, and then the future time step is just the current time. 00:36:25.000 --> 00:36:29.000 Step, plus some random noise. So this is the naive model. 00:36:29.000 --> 00:36:33.000 It's naive because you're just assuming things are going to be the same for every time. 00:36:33.000 --> 00:36:45.000 Step in the future, same as the current state. So to make this prediction, you would do you get trained dot closing price, and then we would do. 00:36:45.000 --> 00:36:46.000 I think it's probably easiest for me to just to do dot values and take the very last observation. 00:36:46.000 --> 00:37:06.000 And then multiply it by a vector of ones. That is the length of the test set again, if it was cross-sidation, you know, if it was actually trying to compare to other models, we would do cross-sidation because we're just learning the models. 00:37:06.000 --> 00:37:13.000 It's the test set. Okay. So here, this one actually works a lot better for this particular data set. 00:37:13.000 --> 00:37:14.000 And that's typically because you, you can try and model the stock market with a random walk. 00:37:14.000 --> 00:37:25.000 And it's usually okay in the short term, because stock prices are pretty volatile. 00:37:25.000 --> 00:37:31.000 That's why this one looks a little bit better than than the average. 00:37:31.000 --> 00:37:33.000 Okay. 00:37:33.000 --> 00:37:37.000 Okay. But before we move on to data with a trend, but no seasonality. 00:37:37.000 --> 00:37:48.000 Are there any questions about the average or the naive forecast? 00:37:48.000 --> 00:37:53.000 I mean, and I guess I just have a question of like you're not really supposed to do this with stock price data, right? 00:37:53.000 --> 00:38:07.000 Yeah, I thought I heard that like, you're not supposed to try and predict stock price data because it's not actually doable. 00:38:07.000 --> 00:38:12.000 Yeah, so like, you know, people do come up with models to try and predict. 00:38:12.000 --> 00:38:18.000 Maybe not. Try and guess like the absolute, actual price. I don't actually know what people you know in quantitative trading firms do, and try and predict. 00:38:18.000 --> 00:38:26.000 Like? If you like, the general knowledge is, you're never gonna beat the market. 00:38:26.000 --> 00:38:30.000 And typically, if my understanding of what that means is over, time. 00:38:30.000 --> 00:38:31.000 So in the short term you might get lucky, or maybe have a good model. 00:38:31.000 --> 00:38:39.000 That's good at predicting like a short term thing. 00:38:39.000 --> 00:38:52.000 But if you are, then to try and consistently use the same approach over time, you're always going to be beaten by the market, which is typically meant like an index fund that you know, if you don't have a lot of money that you're willing to throw around and bet on your models I would suggest or if you 00:38:52.000 --> 00:38:57.000 don't have somebody else's money to do it with. 00:38:57.000 --> 00:38:58.000 Right. 00:38:58.000 --> 00:39:07.000 I would suggest, or if you don't have somebody else's money to do it with, I would suggest, don't try and predict the stock market, you know there are places that are quantitative trading firms that would that do have they try and at least predict something and some of them make a lot of 00:39:07.000 --> 00:39:17.000 money, but in the long term they would probably be outperformed by other sort of trading approaches. 00:39:17.000 --> 00:39:18.000 Yeah. 00:39:18.000 --> 00:39:28.000 Alright! Thanks! 00:39:28.000 --> 00:39:32.000 Okay, so our next 2 baseline forecasts are for data with Trend. 00:39:32.000 --> 00:39:36.000 But no seasonality, so either increasing or decreasing over time. 00:39:36.000 --> 00:39:47.000 The first is just called the trend Forecast so the trend forecasts just assumes that Time series is a linear function of time. 00:39:47.000 --> 00:40:00.000 So you might do something like F of T. Is equal to beta 0 plus beta one t plus epsilon if it's in the future, and if it's not in the future, you just take the current observation. 00:40:00.000 --> 00:40:05.000 And so here Beta, 0 and Beta one are real variables and Epsilon is a random error term. 00:40:05.000 --> 00:40:16.000 So the underlying statistical model here is this is really just a play on the average model that is linear in time. 00:40:16.000 --> 00:40:22.000 So in the average model, we just take the expected value and it's fixed for all time, whereas here we're saying the expected value can be modeled as a function of time. 00:40:22.000 --> 00:40:42.000 So I guess this should be e of y t given sub t so I guess this should be the expected value can be modeled as a function of time. So I guess this should be of y of t given sub t, so you do from S. 00:40:42.000 --> 00:40:52.000 K learn that linear model which we've seen before. Import linear regression, and then just do exactly the same thing. 00:40:52.000 --> 00:40:53.000 So that we've done in the past. Few lectures and problem sessions. 00:40:53.000 --> 00:41:00.000 So Reg is equal to linear regression. 00:41:00.000 --> 00:41:09.000 Regg dot fit, and then for your your features here you would just do something like Np. 00:41:09.000 --> 00:41:22.000 Arrange one to the length of the training set. So train, and then you have to do, plus one to account for the fact that you're not starting at 0 anymore. 00:41:22.000 --> 00:41:26.000 And then we would need to reshape, because this is a way dimensional array. 00:41:26.000 --> 00:41:53.000 And then we just put the closing price. So good Goog train dot closing price dot values. And then to make the forecast we would do Reg dot predict the Npr range would now go from the length of the training set to the links. 00:41:53.000 --> 00:41:58.000 Maybe what I'll do here is enter to the length of the training set. 00:41:58.000 --> 00:42:10.000 Plus the length of the test set. Because again, we're just looking at the test set here, plus again, plus one to account for the fact that you know you're not starting at 0. 00:42:10.000 --> 00:42:16.000 And so then that would be our prediction. Oh, for@thereshape.reshape negative one! 00:42:16.000 --> 00:42:20.000 Oh, one! 00:42:20.000 --> 00:42:29.000 Maybe I don't actually need the plus one. Where did it go? 00:42:29.000 --> 00:42:31.000 There we go. I did not need the plus one. Okay, so this is what you get again, doesn't look like it did much better than the average forecast. 00:42:31.000 --> 00:42:40.000 We could compare if we wanted to but that's not the point of this notebook. 00:42:40.000 --> 00:42:53.000 And again, it's probably just due to the fact that we have such a long time series. And this does not look like a linear linear. 00:42:53.000 --> 00:42:58.000 Alright! So is there any questions about the trend forecast, and how you fit it? I guess. 00:42:58.000 --> 00:43:06.000 Are there any questions? 00:43:06.000 --> 00:43:15.000 Okay. So the last for the last of the 2 for data with a trend. But no seasonality is called a random walk with Drift. 00:43:15.000 --> 00:43:35.000 So this is just a trend extension of the naive forecast, and so for this, you're going to take your F of T to be the last observation, plus some constant or parameter that you try and estimate beta times the number of time steps into the future from your last 00:43:35.000 --> 00:43:37.000 observation plus epsilon, and then, if it's occurred, and like within your training, set, you just take whatever the observed value is. 00:43:37.000 --> 00:43:47.000 Here Beta is a real number that you're gonna estimate. 00:43:47.000 --> 00:43:51.000 And so the underlying statistical model here is just called the same thing. 00:43:51.000 --> 00:43:54.000 It's a random walk with drift where your have a sequence of random variables. 00:43:54.000 --> 00:43:58.000 Y. One to y n, or just y one, y 2, y t etc. 00:43:58.000 --> 00:44:15.000 Are you assume if a starting point y sub 0, and then each subsequent point y sub. 0, and then each subsequent point is the current plus this trend y sub. 0, and then each subsequent point is the current plus this trend parameter coefficient 00:44:15.000 --> 00:44:16.000 plus epsilon. And so that's why it's T minus n, because you're assuming you're adding the same amount at each time. 00:44:16.000 --> 00:44:29.000 Step. So the way you can estimate this slope coefficient here, Beta, is to use what's known as first differences. 00:44:29.000 --> 00:44:52.000 And so first difference is are found by calculating the a few like the next time step minus the current, or I guess you could put y sub t minus y sub t minus one. So you can do this really quickly with pandas dot diff function, so if we do just to demonstrate so. 00:44:52.000 --> 00:44:59.000 Looking at the first 5 like values of the closing price. 00:44:59.000 --> 00:45:11.000 We get this, and then with diff, goo train dot closing price dot diff, and then dot head. 00:45:11.000 --> 00:45:16.000 You can compare and kinda eyeball that, you know there's nothing to subtract the cause. 00:45:16.000 --> 00:45:30.000 This is the first entry. So for the one entry it takes the one entry of Goo train and subtracts the 0 entry, and so if you eyeball it, that's how you can tell that this is what you'd get then for the next century you would do the 00:45:30.000 --> 00:45:36.000 2 entry minus the one entry which gives us a point 5 4, and then you just keep going and so then, to get the estimate of the beta, you just take the average value of these. 00:45:36.000 --> 00:45:47.000 So I'm going to copy paste. Get rid of the head, and add in the mean. 00:45:47.000 --> 00:45:50.000 And then, just to make sure I'm not including that. 00:45:50.000 --> 00:45:57.000 Na will do from one onwards. 00:45:57.000 --> 00:46:02.000 Okay. And so then this is what our prediction looks like. 00:46:02.000 --> 00:46:07.000 So a little bit of an increase. 00:46:07.000 --> 00:46:19.000 And you can kind of see from the slope slight slope here. 00:46:19.000 --> 00:46:20.000 So Zack is asking as a random walk with drift different from the linear model, with a different intercept. 00:46:20.000 --> 00:46:34.000 So the linear model you get is regressing the Y values onto the model you get is regressing the Y values onto the time steps. 00:46:34.000 --> 00:46:35.000 So T, one or 1, 2, 3, 4. So you're never going to get well, I guess maybe not. 00:46:35.000 --> 00:46:41.000 Never! It would be rare that you get the intercept here. 00:46:41.000 --> 00:46:50.000 This beta 0 basically you're just fixing Beta 0 to be the last observation of your training set. 00:46:50.000 --> 00:47:07.000 If that makes sense. 00:47:07.000 --> 00:47:14.000 Any other questions about these 2? 00:47:14.000 --> 00:47:15.000 Yeah. 00:47:15.000 --> 00:47:17.000 Just kind of a general question. Could you do something like transform your time? 00:47:17.000 --> 00:47:25.000 Variable, using like a sine wave, or something like that, so then, it's not really a time series anymore. 00:47:25.000 --> 00:47:29.000 And then just do like a normal. 00:47:29.000 --> 00:47:30.000 Okay. 00:47:30.000 --> 00:47:31.000 So it will always yeah. So no matter what transferations you apply to it, it will always be a time series. 00:47:31.000 --> 00:47:38.000 So the series, the Time series, part of it, isn't necessarily like Oh, it's seasonal, or Oh, it has a trend. 00:47:38.000 --> 00:47:46.000 The time series. Part of it is, there is, you know, in the models we're going to learn, and the live lectures. 00:47:46.000 --> 00:47:51.000 There's an assumed dependence structure from one observation compared to the previous one. 00:47:51.000 --> 00:48:01.000 Like. So you're assuming that there is some sort of dependence steps basically. 00:48:01.000 --> 00:48:06.000 So that's the idea behind behind, like the time series. 00:48:06.000 --> 00:48:25.000 And even if you were to take like, if you took the sign of all the observations, or you did like whatever transformations you applied to the different observations like, there would still be some dependence between the observations, so that yeah, you can't really take the time. 00:48:25.000 --> 00:48:34.000 Series, and then, like somehow find a way to decompose it, to not be a timeline anymore. 00:48:34.000 --> 00:48:38.000 Clark's asking if we can show the Beta hat block again. 00:48:38.000 --> 00:48:59.000 We sure can. So to get the Beta hat, remember, so like if y t plus one is equal to yt plus beta plus epsilon, then y t plus one minus y t would be a way to estimate the betas and then typically you assume that the epsilon has a an average 00:48:59.000 --> 00:49:06.000 a mean of 0, in which case the beta, like, by taking the average of all the yt plus ones minus the yt's you're gonna estimate a value for Beta. 00:49:06.000 --> 00:49:14.000 And so that's what we're doing here. 00:49:14.000 --> 00:49:24.000 We're taking the diff. And then, when we take the average of all the diffs again, we're skipping the one here because we're skipping the very 0 row. 00:49:24.000 --> 00:49:35.000 So the very first row we're skipping this because it has a missing value. 00:49:35.000 --> 00:49:41.000 Any other questions? 00:49:41.000 --> 00:49:47.000 Okay. So the last 2 models we're gonna look at are for data with seasonality. But no trend. 00:49:47.000 --> 00:49:52.000 And last 2. In this notebook we are going to look at more models based on the time that I have right now. 00:49:52.000 --> 00:49:53.000 So let's just remind ourselves quickly of what this data set looks like. 00:49:53.000 --> 00:49:59.000 And so here we can confirm earlier in the notebook added Typo. 00:49:59.000 --> 00:50:09.000 So we're looking at from 1,928 to 1948, not 90, I think I had 30 to 50 earlier in the notebook so this is what it looks like. 00:50:09.000 --> 00:50:17.000 Okay, so this is our time series. So we can see that this data tends to this exhibits a sort of yearly pattern. 00:50:17.000 --> 00:50:30.000 So new cases tend to increase at the beginning of each year, and peak in the first sort of quarter of the year, and then decline afterwards, and the cycle occurs on a yearly basis. 00:50:30.000 --> 00:50:31.000 So time series that exhibit this sort of trend. 00:50:31.000 --> 00:50:40.000 Our behavior are set to exhibit seasonality, not trend, but behavior. And I think this is just a repeat from before I had that. 00:50:40.000 --> 00:50:41.000 So. Some of these notebooks I wrote like 2 or 3 years ago, and so they don't always conform with one another. 00:50:41.000 --> 00:51:03.000 I said this in the earlier notebook, okay? So when we have this sort of belief like that, our data is seasonal, we can update these sort of baseline models and average in an E naive model, we can update them to account for the fact that our data, is seasonal so 00:51:03.000 --> 00:51:10.000 we're gonna use this to sort of look at the last year of the data set, which is 1,947. 00:51:10.000 --> 00:51:14.000 We're going to use this as a way to sort of show you how you can make some seasonal baseline module and then compare it to what we observe with the last year. 00:51:14.000 --> 00:51:27.000 Again, if we were actually trying to build a model for predicting on this data the best one we could, we would do cross validation or a validation set. 00:51:27.000 --> 00:51:35.000 But just to introduce the models are going to focus in on just looking and comparing to what we observe in the last year. 00:51:35.000 --> 00:51:36.000 So the first seasonal. But no trend is the seasonal version of the average forecast. 00:51:36.000 --> 00:51:46.000 So again we say. 00:51:46.000 --> 00:51:55.000 Okay. So for each time, step in the seasonal average for forecast, basically, what you're gonna do is predict the average value of all the previous corresponding time steps throughout the series. 00:51:55.000 --> 00:52:06.000 So a seasonal data set has a given season, and that season is, let's say, M. 00:52:06.000 --> 00:52:11.000 Time steps long. I think that's what I used here, and so if it is M. 00:52:11.000 --> 00:52:19.000 Times steps long, you would basically would go through and say for each like observation, one in a period. 00:52:19.000 --> 00:52:23.000 So the length of a season is called a period for each observation. 00:52:23.000 --> 00:52:27.000 One in that period to predict the next observation. One. 00:52:27.000 --> 00:52:28.000 I'm going to take the average over all of those. 00:52:28.000 --> 00:52:30.000 And then for making a prediction that is in time, step 2 of a P. 00:52:30.000 --> 00:52:42.000 Air. You take the average over all of those time. Step 2 observations in your training set for each observation that occurs in time. 00:52:42.000 --> 00:52:43.000 Step 3. In the period in the future you would take the average over all the time. Step 3. 00:52:43.000 --> 00:52:51.000 Observations in your data set. So that's sort of the idea. Here. 00:52:51.000 --> 00:53:02.000 It's easier to say it and explain it with words than it is to read through this formula, what you can do on your own time if you'd like so sort of the underlying statistical model. 00:53:02.000 --> 00:53:12.000 Here is just a seasonal extension of the average forecast, where the underlying model is assuming that each step in your season represents its own like white noise, sequence. 00:53:12.000 --> 00:53:17.000 So for every step you have a different, random, variable that you're drawing from that that that distribution. 00:53:17.000 --> 00:53:22.000 That's the idea. So the nice thing is ahead of time. 00:53:22.000 --> 00:53:27.000 I specified. So I clean the data a little bit. 00:53:27.000 --> 00:53:29.000 So each week. It's each year has 52 weeks, which isn't true. 00:53:29.000 --> 00:53:47.000 Every year has exactly 52 weeks, but for this toy example each week year has 52 weeks, and so, in order to get the average model predictions, all I have to do is through the training set loop through the different weeks of the year. 00:53:47.000 --> 00:53:51.000 One through 52, and then record the average for that week. 00:53:51.000 --> 00:53:56.000 And so that's what this code does is it loops through weeks, one through 52, and then records the average value of cases for that week across the training. 00:53:56.000 --> 00:54:06.000 Set, and so then this is plotting that average forecast, that seasonal average forecast. 00:54:06.000 --> 00:54:16.000 Okay, so the dotted line is the seasonal average forecast. 00:54:16.000 --> 00:54:19.000 The solid red line is the actual observation in 1947. 00:54:19.000 --> 00:54:41.000 So one reason that this doesn't perform as well as because 1947 was sort of a an anomaly year for most seasonal influenza years it started off, maybe like near like actually in the year, whereas other seasons tend to start in December, November. 00:54:41.000 --> 00:54:42.000 December, and then are starting to like, maybe peek in January and come down. 00:54:42.000 --> 00:54:51.000 So that's why you're seeing this sort of disparity between the observed and then the average model. 00:54:51.000 --> 00:54:53.000 Okay. So then the next one is just the seasonal extension of the naive model. 00:54:53.000 --> 00:55:02.000 It's called a seasonal, naive model. Here you take again. 00:55:02.000 --> 00:55:08.000 You're doing this sort of idea where you're breaking up your time, steps your seasons into the corresponding time steps. 00:55:08.000 --> 00:55:25.000 So if I want to make a prediction in the first time step of the of of the model, or whatever, let me start over, if I want to make a prediction in the future, you'd have to figure out what time step that prediction, corresponds to and then find the most recent version of 00:55:25.000 --> 00:55:42.000 that time step in the training data and then choose that. So in this example, for the year 1947, we're just going to take every corresponding week from the year 1947, we're just going to take every corresponding week from 1946 and then copy and paste it over. 00:55:42.000 --> 00:55:54.000 So the underlying model for this is sort of an a seasonal extension of the random Walk, where we would think of each step in the seasons cycle as sort of its own random walk, sequence. 00:55:54.000 --> 00:56:00.000 So let's go ahead and do this. So here we can use the fact that I have already cleaned out the year. 00:56:00.000 --> 00:56:07.000 This data. So I can say to get my predictions for 1947, I just need to get the corresponding values. 00:56:07.000 --> 00:56:11.000 For 1947. 00:56:11.000 --> 00:56:14.000 Okay. 00:56:14.000 --> 00:56:15.000 So that's what the seasonal, naive model looks like. 00:56:15.000 --> 00:56:25.000 On this particular data? Okay, are there any questions about the 2 seasonal? 00:56:25.000 --> 00:56:44.000 But no trend models that we just went over. 00:56:44.000 --> 00:56:50.000 Alright, so like, I said. There are 2 more baselines that you might want to can look at. 00:56:50.000 --> 00:57:03.000 So, for instance, you can have a data set that has both seasonality and a trend in that case, you're basically just doing extensions of again, you're just doing extensions of the average and the naive model. 00:57:03.000 --> 00:57:06.000 But accounting for the fact that you have a trend and seasonality. 00:57:06.000 --> 00:57:12.000 So it gets like cumbersome to write it all out and go over it in a lecture. 00:57:12.000 --> 00:57:26.000 But I did write it all out in the practice problems notebook for time series. So if you're interested in using these baselines for such data, check out the practice problems and you'll see how to do so okay? 00:57:26.000 --> 00:57:34.000 So the last notebook we're gonna dive into today is called averaging and smoothing Snowbook number 5 in the time series. 00:57:34.000 --> 00:57:46.000 Folder. So these are gonna start to introduce some beyond baseline models. So we're gonna see how much of this we can get through today, and then we'll finish it up tomorrow. 00:57:46.000 --> 00:57:49.000 So! 00:57:49.000 --> 00:58:01.000 What do we mean when we say averaging, smoothing forecast, so averaging or smoothing forecasts are ones where you're just going to take an average of some collection of the previous values? 00:58:01.000 --> 00:58:10.000 So the ultimate version of this was that baseline model where you take all the previous observations and the training set, and then average of them together. 00:58:10.000 --> 00:58:29.000 But we're going to learn like more nuanced or subtle averaging models in this notebook, the very first one is known as a moving average forecast and so, for one example, of a moving average forecast for instance, a moving average forecast for instance, a moving average of size, 3 window, size. 00:58:29.000 --> 00:58:36.000 3 you would take the sum of the 3 most previous observations, and then divide by 3. 00:58:36.000 --> 00:58:39.000 So you take the average of the 3 most previous observations, and then divide by 3. 00:58:39.000 --> 00:58:54.000 So this is known as a moving average forcast, because, as T increases you, the window that you're considering is shifting. 00:58:54.000 --> 00:58:57.000 So for our formally, you could write it out like this. 00:58:57.000 --> 00:59:05.000 So the moving average forecast with window sites, K. And equal weights is given by this, where you're just doing the sum from. 00:59:05.000 --> 00:59:12.000 I equals 0 to k minus one of y sub t minus i plus epsilon t, so that's random noise. 00:59:12.000 --> 00:59:20.000 So this is basically like for any observation within your timeline or within your training set, you're moving that window. 00:59:20.000 --> 00:59:24.000 And then, once you get into the future, you're stuck with. 00:59:24.000 --> 00:59:26.000 Okay. I only have these observations. So I'll predictions into the future are just the average of your last K observations. 00:59:26.000 --> 00:59:36.000 So going back to the Google stock we're going to show you how to implement this model. 00:59:36.000 --> 00:59:46.000 So pandas has this function called dot rolling. So, okay, now, I know what I was trying to do. 00:59:46.000 --> 00:59:55.000 So you have you put in the. 00:59:55.000 --> 01:00:00.000 Okay. So now that I have the copy here, you're gonna do rolling. 01:00:00.000 --> 01:00:04.000 And then dot mean and then I'll show you what this looks like. 01:00:04.000 --> 01:00:09.000 Let's do dot head at 10, and then window. 01:00:09.000 --> 01:00:18.000 Sorry about that. I wanted to put, let's say 3, and then there's also this other argument you're gonna want called closed equals left. 01:00:18.000 --> 01:00:21.000 And I'll talk about that does in a second. Okay? 01:00:21.000 --> 01:00:37.000 So what roleing does is it will go through whatever column you've provided it, and then sort of do this rolling window type thing so here we have a window of size 3 so it's gonna look at the 3 most recent objects. 01:00:37.000 --> 01:00:48.000 And then, whatever operation you ask for next, it will calculate that observation, or calculate that statistic, using those observations. 01:00:48.000 --> 01:01:09.000 So here, because we put mean the observation on the 3 row, because our window sizes 3 is the mean of the first 3, then the next on row 4 is the mean of rows, one through 3, and then on row 5 is the mean on rows, 2 through 4, so this isn't something that you 01:01:09.000 --> 01:01:18.000 can use only to calculate means, we could also do something like, what is the median of the last 3 rowss? We could also do something like what is the median of the last 3 Rows. 01:01:18.000 --> 01:01:21.000 Let's see, I forget what's standard deviation is. 01:01:21.000 --> 01:01:23.000 Std, so you could even do something like alright. What's the standard deviation on the last 3 rows, etc. 01:01:23.000 --> 01:01:35.000 So as long as you put in like something where you're calculating a statistic on a column, you can use it for rolling. 01:01:35.000 --> 01:01:46.000 So the first argument tells you what the window size is, so I could change the window size to be like 5 or 2, and it changes like the size of the window. 01:01:46.000 --> 01:01:53.000 We call the window, like the number of observations so I'm gonna go back to 3, because that's the model I specified earlier. 01:01:53.000 --> 01:02:13.000 Then I have this argument called closed equals left. So this is just specifying that I want it to be set up to be the 3 most recent in comparison to what we're on that as opposed to other ones as you can center the window, so here, it would be like Y sub T Y sub T plus one y sub 01:02:13.000 --> 01:02:26.000 t minus one y sub t y sub t , oh, that work out so it'd be y sub t y sub t plus one y sub t plus 2, because we're doing time series forecasting. 01:02:26.000 --> 01:02:35.000 We can't use the future to make predictions. So we have to set it so that the side of the window that we start with is the left hand side. 01:02:35.000 --> 01:02:40.000 Okay, so that was a lot of information. So are there any questions about rolling? 01:02:40.000 --> 01:03:00.000 And how it works, or like what closed means, or there any questions about that? 01:03:00.000 --> 01:03:02.000 Okay. So then we, yeah. 01:03:02.000 --> 01:03:08.000 I still had to understand what the whole clothes is so close equals left means. 01:03:08.000 --> 01:03:12.000 We're saying that we're taking like, you know. 01:03:12.000 --> 01:03:18.000 Let's say, for instance, at the in this table, that we're looking at. 01:03:18.000 --> 01:03:23.000 We're taking the last 3 in. 01:03:23.000 --> 01:03:25.000 And then I guess I don't quite understand, like it seems like a closed interval right? 01:03:25.000 --> 01:03:29.000 But. 01:03:29.000 --> 01:03:30.000 So close, is just specifying like, Where are you? Start the window. 01:03:30.000 --> 01:03:33.000 So when it's equal to left, you're starting it and going backwards. 01:03:33.000 --> 01:03:49.000 If it was equal to, I believe if it's equal to left, you're starting it and going backwards. If it was equal to, I believe if it's equal to right, you'd be going for so well, apparently not. So, let's see we can always. 01:03:49.000 --> 01:03:55.000 I'm guessing it means like the last 3, including the current one, as opposed to excluding the current one. 01:03:55.000 --> 01:03:56.000 Yeah. So why don't we? 01:03:56.000 --> 01:04:02.000 So the average for time equals 3. It factors in 0 1, 2 versus 0 1 2 is the average at 2. 01:04:02.000 --> 01:04:03.000 So let's go here, and we'll just read what the definition says. 01:04:03.000 --> 01:04:11.000 Okay, so closed, if right, the first point in the window is excluded from the calculations. 01:04:11.000 --> 01:04:16.000 If left. The last point in the window is excluded from calculations. 01:04:16.000 --> 01:04:39.000 If both, the no points in the window are excluded, and if neither the first and last points in the window are excluded from calculations so if we don't provide anything we can like, get rid of this altogether, we can see that this row is included in the average right, so we don't 01:04:39.000 --> 01:04:41.000 want that if we wanna make predictions like we can't include the observation itself. 01:04:41.000 --> 01:04:52.000 And making the predictions. And so what we want. So here the window is just looking at the the start of it. 01:04:52.000 --> 01:04:58.000 0 one and 2, and so like, we're saying here what we want is to exclude the last point. 01:04:58.000 --> 01:05:04.000 So that's why we want closed equals left. 01:05:04.000 --> 01:05:12.000 And this was experimenting. This is what we want. So when we have closed equals left, it means exclude the last point in the window. 01:05:12.000 --> 01:05:22.000 So, these first 3 rows are our first window here. So we're saying, do not include the third entry of this window, and that's how we get with like this is, get the average of the 3 previous and then this is the average of the 3 previous and so forth. 01:05:22.000 --> 01:05:32.000 Does that make sense? 01:05:32.000 --> 01:05:34.000 Gotcha! 01:05:34.000 --> 01:05:37.000 Yeah, thanks for asking. 01:05:37.000 --> 01:05:45.000 Are there any other questions about this? It's a good thing we asked, cause it help clear up some confusion. 01:05:45.000 --> 01:05:55.000 If we're getting an error from the close to equals left, should we just expect that? That's a we're using an old version or something. 01:05:55.000 --> 01:05:56.000 Okay. Cool. 01:05:56.000 --> 01:05:59.000 That's probably what you're experiencing. So yeah, that would be my guess. 01:05:59.000 --> 01:06:05.000 Sounds good. 01:06:05.000 --> 01:06:09.000 Awesome. Okay? So then we can use this to make predictions. 01:06:09.000 --> 01:06:14.000 Did I do that already? Yes, so I've done it here. 01:06:14.000 --> 01:06:23.000 So this is like the predictions. And so what we can see here is the green dotted line is sort of the fit on the training data. 01:06:23.000 --> 01:06:27.000 So you can see it's delayed because of the rolling average. 01:06:27.000 --> 01:06:30.000 And then the red line with the points is the prediction. 01:06:30.000 --> 01:06:37.000 The forecast on the test set, and so you'll think like, Oh, well, that's weird, like there are gaps here. 01:06:37.000 --> 01:06:53.000 Remember that we don't trade on the weekend, and so these gaps are the weekend so like there's no trading days in the data here remember, we are like on the horizontal axis we plotted the dates, not the time step. 01:06:53.000 --> 01:07:01.000 So like not like time. Step one time, step 2, etc. 01:07:01.000 --> 01:07:20.000 So, in addition to making models, you can use the moving average to try and get a sense of like the trend of your data or the seasonality of your data, and so by what we're seeing here is the same exact data set minus the test set and we're plotting the moving average 01:07:20.000 --> 01:07:25.000 on top of the data. And maybe it is slightly hard to see. 01:07:25.000 --> 01:07:31.000 So I apologize for that. But as you increase the window size, you can start to reveal some of the trends of the data so like here the sizes may be too small. 01:07:31.000 --> 01:07:38.000 And you're basically just fitting the data exactly but as you increase the window size, you start to get like smooth out the bumpiness of the original time series. 01:07:38.000 --> 01:07:48.000 And you start to see things like sort of this exponential increase. 01:07:48.000 --> 01:07:51.000 It seems like in the trend, at least among the time series. 01:07:51.000 --> 01:07:58.000 And so one way to sort of look for patterns in the data that may be are hard to see if you plot, it is. 01:07:58.000 --> 01:08:12.000 You can also plot moving averages with differing window sizes to see if you can get a sense of the trend or the patterns in the data that. 01:08:12.000 --> 01:08:29.000 So in addition to moving averages, you have more like general weighted average forecasts, so like a moving average has an equal weight on every observation, whereas in general they could have any weight as long as they sort of add up to one right so maybe you have reason, to believe that the most recent 01:08:29.000 --> 01:08:33.000 observation is more important than the previous 2 observations. 01:08:33.000 --> 01:08:44.000 So this is an example where it's a weighted average where I'm taking the a weight of 2 thirds to the most recent observation. 01:08:44.000 --> 01:08:47.000 And then weights of 1, 6 to the second and third. 01:08:47.000 --> 01:08:51.000 Most recent observations. So this would be a wayed average forecast. 01:08:51.000 --> 01:08:56.000 And I'm going to code this up as a function. 01:08:56.000 --> 01:09:16.000 And remind myself what I was trying to do when I wrote this. 01:09:16.000 --> 01:09:24.000 Okay. I think I remember now. Alright. So then, what I was trying to do was just the T is gonna be the position. 01:09:24.000 --> 01:09:28.000 So I'm gonna do. 01:09:28.000 --> 01:09:43.000 Return, 2 divided, 3, 2 divided, by 3, so 2 thirds times, Google train dot closing price at T. 01:09:43.000 --> 01:09:52.000 Maybe dot values at T minus one and then plus one over 6 times. 01:09:52.000 --> 01:10:04.000 Google, train, dot closing price values at T minus 2, and then plus the last 1 one over 6 times. 01:10:04.000 --> 01:10:10.000 Goo train dot closing price. 01:10:10.000 --> 01:10:15.000 Top values, t, minus, 3. Okay, so this is just taking like this very particular one. 01:10:15.000 --> 01:10:27.000 And writing it as a function. 01:10:27.000 --> 01:10:28.000 And you did dot value instead of. 01:10:28.000 --> 01:10:31.000 Oh, Jeeze, what did I do? Not? Yeah. Yup, okay. 01:10:31.000 --> 01:10:35.000 There we go. And so this is what that particular one looks like. 01:10:35.000 --> 01:10:36.000 Doesn't look very different from the moving average one. In my opinion. 01:10:36.000 --> 01:10:45.000 But this is just introducing the idea that you don't need to have equal weights on your moving average. 01:10:45.000 --> 01:10:51.000 You could have a weighted average where you play around with the different weights. 01:10:51.000 --> 01:10:59.000 So the underlying statistical model for any any of these models that we're looking at. 01:10:59.000 --> 01:11:04.000 Up to this point is known as an Ma Q model. So if we let epsilon sub t be a white noise sequence, then a moving average stochastic process of order. 01:11:04.000 --> 01:11:16.000 Q ma q is given by y sub t is equal to beta 0 epsilon t plus beta one epsilon t. 01:11:16.000 --> 01:11:23.000 Minus one plus dot dot dot plus beta q epsilon t minus one plus dot dot dot plus beta q epsilon t minus q. So this might seem like weird like, what am I talking about? 01:11:23.000 --> 01:11:32.000 This is gonna come back when we talk about what are known as a rema models and armor models. 01:11:32.000 --> 01:11:37.000 Tomorrow. Models and arma models tomorrow. So you know, keep this under your cap until tomorrow. 01:11:37.000 --> 01:11:40.000 It will come. 01:11:40.000 --> 01:11:45.000 So Lara is asking, How do you go about figuring out what weights to apply? 01:11:45.000 --> 01:11:46.000 So in general you probably aren't gonna try fiddling around with like figuring out. 01:11:46.000 --> 01:12:04.000 Weights on your own, you would do something like what we're going to learn next called smoothing where you could do something like what we're going to learn next called smoothing where you could do. Like set up a cross validation to figure out the coefficients for smoothing if that doesn't make 01:12:04.000 --> 01:12:20.000 sense we haven't covered smoothing, but if if a few minutes we will or you could fit what's known as an Maq model in which case there's sort of like an algorithm that will go on in the background what's known as an Maq model in which case they're sort of like an Algorithm that will go on in the 01:12:20.000 --> 01:12:27.000 background when you call this, we're not covering it in this notebook because we are gonna learn like the more general thing, tomorrow but you you can you fit that? 01:12:27.000 --> 01:12:31.000 And then the fitting algorithm figures out the optimal weights for you. 01:12:31.000 --> 01:12:35.000 So you don't have to try and code up like and keep track of. 01:12:35.000 --> 01:12:49.000 Okay, what's the wait on this one? The weight on that one and the weight on the third one. 01:12:49.000 --> 01:12:58.000 Okay. So the last set of models that we're going to look at in this notebook are known as exponential smoothing models so there's going to be 3. 01:12:58.000 --> 01:13:04.000 And basically the names are going to be somewhat funny because it's basically going to be regular exponential, smoothing, double, exponential smoothing models. 01:13:04.000 --> 01:13:10.000 So there's going to be 3. And basically, the names are going to be somewhat funny because it's basically going to be regular, exponential, smoothing, double exponential smoothing and triple exponential smoothing. 01:13:10.000 --> 01:13:22.000 And it'll make sense with. That means as we see the models. At least I hope it will. So we're gonna sort of bring back this hat notation that we've seen in earlier notebooks where Y sub T sub hat is going to be the prediction or the forecast at time sub 01:13:22.000 --> 01:13:27.000 T. So the first one we're looking at is simple, exponential, smoothing. 01:13:27.000 --> 01:13:28.000 So for this, you're going to say that the prediction at time, t plus one is the observed value. 01:13:28.000 --> 01:13:40.000 Times, alpha at time sub t plus one minus alpha of the times. 01:13:40.000 --> 01:13:50.000 The prediction at time. Sub t. So this is if you're in the training set, and then when you're not in the training set. 01:13:50.000 --> 01:13:56.000 You're going to do Alpha times. The last value plus one minus alpha times. 01:13:56.000 --> 01:13:59.000 The predicted value for the the last observed value. 01:13:59.000 --> 01:14:00.000 So Alpha. Here is another example. Of a hyper parameter. 01:14:00.000 --> 01:14:08.000 It's set between 0 and one you can select it by hand. 01:14:08.000 --> 01:14:16.000 Or you can find it through some kind of algorithm. So the way that we're going to do it here, there's a way we're going to implement it. 01:14:16.000 --> 01:14:31.000 There's a way called maximum likelihood estimation that can be run in in the background to find a value for Alpha, or you can do cross-validation that can be run in in the background to find a value for Alpha, or you can do cross validation. And set up a grid of alpha values and then find the one that gives you 01:14:31.000 --> 01:14:32.000 the best Msc. So there's sort of 2 ways that I think are helpful to think about. 01:14:32.000 --> 01:14:41.000 This model is the first, is sort of like an adjustment of a naive force. 01:14:41.000 --> 01:14:42.000 So if you do, a little rearranging, you can turn simple. 01:14:42.000 --> 01:14:50.000 Exponential, smoothhing into the following, so you take the estimate for the next time. 01:14:50.000 --> 01:14:55.000 Step is equal to the estimate at the current time. Step plus alpha times. 01:14:55.000 --> 01:15:06.000 Y sub t minus y hat sub t, and so one way to think about this is, we've got the current plus alpha times. 01:15:06.000 --> 01:15:09.000 The residual or sort of an error to so like. 01:15:09.000 --> 01:15:13.000 If we sort of squint this E sub t is sort of like an estimate of the epsilon from the original naive model. 01:15:13.000 --> 01:15:28.000 So this is sort of like a play on the naive model another way we might wanna think about it is, it's sort of a weighted average that we're going to sort of I keep saying sort of a lot that we're going to optimize. 01:15:28.000 --> 01:15:39.000 And so the optimization can happen either through this maximum likelihood that I talked about, or it can be through the cross validation that I mentioned, as well. 01:15:39.000 --> 01:15:48.000 So if you rewrite all of this out from you know all the way from the beginning of the training set the prediction at time. 01:15:48.000 --> 01:16:01.000 T plus one is alpha times y sub t plus one minus alpha times y sub t minus one plus one minus alpha squared times y sub t minus alpha squared times y sub t minus 2, and so forth. 01:16:01.000 --> 01:16:06.000 So you can keep doing this. This is a weighted sum that includes all prior information. 01:16:06.000 --> 01:16:11.000 So all of the previous observations are included in this sum, and then you can play around with the value of Alpha. 01:16:11.000 --> 01:16:15.000 So if it is a lower value of Alpha, you pay more attention to the most recent. 01:16:15.000 --> 01:16:29.000 Observations right? And then, if it is a higher value of alpha, or did I get that backwards? 01:16:29.000 --> 01:16:36.000 Let's see. So if Alpha is big, the current observation has a high weight, and then the next current. 01:16:36.000 --> 01:16:50.000 So I think if Alpha is so, the bigger Alpha is, the more you pay attention to the most recent, and then the smaller alpha is, the less you pay attention to the most recent okay, alright. 01:16:50.000 --> 01:16:54.000 So I see I have a question. 01:16:54.000 --> 01:16:59.000 So Jonathan's asking. I've noticed these models tend to stop after end, just reporting. 01:16:59.000 --> 01:17:19.000 The last result is that normal, or would you let them model, predict the later periods in the test window based on the predictions earlier in that window so you do not use the predictions so like these models are always good to stop at the most recent observation you don't use the predictions that were made 01:17:19.000 --> 01:17:22.000 so like, let's say again, the training sets size one. 01:17:22.000 --> 01:17:26.000 You wouldn't use the prediction of n plus one. 01:17:26.000 --> 01:17:36.000 So then predict n plus 2. So you're only going to ever with these models, only going to ever use the last observation. 01:17:36.000 --> 01:17:37.000 Like only use the training set to make predictions on test sets or hold out sets. 01:17:37.000 --> 01:17:44.000 So you're never going to use like previous predictions to then make future predictions. 01:17:44.000 --> 01:17:59.000 If that makes sense so one way that this gets adjusted for for things like trends is, they add, like a trend term like we talked about, and the next exponential smoothing is for data with a trend. 01:17:59.000 --> 01:18:13.000 And then, similarly, for data with seasonality, there's also another extension. So that's how these models will work. 01:18:13.000 --> 01:18:21.000 Are there other questions? 01:18:21.000 --> 01:18:25.000 It's good about the smoothing. 01:18:25.000 --> 01:18:39.000 So it seems like if you scroll up a little bit. 01:18:39.000 --> 01:18:40.000 Yup! 01:18:40.000 --> 01:18:44.000 The even on like that. Point that we've collected in some sense the prediction for y 2 might no longer be y 2 right is that either we're just like kind of inducting I'm like, Oh, okay, you know, why one, you're predicting one. 01:18:44.000 --> 01:18:47.000 And then based on your parameter Alpha you're adjusting how much you predict y. 01:18:47.000 --> 01:19:00.000 2, and so on. Or is it like, you have some model that gives you predictions, and you're sort of smoothing out after you've develop some other model. 01:19:00.000 --> 01:19:01.000 So all of the predictions are made using this function here. So like. 01:19:01.000 --> 01:19:15.000 Why, one, I think, it's just y one. It's just the starting point but then, like y one, I think it's just y one. It's just the starting point. But then, like y 2 would be alpha times. 01:19:15.000 --> 01:19:21.000 Y one sorry. Alpha times y one plus one minus alpha times. 01:19:21.000 --> 01:19:26.000 Y one hat, which would be the. 01:19:26.000 --> 01:19:30.000 Oh, so I think there's just. I may have forgotten to write it down. 01:19:30.000 --> 01:19:35.000 I think there's usually like an assumption of like a Y 0 that you don't observe. 01:19:35.000 --> 01:19:40.000 If that makes sense. Yeah, but like. 01:19:40.000 --> 01:19:45.000 You're not using the actual values in this particular model, whereas in the baseline models we just use the actual values. 01:19:45.000 --> 01:19:48.000 But remember, like we don't care about how good it is at fitting the actual training set. 01:19:48.000 --> 01:20:01.000 What we care about is like our ability to forecast. 01:20:01.000 --> 01:20:10.000 And then Zach's asking, Why are stocks treated as having equal length time steps when there are weekends and holidays without trading? 01:20:10.000 --> 01:20:12.000 Is this a simplification we make for illustrative purposes. 01:20:12.000 --> 01:20:21.000 Or is it a common assumption? So in this, in notebooks, I'm just making it because I need to find a time series to demonstrate the data. 01:20:21.000 --> 01:20:25.000 And this was one that I thought people might be interested in seeing. 01:20:25.000 --> 01:20:28.000 So that's why I use it. I'm not. 01:20:28.000 --> 01:20:35.000 I've never worked in sort of trying to actually predict stock markets, so I don't know if this is a common assumption because I do know, like things could happen about the company over the weekend. 01:20:35.000 --> 01:20:44.000 That maybe would impact the price. Starting on the next trading day. 01:20:44.000 --> 01:20:54.000 But I think maybe that would be, you know, in these sort of models you could maybe assume that like that's absorbed into the random noise. 01:20:54.000 --> 01:21:06.000 If that makes sense. So I'm not sure what's done in practice, because I've never worked with these types of problems like with stock markets in these notebooks, it's just in assumption made for the models. 01:21:06.000 --> 01:21:15.000 And it was just a data set, I thought, might be interesting, interesting to people. 01:21:15.000 --> 01:21:16.000 Okay, so how do we fit simple exponential smoothing in Python? 01:21:16.000 --> 01:21:23.000 So we use the stats, models, packages, package. 01:21:23.000 --> 01:21:29.000 So here's a link to their documentation. So this may or may not be installed on your computer. 01:21:29.000 --> 01:21:33.000 So to check that you have it installed. You can import stats, models. 01:21:33.000 --> 01:21:34.000 It's standard to import it's a link to their documentation. 01:21:34.000 --> 01:21:35.000 So it should check that you haven't installed. You can import stats, models. It's standard to import it. As Sm. 01:21:35.000 --> 01:21:48.000 Just like happened earlier. Somebody asked a question. Oh, this part of pandas isn't working. 01:21:48.000 --> 01:22:01.000 Is it because my version is different? If what I code up doesn't work on yours, check the version that you have, and it's probably because the versions different. If what I code up doesn't work on yours check the version that you have and it's probably because the versions are slightly different. 01:22:01.000 --> 01:22:02.000 So you know, that's usually gonna be the answer. 01:22:02.000 --> 01:22:19.000 Well, I guess the most usual and common answer will be, there's a small typo, and what you what you typed the second most common answer for a difference between your code and your code is that our versions are different so always consult the documentation for your particular version what i'm writing I think should work. 01:22:19.000 --> 01:22:26.000 But sometimes stats. Models will make changes, slight changes that change the way you fit the model. 01:22:26.000 --> 01:22:29.000 So you can find instructions for installing stats models with Pip and conductions for installing stats, models with Pip and conduct. 01:22:29.000 --> 01:22:34.000 Here, and then, I believe, after last week. My hope is that everybody has practice installing a python package. 01:22:34.000 --> 01:22:43.000 If not, you can check out the instructions on our website. 01:22:43.000 --> 01:22:48.000 I believe it's somewhere under first steps for the data science website. 01:22:48.000 --> 01:22:57.000 So we're gonna import the model directly. So there is a simple X smoothing model type and stats models. 01:22:57.000 --> 01:22:59.000 That's we can find here. So we would do. 01:22:59.000 --> 01:23:04.000 This is part of the Tsa, which is Time series, analysis. 01:23:04.000 --> 01:23:15.000 Api and Stats, models, so from stats, models, Tsa Api, we're going to import simple exp with lowercase X and P. 01:23:15.000 --> 01:23:21.000 Smoothing with a capital S. So then, what to fit this type of model? 01:23:21.000 --> 01:23:28.000 It is slightly different than the model fitting process and essay learn because it's a different package. 01:23:28.000 --> 01:23:37.000 So the first thing we're gonna do is we're going to call simple exp smoothing with a capital E. 01:23:37.000 --> 01:23:50.000 Then you're going to input the training data. So Goog, train dot closing price, then I'm going to call dot fit inside fit. 01:23:50.000 --> 01:23:59.000 You put the alpha, which in the Stats models code is called smoothing light. 01:23:59.000 --> 01:24:02.000 For illustrative purposes. I'm just gonna choose. 01:24:02.000 --> 01:24:06.000 Let's say point 5. And let's do point 7. No reason. 01:24:06.000 --> 01:24:15.000 I just decided point 7 on a win. And then the next thing we're gonna do is put optimize equals, false. 01:24:15.000 --> 01:24:23.000 So when optimized, is equal to true, it's not going to take your input for the smoothing level and use it. 01:24:23.000 --> 01:24:34.000 It will use it as an initial guess at Alpha, and then find the one that it wants to use for the final model as using the method of maximum likelihood. 01:24:34.000 --> 01:24:42.000 I believe, and so then, if we want to have a set value of Alpha, we have to set optimized equal to false. 01:24:42.000 --> 01:25:00.000 Okay. So in our particular case of predictive modeling, what we would probably do instead of using the maximum likelihood which will best fit the training data, we would want to do sort of a cross-validation to go through and try different values of alpha probably evenly spaced on a 01:25:00.000 --> 01:25:02.000 grit. So now that you have the fitted model. 01:25:02.000 --> 01:25:14.000 So we have a fitted model here when we want to get the fitted values, and maybe I'll demonstrate this as its own code chunk to get the fitted values. 01:25:14.000 --> 01:25:19.000 You would call the variable that you stored the model in, and then you can do. 01:25:19.000 --> 01:25:22.000 Dot fitted. 01:25:22.000 --> 01:25:31.000 No underscore dot fitted values. And so what this is are these? 01:25:31.000 --> 01:25:55.000 So the values for the training set, and then to get the forecast, you would do the variable that you stored the model in dot forecast, and then you just input a number that is like how far out you'd like to forecast so if I just want to forecast one day I put one if I want to forecast 2 days into 01:25:55.000 --> 01:26:02.000 the future I would put 2 I wanted to forecast 2 days into the future. I would put 2. I wanted to forecast the length of the test set, which is what I'm going to do in this version. 01:26:02.000 --> 01:26:12.000 I would do length of Google. Test. Okay? And you can notice, the forecast is exactly the same, no matter how far out in the future I go. 01:26:12.000 --> 01:26:17.000 And we can see, you know, that should be expected based on the format of the model. 01:26:17.000 --> 01:26:28.000 Okay, so here's what this forecast looks like. So the green dotted line are these fitted values and the red line with the circles is the the forecast. 01:26:28.000 --> 01:26:35.000 And this shouldn't say weighted average. It should say. 01:26:35.000 --> 01:26:43.000 Simple exponential smoothing. Okay? And then, as I said, we could use cross-validation or availability set to find the best alpha. 01:26:43.000 --> 01:27:05.000 So I'm gonna end it there for today. After asking answering any questions tomorrow we'll pick up with double exponential smoothing and triple exponential smoothing and triple exponential smoothing and then finish out as far as we can in the timeline you may notice that I 01:27:05.000 --> 01:27:06.000 have sort of a next steps, and then 2 additional notebooks. 01:27:06.000 --> 01:27:14.000 After that so notebooks 9 and 10, we're not gonna cover in live lecture. 01:27:14.000 --> 01:27:17.000 So notebook 9 builds on some stuff that we'll learn in classification. 01:27:17.000 --> 01:27:25.000 And then notebook 10 is sort of a one-off thing for people interested in the profit model which is popular in industry. 01:27:25.000 --> 01:27:26.000 Yeah. But on that note let's answer some questions and then sign off for today. 01:27:26.000 --> 01:27:30.000 So Jonathan's asking, Will this class cover panel data models? 01:27:30.000 --> 01:27:39.000 So no, and I don't know what you mean. 01:27:39.000 --> 01:27:42.000 I don't know what you mean. I don't know what panel data refers to. 01:27:42.000 --> 01:27:52.000 I've never heard the term before, so no, we're not going to cover that unless maybe it does mean something that we are going to cover and I just never heard the term before. 01:27:52.000 --> 01:27:55.000 But I don't think we'll be covering panel data. 01:27:55.000 --> 01:28:03.000 Are there any questions about any other questions that people want to ask before I stop recording? 01:28:03.000 --> 01:28:18.000 Sorry. I think you might have answered this earlier, but so, if you have other features, you know, in addition to just the time it like is that something that's gonna be covered. 01:28:18.000 --> 01:28:22.000 I think you mentioned it earlier with Erica's question, but. 01:28:22.000 --> 01:28:23.000 Yeah, so we're not gonna cover those particular models. 01:28:23.000 --> 01:28:27.000 I believe in the practice problems. There is a model that cover like a way to do. 01:28:27.000 --> 01:28:33.000 I believe in the practice problems for Time Series I do cover how to fit such a model. 01:28:33.000 --> 01:28:57.000 It is in the Stats models, documentation. You can see it there as well. There are models that do that. We're just not going to cover it in the live lectures, and then I think, and the notebook number 9 in here is a model that can accommodate additional features as well. 01:28:57.000 --> 01:28:58.000 Yeah. 01:28:58.000 --> 01:29:03.000 Okay. Thank you. 01:29:03.000 --> 01:29:07.000 Any other questions? 01:29:07.000 --> 01:29:08.000 Yeah. 01:29:08.000 --> 01:29:14.000 Yes, I had a question. So in these examples, you've got you're basically using a 14 day horizon. 01:29:14.000 --> 01:29:15.000 Yup! 01:29:15.000 --> 01:29:18.000 And but you're multiplying, you know you're doing. 01:29:18.000 --> 01:29:21.000 Np. Dot ones, and your kind of just taking the first value and multiplying it and applying it to all of the 14 values. 01:29:21.000 --> 01:29:34.000 So. Is there ever a time when you can actually have different values in those 14 year horizons? 01:29:34.000 --> 01:29:47.000 Yeah. So the only time what you're gonna see different values across the forecast are gonna be models and these notebooks are gonna be models with trend components. 01:29:47.000 --> 01:30:05.000 So the 2 trend models in the baseline and the 2, and the seasonal models like they have different values for each part of the forecast the 2 trend models in the baseline and the 2 and the seasonal models like they have different values for each part of the forecast this next 01:30:05.000 --> 01:30:17.000 notebook when we do double, exponential and triple exponential, they allow for different values for each step of the forecast, and then we are going to learn a model called their rema, which will have different values, depending on the options. 01:30:17.000 --> 01:30:21.000 You set for the but the ones that we've set so far are fixed like they, you know, because we can only use the training data to make the predictions. 01:30:21.000 --> 01:30:33.000 The forecast. They're fixed with using the last. However, many observations. 01:30:33.000 --> 01:30:34.000 Yeah. Thanks. 01:30:34.000 --> 01:30:36.000 Yup! 01:30:36.000 --> 01:30:38.000 Okay, so, I'm going to go ahead and stop recording. 01:30:38.000 --> 01:30:40.000 But I will stick around for any extra questions if you have to go. 01:30:40.000 --> 01:30:43.000 That's fine, and I'll see you tomorrow. 01:30:43.000 --> 01:30:51.000 But I'll see