Stationarity and Autocorrelation Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.


Hi, everybody. Welcome back. In this video, we're going to talk about stationarity and auto correlation. So we're going to talk about this and sort of take like a brief aside from building forecasts and models because these are important concepts for the next forecast type that we're going to learn. So in this notebook will define the concepts of both strict and weak stationarity and actually weak stationary, we're just going to call stationarity.We'll learn about autocorrelation, we'll plot autocorrelation at different legs using stats models, and then we'll learn at the end about differencing, which will play an important role in the next forecast type as well. So strict stationarity, or stationarity in general, is a statistical property of a time series that you may need to have in order for a forecasting method to work.And so we say that as time series Y sub T is strictly stationary if the joint probability distribution of Y sub T1Y sub T2Y sub TN is equal to that of Y sub T1 plus Tau, Y sub T2 plus Tau and Y sub TN plus Tau. And importantly this has to be for any the values of T1 through TN. So that doesn't mean that they need to be sequential.I actually think they do need to be sequential sequential, but it doesn't mean that they have to be like T1 equal to 1, T 2 equal to two. It's just time one followed by time 2, and so forth for any value of Tau and for any value of N. And so importantly, this last part implies that are, well, one. The entire thing implies that your joint distribution can only defend upon the intervals between T1 and T all the T's.In particular, if N is one, this means that the expected value of any random variable has to be equal to new, independent of time. They all have to have the same variance, both of them not being functions of T. And then if N is equal to two. What this is saying is that the joint distribution of these two random variables only depends upon the difference in time between T2 and T1. So this is known as the lag between T1 and T2.And so if you have a a lag from time, let's say for instance the lag from one to two is 1, the lag from one to three is 2, and so forth. So this notion of strict stationarity is typically usually just a theoretical notion in practice, because it is a very restrictive quality to have for a time series. So often times will define a weaker sense.Which in general is known as weak stationarity. But for us we're just going to call it stationarity because it's the one that we're going to talk about when the one that we're going to mean when we say the word stationarity. So we say that a time series is stationary if the expected value of the and any random variable YT in your time series is equal to mu and the covariance between Y sub T&Y sub T plus Tau is only a function.Given by the lag Tau. OK, so these are the two things that you need in order for a time series to be stationary. This is not always easy to verify, for you know this covariance. It's kind of hard to show that it's just a function of Tau. But we'll see some ways that in practice you can maybe get a sense for whether or not something stationary enough for you to use a method.So some examples of stationary time series. So this is again weak. Stationary is that white noise series is stationary. To go through, we assumed that the mean is 0 and that the code, the covariance would also be 0, right? Because we're assuming that these are identical and independently distributed first differences of a random walk. So this is the first difference is.Taking the time the observation at T + 1 and subtracting off the observation at T and again at. This is assuming the random walk model that T + 1 is Y sub T plus some random noise. A moving average process which we talked about in the last notebook is also stationary. So in this next forecasting the the notebook, we're going to do after this one in the time series series of notebooks.It is important for your time series data to be stationary. So how can we determine that? One way is a visual guide so we can plot what is known as the autocorrelation. So autocorrelation of a time series is essentially if you know correlation when we are doing regression, correlation for time series. That's what autocorrelation is, auto meaning self.So autocorrelation is your correlation with yourself, and importantly this is a function of the lag. So you're going to look at the correlation of the time series with its lagged self. So for different lags K, the autocorrelation at lag K is given R sub K and then you do the sum of the current value of the time or just the regular time series values.Minus the average times the time series value K points in the future, minus the average of the time series and then divided by the variance of the time series. So here N is going to be the last observation that you have in your in your set. And then over line is just the arithmetic mean. So this is a function of the lag.Which means that we can plot it as we change the lag. So on the vertical axis we'll plot the autocorrelation and then on the horizontal axis we'll plot the lag. When we do this, this is called the coralogram coralogram. I always have a hard time saying this word.This is one way to visually probe if our data series clearly violates stationarity. So this isn't a proof that your data series is stationary, but you can kind of look at your choreogram to see if it is violating stationarity very clearly. So here is an example with white noise and what do I mean by an example where we're going to use stats models, plot_ACF. So ACF stands for autocorrelation function.We're going to use stats models, autocorrelation function, plot autocorrelation function to make some plots and then gives you a sense of what you should be looking for and as examples of things that violate stationarity. So here is an example that we know from above is stationary, so I'm just going to make a.