Facebook Prophet Model Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. 11:44:37 Hi! Everybody! Welcome back in this video. We're going to learn about the Facebook profit model. 11:44:42 So let's go ahead and move on over to that. 11:44:47 Jupiter notebook. Okay, so the Facebook product. 11:44:53 A forecasting model. 11:44:57 By Facebook, originally published in 2018. Here's a link to the paper which is in the repository while it has become. 11:45:03 Well! 11:45:07 So a very prominent Zillow news story couple of years ago, maybe 2022, 2021. 11:45:09 It is still pretty commonly used for cast in industry settings, so I thought it was important that you haven't a chance to be exposed to the model. 11:45:17 Get the ideas of how to use it with python, and then also learn about some of the shortcomings. 11:45:23 So the Facebook profit model takes the form, the following additive form, so we have our output that we're trying to predict Y of T, and it takes this additive form of G of T plus s of T plus h of T, plus an error epsilon t so G, of T is a function 11:45:42 that looks to capture the trend component of the time series, S of T is a function that looks to capture the seasonal component or components of the time series, and we'll touch on that later. 11:45:54 And Hftt is a function of T that looks to capture any holiday effects. 11:45:59 And so we haven't talked about holiday effects, but sometimes a time series can be impacted by holidays, whether it be actual holidays like Christmas or Ramadan, or it be holidays in a sense of an event that is going on so, for instance, or it be holidays in a sense, of an event, that is going on so, for instance, the super bowl or other 11:46:18 sporting events or concerts that are very large can have impacts on your time series as well. 11:46:23 So these things are usually known. So this is a contrast to like a global pandemic happening which isn't known ahead of time. 11:46:29 But these holidays are events in time that are known, that you can program into your model. 11:46:35 So that way it can make adjustments based on the fact that it knows that there's a holiday. 11:46:40 So this formulation of a model is very similar to a class of models known as generalized additive models, or, as their sometimes called here's a link to the Wikipedia entry on those. 11:46:53 If you're interested in learning a little bit more, we're not going to dive into the details of how this particular model is fit, but we can always, you know, you can always go to this article that is in the repository to read more about how they're fitting it. 11:47:05 So we're gonna just give you a quick rundown on the different aspects of the 3 different functions. 11:47:10 So you get a sense of what's going on with them. 11:47:13 Then we're going to show you how to fit it with the profit package and python, and then we're going to tell you a little bit about the pitfalls. 11:47:22 So the G of T is the part that allows for the fitting of a trend. 11:47:28 This allows for 2 types of trend function the first is a saturating growth. 11:47:33 So they're thinking of this in the sense of Facebook, maybe trying to model the number of users that are using it. 11:47:39 There is typically a limit on the number of users. The hard upper, limited number of people on the planet. 11:47:45 But things like people who have access to the Internet as well. 11:47:50 So this is a classic growth function that is a saturating growth function where C is sort of giving the limits of the saturate of the saturated value. 11:47:58 And then you have a one plus e to the negative k, t, minus m, the other one that you can use is a linear trend, which is just the line of Kt, plus M. 11:48:10 So the main modification that the Facebook profit model makes is it allows these functions to take in what are known as change points. 11:48:19 So change. Points are points in time, at which you expect the growth rate might change. 11:48:23 So, instead of having a very constant linear trend, or a very constant growth rate, it is possible that outside factors, or, for instance, in Facebook, maybe it's introducing a new product or it knows of an entire region of the planet getting wide access to the website, maybe it was banned in a certain country. 11:48:41 But now that band is being lifted, so this would indicate that both your in the case of the saturation, saturated growth, the C and the rate itself might be changing, and then in linear growth. 11:48:55 Maybe the rate is changing. So these change points allow for different rates of growth. 11:49:00 In the case of a linear what you're essentially saying is now, instead of a linear function, you have a what's known as a piecewise linear function. 11:49:07 So at different points in time, the rate and the Co. And and the intercept will be changing. 11:49:13 So you can provide what these chain points are to the model by hand, or you can allow the fitting algorithm to try and detect when these are on its own. 11:49:25 Hopefully. You'll see at least get pointed to the documentation. 11:49:28 But hopefully, I believe we at least tell you what argument to use when fitting the model or putting a defining model to define those change points. 11:49:38 The S of T is the seasonal aspect of the model so these are fit with foyer series that look like this. 11:49:47 You have control over N. As well as P. So P. Is the length of the period. 11:49:54 So these are for the profit model, measured in days if your period takes place over the course of a week, your P. 11:50:01 Would be 7. It takes place over the course of the year. The P. 11:50:04 Would be 365.2 5, and you can choose in with a heper parameter tuning process, or you can let the profit model choose, and for you the profit model also allows for multi-seasonality. 11:50:19 So this is where earlier, I said, for fitting the seasonals like with a parentheses. 11:50:24 S. So a lot of time series in the real world do exhibit. 11:50:28 Not just a single season, but multiple types of season. So this could mean that over the course. 11:50:34 If you have a hourly time series. So basically like, maybe you have something like the volume of calls to a call center on every hour for every day. 11:50:45 For the past, however, many years. So you may see seasonality of like. 11:50:50 Maybe you have a very common daily pattern that seems to repeat on top of that, maybe you have a weekly pattern that seems to repeat, and then, on top of that, maybe you also have a yearly pattern that seems to repeat. 11:51:01 So that is multi-seasonality, so the profit model is built so that out of the box it has built-in daily, weekly, and yearly seasonality that you can control with different inputs. 11:51:14 Holidays are modeled just by using a series of indicator functions. 11:51:18 So for any time or date time that you give it will then check whether or not that date time occurs on a holiday, so it has pre programmed holidays, but you can also add new ones if you'd like. 11:51:31 So it will check like, is it on this holiday, or there's also the ability to put in a range of dates that, like sometimes people will change their behavior. 11:51:39 Leading up to a holiday and following a holiday and so that might be reflected in the time series so that's the basic rundown of how the model works in theory. 11:51:50 So how do we do it in practice? So we're going to be fitting the profit model in python, using the profit package. 11:51:57 So it's very likely that you do not already have this package installed. 11:52:00 So if you need to pause the video, try and install it, following their installation instructions, as well as the python, Installation Guide at the Erdos Institute website, and then see if he can figure it out. 11:52:14 Once you think you've got it figured out. Try and run this code, Chunk. 11:52:18 It should run. So I had version one when I was running this notebook. 11:52:22 You may have a different version. So we're going to use the profit model to try and predict our old standby. 11:52:29 The flu data set. So there's one thing that you have to change. 11:52:34 So in the past, when we had time series, we would just have time steps, and then, whatever the name of the call was, was fine with the profit. 11:52:42 Model. You'd need to have very specific names so you need a Ds column which has the date times of the time steps, and you need your column with what you're trying to predict. 11:52:53 As the Y column. So we're going to have to do some pre-processing, because if we remember. 11:53:02 This is what our fleet Isseries looks like. We have a date a year, a week in a cases. 11:53:08 So we need date to be renamed as Ds. And we need cases to be renamed as Y, and we need to get rid of year and week. 11:53:16 So we're gonna do flu train at date cases and then rename columns is equal to. 11:53:28 So! 11:53:30 We'll double check if this is the correct order. So Ds is date. 11:53:35 Actually, I think it is the other way around. So date is Ds, and then. 11:53:43 Why cases will be renamed to y dot copy, and then, before we do it for tests, let's just check that. 11:53:53 We did it right. Blue train dot head. 11:53:58 Okay, great. So now we're going to do the same for the test set. 11:54:01 So control C paste, and then just rename this to test. 11:54:11 So I'll have to go back. Here we go, so let's see how to fit the out of the box. 11:54:17 My, and then we'll talk a little bit about arguments that you can give to make it a slightly, not out of the box module. 11:54:26 So we're going to do from lowercase profit. 11:54:30 We're going to import capital P properties. 11:54:34 And then it's a lot like sk learn. So you make your model object. 11:54:39 And because they're doing the default, it's just profit. 11:54:43 And then you do dot fit. 11:54:49 A slight difference. We don't do X comma with the training set. 11:54:51 Okay. So you'll see these 2 things. So these are. 11:54:59 Just warnings. The first is an info. So it detected that it didn't have data to do daily seasonality. 11:55:04 So it just turn that off. And then here, this is a warning that there's they're doing some with pandas in the background. 11:55:12 That is. 11:55:21 Compatible with the version of Pandas that I have. So that's what this is. 11:55:22 So now we have this is also just telling you that the forecast to the model has been fit, so we can get the forecast with M. 11:55:26 Dot predict. Notice that we don't have to put in any. 11:55:32 Here. So this is going to give us the fit on the tree. 11:55:33 And then we can look at what that fit produced is a time series data frame with the Ds column and then it gives you the trend. 11:55:42 It gives you the lower and what it says. The lower and upper bounds on the forecast are. 11:55:50 You got trendlower trend upper. So this is the lower and upper bounds, and we don't cover what? 11:55:56 How those are gotten again. Check out the paper. 11:56:00 They talk about how you receive those bounds. These are the weekly part of the seasonality, the yearly part of the seasonality. 11:56:09 And then, finally, the forecast that is made so we could use this data frame to plot the forecast by hand. 11:56:19 Okay. So here the blue is the trainee data, while the dotted line, the red dotted line is the profit, and you can see it seems to get the timing quite quite correct, but it sort of misses out on. 11:56:33 Capturing the accurately capturing the size of the flu season. 11:56:40 Another thing. That's nice is, it? Has built in plotting functions. 11:56:44 So you can do M. Dot plot, and then put in your forecast, and then it will plot it for you. 11:56:52 So these black, dotted black dots are the actual time series. 11:56:58 The blue solid line is the forecast, and then these blue shaded regions give you the lower and upper bounds of the forecast. 11:57:08 Okay. Another nice features of this is, it? Has this M. Dot plot components? 11:57:16 Again you put in your forecast, and then what this does, it will plot the different components. 11:57:20 So you've got your trend. That h of T. 11:57:24 Then you've got your weekly seasonality, and you can see it's plotted by day of the week, and then you've got your yearly seasonality, which I think if we look at this and think about the flu season, this seems about right so you can then also make a fourcast into the future. 11:57:42 By providing predict with a Ds column of dates. 11:57:46 So for us we would do flu tests and notice that I have to use 2 square brackets because it needs to get a Pandas series or pandas data frame. 11:57:56 So a single bracket will give you like an array or something. 11:57:58 It needs it as a data frame. Okay, so here is the prediction, the forecast on our test set. 11:58:05 Okay, okay, so in general, when you're doing a profit type model for a forecast, you typically want to try your best to cater it to your time series by hand using the knowledge that you have you don't want to just follow along with what profit is saying is best so a couple of things we can 11:58:26 do is we can turn off daily and then we also know there's no weekly seasonality with this data set. 11:58:31 We? So we can turn that off as well. So to do that you just set the daily underscore seasonality argument to false and the weekly underscore seasonality. 11:58:40 Are to false. We also know that this data set doesn't really have any trends. 11:58:46 Certainly not one where there are change points, so we can turn off the changes points by setting N underscore change points. 11:58:53 Equal to 0. So here we do that, and then we refit it, and here's what the forecast looks like. 11:59:01 Oh, you know what I guess I need to include. 11:59:07 Forecast is now equal to M. Dot predict. So I forgot that step. 11:59:16 So here's what the new forecast looks like. It's not very different from the original 4 counts, but that's what this looks like. 11:59:23 So again, what else you can do is you can include arguments like change points which I believe allow you to give, or a series of change points, change points, range and holidays, which I believe, allow you to give an array of holidays, perhaps again, I'm not like an expert all the different little details. 11:59:44 Of the documentation. But here is a link to the documentation where you can try and learn more on the profit package. 11:59:52 So before wrapping up this notebook, I wanna give some information about the pitfalls. 11:59:57 So the authors claim that this model works best for time series that have strong seasonal effects, and then several seasons of historical data. 12:00:06 So this is a large class of time series data, but not every time series falls under this umbrella. 12:00:13 So it's really important to keep in mind, you know. 12:00:14 Does this model work best for the type of Time series. I'm working with. 12:00:19 If it's time series without seasonality, then you might not wanna use your profit model. 12:00:24 And if it's us a time series that don't have a ton of observations, you may not want to use the profit model. 12:00:30 So this profit model was associated with a really big forecasting failure for the website. 12:00:35 Zillow, which you can read a little bit about here to be fair to the data scientists that worked. 12:00:40 It's all at the time. They the problem they were working on had very little room for error, and so I personally think a lot of the models you might use would be subject to the same sort of failures as the profit model. 12:00:52 But it just highlights that you really need to understand the limitation of your model and have that in mind when you're working. 12:01:00 And another thing is a really big difference between profit and then other forecasting models. 12:01:05 We've talked about in the Time Series section of our notes is basically what profit is doing is curve fitting, which is different than building sort of some sort of structural dependence upon previous observations. 12:01:19 So it's sort of ignoring the dependent structure and just saying I'm gonna fit a curve to these points. 12:01:25 As closely as I can, whereas, other models we've learned is really inherently other models that we've learned are really inherently, you know, using the statistical structure of the dependence across time. 12:01:38 So that's sort of a different approach. But that means it's not going to handle things like deviations from the curve. 12:01:46 It's fit very well, but you know a nice thought of this is you could use the profit model as sort of a check on whether or not your time series is behaving well. 12:01:56 Meaning? Is it deviating a lot from what it's done in the past, so that is a nice feature of the profit model which you can find. 12:02:03 Elaborated upon a little bit at this blog post. 12:02:07 The post that I've linked to so hey, it's just a another thought. 12:02:12 So it's used a lot in industry. It's good to have a knowledge of it, but it's also good to know that it's not. 12:02:16 I think it's maybe not the best name for a model. 12:02:19 It is not a profit that will be able to answer every question you want it's just a tool in your tool chest that you might want to use sometimes you might not want to use other times. 12:02:29 Okay, so that's it for this notebook. I hope you enjoyed learning about the Facebook profit model and I hope to see you next time.