Is Facebook’s “Prophet” the Time-Series Messiah, or Just a Very Naughty Boy?

A debate rages on page one of Hacker News about the merits of the world’s most downloaded time-series library. Facebook’s Prophet package aims to provide a simple, automated approach to the prediction of a large number of different time series. The package employs an easily interpreted, three-component additive model whose Bayesian posterior is sampled using STAN. In contrast to some other approaches, the user of Prophet might hope for good performance without tweaking a lot of parameters. Instead, hyper-parameters control how likely those parameters are a priori, and the Bayesian sampling tries to sort things out when data arrives.

The funny thing is though, that if you poke around a little you’ll quickly come to the conclusion that few people who have taken the trouble to assess Prophet’s accuracy are gushing about its performance. The article by Hideaki Hayashi is somewhat typical, insofar as it tries to say nice things but struggles. Yahashi notes that out-of-the-box, “Prophet is showing a reasonable seasonal trend unlike auto.arima, even though the absolute values are kind of off from the actual 2007 data.” However, in the same breath, the author observes that telling ARIMA to include a yearly cycle turns the tables. With that hint, ARIMA easily beats prophet in accuracy — at least on the one example he looked at.

I began writing this post because I was working on integrating Prophet into a Python package I call time machines, which is my attempt to remove some ceremony from the use of forecasting packages. These power some bots that the prediction network (explained at www.microprediction.com if you are interested). How could I not include the most popular time series package?

  • We call m.fit(df) after each and every data point arrives, where m is a previously instantiated Prophet model. There is no alternative, as there is no notion of “advancing” a Prophet model without refit.
  • We make a “future dataframe” called forecast say, that has k extra rows, holding the times when we want predictions to be made and also known-in-advance exogenous variables.
  • We call m.predict(forecast) to populate the term structure of predictions and confidence intervals.
  • We call m.plot(forecast) and voila!

Perhaps we start by looking at some of the more bold Prophet predictions.

Now, having shown you in-sample data, let’s look at some examples with the truth revealed. You’ll see that some of those wagers made by Prophet do pay out. For example, here’s Prophet predicting the daily cycle of activity in bike sharing stations close to New York City hospitals. It does a nice job of anticipating the dropoff, don’t you think?

  1. Construct an upper bound by adding m standard deviations to the highest data point, plus a constant. Similarly for a lower bound.
  2. If Prophet’s prediction is outside these bounds, use an average of the last three data points instead.

I have begun a more systematic assessment of Prophet, as well as tweaks to the same. As with this post, I’m using a number of different real world time series and analyzing different forecast horizons. The Elo ratings seem to be indicative of Prophet’s poor performance — though I’ll give them more time to bake. However, unless things change my conclusions are:

  • In keeping with some of the cited work, I find that Prophet is beaten by exponential moving averages at every horizon thus far (ranging from 1 step ahead to 34 steps ahead when trained on 400 historical data points). More worrying, the moving average models don’t calibrate. I simply hard wired two choices of parameter.

Source Prolead brokers usa