The Anvil

Subscribe
Archives
June 28, 2021

How good will your forecast be?

Hi All,

It’s been a long time - I’ve been busy with client work and other non-Forecast Forge projects. This email contains two things:

  1. A case study on forecasting for PR and PPC with BJ Enoch
  2. How to estimate how good your forecast is going to be; this is very important when managing expectations for your bosses and clients.

PR and PPC forecasting with BJ Enoch

You can watch a video of my conversation with BJ at https://www.youtube.com/watch?v=fNKUqX4BSqs

We are within 1.5% of the forecasted values for the month so far. And that gives us a high degree of confidence to use Forecast Forge as our primary method of evaluating spend and conversions and revenue.
— BJ Enoch

BJ Enoch is the director of digital marketing at The CE Shop which provides career education for realtors. He has also used Forecast Forge in his previous role as a Director of SEO and a Digital Director at an agency.

He has been a Forecast Forge customer for a little under six months now and has been using it for a variety of things; from forecasting PR metrics to PPC and everything in between!

We knew that we would have an article featuring us going out in ESPN on [e.g.] Tuesday and we had enough historical data that we were able to forecast how much traffic and how many leads we would get from it. And that would tell us how many staff we'd need to handle the demand.

BJ’s first use for Forecast Forge was for a sports marketing group. He was able to forecast the traffic and lead flow that would be generated from their PR activity.

The client had a long history of getting themselves mentioned in places like ESPN so BJ was able to use this historical data to predict the impact upcoming articles would have. This was important to the client because it enabled them to better prepare staffing levels to handle the increased demand.

To do this for yourself you will need a database of historical press mentions for your client or business so that you can match the dates up with business data like traffic, conversions or revenue. Not all mentions are created equal so some kind of estimate of the “impact” of a press piece (e.g. reach, audience etc.) will help to make your forecast better.

BJ has also used Forecast Forge to predict the return from PPC campaigns. He started off with a simple forecasting model using the date and amount of media spend to predict the conversion value. The results were not great.

But one of the advantages of using Forecast Forge is that if the forecast looks wrong or seems to be missing something then you can dive right in and start making improvements. BJ started to add more regressors to the forecast and things improved. Things improved a lot

We had to get a little creative with how we come up with those regressor values. Now our forecasted values are within 1.5% of the actuals

The regressors BJ used were, for the most part, very specific to the business sector he was forecasting for. One of the regressors he was using that has more general applications was to help the forecast with all the events that usually happen annually except for 2020 because of Covid.

I have written previously about forecasting the effect of lockdowns but BJ’s situation was significantly more challenging because different US states have had different lockdown rules at different times.

To solve this problem BJ collated lockdown dates from multiple sources so that he could use a regressor column for lockdown rules in each state. This is a great example of the kind of data work that will add value for years to come and that, once it has been done, can be used in many different places.

How good is my forecast?

One of the challenging things of using a machine learning system like Forecast Forge is learning how much you can trust the results. Obviously the system isn’t 100% infallible so it is important to have an idea of where it can go wrong and how badly it can go wrong when you are talking through a forecast with your boss or clients.

One ways that people familiarise themselves with Forecast Forge is to run a few backtests. A backtest is where you make a forecast for something that has already happened and then you can compare the forecast against the actual values.

For example you might make a forecast for the first six months of 2021 using data from 2020 and earlier. Then you can compare what the forecast said would happen against the real data.

         Use data from this period      To predict here
  ________________________________________~~~~~~~~~~~~~
  |------------|------------|------------|------------|--------????|?????????
2016         2017         2018         2019         2020       Then use the same
                                                               methodology here

Sometimes when you do this you will see the forecast do something dumb and you will think “stupid forecast. That obviously can’t be right because we launched the new range then” or something like that - the reason will probably be very specific to your industry or website. If you can convert your reasoning here into a regressor column then you can use this to make the forecast better.

Backtesting several different forecasting methods with different transforms and regressors

Once you start doing this then there is a temptation to start adding and adjusting regressor columns to make your backtesting fit as good as possible.

Figuring these things out and seeing your error metrics improve is one of the most fun things about data science (and have no doubt that, even if you’re doing it in a spreadsheet, this is data science) but it also opens the door to the bad practice of overfitting. If you overfit then the forecast will perform a lot better on a backtest than it will in the future.

An extreme example of this is if you were trying to make a revenue forecast and, for your backtest, used the actual revenue as a regressor column. In this case Forecast Forge would learn to use that column to make an extremely good prediction! But when it came to making a real forecast you’ll have to use a different method to fill in the revenue values for the regressor columns and then either your real forecast will be rubbish or you’ll have figured out a way to forecast revenue without using Forecast Forge - good for you!

There are a few different techniques you can use to help avoid this trap and to estimate how good your forecast will be in the real world outside of a backtest.

1. Separate test and validation

The idea here is that you test out different regressors and transforms to see which performs well on your test data. Then, once you have picked the best, you run a final test on validation data to decide whether or not to proceed with using the forecast.

For example, if you want to make a 6 month forecast you could use data up until July 2020 to make a forecast through to December 2020. July to December 2020 is your test data to where you can play around with different regressors and transforms to see what gives you the best result. And then use the same method with data through to December 2020 to make a forecast for January 2021 to June 2021. This is your validation data where you check that your forecasting method gives good enough results on out of sample data and gives a “best guess” estimate of how well the forecast will perform on unseen data. For this reason you must not peek at the validation data before the final test. If you do, then it isn’t really validation but just a poorly used form of test data.

This is a good method but it does have two main flaws:

  1. For longer forecasts you end up needing a lot of training data. For example, if you want to make a 12 month forecast then you’d use June 2020 to June 2021 as your validation data, June 2019 to June 2020 as your test data and then you’d still need at least a couple of years training data before that for you to use when you’re trying to figure out the best model. So, in this example, you’d need good quality data going back to at least June 2017 and June 2016 would be even better. This is a long time to have a consistent, non-broken analytics implementation; according to surveys done by Dipesh Shah, less than 20% of businesses have this data (see a case study on how he uses Forecast Forge with one of his ecommerce clients).
  2. At the moment people care a lot about how well their forecasts model things like the effect of the covid-19 pandemic. But it is likely that all or most of the pandemic will occur only within your testing and validation periods so the model will perform very poorly during testing because there is no pandemic data during the training period. There is no way around this because the validation period has to be after the test period and the test period has to come after the training data.

2. Make models for similar things at the same time

If you have to make forecasts for a variety of similar things (e.g. similar brands in one market or the same brand in different countries) then you can use some of them for testing and improving your methodology and some of them for validating how well you expect it to perform on unseen data.

For example, if you are forecasting for five similar brands (e.g. they are different faces for the same conglomerate) you can find the forecasting method that works best with three of them and then save the final two for validation - checking how well your method is likely to perform on unseen data.

This method only works well if the things you are forecasting are similar enough that the same method will work well for all of them. For example if you try this with brands that are completely different then the performance of a forecast for brand A will not tell you very much about how it will perform for brand C.

Whether things are “similar enough” for this technique to work is quite a tough question to answer. In may ways the typical use case for Forecast Forge makes this even harder; if you were trying to forecast 1000 metrics then you’d code the model in Python or R and it would be obvious that you wouldn’t have time to customise the model so you’d just use whichever method performed best on average. But part of the point of Forecast Forge is to make it easier for people to customise their forecasts and Forecast Forge users tend to be making fewer than ten forecasts at once so the temptation to customise and overfit forecasts is a lot harder to avoid.

Both of these methods and techniques rely on the patterns and features learned by the Forecast Forge algorithm in the past also continuing into the future. As we’ve all seen in 2020 sometimes crazy things happen and then your forecast will be rubbish unless you knew about the craziness in advance.

Don't miss what's next. Subscribe to The Anvil:
Powered by Buttondown, the easiest way to start and grow your newsletter.