Verifying forecasts 2

As I said last week, verifying predictions is difficult, and was prompted in to looking again at the matter by someone doing it wrong.  Of course the standard of 'wrongness' involved is mine.  Forecast verification is something of an art as well as mathematics and science.  But some points I think I'll get little argument from Allan Murphy* and his intellectual colleagues and descendants for are:
  • You have to be clear what you're forecasting
    • what variable
    • at what time (or time span)
    • for what place or area
  • You have to be clear how the forecast is going to be evaluated
  • You should evaluate all forecasts
  • Forecast must be public
  • Forecasts must be verifiable
That last might seem a little strange.  I hope not.  Suppose I said next July 20th at 3:34 PM at Washington National Airport the official temperature would be hot.  Very specific about what I'm forecasting and what it will be evaluated against.  But what is 'hot'?  To me, anything over 80 F (27 C).  As such, it's a near certainty that my forecast will be correct.  It's also awfully easy for me, on July 21st, to say, regardless of the temperature, that it was 'hot'.  This is one reason that we prefer numbers in science.  You can, and we do, work with qualitative predictions.  But it takes more work, as you have to find some way of making 'hot' objective, so that we can all agree that such a forecast was correct or not.

In general, if not as universal, we add a couple more items, at least desirable if not mandatory:

  • Forecasts should specify their degree/nature of confidence
  • It's a good idea to compare the quality of your prediction against an null forecast method (not a personal comment, means any method that doesn't know any of the science -- like straight line regression, or persistence; also goes by the principle of 'check how wrong you could be', which I'll illustrate later this week).
  • A trivial matter (except that it comes up in Watts' Nov 23 2010 response to greenman3610) is that all predictions depend on what really happens.  Of course if it's colder, there'll be more ice, and if it's warmer there'll be less.  That's what you're supposed to be predicting!
Now, what prompted this was a video by greenman3610 video and the response from Watts up With That.  greenman observed a very bad forecast coming from WUWT, Watts said it was really rather good.  Figuring out good vs. bad isn't really a scientific question, and those aren't really the words used by either, so be fair to both.  Those are my words, but I think capture fairly the sense of their respective comments.

This is all regarding sea ice.  You can check my original comments from June on my May estimates -- that they were for September average, Arctic, sea ice extent, as measured by NSIDC.  Further, at least in what we submitted to the sea ice outlook, we mentioned what the standard errors in the predictions were.  Don't want it said that I have higher standards for others than I live by myself.

So what was Goddard's prediction?  That turns out to be hard to track down. Tamino and Neven have also looked in to the matter, Neven getting back to February (from his June check).  My selection:

1) On June 6th it is that "Conclusion : Based on current ice thickness, we should expect September extent/area to come in near the top of the JAXA rankings (near 2003 and 2006.) However, unusual weather conditions like those from the summer of 2007 could dramatically change this. There is no guarantee, because weather is very variable."
-- this does tell us what the verification data source is supposed to be, but not whether it is monthly average or daily minimum.  Fairly clearly it is September.  September's minimum and average for 2003, from JAXA, were (6.03, and 6.13) million square km.  September 2006 showed (5.78, 5.91).  The nearest to both would be their average, giving his June 6, 2010 forecast(s) as 5.905, 6.02 million square km for minimum day and monthly average, respectively.

It is not until comments at his personal, separate from WUWT, blog in September that it becomes clear to me that Goddard means the minimum day, not the monthly average.  JAXA's minimum September 2010 day is 4.81 million square km.  So Goddard's June 6 forecast is off by over 1 million km^2.  He gave no sense of variability at this point, but I'll observe my own prior estimate of 0.5 million km^2 for natural variability.  So 2 standard deviations errors.  (Aside: that others were off by as much or more does not affect our evaluation of Goddard's predictions.  n.b., I was not one of those others.)

2) On June 14th, the forecast has changed to "Conclusion : 2010 minimum extent is on track to come in just below 2006. With the cold temperatures the Arctic is experiencing, the likelihood of a big melt is diminishing."
Ok, what does 'just below' mean?  About the same as my 'hot', perhaps.  2006's minimum day at JAXA was 5.78 million km^2, September average of 5.91.  Observed 2010 was (4.81, 5.10).  I'm hard-pressed to call errors of (+0.97, +0.81) million square km 'just below', but the Goddard never defined the term.  (Hence that guide on verifying forecasts!)

3) On June 23rd the forecast becomes:
"I’m forecasting a summer minimum of 5.5 million km², based on JAXA. i.e. higher than 2009, lower than 2006."
The first time he directly names a specific number for the ice (well, one assumes extent, but he doesn't say here whether it's extent or area he means; nor whether it's minimum day or monthly average).  2009's JAXA numbers are (5.25, 5.38) for minimum day and monthly average extent, respectively.  2006 are (5.78, 5.91).  5.5 is between either the minimum day or the monthly average, so this didn't help clarify which he meant.  His September comments did (minimum day), and this comment is also more clearly consistent with minimum day. (0.25 above 2009, 0.28 below 2006, versus being much closer to the 2009 monthly average than 2006 monthly average).  This also gives us a sense of his level of uncertainty -- 0.25 million km^2.  If he were more uncertain than that, he would give a wider range of extents.  Whether that's one or two 'sigma' is also not clear, and, again, points to why we like these things specified.


I'll note that in following this up, I read every one of the WUWT 'sea ice news' posts from #2 to #30, as well as an August midweek update, and all 'verification' posts at Goddard's.  Plus some, but not all, comments in August's posts.  This matters some for what follows.

From July 4th through a comment of his on his own WUWT post August 24th, Goddard continues with 5.5 million square km being his prediction.   Quoting his comment (with date and time so you can find it; I've never figured how to link straight to comments):
"
stevengoddard says:
August 24, 2010 at 9:46 am
Scott,
Remember that NSIDC took a mulligan, changing their forecast in July. They started at 5.5 million.
I haven’t taken my mulligan yet ;^)
"

So at least as late as that the 24th, 5.5 is his prediction and he's taking pride in having not changed his forecast, when talking to WUWT readers.  That's odd, because in the August Sea Ice Outlook, whose due date for submission was mid-month (I did submit to it myself, on time), his prediction was 5.1 million km^2 for September monthly average at NSIDC.  As I mentioned before JAXA runs about 0.2 million above NSIDC, so a 0.4 million square km drop doesn't make sense.  On top of which is monthly average (which, at JAXA, runs about 0.15 million km^2 above minimum day, and more between NSIDC's monthly average and minimum day -- about 0.30 million km^2 this year).

To back that out:  If the prediction for minimum day was 5.5 according to JAXA for minimum day, subtract 0.2 to get NSIDC's minimum day, and then add 0.3 to get the September (NSIDC) average extent.  If his prediction hadn't fundamentally changed, the SEARCH submission should have been 5.6 million km^2.  Since it was 5.1 instead, there's a rather large change.  Surely worthy of a post of its own at either WUWT or his own blog.  In any case, given his August 24th comment, it had to be between then and the 31st.  (At least if it's going to be called an August prediction, which he does.)  That, or he was telling WUWT readers different things on the 24th than he was telling the Sea Ice Outlook.  Or Outlook let him submit late, or ... -- the point being, this shows why it is we want our forecasts to be clearly public.

August 29th Goddard is still referring only to his 'June' forecast of 5.5 million km^2.  No mention of an August prediction.

August 30th, Steven Goddard started blogging regularly at his own blog rather than WUWT.

On August 31st he seems to still like his 'June' forecast (actually, the at least 3rd forecast from June, the one on June 23rd) as he says:
"The video below shows current ice (thin red line) my June forecast (dashed line) and NSIDC’s forecast summer minimum (red horizontal line.)  Who do you think is going to be closest?"
-- and there is no mention of an August prediction.  Note, too, he doesn't mention that NSIDC is predicting a different thing than he is.

The first I see Goddard directly referring to his 'August forecast' is September 7th.  It is mentioned at WUWT the the day before by Watts.  Can you really call something that doesn't deserve a main-post mention until the end of the first week of September a prediction of September?  Ok, maybe I missed the post in which it showed up.  But clearly, given his August 24th and 29th comments, his prediction of 5.1 million square km doesn't surface publicly until after the morning of the 29th.

JAXA's observed ice cover on August 23rd was 5.60 million km^2 (last observation he'd have been able to look at in commenting on the 24th).  24th was 5.55.  August 31st was 5.33 (already 'busting' all of his 'June' forecasts).  The Sea Ice Outlook was released September 1st, so in the last week of August, apparently, after the June forecasts were busted, Goddard made a revised forecast.  (See point of 'how wrong could you be' above; I'll make it its own note later this week.  The answer is, for JAXA, not very if you get to predict the seasonal minimum day from August 23rd.)

It is also with the post of September 7th that I (finally) can be positive that Goddard means to verify minimum day's ice extent as computed by JAXA:
"My June forecast of 5.5 million km² (JAXA) is currently off by 7%."
-- you can't make that statement if you mean monthly average. Who knows what's going to happen the rest of the month?



So at last, I'll return to greenman3610 and Watts' comments on Goddard's prediction(s) made at WUWT.  One part of it being that fundamentally, greenman3610 is not focused on predictions as such.  It it, instead the months of Goddard talking of sea ice being in recovery.  That belief in recovery driving his predictions of ice extent.  But, fundamentally we're looking at at least 6 months of 'sea ice is recovering' posts from Goddard, with numbers or references that compute to numbers from 5.5 million km^2 to over 6 for the extent based on that belief.  Then, in the last 2 days of August, entering a forecast of 5.1 million km^2, which is less than 2009's 5.25.  It's a recovery, there's just less ice?  Don't follow that reasoning.

As to predictions as such, only the June predictions (between 5.78 and 6.03 June 6th, 'just below' 5.78 June 14th, 5.5 June 23rd; in the first two he was referencing years, I filled in the values for minimum day from JAXA for those years) seem to have been made in notes of their own at WUWT.  It's correct to refer to those as his WUWT forecasts lacking any sighting of a post with the 5.1 there before September, and his clear comment on the 24th of August of 5.5 (still) being his prediction with none others (no 'mulligan' as he called it) existing.  Further, after going through all the posts I could find on the topic, it's clear (see above) that he meant to forecast the minimum day's ice cover as computed by JAXA.   That figure, this year, was 4.81 million km^2. So his last (and, turned out to be, best, and the one he consistently referred to as his forecast from June 23rd to at least August 24th) June forecast was off by almost 0.7 million km^2.

On the other hand, Watts points, in November, to only the final prediction from Goddard, which had to have been made in the last two days of August, that was 5.1 million km^2 for September monthly average computed by NSIDC.  (You can tell by noting the horizontal line in the verification figure from SEARCH that Watts shows is at 4.9 million km^2, vs. JAXA's minimum day of 4.81, or NSIDC's minimum day of 4.60, or JAXA's monthly average of 5.10.)  The 5.1 is not so far off from 4.9.  At least SEARCH is more or less clear (not clear enough, I think, that'll be a different email) that this is what they're looking for.  Not clear at all to me that either Watts or Goddard realize the different quantities being forecast or verified with.  Pointing to only one of multiple forecasts violates one of the forecast verification principles I mentioned above. 

Was Goddard's forecast pretty good?  Pretty bad?  Off by 0.2, 0.3, 0.7, 1.1 million km^2?  Who knows.  We can get all those results and more by varying what we take to be his forecast and how what we choose to verify against.  That's why it's such a central point that you say just what you're predicting (guessing, estimating, ...) and how it's to be validated.


Ok, so all good fun in seeing why we want to do forecast verification in the direction that I like rather than waiting until afterwards to figure out what number will be compared against what other number.  There was one other point of contention between Watts and greenman3610 -- the business of Goddard talking of recovery or not.  From August 9th, for example, we see Goddard (he, or Watts, italicized it, so I'll follow suit) saying:
Can we find another year with similar ice distribution as 2010? I can see Russian ice in my Windows. Note in the graph below that 2010 is very similar to 2006. 2006 had the highest minimum (and smallest maximum) in the DMI record. Like 2010, the ice was compressed and thick in 2006. Conclusion : Should we expect a nice recovery this summer due to the thicker ice? You bet ya..

-- The DMI record is even shorter than JAXA, starting in 2005, vs. 2002.  Given that we want 30 years for climate purposes, either is too short for much use, except to cherry pick for 'highest in the record'.  Kind of like being the tallest person in my house.  Sure, I am.  But there aren't many of us here.  The satellite period as a whole only begins in October 1978, so even taking the whole period is pretty short.

Anyhow, there's kerfuffle between greenman3610 and Watts regarded whether there was a 'guarantee' of recovery in Goddard's comment.  You decide (go read the whole note, of course, if you're going to).

I'm hard pressed, to return to science, to see 2010 extent being below 2009 and all years 2006 and before that we have data for, as 'recovery'.  And that takes us back to what constitutes a forecast and how you'd verify it.


* ok, going way back for those who remembered that asterisk.  Allan Murphy was one of the major figures in meteorological forecast verification.  One of the people I discussed verification with fairly often had learned a fair part of what he knew from Murphy.  For the major 'small world' effect, I have a couple of textbooks that Murphy used himself.  (One is 'Strength of Materials', so I guess that he started out in engineering, as I had.)  Anyhow, if you were to read all his papers on verification, you'd be quite knowledgeable indeed.

Note:
I try to tell people when I write about their work.  But I could not find any email contact for Goddard at his blog.  I sent a note (available on request) via Watts' contact page 9:45PM Eastern time 24 Nov asking a couple questions and notifying him of this post's scheduled Monday appearance, and to greenman3610 about the same time.  Watts couldn't answer my questions, but did forward (he said evening of the 24th) my note to Goddard.  (No surprise that he couldn't -- they were about Goddard's actions and knowledge.)  Watts, in his response to me mentioned a prediction by Bastardi at WUWT.  It illustrates another bunch of violations of the principles I mention above, so gives another chance to discuss how to do forecast verification.  That'll appear later this week, as well as a consideration of how wrong could you be if you wait until the last week of August to make your prediction of the seasonal minimum.
◄ Newer Post Older Post ►
eXTReMe Tracker
 

Copyright 2011 Grumbine Science is proudly powered by blogger.com