The observed ice extent for September 2010, monthly average, from the National Snow and Ice Data Center was 4.90 million km^2. One thing about making your predictions and deciding how well you did is that you also have do be specific about what you're going to compare against. You'll find somewhat different figures if you look at other places.
If you were dishonest, or just not careful, you might select whichever observation was closest to your prediction. The problem with that is that it then becomes easy to claim an accurate prediction -- with little regard for the quality of the prediction itself. Just select the most favorable observation, or process the data yourself in your own way. (By changing how you do your land masking, you can change your ice areas or extents by upwards of 1 million square km. ... he said with no tinge of annoying experience.)
It turns out that my May predictions did pretty well.
But first, some other predictive types.
One was the poll I put up for readers here (which I also entered my two). Two said 5-5.25 milion km^2, so were a bit high. 3 had 4.75-5.0, which was the correct bin. 3 more took 4.50 to 4.75, a bit low. 4 each took 4-4.5, and under 4 million, so were fairly to very low. This is a better showing than last year, when everybody (except William Connolley, who didn't enter the poll but did make a bet with me) was too low, some by quite a lot.
I also mentioned some simple predictors -- climatology and simple trends. They did very poorly:
- Climatology 1979-2000: 7.03 million km^2
- Climatology 1979-2008: 6.67 million km^2
- Linear Trend 1979-2009: 5.37 million km^2
Now for mine (ours -- I was working with Xingren Wu):
- Wu and Grumbine modeling: 5.13 million km^2
- Grumbine and Wu statistical ensemble: 4.78 million km^2
- Grumbine and Wu best fit statistical: 4.59 million km^2
The best fit statistical one did an excellent job of predicting the September minimum, which was 4.60. The problem being, it wasn't trying to do that. This is another reason you have to be specific about just what you're predicting. As for last year, the best fit statistical was too low. 2 is a rather small sample, but being wrong in the same direction does point to something for us to keep in mind when working on next year's prediction. Finding some way of getting a predictor that's too high as often, and by as much, as it's too low.
The ensemble statistical predictor did pretty well -- off by 0.12 million square km. I wouldn't want to mow that large an area! But compared to the natural variability of about 0.5, it's a pretty good result. So I think the ensemble approach did us some good at representing reality a little better.
The model (actually an ensemble of model runs) was only off by 0.23 million square km. Again, pretty good compared to the natural variability. More about how it got there in a moment.
If you averaged our two predictions, you get 4.95 million square km, and off by 0.05, extremely close. While the sea ice outlook did that, I won' take credit for that accuracy (same as I wouldn't have taken any blame for errors). Again, we submitted two separate predictions precisely because we did not consider averaging the two to be meaningful.
With the model, we learned something useful. Namely, the prediction entered was not exactly what came out of the model. Our adjustment method is what produced the good prediction. The problem was that the model, we knew, was biased in favor of having too much ice extent and area, and to making the ice too thick. A model which is consistently biased can actually be very useful, once you figure out how to correct for its biases. A well-known weather model in the 1970s and 1980s was useful in this way. Its rain predictions were always a factor of 2 wrong. Once you divided (or multiplied, I forget which), the model's prediction by 2, you had very good guidance.
Our hope was that we could figure out a way of using the model to get a better estimate than the model itself gave. Then, if it worked, that our method would shed light on how we could improve the model itself. There's a fair chance that we have exactly that. Our method was to use the area of ice that was thicker than 60 cm (or so), rather than ice thicker than zero (using the model straight). That such a correction method worked tells us that we might be much better off to restart the model with all ice, at least in the Arctic, being 60 cm thinner. If everything else is correct in the model system, this restart might cure all problems in the ice model. (We're probably not going to be that lucky, but it's a direction of hope!) Then we'd have accurate predictions without a need for bias correction.
It is these sorts of things -- finding good ideas on how to restart the model, looking too see how much variability is natural, see what kinds of statistical methods are useful, ... -- that I find useful in the outlook process. Something to help focus our thinking. We could do such things anyhow, but it's sometimes also an interesting plus to work on the same problem as other people. If nothing else, you have something to talk about in the hallway.