Econometrics provides many useful tools for evaluating models, including climate models. I plan to do a few projects on this topic in the years ahead.
THE GLOBAL WARMING HIATUS, aka DISCREPANCY
I have a column today (June 17 2014) in the Financial Post on the widening discrepancy between models and observations. The talk of the "pause" in global warming is somewhat misplaced, since a pause is not out of place amidst a long term upward trend. What is out of place is an extended pause just where models predict a sharp rise. That's the issue that merits attention, both for the scientific issues it gives rise to, and also the potential policy implications. NOTE: the line shades were mislabeled in the article--black should be gray and vice-versa. The above graph has the correct shading. R code to draw that graph is here.
TROPOSPHERIC TRENDS: MODELS vs OBSERVATIONS ROUND II: In fall 2010 I published a paper with Steve McIntyre and Chad Herman comparing climate model-generated predictions to observations from satellites and weather balloons in the lower- and mid-troposphere over the tropics, a key region for assessing climate model validity. That paper applied two methods, the panel model, which is a fairly well-known econometric method, as well as the Vogelsang-Franses multivariate trend estimation method, a less-well known but superior alternative which adapts the general HAC method to the estimation of robust confidence intervals for linear trends. The data set used in MMH spanned 1979 to 2009. I extended the data set to include weather balloon data back to 1958 for the purpose of comparing observed lower- and mid-troposphere trends in the tropics to climate model predictions. A challenge in this case is that the 1977-78 Pacific Climate Shift introduces a step-like change in the mean of the data which causes a spurious increase in the estimated trend. But controlling for the step-change affects the VF critical values. Tim Vogelsang has extended the theory behind the VF method to yield robust trend variances in the presence of autocorrelation of unknown form when a step-change occurs at a known point in the sample. In our new paper, just released as a Discussion Paper and en route to a journal, Tim and I present a detailed explanation of the HAC approach to trend comparisons, including the relevant asymptotics and a bootstrap method for generating empirical critical values, then we apply the method to the Hadley and RICH balloon data for the tropical troposphere. Controlling for the 1977 Pacific Climate Shift we find the trends are insignificant from 1958-2010 and the discrepancy with climate models is highly significant.
PAPER IN CLIMATE DYNAMICS TESTING CLIMATE MODEL VALIDITY (2012): Lise Tole and I published a paper in Climate Dynamics testing the ability of climate models to reproduce the spatial pattern of temperature trends over land. This builds on previous work of mine looking at the correlation between indicators of industrial development over land and the spatial pattern of warming trends, a relationship that is not predicted by models and is supposed to have been filtered out of the surface climate record. The paper is
**McKitrick, Ross R. and Lise Tole (2012) “Evaluating Explanatory Models of the Spatial Pattern of Surface Climate Trends using Model Selection and Bayesian Averaging Methods” Climate Dynamics, 2012, DOI: 10.1007/s00382-012-1418-9
Preprint here; data and code archive here; university press release here. We apply classical and Bayesian methods to look at how well 3 different types of variables can explain the spatial pattern of temperature trends over 1979-2002. One type is the output of a collection of 22 General Circulation Models (GCMs) used by the IPCC in the Fourth Assessment Report. Another is a collection of measures of socioeconomic development over land. The third is a collection of geopgraphic indicators including latitude, coastline proximity and tropospheric temperature trends. The question is whether one can justify an extreme position that rules out one or more categories of data, or whether some combination of the three types is necessary. I would describe the IPCC position as extreme since they dismiss the role of socioeconomic factors in their assessments. In the classical tests, we look at whether any combination of one or two types can "encompass" the third, and whether non-nested tests combining pairs of groups reject either 0% or 100% weighting on either. ("Encompass" means provide sufficient explanatory power not only to fit the data but also to account for the apparent explanatory power of the rival model.) In all cases we strongly reject leaving out the socioeconomic data. In only 3 of 22 cases do we reject leaving out the climate model data, but in one of those cases the correlation is negative, so only 2 count--that is, in 20 of 22 cases we find the climate models are either no better than or worse than random numbers. We then apply Bayesian Model Averaging to search over the space of 537 million possible combinations of explanatory variables and generate coefficients and standard errors robust to model selection (aka cherry-picking). In addition to the geographic data (which we include by assumption) we identify 3 socioeconomic variables and 3 climate models as the ones that belong in the optimal explanatory model, a combination that encompasses all remaining data. So our conclusion is that a valid explanatory model of the pattern of climate change over land requires use of both socioeconomic indicators and GCM processes. The failure to include the socioeconomic factors in empirical work may be biasing analysis of the magnitude and causes of observed climate trends since 1979. I have written a pair of op-eds to explain the work. The first part appeared in the Financial Post on June 21. A version with the citations provided is here. Part II is here online, and the versions with citations is here.
MODEL-DATA TREND COMPARISONS (2010): My first foray into this topic looks at how to compare model-generated trends to observations. There have been some rather simplistic methods used before now, based on t-stats with "effective degrees of freedom" adjustments &whatnot. The following paper explains more accurate testing methods using panel regression and multivariate trend estimations that have higher power and greater robustness to complex autocorrelation patterns. The application is to the tropical troposphere, an important regions for testing models' ability to quantify the atmospheric response to greenhouse gases. A few recent studies differed on whether models significantly overstate the warming or not. We find that up to 1999 there was only weak evidence for this, but on updated data the models appear to significantly overpredict warming.
CORRECTION to MMH10: In 2010 Steve, Chad and I published a paper that applied panel and multivariate (VF) methods to test the significance of trends and of model-obs differences in the tropical troposphere. There were a couple of typos, and also Chad discovered an error in the GISS data as archived at the PCMDI (not a huge one, just an error splicing pre- and post-2000 runs together). We re-did our analyses and used the updated versions of the observational data for the purpose. The correction has been published:
The GISS correction and data revisions strengthen all our original findings, reducing the observational trends and raising (slightly) the model trends. (a) The combined MSU trends have a p-value just over 0.05; still significant but "marginal". (b) The HadAT 1979-2009 trend in the LT drops from significance to marginal. (c) The average 1979-2009 MT trend across all observational series drops to insignificance. (d) The RICH 1979-2009 MT trend drops to insignificance. (e) The RSS 1979-2009 MT series is now significantly different from models in the panel regression test. For the 1979-2009 interval, all observational series individually and jointly are significantly below models at both the LT and MT layers. (f) Over the 1979-1999 interval the model-obs differences are still marginally significant but in the MT layer it is now at about the 6% level, so it is nearly significant.