For those too busy or too lazy to read this entire post, here is the main conclusion: comparing raw versus homogenised temperature records from KNMI’s main station de Bilt shows that the applied adjustments cool the past. More precisely, summer temperatures of 1900-1950 have been cooled to such extent that certain claims (like the summer of 2003 being the hottest on record) are only supported by the adjusted data set, and not by the raw data set. Furthermore, comparing raw versus homogenised data reveals that the correction for the urban heat island (UHI) effect is no longer being applied after 2006, while the reason for cancelling this correction is not provided.
First two small disclaimers: first of all, I am a biomedical scientist, not a climatologist. However, I am confident that my scientific skills allow me to correctly reconstruct changes between raw and adjusted temperature records. Here I only describe what has been changed in an adapted temperature data set, not why and how it has been changed, so in this post I will not really enter the realm of climatology (I will leave that for future posts…). Secondly, I am a scientist, not a journalist. So I analyse data freely supplied to me, but I do not necessarily correspond with the data source to learn their view on my analysis and interpretations.
Triggered by an alarming report of French mathematicians on flaws in methodology of climatological research, I decided to have a closer look at data from the Dutch meteorological institute (KNMI), whose main station is located at de Bilt. For de Bilt, raw daily weather records are freely accessible as well as homogenised monthly data. Funnily, the second line of raw data export files from their database contains a disclaimer, stating that this data set should not be used for climate change studies and trend analyses, exclamation mark included. KNMI, you do know how to fuel my curiosity.
So what is homogenisation, and when should it (not) be applied?
In science, every measurement is an estimate of reality, not an absolute truth. Therefore it is critical to be confident a priori that your methods are optimal so your raw measurements will deviate from reality as little as possible. In scientific Valhalla, checking and possibly correcting your raw data is not needed, because equipment never breaks down, is never adjusted or replaced, operators never make mistakes, and all other influences affecting measurements are foreseen and eliminated. In the real world however, shit happens, so a log file of your measurements (= metadata) to track this shit is essential. Based on metadata, it may be necessary to reconsider validity of some raw data points. If that validity is unequivocally jeopardized it may even be necessary to exclude those data points, and under very rare circumstances correction is indicated or defendable.
Correcting data is dangerous and should only be performed when a plausible cause for data corruption is known, when no alternatives for data correction are at hand, when for analysis a corrected data set is preferred over an incomplete data set, and when one is confident that the corrected data provide a better estimate of reality than the raw data. In a future post I will explore assumptions underlying the correcting (or ‘homogenisation’) of climatological data records; for now, I won’t elaborate on my view on the ‘why’ or ‘how’ and pro’s and con’s of homogenisation, I will focus just on the results of temperature data adjustments.
Reconstructing the correction of the de Bilt temperature series
To find out what corrections have been applied to the homogenised temperature data set of de Bilt, I simply compared the temperatures of the raw data set to the homogenised data set. No rocket science here: if raw data of a particular month have not been processed further, the arithmetic mean of all raw daily average temperatures in that month should match the corresponding monthly average temperature value from the homogenised data set. This will not hold true for most months, as the accompanying info of the homogenised data file lists 5 corrections: a relocation and replacement of the weather screen in September 1950, a relocation of the screen in August 1951, a lowering of the screen height in June 1961, replacement of the Stevenson screen by a round-plated screen in June 1993, and a correction for the urban heat island (UHI) effect of 0.11°C per century.
First I calculated and plotted the monthly temperature difference between raw and homogenised data. In figure 1, the grey lines show the monthly differences, while the black line shows the yearly average of monthly differences. The blue vertical lines indicate the mentioned time points of weather screen changes, and these time points coincide with abrupt changes in the pattern of monthly differences, reassuring that my calculations are correct. Moreover, the monthly differences also nicely follow the green line that represents the UHI correction of 0.11°C per century.
Incomplete application of the UHI correction
There are two striking things in figure 1. First of all, the grey and black lines start to divert from the green UHI correction line after approximately 2006 (red vertical line). They become perfectly flat, indicating the UHI correction is no longer applied after 2006. If there is a valid reason for not applying the UHI correction for data recorded beyond 2006, the KNMI should have stated this as a sixth correction (or amendment to the fifth correction) in their homogenised data file. To me it is unclear whether withholding this information is a result of sloppiness or concealment, but both are blameworthy. The effect of cancelling the UHI correction for a time range of approximately 9 years does not have serious consequences yet (resulting in a tiny difference of ~0.01°C), but obviously continuation of this cancellation will have more effect each decade.
The second noteworthy thing in figure 1 is the large variation of the monthly differences (grey line) especially before 1950, roughly ranging from +0.2°C to -0.4°C. The highly regular saw-toothed appearance of the grey line points to a season-specific correction pattern, so I decided to have a closer look at the monthly corrections for the time periods separated by the breaks (the blue lines).
Seasonal homogenisation effects: cooling the 1900-1950 summers
To determine the average monthly correction for the four time periods separated by the breaks, I calculated for each of these time periods the arithmetic mean of the correction for all twelve months, resulting in the graph shown in figure 2. The dots show the actual corrections per month for each time period, while the smoothing lines are just plotted to get a general impression of a seasonal pattern in these monthly corrections. While the average monthly corrections are relatively small after 1950 (red, green and blue dots), these corrections are significant before 1950 (black dots) and range roughly from +0.2 for October to values lower than -0.3°C for all summer months. I am quite certain my estimates of these monthly adjustments are correct, since this graph very closely resembles a graph published by the KNMI itself (see this report, figure 9, orange dots). Now we’re talking: no meagre sub-sub-degree data brushing, but cooling pre-1950 summer temperatures with a generous 0.3-0.4°C on average.
When plotting the summer temperature correction per year, an even more pronounced cooling of the ‘30s and ‘40s summers is visible since the UHI correction also starts to kick in (see figure 3; green line). To stress the selectivity of cooling the summer temperatures before 1950, as a reference I also plotted the yearly corrections of winter temperatures (blue line). Note: for these winter temperature calculations, December of each year was regarded as a winter month of the following year (which is common practice in meteorology).
But does this 0.3-0.4°C cooling of pre-1950 summer temperatures really change the big picture of summer temperature fluctuations during the last century, knowing that Dutch summer temperature averages easily vary between 15°C to 18°C? To see the effect of that correction, I calculated and plotted the average summer daily temperatures (June 1st to August 31st) for each year on record for both the raw data set and the homogenised data set (see figure 4).
The change may not be dramatic, but plotting these two graphs (thin lines) and their less noisy 10-year smoothing averages (thick lines) certainly gives different information on how summer temperatures have been evolving (see figure 4). Whereas the red lines (homogenised data) show only a modest warming during the ‘30s and ‘40s, the black lines (raw data) now show a more impressive warming, even to the extent that our hottest summer on record (2003) is surpassed by the summer of 1947 (see the black lines: 18.6°C versus 18.7°C)! Again, as a reference, I include a comparison of raw versus homogenised winter temperatures; the black lines (raw) and red lines (homogenised) hardly differ (figure 5).
Small changes, and not so small consequences
In summary, the comparison of raw versus homogenised temperature records for KNMI main station de Bilt demonstrates that although most adjustments only result in small differences between the data sets, one change is rather pronounced: the cooling of pre-1950 summers in the homogenised data set. As I have stated earlier, in this post I am simply describing what has been changed, not why. I promise I will closely study the justifications and assumptions underlying these temperature adjustments, and share my views as a non-meteorologist on that in a later post.