By the same authors

From the same journal

From the same journal

Improving the prediction of an atmospheric chemistry transport model using gradient boosted regression trees

Research output: Contribution to journalArticle

Full text download(s)

Published copy (DOI)

Author(s)

Department/unit(s)

Publication details

JournalAtmospheric Chemistry and Physics Discussions
DateAccepted/In press - 24 Sep 2019
DatePublished (current) - 2 Oct 2019
Volume2019
Pages (from-to)1-33
Original languageEnglish

Abstract

Predictions from process-based models of environmental systems are biased, due to uncertainties in their inputs and parameterisations, reducing their utility. We develop a predictor for the bias in tropospheric ozone (a key pollutant) calculated by an atmospheric chemistry transport model (GEOS-Chem), based on outputs from the model and observations of ozone from both the surface (EPA, EMEP and GAW) and the ozone-sonde networks. We train a gradient-boosted decision tree algorithm (XGBoost) to predict model bias, with model and observational data for 2010–2015, and then test the approach using the years 2016–2017. We show that the bias-corrected model performs significantly better than the uncorrected model. The root mean square error is reduced from from 16.21 ppb to 7.48 ppb, the normalised mean bias is reduced from 0.28 to −0.04, and the Pearson's R is increased from 0.479 to 0.841. Comparisons with observations from the NASA ATom flights (which were not included in the training) also show improvements but to a smaller extent reducing the RMSE from 12.11 ppb to 10.50 ppb, the NMB from 0.08 to 0.06 and increasing the Pearson's R from 0.761 to 0.792. We attribute the smaller improvements to the lack of routine observational constraints of the remote troposphere. We explore the choice of predictor (bias prediction versus direct prediction) and conclude both may have utility. We show that the method is robust to variations in the volume of training data, with approximately a year of data needed to produce useful performance. Data denial experiments (removing observational sites from the algorithm training) shows that information from one location (for example Europe) can reduce the model bias over other locations (for example North America) which might provide insights into the processes controlling the model bias. We conclude that combining machine learning approaches with process based models may provide a useful tool for improving performance of air quality forecasts or to provide enhanced assessments of the impact of pollutants on human and ecosystem health, and may have utility in other environmental applications.

Bibliographical note

© Author(s) 2019.

Discover related content

Find related publications, people, projects, datasets and more using interactive charts.

View graph of relations