By the same authors

Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10

Research output: Contribution to journalArticle

Standard

Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. / Keller, Christoph A.; Evans, M. J.

In: Geoscientific Model Development Discussions, Vol. 12, No. 3, 29.03.2019, p. 1209-1225.

Research output: Contribution to journalArticle

Harvard

Keller, CA & Evans, MJ 2019, 'Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10', Geoscientific Model Development Discussions, vol. 12, no. 3, pp. 1209-1225. https://doi.org/10.5194/gmd-2018-229

APA

Keller, C. A., & Evans, M. J. (2019). Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geoscientific Model Development Discussions, 12(3), 1209-1225. https://doi.org/10.5194/gmd-2018-229

Vancouver

Keller CA, Evans MJ. Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geoscientific Model Development Discussions. 2019 Mar 29;12(3):1209-1225. https://doi.org/10.5194/gmd-2018-229

Author

Keller, Christoph A. ; Evans, M. J. / Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. In: Geoscientific Model Development Discussions. 2019 ; Vol. 12, No. 3. pp. 1209-1225.

Bibtex - Download

@article{33182cee61d94765ab9248bc8d3e79df,
title = "Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10",
abstract = "Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry models. Our training data consists of one month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model (v10). From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine learning driven GEOS-Chem model compares well to the standard simulation. For O3, error from using the random forests grow slowly and after 5 days the normalised mean bias (NMB), root mean square error (RMSE) and R2 are 4.2%, 35%, 0.9 respectively; after 30 days the errors increase to 13%, 67%, 0.75. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10% and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short lived nitrogen species errors become large, with NMB, RMSE and R2 reaching >2100% >400%, <0.1 respectively. This proof-of-concept implementation is 85% slower than the direct integration of the differential equations but optimisation and software engineering would allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations and its applicability to operational air quality activities.",
author = "Keller, {Christoph A.} and Evans, {M. J.}",
note = "{\textcopyright} Author(s) 2019.",
year = "2019",
month = mar,
day = "29",
doi = "10.5194/gmd-2018-229",
language = "English",
volume = "12",
pages = "1209--1225",
journal = "Geoscientific Model Development Discussions",
number = "3",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10

AU - Keller, Christoph A.

AU - Evans, M. J.

N1 - © Author(s) 2019.

PY - 2019/3/29

Y1 - 2019/3/29

N2 - Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry models. Our training data consists of one month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model (v10). From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine learning driven GEOS-Chem model compares well to the standard simulation. For O3, error from using the random forests grow slowly and after 5 days the normalised mean bias (NMB), root mean square error (RMSE) and R2 are 4.2%, 35%, 0.9 respectively; after 30 days the errors increase to 13%, 67%, 0.75. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10% and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short lived nitrogen species errors become large, with NMB, RMSE and R2 reaching >2100% >400%, <0.1 respectively. This proof-of-concept implementation is 85% slower than the direct integration of the differential equations but optimisation and software engineering would allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations and its applicability to operational air quality activities.

AB - Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry models. Our training data consists of one month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model (v10). From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine learning driven GEOS-Chem model compares well to the standard simulation. For O3, error from using the random forests grow slowly and after 5 days the normalised mean bias (NMB), root mean square error (RMSE) and R2 are 4.2%, 35%, 0.9 respectively; after 30 days the errors increase to 13%, 67%, 0.75. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10% and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short lived nitrogen species errors become large, with NMB, RMSE and R2 reaching >2100% >400%, <0.1 respectively. This proof-of-concept implementation is 85% slower than the direct integration of the differential equations but optimisation and software engineering would allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations and its applicability to operational air quality activities.

U2 - 10.5194/gmd-2018-229

DO - 10.5194/gmd-2018-229

M3 - Article

VL - 12

SP - 1209

EP - 1225

JO - Geoscientific Model Development Discussions

JF - Geoscientific Model Development Discussions

IS - 3

ER -