Healthcare Cost Regressions: Going Beyond the Mean to Estimate the Full Distribution

Research output: Contribution to journalArticlepeer-review


Understanding the data generating process behind healthcare costs remains a key empirical issue. Although much research to date has focused on the prediction of the conditional mean cost, this can potentially miss important features of the full distribution such as tail probabilities. We conduct a quasi-Monte Carlo experiment using the English National Health Service inpatient data to compare 14 approaches in modelling the distribution of healthcare costs: nine of which are parametric and have commonly been used to fit healthcare costs, and five others are designed specifically to construct a counterfactual distribution. Our results indicate that no one method is clearly dominant and that there is a trade-off between bias and precision of tail probability forecasts. We find that distributional methods demonstrate significant potential, particularly with larger sample sizes where the variability of predictions is reduced. Parametric distributions such as log-normal, generalised gamma and generalised beta of the second kind are found to estimate tail probabilities with high precision but with varying bias depending upon the cost threshold being considered. Copyright © 2015 John Wiley & Sons, Ltd.

Original languageEnglish
Pages (from-to)1192-1212
Number of pages21
JournalHealth Economics
Issue number9
Early online date30 Apr 2015
Publication statusPublished - 4 Aug 2015

Bibliographical note

Copyright © 2015 John Wiley & Sons, Ltd. This is an author produced version of a paper published in Health Economics. Uploaded in accordance with the publisher's self-archiving policy.

Cite this