Identifying depression with the PHQ-2: A diagnostic meta-analysis

Laura Manea, Simon Gilbody, Catherine Hewitt, Alice North, Faye Plummer, Rachel Richardson, Brett D Thombs, Bethany Williams, Dean McMillan

Research output: Contribution to journalArticlepeer-review

Abstract

BACKGROUND: There is interest in the use of very brief instruments to identify depression because of the advantages they offer in busy clinical settings. The PHQ-2, consisting of two questions relating to core symptoms of depression (low mood and loss of interest or pleasure), is one such instrument.

METHOD: A systematic review was conducted to identify studies that had assessed the diagnostic performance of the PHQ-2 to detect major depression. Embase, MEDLINE, PsychINFO and grey literature databases were searched. Reference lists of included studies and previous relevant reviews were also examined. Studies were included that used the standard scoring system of the PHQ-2, assessed its performance against a gold-standard diagnostic interview and reported data on its performance at the recommended (≥3) or an alternative cut-off point (≥2). After assessing heterogeneity, where appropriate, data from studies were combined using bivariate diagnostic meta-analysis to derive sensitivity, specificity, likelihood ratios and diagnostic odds ratios.

RESULTS: 21 studies met inclusion criteria totalling N=11,175 people out of which 1529 had major depressive disorder according to a gold standard. 19 of the 21 included studies reported data for a cut-off point of ≥3. Pooled sensitivity was 0.76 (95% CI =0.68-0.82), pooled specificity was 0.87 (95% CI =0.82-0.90). However there was substantial heterogeneity at this cut-off (I(2)=81.8%). 17 studies reported data on the performance of the measure at cut-off point ≥2. Heterogeneity was I(2)=43.2% pooled sensitivity at this cut-off point was 0.91 (95% CI =0.85-0.94), and pooled specificity was 0.70 (95% CI =0.64-0.76).

CONCLUSION: The generally lower sensitivity of the PHQ-2 at cut-off ≥3 than the original validation study (0.83) suggests that ≥2 may be preferable if clinicians want to ensure that few cases of depression are missed. However, in situations in which the prevalence of depression is low, this may result in an unacceptably high false-positive rate because of the associated modest specificity. These results, however, need to be interpreted with caution given the possibility of selectively reported cut-offs.

Original languageEnglish
Pages (from-to)382-395
Number of pages14
JournalJournal of affective disorders
Volume203
DOIs
Publication statusPublished - 6 Jun 2016

Bibliographical note

Funding Information:
We would like to thank the authors of both the included and excluded studies for their help in answering our questions about their studies. Dr Manea was supported by an NIHR Lectureship award. There was no specific funding for this study, and no funders had any role in the study design, in the collection, analysis or interpretation of data, in the writing of the manuscript or in the decision to submit the manuscript for publication. Appendix A

Publisher Copyright:
© 2016 Published by Elsevier B.V.

Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.

Keywords

  • Diagnostic accuracy
  • Diagnostic meta-analysis
  • Major depression
  • Phq-2
  • Screening
  • Ultra-brief screening instruments
  • Depression/diagnosis
  • Psychiatric Status Rating Scales
  • Humans
  • Sensitivity and Specificity
  • Depressive Disorder, Major/diagnosis

Cite this