TY - JOUR
T1 - Cost-effectiveness of Microsoft Academic Graph with machine learning for automated study identification in a living map of coronavirus disease 2019 (COVID-19) research
AU - Shemilt, Ian
AU - Arno, Anneliese
AU - Thomas, James
AU - Lorenc, Theo
AU - Khouja, Claire
AU - Raine, Gary
AU - Sutcliffe, Katy
AU - Preethy, D'Souza
AU - Kwan, Irene
AU - Wright, Kath
AU - Sowden, Amanda
N1 - Publisher Copyright:
Copyright: © 2024 Shemilt I et al.
PY - 2024/3/26
Y1 - 2024/3/26
N2 - Background: Identifying new, eligible studies for integration into living systematic reviews and maps usually relies on conventional Boolean updating searches of multiple databases and manual processing of the updated results. Automated searches of one, comprehensive, continuously updated source, with adjunctive machine learning, could enable more efficient searching, selection and prioritisation workflows for updating (living) reviews and maps, though research is needed to establish this. Microsoft Academic Graph (MAG) is a potentially comprehensive single source which also contains metadata that can be used in machine learning to help efficiently identify eligible studies. This study sought to establish whether: (a) MAG was a sufficiently sensitive single source to maintain our living map of COVID-19 research; and (b) eligible records could be identified with an acceptably high level of specificity. Methods: We conducted an eight-arm cost-effectiveness analysis to assess the costs, recall and precision of semi-automated workflows, incorporating MAG with adjunctive machine learning, for continually updating our living map. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Our systematic review software, EPPI-Reviewer, was adapted to incorporate MAG and associated machine learning workflows, and also used to collect data on recall, precision, and manual screening workload. Results: The semi-automated MAG-enabled workflow dominated conventional workflows in both the base case and sensitivity analyses. At one month our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified 469 additional, eligible articles for inclusion in our living map, and cost £3,179 GBP per week less, compared with conventional methods relying on Boolean searches of Medline and Embase. Conclusions: We were able to increase recall and coverage of a large living map, whilst reducing its production costs. This finding is likely to be transferrable to OpenAlex, MAG’s successor database platform.
AB - Background: Identifying new, eligible studies for integration into living systematic reviews and maps usually relies on conventional Boolean updating searches of multiple databases and manual processing of the updated results. Automated searches of one, comprehensive, continuously updated source, with adjunctive machine learning, could enable more efficient searching, selection and prioritisation workflows for updating (living) reviews and maps, though research is needed to establish this. Microsoft Academic Graph (MAG) is a potentially comprehensive single source which also contains metadata that can be used in machine learning to help efficiently identify eligible studies. This study sought to establish whether: (a) MAG was a sufficiently sensitive single source to maintain our living map of COVID-19 research; and (b) eligible records could be identified with an acceptably high level of specificity. Methods: We conducted an eight-arm cost-effectiveness analysis to assess the costs, recall and precision of semi-automated workflows, incorporating MAG with adjunctive machine learning, for continually updating our living map. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Our systematic review software, EPPI-Reviewer, was adapted to incorporate MAG and associated machine learning workflows, and also used to collect data on recall, precision, and manual screening workload. Results: The semi-automated MAG-enabled workflow dominated conventional workflows in both the base case and sensitivity analyses. At one month our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified 469 additional, eligible articles for inclusion in our living map, and cost £3,179 GBP per week less, compared with conventional methods relying on Boolean searches of Medline and Embase. Conclusions: We were able to increase recall and coverage of a large living map, whilst reducing its production costs. This finding is likely to be transferrable to OpenAlex, MAG’s successor database platform.
KW - Automation
KW - evidence synthesis
KW - machine learning
KW - systematic map
KW - systematic review
UR - http://www.scopus.com/inward/record.url?scp=85192191622&partnerID=8YFLogxK
U2 - 10.12688/wellcomeopenres.17141.2
DO - 10.12688/wellcomeopenres.17141.2
M3 - Article
AN - SCOPUS:85192191622
SN - 2398-502X
VL - 6
JO - Wellcome Open Research
JF - Wellcome Open Research
M1 - 210
ER -