NLP Analysis of COVID-19 Radiology Reports in Indonesian using IndoBERT

Nunung Nurul Qomariyah, Tianda Sun, Dimitar Lubomirov Kazakov*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The presence of COVID-19, a respiratory disease, can be detected through medical imaging, such as Chest X-Ray (CXR) and Computed Tomography (CT) scans. These radiology images can also show how the patient's condition progresses. Radiologists need to provide a written report for each image, so that other clinicians can use it in their decision making. In this study, we applied one of the Natural Language Processing (NLP) models called IndoBERT to analyze radiology reports of COVID-19 patients written in Indonesian. We performed two tasks, clustering to group reports by meaning and understand their content, and text classification to predict one of the five possible outcomes for each patient. We show the most frequent topics in radiology reports, and word scores in each topic. The IndoBERT model was fine tuned on a medical text, ‘Kamus Kedokteran Dorland’ in an attempt to further improve it. This proved unnecessary: on one hand, there were no additional benefits, on the other, the standard model alone achieved a very satisfactory classification accuracy of over 90%.
Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Biomedical Engineering (IBIOMED)
Place of PublicationYogakarta, Indonesia
Number of pages6
ISBN (Electronic)9781665460798/22/
Publication statusPublished - 18 Oct 2022
Event4th International Conference on Biomedical Engineering - Yogakarta, Indonesia
Duration: 18 Oct 202219 Oct 2022


Conference4th International Conference on Biomedical Engineering
Abbreviated title IBIOMED
Internet address


  • Natural Language Processing
  • IndoBERT
  • COVID-19
  • Radiology

Cite this