By the same authors

From the same journal

OCR post-correction for detecting adversarial text images

Research output: Contribution to journalArticlepeer-review

Published copy (DOI)

Author(s)

Department/unit(s)

Publication details

JournalJournal of Information Security and Applications
DateAccepted/In press - 25 Mar 2022
DateE-pub ahead of print - 2 Apr 2022
DatePublished (current) - 2 Apr 2022
Volume66
Number of pages12
Early online date2/04/22
Original languageEnglish

Abstract

The amount of images with embedded text shared on Online Social Networks (OSNs), such as Twitter or Facebook has been growing in recent years. It is becoming important to analyse the images uploaded into these platforms, as adversaries may spread images with toxic content or misinformation (i.e. spam). Optical character recognition (OCR) systems have been used to detect images with malicious content, where the embedded text gets extracted and classified using machine learning algorithms. However, most existing OCR-based systems are adversary-agnostic models, in which the extracted text from an image is not checked by humans before the classification. Consequently, these fully automated models become vulnerable to minor modifications of images’ pixels or textual content (e.g., character-level perturbations), which do not affect human understanding, but could cause the OCR systems to misrecognise the embedded text. In this paper, we propose an OCR post-correction algorithm to improve the robustness of OCR-based systems against images with perturbed embedded texts. Experimental results showed that our proposed algorithm improves the robustness of three state-of-the-art OCR models with at least 10% against adversarial text images, and it outperforms five spellcheckers in correcting adversarial text. Also, we evaluated the perceptibility of our adversarial images, and this study showed that 91% of the participants were able to correctly recognise the adversarial text images. Additionally, we developed an adversary-aware OCR-based system for detecting adversarial text images using the proposed algorithm, and our evaluation results showed considerable improvement in the performance of an OCR-based system.

Discover related content

Find related publications, people, projects, datasets and more using interactive charts.

View graph of relations