By the same authors

From the same journal

From the same journal

Optimising Unicode Regular Expression Evaluation with Previews

Research output: Contribution to journalArticle

Full text download(s)

Author(s)

Department/unit(s)

Publication details

JournalSoftware: Practice and Experience
DateAccepted/In press - 28 Jul 2016
DatePublished (current) - 15 Sep 2016
Number of pages20
Original languageEnglish

Abstract

The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user-selectable character encodings. An established design approach was used: a simulated non-deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches, and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched.

Bibliographical note

© 2016 John Wiley & Sons, Ltd. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details

Discover related content

Find related publications, people, projects, datasets and more using interactive charts.

View graph of relations