Part-of-Speech Tagging using Progol

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A system for `tagging' words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to em eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as tt nounp (noun phrase) are nonterminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data teDaeZavBerGil96-WVLC96,Bri94-AAAI94,CutKupPedSib92-ANLP92 which all have accuracies in the range 96--97 The per-sentence accuracy was 49.5
Original languageEnglish
Title of host publicationInductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97). LNAI 1297
PublisherSpringer
Pages93-108
Number of pages16
Publication statusPublished - 1997

Cite this