Abstract
A system for `tagging' words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to em eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as tt nounp (noun phrase) are nonterminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data teDaeZavBerGil96-WVLC96,Bri94-AAAI94,CutKupPedSib92-ANLP92 which all have accuracies in the range 96--97 The per-sentence accuracy was 49.5
Original language | English |
---|---|
Title of host publication | Inductive Logic Programming: Proceedings of the 7th International Workshop (ILP-97). LNAI 1297 |
Publisher | Springer |
Pages | 93-108 |
Number of pages | 16 |
Publication status | Published - 1997 |