Learning Word Segmentation Rules for Tag Prediction

Research output: Contribution to conferencePaperpeer-review


In our previous work we introduced a hybrid, GA&ILP-based approach for learning of stem-suffix segmentation rules from an unmarked list of words. Evaluation of the method was made difficult by the lack of word corpora annotated with their morphological segmentation. Here the hybrid approach is evaluated indirectly, on the task of tag prediction. A pair of stem-tag and suffix-tag lexicons is obtained by the application of that approach to an annotated lexicon of word-tag pairs. The two lexicons are then used to predict the tags of unseen words in two ways, (1) by using only the stem and suffix generated by the segmentation rules, and (2) for all matching combinations of stem and suffix present in the lexicons. The results show high correlation between the constituents generated by the segmentation rules, and the tags of the words in which they appear, thereby demonstrating the linguistic relevance of the segmentations produced by the hybrid approach.
Original languageUndefined/Unknown
Publication statusPublished - 1999

Cite this