Using document dimensions for enhanced information retrieval

T Jayasooriya, S Manandhar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Conventional document search techniques are constrained by attempting to match individual keywords or phrases to source documents. Thus, these techniques miss out documents that contain semantically similar terms, thereby achieving a relatively low degree of recall. At the game time, processing capabilities and tools for syntactic and semantic analysis of language have advanced to the point where an index-time linguistic analysis of source documents is both feasible and realistic. In this paper, we introduce document dimensions, a means of classifying or grouping terms discovered in documents. Using an enhanced version of Jakarta Lucene[1], we demonstrate that supplementing keyword analysis with some syntactic and semantic information can indeed enhance the quality of information retrieval results.

Original languageEnglish
Title of host publicationAPPLIED COMPUTING, PROCEEDINGS
EditorsS Manandhar, J Austin, U Desai, Y Oyanagi, A Talukder
Place of PublicationBERLIN
PublisherSpringer
Pages145-152
Number of pages8
ISBN (Print)3-540-23659-7
Publication statusPublished - 2004
Event2nd Asian Applied Computing Conference - Kathmandu
Duration: 29 Oct 200431 Oct 2004

Conference

Conference2nd Asian Applied Computing Conference
CityKathmandu
Period29/10/0431/10/04

Cite this