Integrating Information Retrieval & Neural Networks

Research output: ThesisDoctoral Thesis

Abstract

Due to the proliferation of information in databases and on the Internet, users are overwhelmed leading to Information Overload. It is impossible for humans to index and search such a wealth of information by hand so automated indexing and searching techniques are required. In this dissertation, we explore current Information Retrieval (IR) techniques and their shortcomings and we
consider how more sophisticated approaches can be developed to aid retrieval.
Current techniques can be slow due to the sheer volume of the search space although faster ones are being developed. Matching is often poor, as the quantity of retrievals does not necessarily indicate quality retrievals. Many current approaches simply return the documents containing the greatest number of `query words'. A methodology is desired to: process documents unsupervised; generate an index using a data structure that is memory
efficient, speedy, incremental and scalable; identify spelling mistakes in the query and suggest alternative spellings; handle paraphrasing of documents and synonyms for both indexing and searching; to focus retrieval by minimising the search space; and, finally calculate the query-document similarity from statistics
autonomously derived from the text corpus. We describe our IR system named
MinerTaur, developed using both the AURA modular neural system and a hierarchical, growing self-organising neural technique based on Growing Cell Structures which we call TreeGCS. We integrate three modules in MinerTaur: a spell checker; a hierarchical thesaurus generated from corpus statistics inferred by the system; and, a word-document matrix to efficiently store the associations between the documents and their constituent words. We describe each module individually and evaluate each against comparative data structures and benchmark implementations. We identify improved memory usage, spelling recall accuracy, cluster quality and training and recall times for the modules. Finally we compare MinerTaur against a benchmark IR system, SMART developed at Cornell University, and reveal superior recall and precision for MinerTaur versus SMART.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of York
EditionYCST-2002-04
Publisher
Publication statusPublished - 1 Dec 2001

Cite this