Bilingual lexicon extraction from comparable corpora using in-domain terms

Azniah Ismail, Suresh Manandhar

Research output: Contribution to conferencePaperpeer-review


Many existing methods for bilingual
lexicon learning from comparable corpora
are based on similarity of context vectors.
These methods suffer from noisy vectors
that greatly affect their accuracy. We
introduce a method for filtering this noise
allowing highly accurate learning of
bilingual lexicons. Our method is based
on the notion of in-domain terms which
can be thought of as the most important
contextually relevant words. We provide
a method for identifying such terms.
Our evaluation shows that the proposed
method can learn highly accurate bilingual
lexicons without using orthographic
features or a large initial seed dictionary.
In addition, we also introduce a method
for measuring the similarity between
two words in different languages without
requiring any initial dictionary.
Original languageUndefined/Unknown
Publication statusPublished - 2010

Cite this