Building Dialectal Arabic Corpora

Hani Abdalla Muftah Elgabou, Dimitar Lubomirov Kazakov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The aim of this research is to identify local Arabic dialects in texts from social media (Twitter) and link them to specific geographic areas. Dialect identification is studied as a subset of the task of language identification. The proposed method is based on unsupervised learning using simultaneously lexical and geographic distance. While this study focusses on Libyan dialects, the approach is general, and could produce resources to support human translators and interpreters when dealing with vernaculars rather than standard Arabic.
Original languageEnglish
Title of host publicationThe First Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT)
Pages52-57
Number of pages6
Publication statusPublished - 7 Sept 2017
EventFirst Workshop on Human-Informed Translation and Interpreting Technology - Varna, Bulgaria
Duration: 7 Sept 20177 Sept 2017
http://rgcl.wlv.ac.uk/hit-it/

Conference

ConferenceFirst Workshop on Human-Informed Translation and Interpreting Technology
Abbreviated titleHiT-IT
Country/TerritoryBulgaria
CityVarna
Period7/09/177/09/17
Internet address

Cite this