Testing semantic dominance in Mian gender: Three machine learning models

Marc Allassonnière-Tang, Dunstan Brown, Sebastian Fedden

Research output: Contribution to journalArticlepeer-review

Abstract

The Trans New Guinea language Mian has a four-valued gender system that has been analyzed in detail as semantic. This means that the principles of gender assignment are based on the meaning of the noun. Languages with purely semantic systems are at one end of a spectrum of possible assignment types, while others are assumed to have both semantic and formal (i.e. phonology- or morphology-based) assignment. Given the possibility of gender assignment by both semantic and formal principles, it is worthwhile testing the empirical validity of the categorization of the Mian system as predominantly semantic. Here we apply three machine- learning models to determine independently what role semantics and phonology play in predicting Mian gender. Information about the formal and semantic features of nouns is extracted automatically from a dictionary. Different types of computational classifiers are trained to predict the grammatical gender of nouns, and the performance of the computational classifiers is used to assess the relevance of form and semantics in relation to gender prediction. The results show that semantics is dominant in predicting the gender of nouns in Mian. While it validates the original analysis of the Mian system, it also provides confirmation that form- based and semantic features do not contribute equally in all languages with gender. More generally our work also demonstrates the value of computational methods to validate analyses of gender systems.
Original languageEnglish
Pages (from-to)302-334
Number of pages33
JournalOceanic Linguistics
Volume60
Issue number2
Early online date11 Oct 2021
DOIs
Publication statusPublished - 1 Dec 2021

Bibliographical note

© 2021 University of Hawai'i Press. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details.

Keywords

  • gender
  • gender agreement
  • machine learning
  • neural networks
  • decision trees
  • Random forests
  • Trans-New-Guinea Languages

Cite this