Understanding vocal tract shape variation: lessons from face recognition

Amelia Gully*, Nick Pears

*Corresponding author for this work

Research output: Contribution to conferenceAbstract

Abstract

The field of human face analysis, for applications such as face recognition, is a mature field with a rich history of technological development. These range from early hand-crafted modelling approaches, to the current big data based deep learning approaches, and more recently, hybrid approaches that parameterise explicitly-defined, semantically meaningful models using deep networks. The study of individual vocal tract shape - a critical component in understanding the sources of individual variation in the speech signal - is a newer field, with detailed study of individual shape variation in 3D only recently becoming possible due to the increased availability of medical imaging data for a range of individuals. Although there are important differences between the two domains - particularly that vocal tract shape is usually not observed directly during speech - there are also sufficient similarities that the study of face analysis is instructive when developing models that describe vocal tract shape variation. In particular, concepts such as disentanglement (i.e. separating facial identity from facial expression), and the modelling of individual movement patterns, offer a framework for the development of a vocal tract model that describes both between- and within-subject shape variation. This study considers the lessons that can be learned from the development of face analysis technology for understanding vocal tract shape variation, adding a critical degree of explainability to vocal identification methods based on the acoustic signal alone.
Original languageEnglish
Publication statusUnpublished - 29 Aug 2024
EventVoiceID - Marburg, Germany
Duration: 28 Aug 202430 Aug 2024

Conference

ConferenceVoiceID
Country/TerritoryGermany
CityMarburg
Period28/08/2430/08/24

Cite this