Abstract
The field of human face analysis, for applications such as face recognition, is a mature field with a rich history of technological development. These range from early hand-crafted modelling approaches, to the current big data based deep learning approaches, and more recently, hybrid approaches that parameterise explicitly-defined, semantically meaningful models using deep networks. The study of individual vocal tract shape - a critical component in understanding the sources of individual variation in the speech signal - is a newer field, with detailed study of individual shape variation in 3D only recently becoming possible due to the increased availability of medical imaging data for a range of individuals. Although there are important differences between the two domains - particularly that vocal tract shape is usually not observed directly during speech - there are also sufficient similarities that the study of face analysis is instructive when developing models that describe vocal tract shape variation. In particular, concepts such as disentanglement (i.e. separating facial identity from facial expression), and the modelling of individual movement patterns, offer a framework for the development of a vocal tract model that describes both between- and within-subject shape variation. This study considers the lessons that can be learned from the development of face analysis technology for understanding vocal tract shape variation, adding a critical degree of explainability to vocal identification methods based on the acoustic signal alone.
Original language | English |
---|---|
Publication status | Unpublished - 29 Aug 2024 |
Event | VoiceID - Marburg, Germany Duration: 28 Aug 2024 → 30 Aug 2024 |
Conference
Conference | VoiceID |
---|---|
Country/Territory | Germany |
City | Marburg |
Period | 28/08/24 → 30/08/24 |