Diphthong Synthesis Using the Dynamic 3D Digital Waveguide Mesh

Research output: Contribution to journalArticlepeer-review

Abstract

Articulatory speech synthesis has the potential to offer more natural sounding synthetic speech than established concatenative or parametric synthesis methods. Time-domain acoustic models are particularly suited to the dynamic nature of the speech signal, and recent work has demonstrated the potential of dynamic vocal tract models that accurately reproduce the vocal tract geometry. This paper presents a dynamic 3D digital waveguide mesh (DWM) vocal tract model, capable of movement to produce diphthongs. The technique is compared to existing dynamic 2D and static 3D DWM models, for both monophthongs and diphthongs. The results indicate that the proposed model provides improved formant accuracy over existing DWM vocal tract models. Furthermore, the computational requirements of the proposed method are significantly lower than those of comparable dynamic simulation techniques. This paper represents another step toward a fully functional articulatory vocal tract model which will lead to more natural speech synthesis systems for use across society.
Original languageEnglish
Article number8114217
Pages (from-to)243-255
Number of pages13
JournalIEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume26
Issue number2
Early online date17 Nov 2017
DOIs
Publication statusPublished - Feb 2018

Bibliographical note

© 2017 IEEE. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details.

Keywords

  • Speech synthesis
  • digital waveguide mesh
  • diphthongs
  • numerical acoustic modeling

Cite this