Articulatory speech synthesis has the potential to offer more natural sounding synthetic speech than established concatenative or parametric synthesis methods. Time-domain acoustic models are particularly suited to the dynamic nature of the speech signal, and recent work has demonstrated the potential of dynamic vocal tract models that accurately reproduce the vocal tract geometry. This paper presents a dynamic 3D digital waveguide mesh (DWM) vocal tract model, capable of movement to produce diphthongs. The technique is compared to existing dynamic 2D and static 3D DWM models, for both monophthongs and diphthongs. The results indicate that the proposed model provides improved formant accuracy over existing DWM vocal tract models. Furthermore, the computational requirements of the proposed method are significantly lower than those of comparable dynamic simulation techniques. This paper represents another step toward a fully functional articulatory vocal tract model which will lead to more natural speech synthesis systems for use across society.
|Number of pages||13|
|Journal||IEEE/ACM Transactions on Audio, Speech, and Language Processing|
|Early online date||17 Nov 2017|
|Publication status||Published - Feb 2018|
Bibliographical note© 2017 IEEE. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details.
- Speech synthesis
- digital waveguide mesh
- numerical acoustic modeling