By the same authors

Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Full text download(s)

  • 0900

    533 KB, PDF document


Published copy (DOI)



Publication details

Title of host publicationInterspeech 2017
DatePublished - 2017
Original languageEnglish

Publication series

ISSN (Electronic)1990-9772


Following recent advances in direct modeling of the speech
waveform using a deep neural network, we propose a novel method that directly estimates a physical model of the vocal tract from the speech waveform, rather than magnetic resonance imaging data. This provides a clear relationship between the model and the size and shape of the vocal tract, offering considerable flexibility in terms of speech characteristics such as age and gender. Initial tests indicate that despite a highly simplified physical model, intelligible synthesized speech is obtained. This illustrates the potential of the combined technique for the control of physical models in general, and hence the generation of more natural-sounding synthetic speech.

Bibliographical note

© 2017 ISCA. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details

    Research areas

  • speech synthesis, digital waveguide mesh , deep neural network


Discover related content

Find related publications, people, projects, datasets and more using interactive charts.

View graph of relations