Text-to-speech synthesis in computer-assisted language learning

Research output: Chapter in Book/Report/Conference proceedingEntry for encyclopedia/dictionary


In very simple terms, speech synthesis is the process of making the computer talk, and text-to-speech (TTS) synthesis is a specifi c type which takes raw text as input and aims to mimic the human process of reading. TTS is the technology which allows dictation software, such as IBM ViaVoice, to read your dictated texts back to you. It was also the technology behind Texas Instruments’ Speak & Spell spelling bee program launched in the 1980s. While interest in the use of TTS in computer-assisted language learning (CALL) only really began to take hold around the year 2000, some of the potential benefi ts of the use of TTS in CALL had already been identifi ed around the time that Speak & Spell was launched. At this time, Bruce Sherwood (1981) observed that typing/editing text is easier than (re)recording voice, that navigating through a textual database is easier than retrieving recorded samples from an audiotape, and that TTS synthesis has the capacity to generate speech models on demand—benefi ts which researchers continue to see in the use of TTS synthesis in CALL (Dutoit, 1997; Keller and Zellner-Keller, 2000; Handley and Hamel, 2005; Kang, Kashiwagi, Treviranus, & Kaburagi, 2008). This entry discusses the way in which these benefi ts have been exploited to provide learners with tools to support them in their language-learning activities as well as tutorial software focusing on the acquisition of specifi c areas of linguistic
knowledge and language skills.
Original languageEnglish
Title of host publicationThe Encyclopedia of Applied Linguistics
EditorsCarol Chapelle
ISBN (Electronic)978-1-4051-9843-1
ISBN (Print)978-1-4051-9473-0
Publication statusPublished - Nov 2012

Bibliographical note

© 2013 Blackwell Publishing Ltd.

Cite this