Development of Indonesian Text-to-Audiovisual Synthesis System Using Syllable Concatenation Approach to Support Indonesian Learning

Arifin _, Surya Sumpeno, Mochamad Hariadi, Arry Maulana Syarif


This study aims to develop of Indonesian Text-to-Audiovisual synthesis system using syllable concatenation approach to support Indonesian learning. This system can visualize the syllable pronunciation synchronized with the speech signal so that it can provide a realistic illustration of the articulator movement when each phoneme is pronounced. Syllable concatenation approach is used to realize a realistic visualization by assembling articulation and coarticulation in the form of syllables. In the development of the system, we have recorded speech database in the syllables form which refers to the patterns of syllables in Indonesian. The syllable concatenation approach is used to concatenate viseme of each phoneme, and to form the visualization of syllable pronunciations. It is synchronized with the corresponding speech from the speech database. Evaluation of this system is conducted based on a "lips-reading" of the 10 Indonesian sentences entered into the system. Ratings are based on the degree of correspondence between the syllable pronunciation and the speech produced. Assessment of all respondents is calculated using MOS (Mean Opinion Score). The calculation results show that the Indonesian text-to-audiovisual system has produced the pronunciation visualization more realistic and smoother.


Indonesian Text-To-Audiovisual synthesis system; Indonesian texts; Syllable concatenation approach; viseme

Full Text:


Copyright (c) 2017 Arifin _, Surya Sumpeno, Mochamad Hariadi, Arry Maulana Syarif

International Journal of Emerging Technologies in Learning (iJET) – eISSN: 1863-0383
Creative Commons License
Scopus logo Clarivate Analyatics ESCI logo EI Compendex logo IET Inspec logo DOAJ logo DBLP logo Learntechlib logo EBSCO logo Ulrich's logo Google Scholar logo MAS logo