/Improving Prosody Through Analysis by Synthesis

Improving Prosody Through Analysis by Synthesis

Kevin A. Lenzo
PhD Thesis, Tech. Report, CMU-RI-TR-17-11, Robotics Institute, Carnegie Mellon University, March, 2017

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


Prosody and prosodic modeling in trainable Speech Synthesis systems are often based on large corpora of automatically annotated training data; however, these annotations are often incorrect. In practice, this has been either addressed through labor intensive manual annotation or simply ignored. In order to overcome this problem and improve prosodic realization, an iterative model-based method is proposed for improving linguistic structure, segmentation, and prosodic annotations that correspond to the delivery of each utterance as regularized across the data. For each iteration, the training utterances are resynthesized according to the existing symbolic annotation. Values of various features and subgraph structures are “twiddled:” each is perturbed based on the features and constraints of the model. Twiddled utterances are evaluated using an objective function appropriate to the type of perturbation and compared with the unmodified, resynthesized utterance. The instance with least error is assigned as the current annotation, and the entire process is repeated. At each iteration, the model is re-estimated, and the distributions and annotations regularize across the corpus. As a result, the annotations have more accurate and effective distributions, which leads to improved control and expressiveness given the features of the model.

BibTeX Reference
author = {Kevin A. Lenzo},
title = {Improving Prosody Through Analysis by Synthesis},
year = {2017},
month = {March},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-11},