Skip to main content
Article
Non-Parallel Training for Voice Conversion by Maximum Likelihood Constrained Adaptation
Departmental Papers (ESE)
  • Athanasios Mouchtaris, University of Pennsylvania
  • Jan Van der Spiegel, University of Pennsylvania
  • Paul Mueller, Corticon, Inc.
Document Type
Conference Paper
Date of this Version
5-17-2004
Comments
Copyright 2004 IEEE. Reprinted from Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP 2004) Volume 1, pages I-1 - I-4.
Publisher URL:http://ieeexplore.ieee.org/xpl/tocresult.jsp?isNumber=29343

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
Abstract

The objective of voice conversion methods is to modify the speech characteristics of a particular speaker in such manner, as to sound like speech by a different target speaker. Current voice conversion algorithms are based on deriving a conversion function by estimating its parameters through a corpus that contains the same utterances spoken by both speakers. Such a corpus, usually referred to as a parallel corpus, has the disadvantage that many times it is difficult or even impossible to collect. Here, we propose a voice conversion method that does not require a parallel corpus for training, i.e. the spoken utterances by the two speakers need not be the same, by employing speaker adaptation techniques to adapt to a particular pair of source and target speakers, the derived conversion parameters from a different pair of speakers. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases, and with performance comparable with the ideal case when a parallel corpus is available.

Keywords
  • voice conversion,
  • gaussian mixture model,
  • text-to-speech synthesis,
  • speaker adaptation
Citation Information
Athanasios Mouchtaris, Jan Van der Spiegel and Paul Mueller. "Non-Parallel Training for Voice Conversion by Maximum Likelihood Constrained Adaptation" (2004)
Available at: http://works.bepress.com/jan_vanderspiegel/8/