Voice-Modeling based on a given F0-track

Publication by Stefan Densow, Thorsten Herfet, Eric Haschke, Dietrich Klakow

Abstract:

Separation of a mixture of speech signals requires knownledge of orthogonalities between speakers. In this work, orthogonal fundamental frequencies are assumed, which enables the separation of speech signals based on a time-frequency representation. Every speaker is assigned some time-frequency points, which represent the energy at the harmonics.The task of this work is the development of a method that allows for the reconstruction of the original speech signal from the knownledge of orthogonally sparse energy tracks and the fundamental frequency track. We first consider the calculation of the harmonic energy tracks. Moreover, we discuss the process of speech production and derive a theoretical model which will serve as a base for a reconstruction method in the time domain.