A Grouping Method based on Energy Clustering for Reverberant Speech

Publication by Zhao Li, Thorsten Herfet
Published in iberSPEECH 2014, 2014

Abstract:

The motive of this paper is to group speech units in the time frequency (TF) domain based on a general monaural cue, whose applications include speaker localization, speech enhancement, speech separation, etc. Based on the observation that signal energy from each sound source tends to form a cluster in the TF domain resulting from a bank of gammatone filters, in this paper we use the energy distribution as a monaural cue to group the TF units similar to the classic watershed algorithm but under the control of cluster shape and curve fitting. Experimental results show that the proposed energy clustering has high grouping accuracy (over 93%) and excellent robustness against reverberation. It is also shown that the energy clustering improves a purely localization cuebased separation system by 15-30 percents in the term of hit minus false alarm rate.