Progress in the AMIDA speaker diarization system for meeting data
conference paper
In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER. © 2008 Springer-Verlag Berlin Heidelberg.
Topics
TNO Identifier
240920
ISSN
03029743
Source title
2nd Annual Classifcation of Events Activities and Relationships, CLEAR 2007 and Rich Transcription, RT 2007, 8 May 2007 through 11 May 2007, Baltimore, MD, Conference code: 72688
Pages
475-483
Files
To receive the publication files, please send an e-mail request to TNO Repository.