Title
Evaluating spoken dialogue systems according to de-facto standards: A case study
Author
Möller, S.
Smeele, P.
Boland, H.
Krebber, J.
TNO Industrie en Techniek
Publication year
2007
Abstract
In the present paper, we investigate the validity and reliability of de-facto evaluation standards, defined for measuring or predicting the quality of the interaction with spoken dialogue systems. Two experiments have been carried out with a dialogue system for controlling domestic devices. During these experiments, subjective judgments of quality have been collected by two questionnaire methods (ITU-T Rec. P.851 and SASSI), and parameters describing the interaction have been logged and annotated. Both metrics served the derivation of prediction models according to the PARADISE approach. Although the limited database allows only tentative conclusions to be drawn, the results suggest that both questionnaire methods provide valid measurements of a large number of different quality aspects; most of the perceptive dimensions underlying the subjective judgments can also be measured with a high reliability. The extracted parameters mainly describe quality aspects which are directly linked to the system, environmental and task characteristics. Used as an input to prediction models, the parameters provide helpful information for system design and optimization, but not general predictions of system usability and acceptability. © 2005 Elsevier Ltd. All rights reserved.
Subject
Perceptive dimensions
Prediction models
Questionnaire methods
Spoken dialogue systems
Database systems
Number theory
Optimization
Reliability
Usability engineering
Speech recognition
To reference this document use:
http://resolver.tudelft.nl/uuid:092836fd-1f78-41bd-8bba-af785e55a327
DOI
https://doi.org/10.1016/j.csl.2005.11.003
TNO identifier
239811
ISSN
0885-2308
Source
Computer Speech and Language, 21 (1), 26-53
Document type
article