Improving user-friendliness by using visually supported speech recognition
conference paper
While speech recognition in principle may be one of the most natural interfaces, in practice it is not due to the lack of user-friendliness. Words are regularly interpreted wrong, and subjects tend to articulate in an exaggerated manner. We explored the potential of visually supported error correction (speech recognition integrated in a graphical interface) to improve the user-friendliness of speaker independent speech recognition. We tested five schemes (Words only, Words via menu, Number-correction, Background color-correction, and Number + colorcorrection) in four noise environments (Office, Radio,
Cafeteria, and Car noise) on eight subjects. Background color correction fits within the design trend to develop 'low attention interfaces' because it does not require the user to fixate the display accurately. The results show that a visual support significantly improves the speech recognisers recognition rate, on average by 10% in all four noise environments, and each subject benefits. In the car noise condition, task success is highest with Color+number support; in the other three noise environments Colorsupport works best. Most subjects show the tendency to articulate less clearly (= more naturally) in the conditions that include a correction step. A correction step that involves the pronunciation of the background color therefore makes the system more effective and more natural to use.
Cafeteria, and Car noise) on eight subjects. Background color correction fits within the design trend to develop 'low attention interfaces' because it does not require the user to fixate the display accurately. The results show that a visual support significantly improves the speech recognisers recognition rate, on average by 10% in all four noise environments, and each subject benefits. In the car noise condition, task success is highest with Color+number support; in the other three noise environments Colorsupport works best. Most subjects show the tendency to articulate less clearly (= more naturally) in the conditions that include a correction step. A correction step that involves the pronunciation of the background color therefore makes the system more effective and more natural to use.
TNO Identifier
11781
Source title
Poster presented at the 15th annual ACM symposium on user interface software and technology (UIST '02), Paris, 27-30 october
Pages
15 - 16
Files
To receive the publication files, please send an e-mail request to TNO Repository.