Evaluating spoken dialogue systems according to de-facto standards: A case study