Annotating URLs with query terms: What factors predict reliable annotations?
conference paper
A number of recent studies have investigated the relation be-ween URLs and associated query terms from search engine log files. In [5], the query terms associated with the domain of a URL were used as features for a URL classification task.
The idea is that query terms that lead to successful classification of a URL are reliable semantic descriptors of the URL content. We follow up on this work by investigating which properties of a URL and its associated query terms predict the classification success. We construct a number of URL and query properties as predictors and proceed to analyze these in-depth. We conclude that the classification success | and thus the reliability of the query terms as URL descriptors | cannot easily be predicted from properties of the URL and the queries.
The idea is that query terms that lead to successful classification of a URL are reliable semantic descriptors of the URL content. We follow up on this work by investigating which properties of a URL and its associated query terms predict the classification success. We construct a number of URL and query properties as predictors and proceed to analyze these in-depth. We conclude that the classification success | and thus the reliability of the query terms as URL descriptors | cannot easily be predicted from properties of the URL and the queries.
TNO Identifier
445980
Source title
Proceedings of the Understanding the User (UIIR) workshop at SIGIR 2009