The importance of prior probabilities for entry page search

conference paper
An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL from proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.
TNO Identifier
236826
ISSN
01635840
Source title
Proceedings of the Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 11 August 2002 through 15 August 2002, Tampere
Editor(s)
Beaulieu M.
Baeza-Yates R.
Myaeng S.H.
Jarvelin K.
Pages
27-34