E-mail categorization using partially related training examples

conference paper
Automatic e-mail categorization with traditional classification methods requires labelling of training data. In a real-life setting, this labelling disturbs the working flow of the user. We argue that it might be helpful to use documents, which are generally well-structured in directories on the file system, as training data for supervised e-mail categorization and thereby reducing the labelling effort required from users. Previous work demonstrated that the characteristics of documents and e-mail messages are too different to use organized documents as training examples for e-mail categorization using traditional supervised classification methods. In this paper we present a novel network-based algorithm that is capable of taking into account these differences between documents and e-mails. With the network algorithm, it is possible to use documents as training material for e-mail categorization without user intervention. This way, the effort for the users for labeling training examples is reduced, while the organization of their information flow is still improved. The accuracy of the algorithm on categorizing e-mail messages was evaluated using a set of e-mail correspondence related to the documents. The proposed network method was significantly better than traditional text classification algorithm in this setting. © 2014 ACM.
Elias Network; et al.; Group for Information Retrieval (ACM SIGIR); Spinque; The Assoc. Comput. Machinery, Special Interest; The European Science Foundation
TNO Identifier
516480
Publisher
Association for Computing Machinery
Source title
5th Information Interaction in Context Symposium, IIiX 2014, 26-30 August 2014, Regensburg, Germany
Place of publication
New York
Pages
86-95
Files
To receive the publication files, please send an e-mail request to TNO Repository.