Online Semi-Supervised Learning: algorithm and application in metagenomics
article
As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key role in metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm and a learning framework that is naturally suitable for the analysis of large scale, partially labeled metagenome datasets. We propose an online multi-output algorithm that learns by sequentially co-regularizing prediction functions on unlabeled data points and provides improved performance in comparison to several supervised methods. We evaluate predictive performance of the proposed methods on NIH Human Microbiome Project dataset. In particular we address the task of predicting relative abundance of Porphyromonas species in the oral cavity. In our empirical evaluation the proposed method outperforms several supervised regression techniques as well as leads to notable computational benefits when training the predictive model.
TNO Identifier
486919
Source
IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013, 18-21 December 2013, Shanghai, China, pp. 521-525.
Article nr.
6732550
Collation
5 p.
Pages
521-525
Files
To receive the publication files, please send an e-mail request to TNO Repository.