Counting in visual question answering : A concept detector based approach
conference paper
Visual Question Answering is a field that combines vision techniques and natural language processing techniques. One of the most challenging question types in this field is counting, such as How many sheep are in this picture. In this paper, we focus on counting questions and improve upon the state-of-the-art method DPPnet. We train concept detectors on the MSCOCO dataset and use these detectors in addition to the pre-final layer from the original visual network. Additionally, we use a postprocessing technique to output the right type of answer to each type of question. Both the concept detectors, and the postprocessing slightly improve performance and is usable on current state-of-the-art methods.
TNO Identifier
745669
Publisher
ACM
Source title
Proceedings of the 2016 ACM International Conference on Multimedia Retrieval ICMR’16 June 6–9, 2016, New York, NY, USA
Pages
1-4
Files
To receive the publication files, please send an e-mail request to TNO Repository.