Print Email Facebook Twitter Coarse-to-Fine Visual Question Answering by Iterative, Conditional Refinement Title Coarse-to-Fine Visual Question Answering by Iterative, Conditional Refinement Author Burghouts, G.J. Huizinga, W. Publication year 2022 Abstract Visual Question Answering (VQA) is a very interesting tech nique to answer natural language questions about an image. Recent methods have focused on incorporating knowledge into an improved VQA model, by augmenting the training set, representing scene graphs, or including reasoning. We also leverage knowledge to make VQA more robust. Yet we take a different route: we take the VQA model as-is and extend it with a novel algorithm called Guided-VQA that guides the questioning by leveraging knowledge to obtain better answers. This enables knowledge-extended VQA while not having to retrain the VQA model. This is beneficial when computing resources and/or time to adapt to new knowledge are limited. We start with the observation that VQA has difficulties with answering compositional and finegrained questions. We propose to solve this by a coarse-to-fine scheme of posing ques tions. The proposed Guided-VQA algorithm is an iterative, conditional refinement that decomposes a compositional, finegrained question into a sequence of coarse-to-fine questions by leveraging taxonomic knowledge about the involved objects. On Visual Genome, we show that it improves the answers significantly over standard VQA. This is relevant for robust deployment of VQA where resources or adaptation time are limited. Subject External knowledgeImage analysisIterative refinementVisual Question AnsweringKnowledge managementVisual languagesCoarse to fineExternal knowledgeImage-analysisIterative refinementNatural language questionsNovel algorithmQuestion AnsweringScene-graphsTraining setsIterative methods To reference this document use: http://resolver.tudelft.nl/uuid:49075619-5020-4428-a871-26a8c2aa44d7 TNO identifier 970942 Publisher Springer Science and Business Media Deutschland GmbH ISBN 9783031064296 ISSN 0302-9743 Source Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 21st International Conference on Image Analysis and Processing, ICIAP 2022, 23 May 2022 through 27 May 2022, 418-428 Document type conference paper Files To receive the publication files, please send an e-mail request to TNO Library.