Visual Prompt Tuning and Ensemble Undersampling for One-Shot Vehicle Classification

conference paper
Vision-language foundation models for image classification, such as CLIP, suffer from a poor performance when applied to images of objects dissimilar to the training data. A relevant example of such a mismatch can be observed when classifying military vehicles. In this work, we investigate techniques to extend the capabilities of CLIP for this application. Our contribution is twofold: (a) we study various techniques to extend CLIP with knowledge on military vehicles and (b) we propose a two-stage approach to classify novel vehicles based on only one example image. Our dataset consists of 13 military vehicle classes, with 50 images per class. Various techniques to extend CLIP with knowledge on military vehicles were studied, including: context optimization (CoOp), vision-language prompting (VLP), and visual prompt tuning (VPT); of which VPT was selected. Next, we studied one-shot learning approaches to have the extended CLIP classify novel vehicle classes based on only one image. The resulting two-stage ensemble approach was used in a number of leave-one-group-out experiments to demonstrate performance. Results show that, by default, CLIP has a zero-shot classification performance of 48% for military vehicles. This can be improved to >80% by fine-tuning with example data, at the cost of losing the ability to classify novel (previously unseen) military vehicle types. A naive one-shot approach results in a classification performance of 19%, whereas our proposed one-shot approach achieves 70% for novel military vehicle classes. In conclusion, our proposed two-stage approach can extend CLIP for military vehicle classification. In the first stage, CLIP is provided with knowledge on military vehicles using domain adaptation with VPT. In the second stage, this knowledge can be leveraged for previously unseen military vehicle classes in a one-shot setting.
TNO Identifier
997306
Publisher
SPIE
Source title
Proceedings vol 13206 Security + defence 16-20 September 2024, Edinburgh