Zero-shot neuro-symbolic parsing of body keypoints

conference paper
A new approach for distinguishing neutral (e.g. walking) from threatening (e.g. aiming a handgun) poses, without training, is presented. There are various AI-based models that can classify human poses, but these oftentimes do not generalize to defence scenarios. The lack of data with threatening poses makes it hard to train new models. Our approach circumvents re-training and is a zero-shot, rule-based classification method for threatening poses. We combine a pretrained body part keypoint detection model with the neuro-symbolic framework Scallop. We compare the pretrained models MMPose and YOLOv8x-pose for keypoint detection. We use images from the YouTube Gun Detection Dataset containing persons holding a weapon and label them manually as having a ‘neutral’ or ‘aiming’ pose; the latter was further subdivided into ‘aiming a handgun’ and ‘aiming a rifle’. Scallop is used to define logic-based rules for classification, using the keypoints as input: e.g. the rule ‘aiming a handgun’ includes 'hands at shoulder height’ and 'hands far away from the body'. Recall and precision results for aiming are 0.75/0.81 and 0.83/0.73, for MMPose and YOLOv8x-pose, respectively. Average recall and average precision for ‘aiming a handgun’ and ‘aiming a rifle’ are 0.78/0.36 and 0.76/0.43, for MMPose and YOLOv8x-pose, respectively. Combining neuro-symbolic AI with pretrained pose estimation techniques shows promising results for detecting threatening human poses. Performance of neutral-versus-aiming classification is similar for both approaches, however, MMPose performs better for multi-class classification. In future research, we will focus on improving rules, identifying more poses, and using videos to obtain sequences of poses or activities.
TNO Identifier
997307
Publisher
SPIE
Source title
SPIE SECURITY + DEFENCE Electro-Optical and Infrared Systems: Technology and Applications XXI, 16-20 September 2024 Edinburgh, United Kingdom