InteBOMB: Integrating Generic Object Tracking and Segmentation with Pose Estimation for Animal Behavioral Analysis
With the comparative advantage of generalization, our proposed InteBOMB workflow integrates generic object tracking into top-down approaches, not needing to know the animal as a priori.
Hao Zhai*, Haiyang Yan*, Jingyuan Zhou, Jing Liu, Qiwei Xie, Lijun Shen, Xi Chen, Hua Han
Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences
School of Future Technology, School of Artificial Intelligence, University of Chinese Academy of Sciences
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
Research Base of Beijing Modern Manufacturing Development, Beijing University of Technology
Abstract
Over the past years, numerous animal behavior quantification methods have opened up a new field of computational ethology and will make it possible to establish general-purpose methods for fully automated behavior analysis. Existing state-of-the-art multi-animal pose estimation workflows adopt tracking-by-detection for both bottom-up and top-down approaches, which have to be trained for diverse animal appearances. With the comparative advantage of generalization, our proposed InteBOMB workflow integrates generic object tracking into top-down approaches, not needing to know the animal as a priori. Specifically, our main contribution includes two strategies for tracking and segmentation in laboratory scenes and two techniques for pose estimation in natural scenes. The “background enhancement” strategy calculates foreground-background contrastive and triplet losses, building more discriminative correlation maps via few-shot learning. The “online proofreading” strategy stores human-in-the-loop long-term memory and dynamic short-term memory with a score-based update mechanism via active learning. The “automated labeling suggestion” technique reuses the visual features saved during tracking, selecting representative frames to label the training sets. The “joint behavior analysis” technique also brings these features to combine with other modalities, extending the latent space for behavior classification and clustering. Notably, we collected six mouse and six non-human primate datasets to benchmark laboratory and natural scenes, which evaluated our zero-shot generic trackers and the high-performance joint latent space.