Video-text compliance: Activity verification based on natural language instructions

Mayoore Jaiswal; Frank Liu; Anupama Jagannathan; Anne Gattiker; Inseok Hwang; Jinho Lee; Matthew Tong; Sahil Dureja; Soham Shah; Peter Hofstee; Valerie Chen; Suvadip Paul; Rogerio Feris

doi:10.1109/ICCVW.2019.00188

ICCVW 2019

Conference paper

01 Oct 2019

Video-text compliance: Activity verification based on natural language instructions

View publication

Abstract

We define a new multi-modal compliance problem, which is to determine if the human activity in a given video is in compliance with an associated text instruction. Solutions to the compliance problem could enable automatic compliance checking and efficient feedback in many real-world settings. To this end, we introduce the Video-Text Compliance (VTC) dataset, which contains videos of atomic activities, along with text instructions and compliance labels. The VTC dataset is constructed by an auto-augmentation technique, preserves privacy, and contains over 1.2 million frames. Finally, we present ComplianceNet, a novel end-to-end trainable compliance network that improves the baseline accuracy by 27.5% on average when trained on the VTC dataset. We plan to release the VTC dataset to the community for future research.

Conference paper