About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICCVW 2019
Conference paper
Video-text compliance: Activity verification based on natural language instructions
Abstract
We define a new multi-modal compliance problem, which is to determine if the human activity in a given video is in compliance with an associated text instruction. Solutions to the compliance problem could enable automatic compliance checking and efficient feedback in many real-world settings. To this end, we introduce the Video-Text Compliance (VTC) dataset, which contains videos of atomic activities, along with text instructions and compliance labels. The VTC dataset is constructed by an auto-augmentation technique, preserves privacy, and contains over 1.2 million frames. Finally, we present ComplianceNet, a novel end-to-end trainable compliance network that improves the baseline accuracy by 27.5% on average when trained on the VTC dataset. We plan to release the VTC dataset to the community for future research.