Bringing Image Structure to Video via Frame-Clip Consistency of Object TokensElad Ben-AvrahamRoei Herziget al.2022NeurIPS 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene DataRoei HerzigOfir Abramovichet al.2024WACV 2024