Aligning Large Multimodal Models with Factually Augmented RLHFZhiqing SunSheng Shenet al.2024ACL 2024
Bringing Image Structure to Video via Frame-Clip Consistency of Object TokensElad Ben-AvrahamRoei Herziget al.2022NeurIPS 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene DataRoei HerzigOfir Abramovichet al.2024WACV 2024