Learning Situation Hyper-Graphs for Video Question AnsweringAisha Urooj KhanHilde Kuehneet al.2023CVPR 2023
Grounding Everything: Emerging Localization Properties in Vision-Language TransformersWalid BousselhamFelix Petersenet al.2024CVPR 2024