MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and GroundingRevanth Gangi ReddyXilin Ruiet al.2022AAAI 2022
What When and Where? Self-Supervised Spatio Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsBrian ChenNina Shvetsovaet al.2024CVPR 2024