Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Automated climate analysis is critical for real-world applications such as disaster monitoring, infrastructure risk assessment, urban resilience planning, and policy support. Visual Question Answering (VQA) models enable scalable, expert-free interpretation of satellite imagery by allowing users to query visual data in natural language and receive insightful, grounded responses. However, current VQA models often lack the reasoning capabilities needed for complex geospatial questions that require multi-step inference. We propose a VQA framework that integrates chain-of-thought (CoT) reasoning to enhance interpretability and robustness in answering questions over multispectral satellite imagery. Our work focuses on geospatial reasoning using vision-language models specifically tailored for remote sensing data. By incorporating intermediate rationales, the model is better equipped to handle tasks requiring object detection, classification, spatial relationships, and comparative analysis—critical for meaningful decision support in high-stakes domains. Our approach introduces CoT finetuning to train models to generate coherent reasoning steps before arriving at final answers. To further enhance reasoning fidelity, we integrate Direct Preference Optimization (DPO), a reinforcement learning-based method that aligns reasoning quality with accurate responses. Our experiments show that CoT supervision improves performance by 33% over direct-answer baselines. The resulting system enables VQA models to reason over richer, multi-channel Earth observation data and address complex environmental challenges with improved accuracy, interpretability, and real-world utility.
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
Daniel Karl I. Weidele, Hendrik Strobelt, et al.
SysML 2019
Gang Liu, Michael Sun, et al.
ICLR 2025