Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth StudyShawn TanSonglin Yanget al.2025ICLR 2025
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic PlanningGang LiuMichael Sunet al.2025ICLR 2025
Self-MoE: Towards Compositional Large Language Models with Self-Specialized ExpertsJunmo KangLeonid Karlinskyet al.2025ICLR 2025
Shedding Light on Time Series Classification using Interpretability Gated NetworksYunshi WenTengfei Maet al.2025ICLR 2025
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoningOleh KolnerThomas Bohnstinglet al.2025ICLR 2025