A transfer learning framework for weak to strong generalizationSeamus SomerstepFelipe Maia Poloet al.2025ICLR 2025
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth StudyShawn TanSonglin Yanget al.2025ICLR 2025
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic PlanningGang LiuMichael Sunet al.2025ICLR 2025
Self-MoE: Towards Compositional Large Language Models with Self-Specialized ExpertsJunmo KangLeonid Karlinskyet al.2025ICLR 2025