Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven PriorsIdo AmosJonathan Berantet al.2024ICLR 2024
On the Parameterization and Initialization of Diagonal State Space ModelsAlbert GuAnkit Guptaet al.2022NeurIPS 2022
Diagonal State Spaces are as Effective as Structured State SpacesAnkit GuptaAlbert Guet al.2022NeurIPS 2022
Exploring the limits of decoder-only models trained on public speech recognition corporaAnkit GuptaGeorge Saonet al.2024INTERSPEECH 2024
Diagonal State Space Augmented Transformers for Speech RecognitionGeorge SaonAnkit Guptaet al.2023ICASSP 2023