About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Improving Hugging Face training efficiency through packing with flash attention
Technical note
Rhui Dih Lee, Arthur Zucker, Achintya Kundu, Laura Wynter, Raghu Ganti, and Mayank Mishra