Publication
CVPR 2024
Conference paper

Resource-Efficient Transformer Pruning for Finetuning of Large Models

View publication

Abstract

With the recent advances in vision transformers and large language models (LLMs),finetuning costly large mod-els on downstream learning tasks poses significant chal-lenges under limited computational resources. This pa-per presents a REsource and ComputAtion-efficient Pruning framework (RECAP) for the finetuning of transformer-based large models. RECAP by design bridges the gap between efficiency and performance through an iterative process cycling between pruning, finetuning, and updating stages to explore different chunks of the given large-scale model. At each iteration, we first prune the model with Taylor-approximation-based importance estimation and then only update a subset of the pruned model weights based on the Fisher-information criterion. In this way, RE-CAP achieves two synergistic and yet conflicting goals: re-ducing the GPU memory footprint while maintaining model performance, unlike most existing pruning methods that re-quire the model to be finetuned beforehand for better preser-vation of model performance. We perform extensive exper-iments with a wide range of large transformer-based archi-tectures on various computer vision and natural language understanding tasks. Compared to recent pruning techniques, we demonstrate that RECAP offers significant im-provements in GPU memory efficiency, capable of reducing the footprint by up to 65%.

Date

Publication

CVPR 2024