Publication
ACL 2023
Conference paper

Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

Abstract

The wide applicability and adaptability of large language models (LLM) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances show that generative technology is capable of producing harmful content which exacerbates the problems of accountability within model supply chains. Thus, we need a method to investigate how a model was trained or a piece of text was generated and what their source pre-trained model was. In this paper we take a first step to addressing this open problem by tracing back the origin of a given fine-tuned LLM to its corresponding pre-trained base model. We consider different knowledge levels and attribution strategies, and find that we are able to trace back to the original base model with an AUC of 0.804.

Date

09 Jul 2023

Publication

ACL 2023