Conference paper

Efficient Translation of Long Code Blocks using Large Language Models

Abstract

Recent advances in large language models (LLMs) have led to substantial improvements in automatic code translation across programming languages. However, translating large or complex legacy code—such as monolithic COBOL programs—remains a challenge due to the limited context window of transformer-based models and their diminished ability to preserve long-range dependencies. In this work, we consider COBOL-to-Java translation as a case study and propose a hierarchical chunking strategy that decomposes long code segments into smaller, semantically coherent chunks, each fitting within a predefined Lines of Code (LoC) constraint. We formalize the problem of minimizing the number of chunks under an LoC constraint and present both a greedy heuristic and a dynamic programming based lower bound. Experiments on real-world COBOL applications demonstrate that our approach significantly improves the translation quality compared to non-chunked baselines.