AI for Code

Important information

Location:	IBM Research – Israel Haifa site, University of Haifa Campus, Mt. Carmel
Date:	June 11, 2025
Time:	14:00-18:00

AI for Enterprise Code
Rami Katan, Chief Scientist of Mainframe Technologies, IBM Research

AI code assistants shine with programming languages and environments which have rich open code and documentation available for training. Enterprise legacy code, which is traditionally kept private by all organizations, is very challenging to handle by LLMs due to lack of training data at large scale and complexity. In this talk I will cover the vast need for AI for enterprise code, specifically for mainframe applications and languages, with the unique challenges of coping with scarce availability of code and technical documentation for training and evaluation, and with the size and complexity of legacy systems. I will share takeaways from the IBM practice of training IBM's Granite models and developing WatsonX Code Assistant for Z, as well as the agentic use of static code analysis for covering the application context of millions of lines of code.

Early Fault-Detection in the Development of Exceedingly Complex Reactive Systems
David Harel and Assaf Marron, Weizmann Institute of Science

Finding hidden faults in reactive systems early in planning and development is critical for human safety, the environment, society and the economy. However, the ever-growing complexity of reactive systems and their interactions, combined with the absence of adequate technical details in early development stages, pose a great obstacle. The problem is exacerbated by the constant evolution of systems, their extensive and growing autonomy, and their interwovenness with other systems and the physical world. Appropriately, such systems may be termed super-reactive. We propose an architecture for models and tools that help overcome such barriers and enable simulation, systematic analysis, and fault detection and handling, early in the development of super-reactive systems. The main innovations are: (i) the allowing of natural language (NL) specifications in elements of otherwise standard models and specification formalisms, while deferring the interpretation of such NL elements to simulation and validation time; and (ii) a focus on early formalization of tacit assumptions and interdependencies. The approach is facilitated by combining newly specialized tools with standard development and verification facilities, and with the inference and abstraction capabilities of large language models (LLMs) and associated AI techniques. An important ingredient in the approach is the domain knowledge embedded in LLMs. Special methodological measures are proposed to mitigate well known limitations of LLMs. The talk summarizes a paper presented and published in Modelsward 2025 conference

Growing Up: Using LLMs to Solve Messy Real-World Problems
Danny Yellin, Reichman University

The rapid maturation of Large Language Models (LLMs) for code generation has widely influenced software development, with LLMs now routinely used to synthesize code of increasing complexity, where complexity is usually measured by the algorithmic sophistication required to solve the problem. However, in the real world, much of software engineering deals with complexity of a different nature; namely, in the integration of different software components and technologies and in the distributed nature of modern systems. In this talk, we analyze the ability of LLMs to synthesize applications made up of multiple micro-services (MSs). To successfully generate each MS, the generated code must be able to integrate multiple frameworks, convert between data types, import the correct packages, provide the correct response codes, and invoke other MSs and external services when required. We compare LLMs ability to generate the correct code using different GPT models and show that while the newer model provides much better results, it still struggles with several issues.

Code Understanding with LLMs: Beyond the Hype
Shai Rubin, Co-Founder, Strudel

The current LLM landscape feels like a frenzy—every week brings a new tool or model promising massive productivity gains. As engineers, how do we evaluate these tools beyond the hype and anecdotes like “Wow, Cursor saved my team 90% of their time”? Benchmarks exist, but much like electric car stats, they rarely reflect real-world ROI.

In this talk, I’ll share results from applying 15 popular language models to summarize hundreds of real-world code files. Using practical metrics—verbosity, latency, cost, accuracy (from a human perspective), and information gain—we’ll assess how these models actually perform and what that says about their ROI. The goal: cut through the noise and guide better, data-driven choices.

Trust and Verification in Vibe Coding
Hila Peleg, Technion - Israel Institute of Technology

Code generation with an LLM (colloquially: "vibe coding") is, implicitly, an interaction model. In this interaction, a prompt explaining intent is given to the model, the model returns code, and then, as a crucial step, the user verifies the code before either using it or prompting the model for changes. While the limitations of the generation step are often talked of, in this talk we'll look instead at what research in human-computer interaction tells us about the verification step: what do we know about the human ability to verify code? What causes over- or under-utilization of AI code generation tools? And can better design make for safer and better resulting code?

AI for Code Through the Ages
Yishai Feldman, IBM Research

Nowadays, the term "AI for Code" is usually understood to refer to LLM-based programming assistants. These show tantalizing promises in some areas, but come with inherent risks, especially for software development, where safety and accuracy are paramount.

Research on the application of AI techniques to software development has been going on for decades, and these more "traditional" analytical methods should be combined with the power of LLMs in a way that exploits the best of both and mitigates their weaknesses.

I spent the past 40 years developing intelligent tools for software and systems engineering, for tasks such as program understanding, refactoring, and transformation, on formalisms such as contracts and statecharts, and languages including assembly language, COBOL, and Java.

Drawing on my experiences from a career in AI techniques for Code, I will present several ideas on how to use LLMs in software development to complement analytical methods more effectivity and safely.