Release
3 minute read

Towards a generative future for computing

IBM is exploring a future where generative AI isn’t limited to what’s in the prompt window.

The AI industry has undergone a massive transformation in a comparatively tiny amount of time. A decade ago, the most popular AI models available could take your photo and make it look like a van Gogh painting, or pull data from a chart that you’d then have to crunch yourself. The last couple years of development have made that seem quaint. Generative models can now pen Shakespearean sonnets to your coworkers, conjure up corporate mascot lore, and transform meeting notes into haiku, limericks, or whatever poetic forms the day demands.  

But with all these advances we’ve seen in generative AI, the way we interact with these models remains clunky and ill-fitted to scaling to solve some of the biggest problems facing businesses.  

Right now, when we need something done, we prompt a model with a series of statements, written in full English sentences This has led to an entire field called prompt engineering, which basically amounts to messing around with the words in the prompt when it doesn’t work, until it works better. You can prompt a model with the same basic request written in two slightly different ways and get two different outputs. Even more frustrating, a newer version of a model you’ve found works well in the past might respond completely differently to the same exact prompt. This kind of unpredictability isn’t sustainable for mission-critical or repeatable enterprise workflows.  

It's why, back at this year’s Think conference, IBM researchers unveiled what they see as a new way forward: generative computing. It represents a new computing concept, one that moves beyond ad-hoc prompting toward structured, programmable interaction with models. LLMs should be viewed as new computing elements that require programming structures and development tools, just like any other software.  

We’ve seen green shoots of this concept already in the wild with context engineering, where AI developers have started taking a systematic approach to working with LLMs. Generative programs have the potential to be a more structured way to efficiently and effectively do context engineering. 

When you interact with an LLM today, an API is the intermediary that interacts with the model via tokens, the smallest units of information it can understand. With generative computing, the idea is to replace the API with a runtime equipped with programming abstractions.  

The team is working on abstractions that can remove much of the brittleness of using today’s LLMs. These abstractions include things like creating a set of instructions for a model to follow that will produce the same results regardless of which model is being used, sampling strategies to control the randomness of model outputs, and intrinsic safety guardrails and constraints for how you expect the model to act in its outputs. Instead of imploring a model in English sentences and hoping you’ll get the answer you’re after, the team is working to make this as programmatic as any other form of software.

One way to implement some of these abstractions is with activated low-rank adapters (or aLoRAs). These give foundation models specialized capabilities for specific tasks they need to carry out at inference time — without a delay. They can give models the power to rewrite a user’s query for better retrieval, assess the relevance of retrieved documents, determine if a question is answerable given the context, estimate uncertainty in their answers, detect hallucinations, and even generate sentence-level citations. These new tools are now available for developers in the places they convene, including Hugging Face and vLLM.  

But this is just the start of what the team, led by IBM Research’s VP of AI models, David Cox, sees as a fundamental shift in the way we build for generative AI. In a new blog series expanding upon the work his team is doing, Cox explores what he sees as the general shift that computing has undergone, moving from imperative computing, where someone explicitly lays out instructions for a program to follow, to inductive computing, where the program learns from examples.  

“We believe that generative computing demands new programming models for using LLMs, new fundamental low-level operations performed by LLMs, and new ways of building LLMs themselves,” Cox said in his post.  

This led Cox’s team to create Mellea, a library for writing generative programs. Mellea includes tools developers can use to replace inconsistent agents and brittle prompts with what the team calls “structured, maintainable, robust, and efficient AI workflows.” 

The goal of Mellea is to allow AI builders to ditch the large, unwieldy prompts they’ve relied on to date, and turn them into structured and maintainable mellea problems. Mellea is open-source and available now on GitHub, and is compatible with many inference services and model families.  

This is just a first foray into the burgeoning world of generative computing. To learn more about what’s in store, read Cox’s deep dive into where IBM Research’s generative computing efforts will be focused. 

Related posts