09 May 2022
4 minute read

What are foundation models?

The future of AI is flexible, reusable AI models that can be applied to just about any domain or industry task.

What are foundation models?

The future of AI is flexible, reusable AI models that can be applied to just about any domain or industry task.

Over the last decade, we’ve seen an explosion of applications for artificial intelligence. In that time, we’ve seen AI go from a purely academic endeavor to a force powering actions across myriad industries and affecting the lives of millions each day.

In recent years, we’ve managed to build AI systems that can learn from thousands, or millions, of examples to help us better understand our world, or find new solutions to difficult problems. These large-scale models have led to systems that can understand when we talk or write, such as the natural-language processing and understanding programs we use every day, from digital assistants to speech-to-text programs. Other systems, trained on things like the entire work of famous artists, or every chemistry textbook in existence, have allowed us to build generative models that can create new works of art based on those styles, or new compound ideas based on the history of chemical research.

While many new AI systems are helping solve all sorts of real-world problems, creating and deploying each new system often requires a considerable amount of time and resources. For each new application, you need to ensure that there’s a large, well-labelled dataset for the specific task you want to tackle. If a dataset didn’t exist, you’d have to have people spend hundreds or thousands of hours finding and labelling appropriate images, text, or graphs for the dataset. Then the AI model has to learn to recognize everything in the dataset, and then it can be applied to the use case you have, from recognizing language to generating new molecules for drug discovery. And training one large natural-language processing model, for example, has roughly the same carbon footprint as running five cars over their lifetime.

Building a foundation for AI models

The next wave in AI looks to replace the task-specific models that have dominated the AI landscape to date. The future is models that are trained on a broad set of unlabeled data that can be used for different tasks, with minimal fine-tuning. These are called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence. We’ve seen the first glimmers of the potential of foundation models in the worlds of imagery and language. Early examples of models, like GPT-3, BERT, or DALL-E 2, have shown what’s possible. Input a short prompt, and the system generates an entire essay, or a complex image, based on your parameters, even if it wasn’t specifically trained on how to execute that exact argument or generate an image in that way.

What makes these new systems foundation models is that they, as the name suggests, can be the foundation for many applications of the AI model. Using self-supervised learning and transfer learning, where the model can apply information it’s learnt about one situation to another. While the amount of data is considerably more than the average person needs to transfer understand from one task to another, the end result is relatively similar: You learn to drive on one car, for example, and without too much effort, you can drive most other cars — or even a truck or a bus.

We’ve seen what almost seems like inherent creativity in some of the early foundation models, with AI able to string together coherent arguments, or create entirely original pieces of art. But the value in foundation models can theoretically extend into any domain. At IBM Research, we’ve been studying for years on how to make AI’s applicability more broad and flexible, and since Stanford’s first paper on the topic in 2021, something we’ve been trying to bring to the world of industry.

Let’s take an example in the world of natural-language processing, one of the areas where foundation models are already quite well established. With the previous generation of AI techniques, if you wanted to build an AI model that could summarize bodies of text for you, you’d need tens of thousands of labeled examples just for the summarization use case. With a pre-trained foundation model, we can reduce labeled data requirements dramatically. First, we could fine-tune it domain-specific unlabeled corpus to create a domain-specific foundation model. Then, using a much smaller amount of labeled data, potentially just a thousand labeled examples, we can train a model for summarization. The domain-specific foundation model can be used for many tasks as opposed to the previous technologies that required building models from scratch in each use case.

We’ve started to sow the seeds of foundation models across much of our AI research. We’re looking into how CodeNet, our massive dataset of many of the most popular coding languages from the past and present, can be leveraged into a model that would be foundational to automating and modernizing countless business processes. Imagine legacy systems with the power to utilize the best parts of the modern web, or programs that can code and update themselves, with little need for human oversight.

Similarly, late last year, we launched a version of our open-source CodeFlare tool that drastically reduces the amount of time it takes to set up, run, and scale machine learning workloads for future foundation models. It’s the sort of work that needs to be done to ensure that we have the processes in place for our partners to work with us, or on their own, to create foundation models that will solve a host of problems they have. For example, a financial-services company could customize a foundation model they have for languages just for sentiment analysis.

IBM has also seen the value of foundation models: We implemented foundation models across our Watson portfolio already and have seen that their accuracy clearly surpasses the previous generation of models by a large margin, while still being cost-effective. With pre-trained foundation models, Watson NLP could train sentiment analysis on a new language using as little as a few thousand sentences — 100 times fewer annotations required than previous models. In its first seven years, Watson covered 12 languages. Using foundation models, it jumped to cover 25 languages in about a year.

We believe that foundation models will dramatically accelerate AI adoption in enterprise. Reducing labeling requirements will make it much easier for businesses to dive in, and the highly accurate, efficient AI-driven automation they enable will mean that far more companies will be able to deploy AI in a wider range of mission-critical situations. Our goal is to bring the power of foundation models to every enterprise in a frictionless hybrid-cloud environment.

It’s an exciting time in artificial intelligence research, and to learn more about the potential of foundation models in enterprise, watch this video by our partners at Red Hat.


09 May 2022