IBM will join as a premier member, with a seat on the foundation’s governing board.
The last few years have been transformative in AI. What modern AI systems are capable of would’ve been unimaginable a generation ago. Much of that revolution has been fueled by the cloud. Being able to train massive models wherever your data lives, and call on AI systems wherever and whenever you need them, has infused AI into nearly every aspect of our lives and work.
But no one company or organization has brought this revolution to bear. It has required research from around the world, and systems built on open-source software that means researchers don’t have to reinvent the wheel every time they want to get their work done. To further the potential of research in AI and open-source collaboration, IBM has joined the PyTorch Foundation as a premier member. The foundation, which is a project of the Linux Foundation, serves as a neutral space for the deep-learning community to collaborate on the open-source PyTorch framework and ecosystem.
“When it comes to AI for business, a key term is scalability — without the ability to successfully scale AI, businesses will not get the most out of this technology,” said Raghu Ganti, Principal Research Scientist at IBM Research. “This continues to be a massive challenge facing enterprise AI today. One way we’re achieving this is an open-source collaboration with the PyTorch Foundation.”
By joining the foundation as a premier member, IBM receives a seat on the governing board. The board sets policy through bylaws, mission and vision statements, setting the scope of the foundation’s initiatives. Ganti, who co-leads IBM Research’s AI foundation model training and validation platform, will fill that seat. His team primarily contributes to the PyTorch training components, with the goal of democratizing the training and validation of foundation models.
“IBM’s commitment to open source helps democratize access to AI tools and technologies, making them more accessible to researchers and developers,” said Ibrahim Haddad, executive director of the PyTorch Foundation. “Their collaboration with the community will improve the broader AI community.”
IBM and PyTorch have already collaborated on two major projects. The first enables massive foundation models with billions of parameters to be trained efficiently on standard cloud networking infrastructure, such as Ethernet networking, rather than costlier components like InfiniBand network systems. IBM and PyTorch have also worked on ways to drastically reduce the cost of checkpointing for AI training. The teams together worked on amending PyTorch to move from a shared file system to object storage for the checkpointing workflow. This slashed the costs — while only requiring a change to a single line of code.
“IBM and PyTorch are getting further down the path to where training and running AI models is quicker, more cost-effective, and able to scale to even larger models — all on IBM’s hybrid cloud platform,” Ganti said.
IBM’s new watsonx platform that launched in July leverages this collaboration with PyTorch. It offers a production-ready, enterprise-grade software stack for end-to-end training, fine-tuning, and inference for AI foundation models. IBM has used PyTorch to create cutting-edge foundation models for language, code, geospatial data, and IT data that will live on the watsonx platform.
“By joining the PyTorch Foundation, we aim to contribute our expertise and resources to further advance PyTorch’s capabilities and make AI more accessible in hybrid cloud environments with flexible hardware options,” said Priya Nagpurkar, vice president of hybrid cloud platform and developer productivity at IBM Research. “Our collaboration with PyTorch will also enable IBM to bring the power of foundation models and generative AI to enterprises using the watsonx platform to drive business transformation.”