IBM at Interspeech 2025
- Rotterdam, Netherlands
The Linux Foundation's AI_dev is a nexus for developers delving into the intricate realm of open source generative AI and machine learning. At the heart of this event is the belief that open source is the engine of innovation in AI. By uniting the brightest developers from around the world, we aim to ignite discussions, foster collaborations, and shape the trajectory of open source AI.
In this talk, we will introduce two open-source projects vLLM and KServe and explain how they can be integrated to leverage better performance and scalability for LLMs in production. The session will include a demo showcasing their integration. vLLM is a high-performance library specifically designed for LLM inference and serving, offering cutting-edge throughput and efficiency through techniques such as PagedAttention, continuous batching, and optimized CUDA kernels, making it ideal for production environments that demand fast, large-scale LLM serving. KServe is a Kubernetes-based platform designed for scalable model deployment. It provides robust features for managing AI models in production, including autoscaling, monitoring, and model versioning. By combining vLLM's inference optimizations with KServe's scalability, organizations can deploy LLMs effectively in production environments, ensuring fast, low-latency inference and seamless scaling across cloud platforms.
Speaker: Rafael Vasquez
Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including how:
- Support for a wide array of formats—such as PDFs, DOCX, PPTX, HTML, images, and Markdown—and easy conversion to structured Markdown or JSON. - Advanced document understanding through capture of intricate page layouts, reading order, and table structures—ideal for complex analysis. - Integration of the DoclingDocument format with popular AI frameworks—such as LlamaIndex. LangChain, LlamaStack for retrieval-augmented generation (RAG) and QA applications. - Optical character recognition (OCR) support for scanned documents. - Support of Visual Language Models like SmolDocling created in collaboration with Hugging Face. - A user-friendly command line interface (CLI) and MCP connectors for developers. - How to use it as-a-service and at scale by deploy your own docling-serve.
Speakers: Michele Dolfi & Peter Staar