About
Neural Information Processing Systems (NeurIPS) is a leading machine learning and computational neuroscience conference. IBM Research is excited to sponsor NeurIPS again this year as a Platinum sponsor.
We invite all attendees to visit us during the event at booth number 243, from Tuesday, December 10 through Thursday, December 12.
We look forward to meeting you and telling you more about our latest work and career opportunities at IBM Research. At our booth we’ll be demoing projects on a broad range of AI topics such as foundation models, trustworthy AI, natural language processing and understanding, knowledge and reasoning, AI automation, human-centered AI, and federated learning.
Presentation times of conference workshops, demos, papers, and tutorials can be found see the agenda section at the bottom of this page. Note: All times are displayed in your local time.
Career opportunities
Visit us at the IBM Booth to meet with IBM researchers and recruiters to speak about future job opportunities or 2025 summer internships.
- Current IBM Research open roles
- Sign up to be notified of future openings by joining our Talent Network.
Keep up with emerging research and scientific developments from IBM Research. Subscribe to the Future Forward Newsletter.
Agenda
Visit us at the IBM booth in the exhibit hall to talk to our researchers and recruiters. We'll also be doing demos of our work. View our booth demo schedule and list of available IBM Research staff here.
EXPO | Atin Sood
Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of application modernization use cases such as code explanation, test generation, code repair, refactoring, translation, code generation, code completion and more. To leverage code LLMs to their full potential, developers must provide code-specific contextual information to the models. We would like to demonstrate generic pipelines we built, that incorporate static analysis to guide LLMs in generating code explanation at various levels (application, method, class) and automated test generation to produce compilable, high-coverage and natural looking test cases. We will also demonstrate how these pipelines can be built using “codellm-devkit”, an open-source library that significantly simplifies the process of performing program analysis at various levels of granularity, by making it easier to integrate detailed, code-specific insights that enhance the operational efficiency and effectiveness of LLMs in coding tasks. And how these use cases can be extended to different programming languages, specifically Java and Python.EXPO | Julian Büchel
Analog in-memory computing (AIMC) using resistive memory devices has the potential to increase the energy efficiency of deep neural network inference by multiple orders of magnitude. This is enabled by performing matrix vector multiplications – one of the key operations in deep neural network inference – directly within the memory, avoiding expensive weight fetching from external memory such as DRAM. The IBM HERMES Project Chip is a state-of-the-art, 64-core mixed-signal AIMC chip based on Phase Change Memory that makes this concept a reality. Using this chip, we demonstrate automatic deployment and inference of a Transformer model capable of predicting chemical compounds that are formed in a chemical reaction.EXPO | Luis Lastras
We aim to reframe how developers create LLM applications. Instead of iterating on verbose, complex prompts to achieve a desired complex behavior, we break down complex tasks into a series of standard computing elements that can be called by a developer in programmatic way. In this demonstration we will explore how leveraging an LLM trained with key intrinsic functions, such as hallucination detection, uncertainty quantification, and topic scoping, could unlock a new way of building and working with LLMs.EXPO | Rohan Arora
IT failures are increasingly costly, with even brief outages leading to millions in losses as more business moves online. Incident management has become more complex than ever due to a combination of technological advancements, infrastructure heterogeneity, and evolving business needs. Resolving IT incidents is similar if not more complex to software code bug fixing. It is a very tedious and expensive task. Several advancements have been made including IBM’s Intelligent Incident Remediation using LLMs and generative AI to streamline incident resolution by identifying probable causes and using AI-guided remediation steps. In this demo, we are describing how we are advancing the state of the art in incident remediation using agentic Gen AI approaches. We demonstrate SRE-Agent-101, a ReAct style LLM-based agent, along with a benchmark to standardize the effectiveness of analytical solutions for incident management. SRE-Agent-101 uses several custom built tools, namely anomaly detection, causal topology extraction, NL2Traces, NL2Metrics, NL2Logs, NL2TopologyTraversal, and NL2Kubectl. These tools take natural language as input to fetch target data gathered by the observability stack. Given the verbosity of such data, even powerful models can quickly exhaust their context length. We have implemented a methodology to dynamically discover the more specific context using domain knowledge. The target context is then analyzed by underlying LLM to infer the root cause entity, fault, perform actions and this process iteratively continues until the incident is resolved.EXPO | Werner Geyer
Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as evaluators to filter training data, evaluate model performance, detect harms and risks, or assist human evaluators with detailed assessments. To support this process, effective front-end tools are critical for evaluation. EvalAssist abstracts the llm-as-a-judge evaluation process into a library of parameterize-able evaluators (the criterion being the parameter), allowing the user to focus on criteria definition. EvalAssist consists of a web-based user experience, an API, and a Python toolkit and is based on the UNITXT open-source library. The user interface provides users with a convenient way of iteratively testing and refining LLM-as-a-judge criteria, and supports both direct (rubric-based) and pairwise assessment paradigms, the two most prevalent forms of LLM-as-a-judge evaluation available. In our demo, we will showcase different types of evaluator LLMs for general purpose evaluation and also the latest Granite Guardian model (released October 2024) to evaluate harms and risks.EXPO | Leonid Karlinsky
Enterprise applications present unique challenges for vision and language foundation models, as they frequently involve visual data that diverges significantly from the typical distribution of web images and require understanding of nuanced details such as small text in scanned documents, or tiny defects in industrial equipment images. Motivated by these challenges, we will showcase our IBM Granite Vision model, a foundation model with state-of-the-art performance in document image understanding tasks, such as the analysis of charts, plots, infographics, tables, flow diagrams, and more. We will provide a detailed overview of our methodology and present a live demonstration of our model's capabilities, illustrating its key features and applications. Our model will be open-sourced, allowing the community to access and contribute to its development.