In both figures, the x-axis is a measure of the inference time compute used to generate results, roughly corresponding to the number of particles in the particle filtering technique and the number of samples in the majority voting technique. Following standard convention, the metric on the y-axis is pass @1, an estimate of the probability that the model produces the correct answer in a single attempt. For the reference numbers, we only report average pass @1, as the point of our analysis is to contrast scaling inference on smaller models versus a single inference on more expensive models.
As both plots show, Granite 3.2 is able to take advantage of inference scaling techniques to dramatically boost performance on both MATH500 and AIME2024. For instance, using the particle filtering approach, Granite 3.2’s performance on MATH500 jumps by over 60%, and the performance on AIME2024 grows by a factor of 5. We also see that even simple techniques like majority voting result in dramatic performance improvements as they are able to exercise the native ability of the model to generate chains of thought for math. Furthermore, we do see noticeable improvements in performance with majority voting by “priming” Granite 3.2 to generate longer chains of thought. In both graphs, the primed Granite 3.2 model outperforms our Granite 3.2 model.
Finally, and most significantly, with these inference scaling techniques, our 8B parameter Granite 3.2 model is able to even exceed the performance of much larger models like GPT-4o-0513 and Claude3.5-Sonnet-1022 on both benchmarks.
Inference time scaling has emerged as a powerful technique that can be used to improve model performance, sometimes quite significantly. We have shown our early innovations from our labs that use inference scaling techniques to enable our 8B Granite 3.2 model to demonstrate state-of-the-art math reasoning capabilities, even surpassing those of other well-known proprietary frontier models. This is an active area of research and development for us as we continue to enhance our Granite models to deliver state-of-the-art capabilities — at the most effective and optimized cost point.