Generative AI for Sports and Entertainment

Using large language models to support some of the world’s most prestigious sports and entertainment events

Overview

Narration is an essential part of sports games. However, for large-scale events such as the Wimbledon tennis tournament, with around 250 singles matches across 19 courts over 13 days, producing hundreds of hours of video footage, it is impractical for commentators to create narrations for every match in a timely manner. To address this challenge, we closely worked with IBM Consulting to create a novel system that produces automatic commentary for tennis matches using generative AI. Our system consists of the following stages:

We first extract play-by-play metadata using a computer vision module that understands every detail of the game: court and net detection, player and ball tracking, player pose, fine-grained shot classification (backhand, forehand, volley, …), and shot direction. This metadata is combined with additional information from other modalities, such as audio-based crowd cheering measurement, match data scoring, radar-measured ball speed, and more.
The rich metadata described above is then fed as input to a large language model, which is fine-tuned to produce commentary in natural language as output. The large language model is a 3B encoder-decoder model pre-trained at IBM with trillions of tokens. We fine-tuned the model for commentary generation using a novel layered LoRA architecture (see the publication from CVPR 2023 below.).

Our system was showcased to clients and fans around the world as part of the 2023 Wimbledon and US Open tennis tournaments.

Publications

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
- - Jehanzeb Mirza
  - Leonid Karlinsky
  - et al.
- 2023
- NeurIPS 2023
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
- - Paola Cascante-bonilla
  - Khaled Shehada
  - et al.
- 2023
- ICCV 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
- - Wei Lin
  - Leonid Karlinsky
  - et al.
- 2023
- ICCV 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
- - Yuan Gong
  - Sameer Khurana
  - et al.
- 2023
- INTERSPEECH 2023
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
- - James Smith
  - Paola Cascante-bonilla
  - et al.
- 2023
- CVPR 2023
Teaching Structured Vision & Language Concepts to Vision & Language Models
- - Sivan Doveh
  - Assaf Arbelle
  - et al.
- 2023
- CVPR 2023

Resources

News

Generative AI for Sports and Entertainment

Overview

Publications

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

Teaching Structured Vision & Language Concepts to Vision & Language Models

Resources

Wimbledon to introduce AI-powered commentary to coverage this year

IBM Brings Generative AI Commentary and AI Draw Analysis to the Wimbledon Digital Experience

IBM and Wimbledon, a partnership of innovation

Contributors

Rogerio Feris