Vision & Learning Technologies

We’re teaching computers to understand documents, images, and video using AI.

Our current work focuses on document analysis using deep learning, machine learning, and computer vision. These tasks include: improving optical character recognition (OCR) using computer vision and language models, detecting and recognizing text in natural scenes (natural scene text recognition), document synthesis, document enhancement, and more.

Our team also has research experience in a variety of vision and multimedia focused tasks using deep learning, computer vision, machine learning, and video processing.


Udi Barzelay, Manager Vision & Learning Technologies, IBM Research - Haifa

Natural Scene Text Recognition

Detecting and recognizing nonstandard text in photographs


Document Analysis

Indexing, structuring, and extracting important information from photographed documents


Self-Supervised Learning for Computer Vision Tasks

Tackling difficult vision-based tasks without annotated examples


Past Activities

Video Enrichment / Retrieval / Summarization

Using cognitive computing to discover insights from videos


Video Scene Detection

A fundamental step in video processing aimed at dividing a video into its comprising temporal scenes


Video Object Detection

Excellent for video analysis such as indexing, surveillance, and more


Few-Shot Action Recognition

Detecting and localizing actions in videos given limited annotated examples



