Vision & Learning Technologies

We’re teaching computers to understand documents, images, and video using AI.

Our current work focuses on document analysis using deep learning, machine learning, and computer vision. These tasks include: improving optical character recognition (OCR) using computer vision and language models, detecting and recognizing text in natural scenes (natural scene text recognition), document synthesis, document enhancement, and more.

Our team also has research experience in a variety of vision and multimedia focused tasks using deep learning, computer vision, machine learning, and video processing.


Udi Barzelay, Manager Vision & Learning Technologies, IBM Research - Haifa

Udi Barzelay,
Manager, Vision & Learning Technologies,
IBM Research - Haifa


Natural Scene Text Recognition

Detecting and recognizing nonstandard text in photographs


Document Analysis

Indexing, structuring, and extracting important information from photographed documents


Self-Supervised Learning for Computer Vision Tasks

Tackling difficult vision-based tasks without annotated examples


Past Activities

Video Enrichment / Retrieval / Summarization

Using cognitive computing to discover insights from videos


Video Scene Detection

A fundamental step in video processing aimed at dividing a video into its comprising temporal scenes


Video Object Detection

Excellent for video analysis such as indexing, surveillance, and more


Few-Shot Action Recognition

Detecting and localizing actions in videos given limited annotated examples



Author Title Conference/Journal Year  
Shashank Mujumdar, Nithya Rajamani, L. Venkata Subramaniam, Dror Porat Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments IEEE International Symposium on Multimedia 2013 Link
Flora Gilboa-Solomon, Gal Ashour, Ophir Azulai Efficient storage and retrieval of geo-referenced video from moving sensors International Conference on Advances in Geographic Information Systems 2013 Link
Shashank Mujumdar, Dror Porat, Nithya Rajamani, L. Venkata Subramaniam A Multi-Stage Framework for Classification of Unconstrained Image Data from Mobile Phones International Journal of Multimedia Data Engineering and Management 2014 Link
Chung-Ching Lin, Sharath Pankanti, Gal Ashour, Dror Porat, John R. Smith Moving camera analytics: Emerging scenarios, challenges, and applications IBM Journal of Research and Development 2015 Link
Daniel Rotman, Dror Porat, Gal Ashour Robust and Efficient Video Scene Detection Using Optimal Sequential Grouping IEEE International Symposium on Multimedia 2016 Link
Daniel Rotman, Dror Porat, Gal Ashour Optimal Sequential Grouping for Robust Video Scene Detection Using Multiple Modalities International Journal of Semantic Computing 2017 Link
Daniel Rotman, Dror Porat, Gal Ashour Robust Video Scene Detection Using Multimodal Fusion of Optimally Grouped Features IEEE 19th International Workshop on Multimedia Signal Processing (MMSP) 2017 Link
Daniel Rotman, Dror Porat, Gal Ashour, Udi Barzelay Optimally Grouped Deep Features Using Normalized Cost for Video Scene Detection ACM International Conference on Multimedia Retrieval (ICMR) 2018 Link
Daniel Rotman, Dror Porat, Yevgeny Burshtein, Udi Barzelay Temporal Video Analyzer (TVAN): Efficient Temporal Video Analysis for Robust Video Description and Search AAAI Conference on Artificial Intelligence 2019 Link
Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein Learning to Detect and Retrieve Objects From Unlabeled Videos IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) 2019 Link
Elad Amrani, Rami Ben-Ari, Inbar Shapira, Tal Hakim, Alex Bronstein Self-Supervised Object Detection and Retrieval Using Unlabeled Videos IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2020 Link
Yair Shemer, Daniel Rotman, Nahum Shimkin ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization International Conference on Pattern Recognition (ICPR) 2020  
Daniel Rotman, Yevgeny Yaroker, Elad Amrani, Udi Barzelay, Rami Ben-Ari Learnable Optimal Sequential Grouping for Video Scene Detection ACM Multimedia 2020 Link
Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning AAAI Conference on Artificial Intelligence 2020  
Rami Ben-Ari, Mor Shpigel, Ophir Azulai, Udi Barzelay, Daniel Rotman TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition TBD 2020  


Author Title Date Number  
Ashour, Gal, Dror Porat, and Daniel N. Rotman Non-greedy hierarchical segmentation of serial data February 19, 2019 US 10,210,908 Link
Ashour, Gal, Yevgeny Burshtein, Tal Hakim, Dror Porat, and Daniel Nechemia Rotman Object recognition in video September 17, 2019 US 10,417,501 Link
Drory, Tal, Dror Porat, and Daniel N. Rotman Generating a graphical user interface to navigate video content September 12, 2019 US 15/918,099 Link
Porat, Dror, Daniel N. Rotman, and Gal Ashour Query-based granularity selection for partitioning recordings March 7, 2019 US 15/696,535 Link
Ashour, Gal, Ophir Azulai, and Roy Levin Multimedia analytics in spark using Docker September 17, 2019 US 10,417,273 Link
Azulai, Ophir, Udi Barzelay, Mattias Marder, Dror Porat, and Slava Shechtman Real-time system for determining current video scale February 13, 2018 US 9,892,335 Link
Barzelay, Udi, Ophir Azulai, and T. Z. U. R. Yochay Identifying temporal changes of industrial objects by matching images April 21, 2020 US 10,628,703 Link
Azulai, Ophir Low delay content disarm and reconstruction (CDR) of live streaming video September 17, 2020 US 16/355,775 Link
Barzelay, Udi, and Yevgeny Yaroker Sparse labeled video annotation July 2, 2020 US 16/236,712 Link
Hakim, Tal, and Dror Porat Selecting object detections or recognitions using correctness mappings June 18, 2020 US 16/221,524 Link
Porat, Dror, and Tal Hakim Detection of visual tracker divergence June 25, 2020 US 16/225,078 Link