Improving Watson NLP performance in IBM products through Intel optimizations
For many years, IBM Research has been developing AI capabilities with a focus on embedding this innovation in IBM software. One of the core assets from IBM Research included in over 20+ products at IBM is IBM Watson Natural Language Processing (NLP) Library. The capabilities in IBM Watson NLP have been incubated and delivered into IBM products for decades. To make sure we are delivering the best in terms of performance for IBM products, the Watson NLP team within IBM Research partnered with Intel to improve Watson NLP performance with Intel oneDNN and Tensorflow, powered by OneAPI, and demonstrated benefits of up to 35% in function throughput for key NLP tasks.
Intel optimizations powered by OneAPI
Intel optimizes deep learning frameworks including Tensorflow and Pytorch with OneAPI Deep Neural Network Library (oneDNN) library. oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. This utilizes new hardware features and accelerators on Intel Xeon based infrastructure. These optimizations are targeted to accelerate key performance-intensive operations such as convolution, matrix multiplication, and batch normalization. oneDNN also leverages graph mode computation by fusing ops that are compute and memory bound to further speed-up computation. oneDNN optimizations are available by default, in the official TensorFlow release version starting 2.9.
Below is the list of key building blocks that oneDNN optimizes:
- Matrix multiplication
- Batch normalization
- Activation functions
- Recurrent neural network (RNN) cells
- Long short-term memory (LSTM) cells
IBM Watson NLP tasks
Watson NLP has the concept of “NLP Tasks,” meaning how algorithms are divided by the schema of their responses. An example is the “classification” task, which may be implemented by a wide variety of algorithms. All algorithm implementations for an NLP task adhere to a base set of APIs, making it easier for the user to utilize different algorithms without changes to the consuming code.
In the same context, an algorithm can have a dependency on another NLP task. For example, it may not take raw text as input, but require a structured input. By having this structure, algorithm implementations can call upon the various options available with ease, as they all follow the base API and leverage the appropriate data types that describe each task in a way that is algorithm agnostic.
When testing IBM Watson NLP with oneDNN optimizations, among a wide variety of NLP tasks, text and sentiment classification tasks showed the greatest improvements in both duration and function throughput.
The classification workflow implementation is composed of multiple classifiers, leveraging different features to independently classify text into two or more classes. Each of the classifier predictions are combined through weighted averaging to form the final prediction. This is generally more robust to variance in different dataset characteristics, such as average text length, or number of classes being considered for classification. This classification workflow is common in use cases like spam detection in emails, and automatically grouping news articles by domain.
“When using Intel oneDNN TensorFlow optimizations, IBM Watson NLP exhibited an increase of up to 35% in function throughput for NLP tasks including text and sentiment classification, and embeddings,” said Laura Chiticariu, distinguished engineer and chief architect of IBM NLP. “This improvement will result in better performance for IBM products such as IBM Watson Natural Language Understanding, IBM Watson Discovery and IBM Watson Studio.”
Watson NLP is embedded in several IBM product offerings like IBM Watson Natural Language Understanding, IBM Watson Discovery and IBM Watson Studio, among others, and IBM clients will benefit from the increased performance with Intel optimizations.