Foundation Models for Conversation


Grounded Dialog Response Generation

One of the key problem that deep learning based systems based on pre-trained large language models suffer is the problem of hallucination. They could provide factually incorrect or inconsistent information. To overcome this problem, we are exploring ways in which we ground the dialog response generation on trustworthy enterprise content. In addition to exploring ways in which we could use decision trees or flow charts as well as structured data for grounding dialog responses, we are also investigating use of large language models, where we could use approved unstructured documents for response generation.

A major concern when it comes to using LLMs as conversational agents is ensuring generated responses are faithful to the contents in the documents fed in the prompt. We are exploring new paradigms of using LLMs, where we first predict an evidence necessary to generate a response and then check if the generated response is faithful to the evidence as shown in the figure below:

Foundation model for digital interaction/structured/unstructured

Most of the pain points for customers in digital interaction space revolves around two main buckets :

1. Expensive to get New insights

Need significant effort to answer a new question / use case.

i. Current models struggle on unseen use cases.

ii. There are use cases that current models cannot even do (example: There is no way to communicate what actually happened with the user from digital interaction data in the form of human-readable natural language).

2. Efficiency

Need large amount of training data readily available to create a new model, Expensive to maintain lots of models Effort in incorporating a new model.

We hypothesize both the problems can be handled if the tons of data exhaust generated by digital interaction is used to its full potential. To do that, we are exploring ways to build foundational models for digital interaction data by solving the necessary research challenges around modelling interaction data, identifying right pre-training objectives, ensuring cost effectiveness and efficiency to name a few. It is a very new research direction in the world of digital interactions with an ambitious goal of supporting multiple down stream tasks, often new unseen task with the same foundational model with minimal tuning. Examples of downstream tasks may include : predicting a user’s next action, analyzing a user groups activity footprint to match a product with its target user group or initiate a personalized dialog flow with a customer.