Technical note
2 minute read

AI is making extracting key information from reports easier than ever

Reams of valuable data shared throughout companies around the world is locked up in PDF documents. Reports often have tables, charts, and infographics with key information contained in them — while these are easy for us to read, they’re difficult to copy information out of and use in another context. But a new AI tool, designed by IBM Research, aims to make it far simpler to glean insights from reports.  

Deep Search is a technology that uses AI to collect and convert the data in large documents into something searchable, and it is now integrating with, IBM’s enterprise studio for AI builders to train, validate, tune and deploy AI models at scale. A new feature of Deep Search, called Document Question Answering (DocQA) was presented at the 2024 AAAI conference this week. The system enables users to upload their own documents and interact with them with the help of a conversational assistant. 

One of the most promising applications of Deep Search's technology is in the domain of environmental, social, and governance (ESG) reports. These reports have become increasingly important in recent years, being one of the tools companies use to highlight their efforts in climate sustainability and social responsibility. They are essential for investors, regulators, and other stakeholders to assess a company's impact. However, extracting accurate and relevant information from these reports can be a time-consuming and labor-intensive process. 

Screenshot 2024-02-21 at 10.28.09.png
Searching through an ESG report with a conversational assistant.

Deep Search's new feature, DocQA, is designed to automate this process, allowing users to ask questions about the content of ESG reports and receive accurate answers. The system uses retrieval-augmented generation, which combines information retrieval and natural language generation to ground the answer to the exact paragraph or table from which the answer is being generated. Crucially, this approach allows the system to control hallucinations and provide answers that are based on the actual content of the document. 

There aren’t many ways to accurately extract information from tables at scale right now, and yet it is crucial in many fields, including financial reports, annual reports, and ESG reports. Deep Search's DocQA system addresses this gap by using a state-of-the-art multimodal AI for converting and understanding PDF documents. This enables the system to extract information from tables as well as text, providing users with a more comprehensive understanding of the document's content. Deep Search's library currently consists of over 17,000 ESG reports, making it an invaluable resource for the ESG community. Users can explore this library to gain insights into various industries and companies, helping them make informed decisions based on accurate and relevant data. The system's ability to extract information from both text and tables also makes it an essential tool for researchers, analysts, and investors who need to analyze large amounts of data quickly and efficiently. 

Deep Search's work in the field of ESG reports has been recognized by the scientific community. At AAAI this year, the public will be able to try out the system live to see just how simple it can be to find the information they’re after.