1000xfaster ingestion
Deep Search: Connecting
unstructured data
*Deep search can ingest 20 pages per second, where a typical human expert takes 1–2 minutes per page just to read.
Deep Search uses natural language processing to ingest and analyze massive amounts of data—structured and unstructured. Researchers can then extract, explore, and make connections faster than ever.
Solutions that once took months to find are now being discovered in a matter of days.
How it works
Deep Search imports and analyzes data from public, private, structured, and unstructured sources. AI then breaks down the data and classifies it into fundamental parts.
The Deep Search process starts with unstructured data such as journal articles, patents, or technical reports. No matter whether this data comes from public or proprietary sources, businesses can leverage both securely through our hybrid cloud.
After reviewing unstructured data, the user annotates a few documents to create an AI model. The model then classifies all documents into their fundamental parts. By using this AI model and NLP (Natural Language Processing), Deep Search is able to ingest and understand large collections of documents and unstructured data at scale, automatically extracting semantic units and their relationships.
Once the data has been consolidated and extracted, Deep Search organizes and structures it into a searchable knowledge graph—enabling users to robustly explore information extracted from tens of thousands of documents without having to read a single paper.
Figure D1.
Ingesting the information
01
Unstructured documents like PDFs of articles and patents are uploaded. Text, bitmap images and line paths are then parsed.
02
If there is no model available, a new one can easily be created and trained. To train a model, documents are categorized by layout
Next, sections from a sample of unique pages are annotated with semantic labels. The model is then applied to the remaining document, automatically annotating each page.
03
The predictive annotations are inspected and corrected to improve the models performance. Once refined, the model ready to be applied to other documents.
The remaining documents are parsed, labeled, and assembled into a JSON file that contains both the content and structure of the originals.
Figure D2.
Constructing the knowledge graph
01
NLP components are run on JSON files, linking entities and extracting relationships.
Information is no longer contained within a document, but part of a larger ecosystem created from many documents.
02
Extracted information is combined with other sources such and private or public databases to form a searchable knowledge graph.
03
When queried, the system is able to form links between different nodes within the knowledge graph.
It understands that the same material may be written in different ways, as abbreviations or formulas for example, and makes accurate connections.
Accelerating intake, organization, and understanding of massive amounts of data. Deep Search empowers researchers to grasp previously daunting bodies of information, in a fraction of the time.
And, with knowledge graphs, they’re finding new ways to explore constellations of information—making the connections clearer and easier to identify.
for Future
of Materials