default_blue.png

GTE — Global Table Extraction

Overview

Documents are often the format of choice for knowledge sharing and preservation in business and science. Much of the critical data are captured in tables. Unfortunately, most documents are stored and distributed in PDF or scanned images, which fail to preserve table formatting. GTE is a state of the art framework for extracting table border and structure by integrating domain knowledge of tables with deep learning architecture.

To see one of the examples of GTE in action, check out how our table extraction system is helping researchers discover more insight from the coronovirus literature in the CORD-19 dataset.

Architecture

gte-arch.png

Sample Results

Table Border Detection

gte-2a1.png
gte-2a2.png
gte-2a3.png
gte-2a4.png

Table Structure Generation

gte-2b.png