About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
Densely-packed but structured scientific data are typically presented in the form of tables, which often appear in raster image form. To interpret data from scanned tables, understanding their hierarchical structure is vital. To further address the vast variability of table representations, we propose a fully automatic methodology that uses a bottom-up reasoning that is independent on the existence of representation features, such as lines. We evaluate our approach on the ICDAR 2013 dataset and demonstrate its effectiveness on detecting tables cells and their content and classifying header and data cells. For detecting the cell hierarchy, we demonstrate results on synthetic data due to lack of ground truth.