ACS Spring 2024

Integrating ADME-Tox properties of PFAS described in public and private knowledge bases using Wikidata


Knowledge Integration Framework (KIF) is able to construct a unified view and common query interface to integrate heterogeneous data coming from public and private knowledge bases. These include, for instance, Wikidata, PubChem, IBM CIRCA, and CSV files produced by predictive AI methods. Wikidata is a public repository of general knowledge (the structured counterpart of Wikipedia), PubChem is one of the largest collections of chemical data on the Web, and IBM CIRCA is a research platform aggregating chemically-annotated data from patents and other sources. We propose the use of KIF to integrate ADME-Tox properties of PFAS (Per- and polyfluoroalkyl substances) described in heterogeneous knowledge bases. Both Wikidata and PubChem are accessed through their SPARQL interfaces, while IBM CIRCA and the CSV files are accessed through SQL interfaces. One differentiator of our knowledge integration framework, KIF, is that it adopts Wikidata's data model and vocabulary as the lingua franca. At query time, user-provided mappings are used to translate the data model and vocabulary of the underlying bases to that of Wikidata. This means that the integrated result can be accessed as if it were Wikidata itself, i.e., a single knowledge base which "speaks" the RDF dialect of Wikidata. In the translation process, we leverage Wikidata's data model support for qualifiers to attach contextual information to statements. For example, a value of LD50 is accompanied not only by its unit (mg/kg) but also by a description of the animal tested, route of administration, etc. We also leverage Wikidata's data model support for references to keep track of the provenance (origin) of each statement produced by the integration layer. We expect that KIF will allow the construction of an integrated knowledge base about ADME-Tox properties of PFAS with higher F.A.I.R. properties. In the talk, we discuss KIF, this particular instantiation, applications, and future plans.