Materials science literature-Patent relevance search: A heterogeneous network analysis approach
In recent decades, materials science literature and patents have grown exponentially. This has also contributed to an ever-growing challenge whether the literature is current, as there can be a gap between when the patent was filed and when it was approved. Moreover, it is difficult to ensure that a patent cites the appropriate prior art due to variety and volume of materials science data, especially when it is in two separate sources that have different curation mechanisms and purpose — publications and patents. The existing relational database schema, generally used to store publications, also presents challenges given the strict tabular schema, which may not be appropriate for organizing and querying highly interconnected information about materials in these publications and patents. For example, elements are chemically combined to form a compound, which can then be converted to other compounds via chemical reactions. Furthermore, relational database is not designed for handling combining data from multiple sources and with various formats, thus it makes discover relevance between publications and patents become difficult. In order to explore an alternative approach to represent materials data and combine data from multiple sources into the same repository, in this work, we propose a solution to integrate data from Open Quantum Materials Database (OQMD) and patent data from USPTO1 database into a network and named it heterogeneous materials information network (HMIN). We generalize prior work which based on using meta path-based topological features to explore the network, and we propose features to identify network noise and investigate relatedness between different-typed objects to meet our application needs. We built several machine learning models by using these features to explore relevance between materials science publications and patents. Experiment results show that HMIN can help researchers effectively discover related publications and patents originally kept in different sources. Our work exhibits to materials community a new way of appropriately representing materials data and discovering connections between data from multiple sources.