Defining and capturing the competitor relationship across financial datasets

Min Li; Rajasekar Krishnamurthy; Douglas Burdick; Lucian Popa

doi:10.1145/3220547.3220556

DSMM 2018

Conference paper

15 Jun 2018

Defining and capturing the competitor relationship across financial datasets

View publication

Abstract

The 2018 FEIII Data Challenge aims to enhance a given knowledge graph by validating and enriching the set of competitor edges in the graph using multiple datasets. Upon an investigation of the data, we find that some of the competitor edges given as training data are inconsistent (e.g., conflicting with other relationships such as parent/subsidiary). Rather than using a machine learning approach that would have to address such difficulties and other ambiguities in the training data, we start by formulating two natural, semantic definitions of a competitor relationship. The first is a weak definition that is independent of the training data and identifies pairs of entities as potential competitors whenever in the same industry and geographical location, and provided that there is no negative evidence (such as the two entities being in the same family of companies). The second is a strong definition that intersects the pairs of entities obtained from the weak definition with the competitors given in the training dataset. These two definitions offer a framework implementation which can be extended to further utilize other attributes or additional information when available. One such extension that we can implement right away with the available data is to lift the competitor relationships from subsidiaries to their respective parent companies. We use a high-level language (HIL) for entity linking to express and implement our two semantic definitions as well as the parent lifting extension. The resulting HIL algorithms are readable and easily extensible or modifiable by a domain expert. We show that our submission achieves 19.6% precision, 40.3% recall and 26.4% F1 score, and we make the case that with the availability of more data and more analytics these results can be further improved.

Conference paper