Distributed Training of Knowledge Graph Embedding Models using Ray

Nasrullah Sheikh; Xiao Qin; Yaniv Gur; Berthold Reinwald

EDBT 2022

Conference paper

29 Mar 2022

Distributed Training of Knowledge Graph Embedding Models using Ray

Abstract

Knowledge graphs are at the core of numerous consumer and enterprise applications where learned graph embeddings are used to derive insights for the users of these applications. Since knowledge graphs can be very large, the process of learning embeddings is time and resource intensive and needs to be done in a distributed manner to leverage compute resources of multiple machines. Therefore, these applications demand performance and scalability at the development and deployment stages, and require these models to be developed and deployed in frameworks that address these requirements. Ray\footnote{https://docs.ray.io/en/master/index.html} is an example of such a framework that offers both ease of development and deployment, and enables running tasks in a distributed manner using simple APIs. In this work, we use Ray to build an end-to-end system for data preprocessing and distributed training of graph neural network based knowledge graph embedding models. We apply our system to \textit{link prediction} task, i.e. using knowledge graph embedding to discover links between nodes in graphs. We evaluate our system on a real-world industrial dataset and demonstrate significant speedups of both, distributed data preprocessing and distributed model training. Compared to non-distributed learning, we achieved a training speedup of $12\times$ with 4 Ray workers without any deterioration in the evaluation metrics.