Publication
AAAI-SS 2017
Conference paper

Machine representation of data analyses: Towards a platform for collaborative data science

Abstract

Artificial intelligence and data science play an increasingly important role in solving today's scientific and social challenges. To be successful, the data-driven approach to social good requires effective collaboration between data scientists, subject-matter experts, policymakers, and other stakeholders. We envision a cloud platform for data science that would facilitate collaboration between stakeholders and possess AI capabilities for discovering, benchmarking, and organizing data analyses. Here we present a foundational technology motivated by this vision. Our system automatically extracts a high-level dataflow graph from a data analysis. The graph describes how data flows through an analysis pipeline, including which statistical methods are used and how they fit together. The system requires no special annotations from the data analyst and consumes analyses written in Python using standard tools, such as Scikit-learn and StatsModels. In this paper, we explain how our system works and how it fits into our larger vision for a collaborative data science platform.

Date

Publication

AAAI-SS 2017

Authors

Share