What Makes aWell-Documented Notebook? A Case Study of Data Scientists’ Documentation Practices in Kaggle

April Yi Wang; Dakuo Wang; Jaimie Drozdal; Xuye Liu; Soya Park; Steve Oney; Christopher Brooks

doi:10.1145/3411763.3451617

CHI EA 2021

Conference paper

08 May 2021

What Makes aWell-Documented Notebook? A Case Study of Data Scientists' Documentation Practices in Kaggle

View publication

Abstract

Many data scientists use computational notebooks to test and present their work, as a notebook can weave code and documentation together (computational narrative), and support rapid iteration on code experiments. However, it is not easy to write good documentation in a data science notebook, partially because there is a lack of a corpus of well-documented notebooks as exemplars for data scientists to follow. To cope with this challenge, this work looks at Kaggle - a large online community for data scientists to host and participate in machine learning competitions - and considers highly-voted Kaggle notebooks as a proxy for well-documented notebooks. Through a qualitative analysis at both the notebook level and the markdown-cell level, we find these notebooks are indeed well documented in reference to previous literature. Our analysis also reveals nine categories of content that data scientists write in their documentation cells, and these documentation cells often interplay with different stages of the data science lifecycle. We conclude the paper with design implications and future research directions.

Conference paper