Loops: Leveraging Provenance and Visualization to Support Exploratory Data Analysis in Notebooks

Loops Teaser

Abstract

Exploratory data science work is often described as an iterative process with cycles of obtaining, cleaning, profiling, analyzing, and interpreting data. These cycles create challenges within the linear structure of computational notebooks, leading to code quality, recall, and reproducibility issues. We present Loops, a set of visual support techniques for iterative and exploratory data analysis in computational notebooks. Loops leverages provenance information to provide direct feedback on the impact of changes made within the notebook. Through compact visual representations, we trace the evolution of the notebook over time, highlighting differences between versions. Detail views allow users to compare the cell content and output. Loops is compatible with various types of content present in notebooks, such as code, markdown, data, visualizations, or images. Loops not only improves the reproducibility of notebooks, but also supports analysts during their data science work by showing the effects resulting from changes and facilitating the comparison of multiple versions. We demonstrate our approach's utility and potential impact through two use cases and feedback from notebook users spanning various backgrounds.


Citation

BibTeX

@article{2024_loops,
    title = {Loops: Leveraging Provenance and Visualization to Support Exploratory Data Analysis in Notebooks},
    author = {Klaus Eckelt and Kiran Gadhave and Alexander Lex and Marc Streit},
    journal = {OSF Preprint},
    doi = {10.31219/osf.io/79eyn},
    url = {https://doi.org/ 10.31219/osf.io/79eyn},
    year = {2023}
}