Notably Inaccessible – Data Driven Understanding of Data Science Notebook (In)Accessibility

Potluri, Venkatesh; Singanamalla, Sudheesh; Tieanklin, Nussara; Mankoff, Jennifer

doi:10.5281/zenodo.8185050

Published July 25, 2023 | Version 1.0.0

Dataset Open

Notably Inaccessible – Data Driven Understanding of Data Science Notebook (In)Accessibility

1. University of Washington

Contributors

Project leaders:

Project member:

Tieanklin, Nussara¹

Supervisor:

Mankoff, Jennifer¹

1. University of Washington

Overview

This dataset artifact contains the intermediate datasets from pipeline executions necessary to reproduce the results of the paper.
We share this artifact in hopes of providing a starting point for other researchers to extend the analysis on notebooks, discover more about their accessibility, and offer solutions to make data science more accessible. The scripts needed to generate these datasets and analyse them are shared in the GitHub repository for this work.

The dataset contains large files of approximately 60 GB so please exercise caution when extracting the data from compressed files.

The dataset contains files which could take a significant amount of run time of the scripts to generate/reproduce.

Dataset Contents

We briefly summarize the included files in our dataset. Please refer to the documentation for specific information about the structure of the data in these files, the scripts to generate them, and runtimes for various parts of our data processing pipeline.

epoch_9_loss_0.04706_testAcc_0.96867_X_resnext101_docSeg.pth: We share this model file, originally provided by Jobin et al., to enable the classification of figures found in our dataset. Please place this into the `model/` directory.
model-results.csv: This file contains results from the classification performed on the figures found in the notebooks in our dataset.

Performing this classification may take upto a day.
a11y-scan-dataset.zip: This archive contains two files and results in datasets of approximately 60GB when extracted. Please ensure that you have sufficient disk space to uncompress this zip archive. The archive contains:
- a11y/a11y-detailed-result.csv: This dataset contains the accessibility scan results from the scans run on the 100k notebooks across themes.
  
  The detailed result file can be really large (> 60 GB) and can be time-consuming to construct.
- a11y/a11y-aggregate-scan.csv: This file is an aggregate of the detailed result that contains the number of each type of error found in each notebook.
  
  This file is also shared outside the compressed directory.
errors-different-counts-a11y-analyze-errors-summary.csv: This file contains the counts of errors that occur in notebooks across different themes.
nb_processed_cell_html.csv: This file contains metadata corresponding to each cell extracted from the html exports of our notebooks.
nb_first_interactive_cell.csv: This file contains the necessary metadata to compute the first interactive element, as defined in our paper, in each notebook.
nb_processed.csv: This file contains the necessary data after processing the notebooks extracting the number of images, imports, languages, and cell level information.
processed_function_calls.csv: This file contains the information about the notebooks, the various imports and function calls used within the notebooks.

Files

Notably Inaccessible.zip

Files (1.8 GB)

Name	Size	Download all
Notably Inaccessible.zip md5:6c74dc3b43226ffae8007b9fd2760d3a	1.8 GB	Preview Download

Additional details

Is part of: Conference paper: 10.1145/3597638.3608417 (DOI)

U.S. National Science Foundation
Using Passive Sensing to Assess the Impact of Real-Time Discrimination against Women and Underrepresented Minorities in Engineering 2009977

Jobin, K.V., Mondal, A. and Jawahar, C.V., 2019, September. Docfigure: A dataset for scientific document figure classification. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) (Vol. 1, pp. 74-79). IEEE.

	All versions	This version
Views	472	471
Downloads	64	64
Data volume	116.2 GB	116.2 GB

Notably Inaccessible – Data Driven Understanding of Data Science Notebook (In)Accessibility

Contributors

Project leaders:

Project member:

Supervisor:

Files

Notably Inaccessible.zip

Files (1.8 GB)

Additional details

Related works

Funding

References

Notably Inaccessible – Data Driven Understanding of Data Science Notebook (In)Accessibility

Creators

Contributors

Project leaders:

Project member:

Supervisor:

Description

Files

Notably Inaccessible.zip

Files (1.8 GB)

Additional details

Related works

Funding

References