This dataset is a collection of all notes taken for the article: David Broneske, Sebastian Kittan, and Jacob Krüger: Sharing Software-Evolution Datasets: Practices, Challenges, and Recommendations. Proc. ACM Softw. Eng. 1, FSE, 2024. https://doi.org/10.1145/3660798 - The folder "datasets_notes" comprises Word documents with notes on all papers that we have gathered through our manual search in dblp (including the two manually added GHTorrent papers). - The folder "datasets_problems" involves txt documents that summarize the problems the citing papers mention for each paper. - The folder "uses_problems" comprises Word documents with our notes on the problems mentioned in each snowballed paper. - The "usecasemapping.csv" are our notes on mapping the intended use cases ("use case") and involved data ("type") of a dataset ("title") to the use case ("used for") of the dataset-using papers ("used by"). - The "main_notes.xlsx" is the speardsheet with all of our data and notes. The individual sheets are: - "notes" -- a collection of notes and synthesized tables for exploring and visualizing the collected data. - "Main" -- our main collection of data on the dataset papers we collected manually. - Bibliographic information: author, year, title, record - category: new dataset, modified dataset, use dataset, or follow-up paper - notes: any additional notes - citations: number of citations - Data: reference to the word documents with the collected data - problem collection: problems reported for the dataset - source of data: from where the data was collected - download date: when the data was collected - data collection: process of data collection - daset available: whether the dataset is available - available at: link to the dataset - hosted at: platform/website on which the dataset is hosted - dataset last updated: when the dataset was last updated - data format: format of the dataset - quantity: notes on the number of entries in a dataset - data type: notes on the types of data involved - use case: notes on intended use cases - limitations of dataset: notes on limitations mentioned - querry example: example for a querry on the dataset as exemplified in the paper - content to anonymize: potentially privacy critical data - license: license under which the dataset is published - anonymization employed: what data was anonymized in what way - sheet: sheet in the speradsheet with the detailed notes - used and problems: how often used by another paper, but with problems - used but no problems: how often used by another paper, without problems - used: how often actually used in citing paper - not used: how often not used in citing paper - controll: controll column for uses numbers - addition to used: any additional notes on the use of the datasets - "Table_cited" -- working notes on the individual dataset and the papers that use these - "SELECTION" -- overview of the papers at MSR and their selection - "use_out_of_selection" -- papers out of "SELECTION" that use another dataset (i.e., mining challenge solutions) - remaining sheets: named after each dataset, notes and references on the papers that cite the respective dataset