For full information on data collection, processing and analysis, please see Methods section of associated publication. ### DATA FILES ### "Setting_level_folktunes_data.csv" contains data for 12422 tune settings (versions). This dataset includes only tune settings that have been added to at least one tune set and have complete data for all melodic complexity measures. The dataset contains the following columns: 'setting_id': unique numeric tune setting identifier 'tune_id': unique numeric tune identifier 'name': tune name 'type': tune type (barndance, hornpipe, jig, march, mazurka, polka, reel, slide, slip jig, strathspey, three-two OR waltz) 'date': date when the setting was uploaded to The Session website 'bar_count': number of bars in the setting 'notecount': number of notes in the setting 'compltrans': measure of melodic complexity based on tone transitions; values are integers (inverse of averaged probabilities), scaled between 0 and 10 (higher = more complex) 'complebm': measure of melodic complexity based on melodic expectations; values are integers calibrated in relation to the Essen Folk Song Collection, scaled between 0 and 10 (higher = more complex) 'entropypc1': first order entropy of pitch class distributions, measured as relative entropy which falls between 0 and 1 'entropypc2': second order entropy of pitch class distributions (adjacent pitch pairs), measured as relative entropy which falls between 0 and 1 'novelty1': measure of melodic complexity based on self-similarity, based on arbitrary units scaled between 0 and 1 (1 = maximal novelty) 'set_adds': number of sets a setting has been added to "Tune_level_folktunes_data.csv" contains data for 9378 tunes (based on setting level data aggregated to the tune level). This dataset contains only tunes that have been added to at least 10 tunebooks, with complete data for all melodic complexity measures. The dataset contains the following columns: 'tune_id': unique numeric tune identifier 'name': tune name 'date': date when the first setting of the tune was uploaded to The Session website 'bar_count': median number of bars across settings of the tune 'notecount': median number of notes across settings of the tune 'compltrans': median measure of melodic complexity based on tone transitions across settings of the tune; values are integers (inverse of averaged probabilities), scaled between 0 and 10 (higher = more complex) 'complebm': median measure of melodic complexity based on melodic expectations across settings of the tune; values are integers calibrated in relation to the Essen Folk Song Collection, scaled between 0 and 10 (higher = more complex) 'entropypc1': median first order entropy of pitch class distributions across settings of the tune, measured as relative entropy which falls between 0 and 1 'entropypc2': median second order entropy of pitch class distributions (adjacent pitch pairs) across settings of the tune, measured as relative entropy which falls between 0 and 1 'novelty1': median measure of melodic complexity based on self-similarity across settings of the tune, based on arbitrary units scaled between 0 and 1 (1 = maximal novelty) 'bar_count_IQR': variation (inter-quartile range) of number of bars across settings of the tune 'notecount_IQR': variation (inter-quartile range) of number of notes across settings of the tune 'compltrans_IQR': variation (inter-quartile range) of measure of melodic complexity based on tone transitions across settings of the tune 'complebm_IQR': variation (inter-quartile range) of measure of melodic complexity based on melodic expectations across settings of the tune 'entropypc1_IQR': variation (inter-quartile range) of first order entropy of pitch class distributions across settings of the tune 'entropypc2_IQR': variation (inter-quartile range) of second order entropy of pitch class distributions (adjacent pitch pairs) across settings of the tune 'novelty1_IQR': variation (inter-quartile range) of measure of melodic complexity based on self-similarity across settings of the tune 'tunebooks': number of tunebooks the tune has been added to 'settings': number of tune settings ### SCRIPT FILES ### "Folktunes_analyses.R" contains the R code for the empirical analyses reported in the main text. "Folktunes_SI_analyses.R" contains the R code for the empirical analyses reported in the Supplementary Information. "Folktunes_simulation.R" contains the R code for the simulation analyses.