%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% README for data.csv (Tabi et al 2023)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

We analyzed 479 sampled communities from 299 sites around the planet from the Reef Life Survey (RLS) database (Edgar et al, 2020) comprising population data from more than 1,500 non-benthic marine species with individual body size information. Body size was measured as biomass and data were aggregated by year. We included only sampling sites in our analysis, which were surveyed more than once per year. This decision is based on a prior rarefaction analysis that we conducted to assess the impacts of sampling effort from the RLS database due to the noisiness of one sampling event in species richness.
We collected weekly sea surface temperature (SST) from NOAA's (National Oceanic and Atmospheric Administration) remote sensing database. In our analysis, we used the sum of thermal stress anomalies (TSA), calculated as the number of events when the average difference between weekly SST and the maximum weekly climatological SST was above 1 degree Celsius between 1982 and 2019. The distribution of warm-water coral reef was obtained from UNEP-WCMC World Fish Centre database. The average trophic level of species was obtained from Fishbase database. Finally, the information on marine protected areas was obtained from UNEP-WCMC and IUCN Protected Planet database. The information on human population density was obtained from Gridded Population of the World. The human population density was quantified as humans/Km2 in a 25-km radius around the sampling site. Lastly, we used the regression coefficient between log biomass and log of average body sizes as a measure of community structure.
The "data.csv" file contains all raw data used to calculate the final results in the "results.csv" file. The variables and their definitions in "results.csv" are the following: 

site_code --> character string with site codes 
year -->  sampling year
ID --> the combination of site codes and sampling year
latitude --> latitude of site
longitude --> longitude of site
site_name --> name of site
ecoregion --> ecoregion of site
realm --> realm of site
area --> area name of site
sr --> number of species 
sampling_effort --> number of samples in the respective year
Reg_BS_BM_coef --> Linear regression coefficient between log average body size and log species biomass 
Reg_BS_BM_pvalue --> The p-value of the linear regression coefficient between log average body size and log species biomass 
rho_bs_tl --> Spearman's correlation coefficient between log average body size and average trophic level
rho_bs_tl_pvalue --> The p-value of the Spearman's correlation coefficient between log average body size and average trophic level 
cwm_size --> Community-weighted mean of average body sizes (weighted by abundance)
HumanPopDensity --> humans/Km2 in a 25-km radius around the sampling site 
SumTSAfull --> The sum of all the values of TSA between 1982 and end of 2019 that the value of TSA 
Buffer10Km --> binary variable, 1: coral reef within 10-km of the site, 0: otherwise
STATUS_YR -->  the year in which the MPA was created
IUCN_CAT --> There are different levels of protection, the IUCN has a classification for it: Allowed values: Ia, Ib, II, III, IV, V, VI, Not Applicable, Not Assigned, Not Reported
MPA --> a binary value, 1: the sampling site falls inside a MPA in the year of sampling, 0: otherwise
IUCN_Ia --> binary variable, 1: IUCN category Ia, 0: otherwise

