This README.txt file was generated on 2023-03-23 by Daniel Suárez GENERAL INFORMATION 1. Title of Dataset: Dispersal ability and niche breadth influence interspecific variation in spider abundance and occupancy 2. Author Information Corresponding Investigator Name: Dr Brent Charles Emerson Institution: IPNA-CSIC, San Cristóbal de La Laguna, Canary Islands, Spain Email: bemerson@ipna.csic.es Co-investigator 1 Name: Daniel Suárez Institution: IPNA-CSIC, San Cristóbal de La Laguna, Canary Islands, Spain Co-investigador 2 Name: Paula Arribas Institution: IPNA-CSIC, San Cristóbal de La Laguna, Canary Islands, Spain Co-investigador 3 Name: Nuria Macías Institution: 3. Departamento de Biología Animal, Edafología y Geología, Universidad de La Laguna, Tenerife, Spain 3. Date of data collection: 2012-2021 4. Geographic location of data collection: Canary Islands, Spain 5. Recommended citation for this dataset: Suárez et al. (2022). Data from: Dispersal ability and niche breadth influence interspecific variation in spider abundance and occupancy DATA & FILE OVERVIEW 1. Description of dataset These dataset were generated to investigate the role of dispersal ability and niche breadth in the relationship between abundance and occupancy in a community of spiders Genetic data includes sequences of the cytochrome oxidase I (COI) and 28S ribosomal RNA (28S) obtained with Sanger sequencing. 2. File List: File 1 Name: Alignment_spiders_COI.fasta File Description: fasta file comprising 123 COI spider sequences. File 2 Name: Alignment_spiders_28S.fasta File Description: fasta file comprising 68 28S spider sequences. File 3 Name: Input_mean_abundance.csv File Description: database for abundance-occupancy analyses of 123 spiders considering mean abundance. File 4 Name: Input_site_abundance.csv File Description: database for abundance-occupancy analyses of 123 spiders considering site-level abundance. METHODOLOGICAL INFORMATION The 5' region (658 bp) of the mtDNA COI gene was amplified using the primers LCO1490 and HCO2198 in order to identify presumed biological species (PBS). A custom R script was used to produce an unweighted pair group method with arithmetic mean (UPGMA) tree from pairwise K2P distances. PBS were then taxonomically assigned to species or genus level based on GenBank and BOLD Systems search results. PBS (hereafter species) were categorised for dispersal ability, considering the potential for juvenile stages to be passively dispersed by air currents, while suspended from silk threads, henceforth referred to as 'ballooning', which has been broadly used as a proxy for dispersal ability. A two-step process was used to categorise species as either ‘non-specialist’ or ‘specialist’, based on their affinity to the laurel forest habitat, i.e., whether they are present in a broad range of habitats or restricted mainly to laurel forest. As a first step, the species list was assessed by five Canary Island spider specialists, ), to categorise species as either laurel forest specialists or non-specialists based on their knowledge of the biology of the species. In a second step, species association to the laurel forest was quantified using distribution records within the Biodiversity Data Bank of the Canary Islands (https://www.biodiversidadcanarias.es/biota/). The total number of 500 x 500 m cells occupied by a species in the archipelago was quantified and the percentage of those cells corresponding with laurel forest habitat was then calculated. For the subset of species that were categorised by the panel of experts as laurel forest specialists, species were ranked from highest to lowest percent laurel forest occupancy, and the percentage corresponding to the 50% quantile was used for the categorisation of the species non-characterised in the first step (i.e., above this percentage species were considered as ‘specialists’, whereas below it, species were categorised as ‘non-specialist’). Additionally, 10%, 25% and 75% quantiles were also used for niche breadth categorisation to explore the robustness of further inferences. For each species, local abundance was calculated as i) the mean site abundance (i.e., the sum of all individuals divided by the number of occupied sites at both island and archipelago scales, so with species as the replicate unit), and ii) the individual site abundance (i.e., the sum of all individuals within each site, so with site as the replicate unit). Occupancy was calculated both across islands (i.e., presence or absence of each species within each of the islands) and across sites (i.e., presence or absence of each species within each site). The latter was again calculated both at the archipelago (across all sites) and at the island (across all sites within a given island) scales. AORs were expressed as a logistic regression between the occupancy and the log of the abundance per species (mean site abundance per species). DATA-SPECIFIC INFORMATION FOR: Alignment_spiders_COI.fasta 1. Number of cases/rows: 123 2. Sequence names: Sequences are named after PBS code, the family and the species (Genus_sp). DATA-SPECIFIC INFORMATION FOR: Alignment_spiders_28S.fasta 1. Number of cases/rows: 68 2. Sequence names: Sequences are named after PBS code, the family and the species (Genus_sp). DATA-SPECIFIC INFORMATION FOR: Input_mean_abundance.csv 1. Number of variables: 30 2. Number of cases/rows: 124 3. Variables List: PBS_code: code of each PBS Dispersal: dispersal ability (ballooning or non-ballooning) of each PBS Family: taxonomic family of each PBS HS_exp: niche breadth based only on the expert panel clasiffication of each PBS ('¿?' stands for unknown classification) HS_Q50: niche breadth based on the cuantile 50 of each PBS ('¿?' stands for unknown classification) HS_Q10: niche breadth based on the cuantile 10 of each PBS ('¿?' stands for unknown classification) HS_Q25: niche breadth based on the cuantile 20 of each PBS ('¿?' stands for unknown classification) HS_Q75: niche breadth based on the cuantile 70 of each PBS ('¿?' stands for unknown classification) Total_N_AR: total number of individuals collected at the archipelago scale Abundance_AR: mean abundance per PBS Islands_Successes: number of islands where a given PBS was found Islands_Failures: number of islands where a given PBS was not found Successes_AR: number of sites where a given PBS was found Failures_AR: number of sites where a given PBS was not found Total_N_TF: total number of individuals collected on Tenerife ('NA' stands for not applicable, ie., species not present at this given island) Abundance_TF: mean abundance on Tenerife ('NA' stands for not applicable, ie., species not present at this given island) Successes_TF: number of sites on Tenerife where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_TF: number of sites on Tenerife where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island) Total_N_LG: total number of individuals collected on La Gomera ('NA' stands for not applicable, ie., species not present at this given island) Abundance_LG: mean abundance on La Gomera ('NA' stands for not applicable, ie., species not present at this given island) Successes_LG: number of sites on La Gomera where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_LG: number of sites on La Gomera where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island) Total_N_LP: total number of individuals collected on La Palma ('NA' stands for not applicable, ie., species not present at this given island) Abundance_LP: mean abundance on La Palma ('NA' stands for not applicable, ie., species not present at this given island) Successes_LP: number of sites on La Palma where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_LP: number of sites on La Palma where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island) Total_N_EH: total number of individuals collected on El Hiero ('NA' stands for not applicable, ie., species not present at this given island) Abundance_EH: mean abundance on El Hierro ('NA' stands for not applicable, ie., species not present at this given island) Successes_EH: number of sites on El Hierro where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_EH: number of sites on El Hierro where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island) DATA-SPECIFIC INFORMATION FOR: Input_site_abundance.csv 1. Number of variables: 24 2. Number of cases/rows: 998 3. Variables List: PBS_code: code of each PBS Dispersal: dispersal ability (ballooning or non-ballooning) of each PBS Family: taxonomic family of each PBS HS_exp: niche breadth based only on the expert panel clasiffication of each PBS ('¿?' stands for unknown classification) HS_Q50: niche breadth based on the cuantile 50 of each PBS ('¿?' stands for unknown classification) HS_Q10: niche breadth based on the cuantile 10 of each PBS ('¿?' stands for unknown classification) HS_Q25: niche breadth based on the cuantile 20 of each PBS ('¿?' stands for unknown classification) HS_Q75: niche breadth based on the cuantile 70 of each PBS ('¿?' stands for unknown classification) Abundance: number of individuals of a given PBS collected at a given site Site: code of each site Island: code of each islands (TF - Tenerife; LG - La Gomera; LP - La Palma; EH - El Hierro) Island_code: numeric code of each islands (1 - Tenerife; 2 - La Gomera; 3 - La Palma; 4 - El Hierro) Islands_Successes: number of islands where a given PBS was found Islands_Failures: number of islands where a given PBS was not found Successes_AR: number of sites where a given PBS was found Failures_AR: number of sites where a given PBS was not found Successes_TF: number of sites on Tenerife where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_TF: number of sites on Tenerife where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island) Successes_LG: number of sites on La Gomera where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Successes_LP: number of sites on La Palma where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_LP: number of sites on La Palma where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island) Successes_EH: number of sites on El Hierro where a given PBS was found ('NA' stands for not applicable, ie., species not present at this given island) Failures_EH: number of sites on El Hierro where a given PBS was not found ('NA' stands for not applicable, ie., species not present at this given island)