Sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies
Authors/Creators
Description
# Sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies
Data & Analysis Repository (Zenodo)
This repository contains datasets and analysis scripts used in a study of sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies, including brain morphology, eye morphology, wing reflectance spectrometry, pollen metabarcoding, and microhabitat capture–recapture analyses.
DOI: 10.5281/zenodo.17777739
## Version history
- v1.1 (2026-02-08): Added raw (untransformed) brain volume data, expanded variable-level documentation in the README, and removed absolute file paths from R scripts.
## Abstract
This repository contains the data and analysis scripts supporting the study “Sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies.” The study integrates brain neuroanatomy, eye morphology, wing reflectance spectrometry, pollen DNA metabarcoding, and microhabitat recapture data to test sex-specific differences in sensory investment and foraging ecology in Heliconius himera. Statistical analyses were conducted in R using mixed-effects models, allometric scaling, and multivariate community analyses. The datasets provided here allow reproduction of the key analyses and figures reported in the manuscript.
Sexual dimorphism in foraging behaviour is widespread in insects and may arise from differences in nutritional demands,
sensory systems, or cognition. In Heliconius, pollen-feeding is an evolutionary innovation among butterflies that
supports extended lifespans and sustained reproduction. However, how foraging behaviour varies between the sexes and
how it relates to sexually dimorphic traits remains poorly understood. We investigated sex-specific foraging strategies
in wild Heliconius himera, a highland specialist from southern Ecuador, using field surveys and DNA metabarcoding.
Females carried more pollen than males, consistent with higher nutritional demands. Yet both sexes used similar plant
richness and composition, indicating that differences reflect foraging effort rather than shifts in plant choice.
Gut samples revealed greater pollen diversity and a more consistent community profile than proboscis samples,
suggesting they better capture cumulative foraging history. To place these behavioural differences in a sensory context,
we quantified sexually dimorphic sensory traits. Males had larger eyes and more ommatidia, whereas females had larger
mushroom bodies, brain regions that support associative learning and memory. These patterns are consistent with sexual
dimorphism reported across Heliconius and may contribute to the observed sex differences in pollen foraging. Our findings
highlight how sex-specific foraging differences may arise from effort on shared floral resources and co-occur
with divergent sensory and neural investment, offering insights into the ecological basis of intraspecific variation
in pollen use.
## Contents
This Zenodo record contains:
- Data files: 9 (CSV/TXT) + 1 Excel workbook
- R scripts: 6
- README: this file
File names listed below match those deposited on Zenodo.
## Data and file overview
### 1) Brain morphology and sexual dimorphism
Data:
- Cyr_him_combined_brain_volumes.csv
Log10-transformed neuropil volumes and grouping variables for Heliconius (H. e. cyrbia and H. himera).
- Cyr_him_combined_brain_volumes_raw.csv
Raw (untransformed) neuropil volumes corresponding to the log-transformed file above.
- himera_lrt_table.csv
Likelihood-ratio test (LRT) results from mixed-effects models of neuropil volumes.
- smatr_table_himera.csv
Summary of SMATR outputs (slope, elevation, and major axis shift tests).
- TableS1_Hhim-brain-dimorphism.xlsx
Curated summary table used in the manuscript for brain dimorphism results.
Scripts:
- Brains_sex_him.r
Main pipeline for modelling sexual dimorphism in brain structure (mixed models, LRT tables, figures).
- smatr_eyes_himera.R
SMATR major axis analyses of scaling relationships (eye/brain scaling as used in the study).
### 2) Reflectance spectrometry and age prediction
Data:
- reflectance_age.csv
Raw reflectance spectra used for age prediction modelling (long format; wavelength-level reflectance).
- Vilcabamba_samples_age.csv
Individual-level table containing known ages (when available), predicted ages, and associated metadata.
Scripts:
- reflectance_age_modeling_full.R
End-to-end reflectance-based age modelling script (model fitting, validation, diagnostics).
### 3) Pollen metabarcoding and foraging ecology
Data:
- asv_vilcabamba_table.merge.txt
Merged ITS2 ASV table from proboscis and gut samples.
- samples_metadata_vilcabamba.csv
Metadata for sequenced samples and linked individual-level predicted age groupings.
Scripts:
- metabarcoding_tools_0-1a.R
Helper functions for data cleaning, ASV processing, and taxonomic lookup.
- R_ITS2_vilcabamba.R
ITS2 metabarcoding pipeline including filtering, taxonomic assignment, and diversity analyses.
- Pollen_loads.R
Analyses of pollen load categories, taxonomic richness, Shannon diversity, and related models.
### 4) Microhabitat and capture–recapture data
Data:
- Vilcambamba_recapture_data.csv
Capture–recapture dataset including pollen categories and environmental/logistical fields.
- recaptures_logger_1h.csv
High-frequency temperature and humidity logger data linked to capture/recapture events.
## Variable descriptions (data dictionaries)
This section describes variables (columns), their data types, and units/formats where applicable.
### Cyr_him_combined_brain_volumes.csv
Neuropil volume measurements and grouping variables for Heliconius (H. e. cyrbia and H. himera) used in brain dimorphism and scaling analyses.
Important note: All neuropil volume variables in this file are log10-transformed and stored with the prefix `log_` (i.e., log10(volume in µm³)).
Variables:
- ID (factor): individual identifier (e.g., `1_wild`, `10_wild`)
- Species (factor): species identity (e.g., `H. e. cyrbia`, `H. himera`)
- Sex (factor): `female`, `male`
- Location (factor): collection / rearing location (e.g., `Balsas`, `Cambridge`, etc.)
- Type (factor): `reared`, `wild`
Log10-transformed neuropil volumes (numeric; log10(µm³)):
- log_ME: medulla
- log_LAM: lamina
- log_LOB: lobula
- log_LOP: lobula plate
- log_aME: accessory medulla
- log_vLOB: ventral lobula
- log_rCB: remainder of central brain
- log_AL: antennal lobe
- log_AOTU: anterior optic tubercle
- log_POTU: posterior optic tubercle
- log_MBCA: mushroom body calyx
- log_MBPED: mushroom body peduncle / lobes (as defined in the study)
- log_OL: optic lobe (aggregate)
- log_CBR: central brain region (as defined in the study)
Raw (untransformed) neuropil volumes are provided in Cyr_him_combined_brain_volumes_raw.csv.
### Cyr_him_combined_brain_volumes_raw.csv
Raw (untransformed) neuropil volumes corresponding to the same individuals and structures as Cyr_him_combined_brain_volumes.csv.
Volumes are expressed in cubic micrometers (µm³). Values were log10-transformed for analysis.
Variables:
- ID (factor): individual identifier
- Species (factor): species identity
- Sex (factor): `female`, `male`
- Location (factor): collection / rearing location
- Type (factor): `reared`, `wild`
Raw neuropil volumes (numeric; µm³):
- ME: medulla
- LAM: lamina
- LOB: lobula
- LOP: lobula plate
- aME: accessory medulla
- vLOB: ventral lobula
- rCB: remainder of central brain
- AL: antennal lobe
- AOTU: anterior optic tubercle
- POTU: posterior optic tubercle
- MBCA: mushroom body calyx
- MBPED: mushroom body peduncle / lobes (as defined in the study)
- OL: optic lobe (aggregate)
- CBR: central brain region (as defined in the study)
### Hhimera_eyemorph.csv
Eye and body size measurements for Heliconius himera.
Identifiers and grouping:
- ID (factor): individual identifier (e.g., `H543`)
- collection (factor): collection code (e.g., `ERC_ECU`)
- sex (factor): `female`, `male`
- type (factor): taxon label (here `H_himera`)
- brood (character): brood / stock identifier (may be NA)
- wild_insectary (factor): `insectary`, `wild`
- location (factor): sampling location (`IKIAM`, `Vilcabamba`), may be NA
- latitude, longitude (numeric): coordinates in decimal degrees, may be NA
- elevation (character): elevation label / notes, may be NA
- observer (character): measurer identity
Body size:
- tibia_length (numeric): tibia length (units as measured in the study, typically mm)
- abdomen_length (numeric): abdomen length (same unit convention)
- total_body_length (numeric): total body length (same unit convention; may be NA)
- inter_eye_width (numeric): inter-eye width (same unit convention; may be NA)
Eye area and facet counts:
- L_area, R_area (numeric): left and right eye area (units as measured; typically mm²)
- L_whole_count, R_whole_count (numeric/integer): facet (ommatidia) counts per eye
Image bookkeeping and notes:
- wing_image (logical): whether a wing image exists/was used
- box_and_location (character): storage / box reference
- notes (character): notes on reanalysis / issues
- clade (factor): genetic clade label (e.g., `EAST_1`), may be NA
- gps_location (character): optional GPS location field, may be NA
- gps_latitude, gps_longitude (numeric): optional GPS coordinates, may be NA
- wing_id (character): wing image identifier (may be NA)
### himera_lrt_table.csv
Likelihood-ratio test (LRT) results from mixed-effects models of neuropil volumes.
Variables:
- Neuropil (character): neuropil name (e.g., `AL`)
- Predictor (character): model term tested (e.g., `Location`, `sex`, `rCB`, `SexLocation`)
- LogLik_reduced (numeric): log-likelihood of reduced model
- LogLik_full (numeric): log-likelihood of full model
- CHISQ (numeric): likelihood-ratio chi-square statistic
- p_value (numeric): p-value for the LRT
- Adjusted_p_value (numeric): multiple-testing adjusted p-value
- Significance (character): significance code used in the manuscript tables
### smatr_table_himera.csv
Summary of SMATR outputs for sex differences in scaling relationships.
Variables:
- Neuropil (character): neuropil name
Slope tests:
- Slope_LR_statistic (numeric): likelihood-ratio statistic for slope difference
- Slope_P_value (numeric): slope test p-value
- Slope_FDR (numeric): FDR-adjusted slope p-value
- Slope_r (numeric): correlation coefficient used by SMATR
- Slope_DI (logical): direction indicator (if computed; NA if not used)
Elevation tests:
- Elevation_Wald_statistic (numeric): Wald statistic for elevation shift
- Elevation_P_value (numeric): elevation test p-value
- Elevation_FDR (numeric): FDR-adjusted elevation p-value
- Elevation_r (numeric): correlation coefficient
- Elevation_DI (character): direction indicator (e.g., `male`, `female`)
Major axis shift tests:
- Major_Axis_Wald_statistic (numeric)
- Major_Axis_P_value (numeric)
- Major_Axis_FDR (numeric)
- Major_Axis_r (numeric)
- Major_Axis_DI (character): direction indicator
### recaptures_logger_1h.csv
High-frequency logger data linked to capture / recapture events (time series of temperature and relative humidity).
Indexing:
- X (integer): row index
- ID (factor): butterfly identifier
- File_name (character): logger file identifier
Grouping:
- Species (ordered factor): species category (ordered levels in file)
- Sex (factor): `f`, `m`
- Type (factor): `reared`, `wild`
- Location (factor): site (e.g., `Balsas`, `Hybridzone`, etc.)
- Observer (factor): observer identity
Capture context:
- Date_time_capture (POSIXct): capture timestamp
- Recapture (ordered factor): capture status (e.g., `no`, `release`, etc.)
- Pollen (ordered factor): ordinal pollen load category (`no`, `small`, `large`), may be NA
- Pollen_binom (ordered factor): binary pollen category, may be NA
- Pollen_binary (numeric): numeric coding of pollen presence, may be NA
- Bodylength (numeric): body length (units as measured), may be NA
- Notes (character): notes
Logger readings:
- Logger (factor): logger identifier (e.g., `RH1`)
- Date_time (POSIXct): timestamp of logger observation
- Temperature (numeric): temperature (°C)
- RH (numeric): relative humidity (%)
- Date_Time_Back (POSIXct): timestamp of return/back-reading (if used)
- Date_read (POSIXct/Date): date read from logger
Coordinates:
- Latitude, Longitude (numeric): degrees
- geometry (logical): placeholder column (not used for analyses)
### reflectance_age.csv
Raw reflectance spectra used for age prediction model training and validation (long format).
Variables:
- wl (numeric): wavelength (nm)
- wing (character): wing region (e.g., `fw`)
- color (character): color patch label (e.g., `black`)
- species (character): species code (e.g., `ccw`)
- sex (character): `female`, `male`
- individual (character): individual identifier (e.g., `LMU_05951`)
- measurement (numeric): replicate measurement index
- reflectance (numeric): reflectance value (unitless; as recorded by the spectrometer workflow)
- source (character): dataset source (e.g., `Colombia`)
- Age_days (numeric): known age in days (when available)
### samples_metadata_vilcabamba.csv
Metadata table linking metabarcoding samples to individuals and associated traits.
Identifiers:
- Org_ID (character): original individual identifier
- LMU_ID (character): LMU individual identifier
- ABD_ID (character): abdomen/sample identifier
- Insectary_ID (character): insectary/cage identifier
Sample and biology:
- Type (character): sampled tissue (e.g., `Proboscis`; other values possible depending on processing)
- Species (character): species label (e.g., `himera`)
- Sex (character): `f`, `m`
- Origin (character): origin category (e.g., `wild`)
Timing and place:
- Recapture_date (character): recapture date string (YYYY_MM_DD)
- Recapture_coordinates (numeric): coordinates field as stored in this file
Derived age fields:
- Predicted_age (numeric): predicted age in days from reflectance model
- Relative_age (numeric): scaled relative age (unitless, 0–1)
- Group (character): combined grouping label used in analyses (e.g., `f_Proboscis`)
- Age_group (factor): categorical age class (`Young`, `Middle`, `Old`)
Other processing/bookkeeping fields:
Several columns are placeholders or processing flags and may contain NA values depending on workflow stage:
- Location, Date_sampled, DNA, Wings, Brain_fixed, Phero_control, Phero_andro, RNA_antennae,
Pollen_proboscis, Abdomen_sequencing, Proboscis_Sequencing, Logger_nr, Observer, Notes, Age_days
### Vilcabamba_samples_age.csv
Known-age and predicted-age table for Vilcabamba individuals used for age validation and downstream models.
Variables (selected):
- LMU_ID (character): individual identifier
- Insectary_ID (character): insectary/cage identifier
- Species (character): species code (e.g., `him`)
- Sex (character): `f`, `m`
- Age_days (numeric): known age in days (may be NA)
- Predicted_age (numeric): predicted age in days
- Relative_age (numeric): scaled relative age (unitless, 0–1)
- Brood (character): brood identifier
- Date_eclosed (Date): eclosion date
- Release_date, Recapture_date (character): date strings (YYYY_MM_DD)
- Release_coordinates, Recapture_coordinates (character): coordinate strings
- Location (character): site label
- Date_sampled (Date): sampling date
- DNA, Wings, Brain_fixed, RNA_antennae, Pollen_proboscis (character): Y/N processing flags
- Logger_nr (character): logger identifier
- Eyes_intact (character): Y/N field
- Observer (character): observer identity
- Location_body (character): where body was processed/stored
- Notes (character): free text
### Vilcambamba_recapture_data.csv
Capture–recapture dataset including pollen categories and associated environmental / logistical fields.
Variables:
- ID (factor): butterfly identifier
- Species (factor): species category
- Sex (factor): sex (`f`, `m`, `unknown`, or empty)
- Type (factor): `reared`, `wild`
- Date_time_capture (POSIXct): capture timestamp (may be NA for some rows)
- Pollen (ordered factor): ordinal pollen load category (`no`, `small`, `large`)
- Pollen_binom (ordered factor): binary pollen category (`yes`, `no`)
- Pollen_binary (numeric): numeric coding of pollen presence (0/1)
- Recapture (ordered factor): capture status
- Bodylength (numeric): body length (units as measured)
- Logger (factor): associated logger identifier (if applicable)
- Date_Time_Back (POSIXct): return/back time (if applicable)
- Location (factor): site label
- Latitude, Longitude (numeric): degrees
- geometry (logical): placeholder column (not used for analyses)
- Observer (factor): observer identity
- Notes (character): notes
## Reproducibility notes
- Analyses were performed in R (see “R Environment” section below).
- Scripts are designed to run using relative paths when the data files are located in the same directory as the scripts.
- If you adapt the workflow, set your working directory to the repository folder and avoid absolute paths.
## R Environment (for reproducibility)
Analyses were conducted in R on macOS with the following environment:
- R version: 4.5.1 (2025-06-13)
- Platform: aarch64-apple-darwin20
- OS: macOS Tahoe 26.2
- Time zone: Europe/Berlin
Attached packages (version numbers) include:
viridis (0.6.5); viridisLite (0.4.2); sp (2.2-0); sf (1.0-21); scales (1.4.0);
rnaturalearth (1.1.0); RColorBrewer (1.1-3); patchwork (1.3.2); ordinal (2023.12-4.1);
nnet (7.3-20); leaflet (2.2.3); lattice (0.22-7); htmlwidgets (1.6.4);
glmmTMB (1.1.12); ggmap (4.0.2); geosphere (1.5-20); DHARMa (0.4.7);
ggsci (3.2.0); speedyseq (0.5.3.9021); reshape2 (1.4.4); bipartite (2.22);
vegan (2.7-2); permute (0.9-8); sna (2.8); network (1.19.0);
statnet.common (4.12.0); phyloseq (1.41.1); randomForest (4.7-1.2); mgcv (1.9-3);
nlme (3.1-168); future (1.67.0); ggpmisc (0.6.2); ggpp (0.5.9); zoo (1.8-14);
ggspectra (0.3.16); photobiology (0.13.2); SunCalcMeeus (0.1.2); pavo (2.9.0);
janitor (2.2.1); broom (1.0.10); writexl (1.5.4); multcompView (0.1-10);
emmeans (1.11.2-8); pbkrtest (0.5.5); MASS (7.3-65); lme4 (1.1-37);
Matrix (1.7-4); gridExtra (2.3); effects (4.2-4); car (3.1-3); carData (3.0-5);
ggpubr (0.6.1); cowplot (1.2.0); smatr (3.4-8); readxl (1.4.5); lubridate (1.9.4);
forcats (1.0.0); stringr (1.6.0); dplyr (1.1.4); purrr (1.2.0); readr (2.1.5);
tidyr (1.3.1); tibble (3.3.0); ggplot2 (4.0.1); tidyverse (2.0.0)
(Full package namespace list is available via sessionInfo().)
## How to run the analyses
A typical workflow is:
1. Download and unzip the Zenodo record to a local folder.
2. Open R/RStudio and set your working directory to the folder containing the data and scripts.
3. Run scripts in the order relevant to each analysis component (brain, eye morphology, reflectance-age, metabarcoding, recapture).
Scripts are intended to use local, relative paths (i.e., they assume files are in the same folder). If you adapt the workflow, avoid absolute file paths.
## Contributors and contact
Creators:
- José Borrero, Department of Evolutionary Biology, LMU Munich
ORCID: 0000-0003-0164-496X
Email: jose.borrero@lmu.de
Contributors:
- David F. Rivas-Sánchez
- Daniel Shane Wright
- Stephen H. Montgomery
- Alexander Keller
- Richard M. Merrill
PI contact:
- Richard M. Merrill, Department of Evolutionary Biology, LMU Munich
ORCID: 0000-0003-4527-9298
Email: merrill@bio.lmu.de
## Funding
This research was funded by a European Research Council (ERC) Starter Grant (851040) to R.M.M.
## Dates and locations of data collection
- Field sampling and recapture datasets were collected during 2023 in southern Ecuador (Vilcabamba region) and associated sites used in the study.
- Laboratory measurements and analyses were conducted at LMU Munich.
## License
This Zenodo record is released under the license selected on Zenodo for the deposition.
If you reuse these data, please cite the Zenodo DOI above and the associated publication.
END OF README
Files
asv_vilcabamba_table.merge.txt
Files
(65.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5f42075c4974bcac06c1666185d11137
|
623.4 kB | Preview Download |
|
md5:af346202394da7238411d2c7b3a6a9f3
|
272.7 kB | Download |
|
md5:170b033fad04ecaa74f11bc5e93af2bc
|
21.9 kB | Preview Download |
|
md5:c5e2746956feb5171eef26b151dde33b
|
23.6 kB | Preview Download |
|
md5:d4bcea2f5680b4662eaa81c997450375
|
11.6 kB | Preview Download |
|
md5:78cc12ab6c57927dd8a5c66a94085d84
|
2.8 kB | Preview Download |
|
md5:983882cd708221748d41d9b2a49d8bef
|
12.8 kB | Download |
|
md5:a6ccc6cf0384f48660fc8789f7434420
|
159.5 kB | Download |
|
md5:143f3041ef95cb93eaff7782e57c5235
|
14.2 kB | Download |
|
md5:ef3d1889d808cbfbc933daed6465f9b9
|
19.8 kB | Preview Download |
|
md5:876b2ccc3069db44085afb60691593b9
|
2.4 MB | Preview Download |
|
md5:50f1e1c143b5dd4a7340ad5c73d62ca0
|
61.5 MB | Preview Download |
|
md5:a3123bcc5f1a35ddd9f332f74015d31b
|
21.8 kB | Download |
|
md5:e40eedd99a7b21566b265c080903fb28
|
11.7 kB | Preview Download |
|
md5:402750431c76442bd0025a2d854b4e41
|
18.5 kB | Download |
|
md5:9c32057225a81b28e235065fd6474e8f
|
3.0 kB | Preview Download |
|
md5:db818596f53bf56c0dde08716877cff1
|
15.2 kB | Download |
|
md5:d764b87a41f7f6f95e53ef8267813f79
|
63.0 kB | Preview Download |
|
md5:e82d9fa564e8fef3d5b31a382d7fe5d9
|
123.8 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2025.09.18.676771 (DOI)
Funding
Software
- Programming language
- R