Published February 8, 2026 | Version Version: 1.1.0
Dataset Open

Sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies

  • 1. ROR icon Ludwig-Maximilians-Universität München

Description

# Sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies
Data & Analysis Repository (Zenodo)

This repository contains datasets and analysis scripts used in a study of sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies, including brain morphology, eye morphology, wing reflectance spectrometry, pollen metabarcoding, and microhabitat capture–recapture analyses.

DOI: 10.5281/zenodo.17777739

## Version history

- v1.1 (2026-02-08): Added raw (untransformed) brain volume data, expanded variable-level documentation in the README, and removed absolute file paths from R scripts.


## Abstract

This repository contains the data and analysis scripts supporting the study “Sexual dimorphism in pollen foraging and sensory traits in Heliconius butterflies.” The study integrates brain neuroanatomy, eye morphology, wing reflectance spectrometry, pollen DNA metabarcoding, and microhabitat recapture data to test sex-specific differences in sensory investment and foraging ecology in Heliconius himera. Statistical analyses were conducted in R using mixed-effects models, allometric scaling, and multivariate community analyses. The datasets provided here allow reproduction of the key analyses and figures reported in the manuscript.


Sexual dimorphism in foraging behaviour is widespread in insects and may arise from differences in nutritional demands,
sensory systems, or cognition. In Heliconius, pollen-feeding is an evolutionary innovation among butterflies that 
supports extended lifespans and sustained reproduction. However, how foraging behaviour varies between the sexes and 
how it relates to sexually dimorphic traits remains poorly understood. We investigated sex-specific foraging strategies 
in wild Heliconius himera, a highland specialist from southern Ecuador, using field surveys and DNA metabarcoding. 
Females carried more pollen than males, consistent with higher nutritional demands. Yet both sexes used similar plant 
richness and composition, indicating that differences reflect foraging effort rather than shifts in plant choice. 
Gut samples revealed greater pollen diversity and a more consistent community profile than proboscis samples, 
suggesting they better capture cumulative foraging history. To place these behavioural differences in a sensory context, 
we quantified sexually dimorphic sensory traits. Males had larger eyes and more ommatidia, whereas females had larger 
mushroom bodies, brain regions that support associative learning and memory. These patterns are consistent with sexual 
dimorphism reported across Heliconius and may contribute to the observed sex differences in pollen foraging. Our findings 
highlight how sex-specific foraging differences may arise from effort on shared floral resources and co-occur 
with divergent sensory and neural investment, offering insights into the ecological basis of intraspecific variation 
in pollen use.


## Contents

This Zenodo record contains:
- Data files: 9 (CSV/TXT) + 1 Excel workbook
- R scripts: 6
- README: this file

File names listed below match those deposited on Zenodo.


## Data and file overview

### 1) Brain morphology and sexual dimorphism

Data:
- Cyr_him_combined_brain_volumes.csv  
  Log10-transformed neuropil volumes and grouping variables for Heliconius (H. e. cyrbia and H. himera).
- Cyr_him_combined_brain_volumes_raw.csv  
  Raw (untransformed) neuropil volumes corresponding to the log-transformed file above.
- himera_lrt_table.csv  
  Likelihood-ratio test (LRT) results from mixed-effects models of neuropil volumes.
- smatr_table_himera.csv  
  Summary of SMATR outputs (slope, elevation, and major axis shift tests).
- TableS1_Hhim-brain-dimorphism.xlsx  
  Curated summary table used in the manuscript for brain dimorphism results.

Scripts:
- Brains_sex_him.r  
  Main pipeline for modelling sexual dimorphism in brain structure (mixed models, LRT tables, figures).
- smatr_eyes_himera.R  
  SMATR major axis analyses of scaling relationships (eye/brain scaling as used in the study).


### 2) Reflectance spectrometry and age prediction

Data:
- reflectance_age.csv  
  Raw reflectance spectra used for age prediction modelling (long format; wavelength-level reflectance).
- Vilcabamba_samples_age.csv  
  Individual-level table containing known ages (when available), predicted ages, and associated metadata.

Scripts:
- reflectance_age_modeling_full.R  
  End-to-end reflectance-based age modelling script (model fitting, validation, diagnostics).


### 3) Pollen metabarcoding and foraging ecology

Data:
- asv_vilcabamba_table.merge.txt  
  Merged ITS2 ASV table from proboscis and gut samples.
- samples_metadata_vilcabamba.csv  
  Metadata for sequenced samples and linked individual-level predicted age groupings.

Scripts:
- metabarcoding_tools_0-1a.R  
  Helper functions for data cleaning, ASV processing, and taxonomic lookup.
- R_ITS2_vilcabamba.R  
  ITS2 metabarcoding pipeline including filtering, taxonomic assignment, and diversity analyses.
- Pollen_loads.R  
  Analyses of pollen load categories, taxonomic richness, Shannon diversity, and related models.


### 4) Microhabitat and capture–recapture data

Data:
- Vilcambamba_recapture_data.csv  
  Capture–recapture dataset including pollen categories and environmental/logistical fields.
- recaptures_logger_1h.csv  
  High-frequency temperature and humidity logger data linked to capture/recapture events.


## Variable descriptions (data dictionaries)

This section describes variables (columns), their data types, and units/formats where applicable.


### Cyr_him_combined_brain_volumes.csv

Neuropil volume measurements and grouping variables for Heliconius (H. e. cyrbia and H. himera) used in brain dimorphism and scaling analyses.

Important note: All neuropil volume variables in this file are log10-transformed and stored with the prefix `log_` (i.e., log10(volume in µm³)).

Variables:
- ID (factor): individual identifier (e.g., `1_wild`, `10_wild`)
- Species (factor): species identity (e.g., `H. e. cyrbia`, `H. himera`)
- Sex (factor): `female`, `male`
- Location (factor): collection / rearing location (e.g., `Balsas`, `Cambridge`, etc.)
- Type (factor): `reared`, `wild`

Log10-transformed neuropil volumes (numeric; log10(µm³)):
- log_ME: medulla
- log_LAM: lamina
- log_LOB: lobula
- log_LOP: lobula plate
- log_aME: accessory medulla
- log_vLOB: ventral lobula
- log_rCB: remainder of central brain
- log_AL: antennal lobe
- log_AOTU: anterior optic tubercle
- log_POTU: posterior optic tubercle
- log_MBCA: mushroom body calyx
- log_MBPED: mushroom body peduncle / lobes (as defined in the study)
- log_OL: optic lobe (aggregate)
- log_CBR: central brain region (as defined in the study)

Raw (untransformed) neuropil volumes are provided in Cyr_him_combined_brain_volumes_raw.csv.


### Cyr_him_combined_brain_volumes_raw.csv

Raw (untransformed) neuropil volumes corresponding to the same individuals and structures as Cyr_him_combined_brain_volumes.csv.

Volumes are expressed in cubic micrometers (µm³). Values were log10-transformed for analysis. 

Variables:
- ID (factor): individual identifier
- Species (factor): species identity
- Sex (factor): `female`, `male`
- Location (factor): collection / rearing location
- Type (factor): `reared`, `wild`

Raw neuropil volumes (numeric; µm³):
- ME: medulla
- LAM: lamina
- LOB: lobula
- LOP: lobula plate
- aME: accessory medulla
- vLOB: ventral lobula
- rCB: remainder of central brain
- AL: antennal lobe
- AOTU: anterior optic tubercle
- POTU: posterior optic tubercle
- MBCA: mushroom body calyx
- MBPED: mushroom body peduncle / lobes (as defined in the study)
- OL: optic lobe (aggregate)
- CBR: central brain region (as defined in the study)


### Hhimera_eyemorph.csv

Eye and body size measurements for Heliconius himera.

Identifiers and grouping:
- ID (factor): individual identifier (e.g., `H543`)
- collection (factor): collection code (e.g., `ERC_ECU`)
- sex (factor): `female`, `male`
- type (factor): taxon label (here `H_himera`)
- brood (character): brood / stock identifier (may be NA)
- wild_insectary (factor): `insectary`, `wild`
- location (factor): sampling location (`IKIAM`, `Vilcabamba`), may be NA
- latitude, longitude (numeric): coordinates in decimal degrees, may be NA
- elevation (character): elevation label / notes, may be NA
- observer (character): measurer identity

Body size:
- tibia_length (numeric): tibia length (units as measured in the study, typically mm)
- abdomen_length (numeric): abdomen length (same unit convention)
- total_body_length (numeric): total body length (same unit convention; may be NA)
- inter_eye_width (numeric): inter-eye width (same unit convention; may be NA)

Eye area and facet counts:
- L_area, R_area (numeric): left and right eye area (units as measured; typically mm²)
- L_whole_count, R_whole_count (numeric/integer): facet (ommatidia) counts per eye

Image bookkeeping and notes:
- wing_image (logical): whether a wing image exists/was used
- box_and_location (character): storage / box reference
- notes (character): notes on reanalysis / issues
- clade (factor): genetic clade label (e.g., `EAST_1`), may be NA
- gps_location (character): optional GPS location field, may be NA
- gps_latitude, gps_longitude (numeric): optional GPS coordinates, may be NA
- wing_id (character): wing image identifier (may be NA)


### himera_lrt_table.csv

Likelihood-ratio test (LRT) results from mixed-effects models of neuropil volumes.

Variables:
- Neuropil (character): neuropil name (e.g., `AL`)
- Predictor (character): model term tested (e.g., `Location`, `sex`, `rCB`, `SexLocation`)
- LogLik_reduced (numeric): log-likelihood of reduced model
- LogLik_full (numeric): log-likelihood of full model
- CHISQ (numeric): likelihood-ratio chi-square statistic
- p_value (numeric): p-value for the LRT
- Adjusted_p_value (numeric): multiple-testing adjusted p-value
- Significance (character): significance code used in the manuscript tables


### smatr_table_himera.csv

Summary of SMATR outputs for sex differences in scaling relationships.

Variables:
- Neuropil (character): neuropil name

Slope tests:
- Slope_LR_statistic (numeric): likelihood-ratio statistic for slope difference
- Slope_P_value (numeric): slope test p-value
- Slope_FDR (numeric): FDR-adjusted slope p-value
- Slope_r (numeric): correlation coefficient used by SMATR
- Slope_DI (logical): direction indicator (if computed; NA if not used)

Elevation tests:
- Elevation_Wald_statistic (numeric): Wald statistic for elevation shift
- Elevation_P_value (numeric): elevation test p-value
- Elevation_FDR (numeric): FDR-adjusted elevation p-value
- Elevation_r (numeric): correlation coefficient
- Elevation_DI (character): direction indicator (e.g., `male`, `female`)

Major axis shift tests:
- Major_Axis_Wald_statistic (numeric)
- Major_Axis_P_value (numeric)
- Major_Axis_FDR (numeric)
- Major_Axis_r (numeric)
- Major_Axis_DI (character): direction indicator


### recaptures_logger_1h.csv

High-frequency logger data linked to capture / recapture events (time series of temperature and relative humidity).

Indexing:
- X (integer): row index
- ID (factor): butterfly identifier
- File_name (character): logger file identifier

Grouping:
- Species (ordered factor): species category (ordered levels in file)
- Sex (factor): `f`, `m`
- Type (factor): `reared`, `wild`
- Location (factor): site (e.g., `Balsas`, `Hybridzone`, etc.)
- Observer (factor): observer identity

Capture context:
- Date_time_capture (POSIXct): capture timestamp
- Recapture (ordered factor): capture status (e.g., `no`, `release`, etc.)
- Pollen (ordered factor): ordinal pollen load category (`no`, `small`, `large`), may be NA
- Pollen_binom (ordered factor): binary pollen category, may be NA
- Pollen_binary (numeric): numeric coding of pollen presence, may be NA
- Bodylength (numeric): body length (units as measured), may be NA
- Notes (character): notes

Logger readings:
- Logger (factor): logger identifier (e.g., `RH1`)
- Date_time (POSIXct): timestamp of logger observation
- Temperature (numeric): temperature (°C)
- RH (numeric): relative humidity (%)
- Date_Time_Back (POSIXct): timestamp of return/back-reading (if used)
- Date_read (POSIXct/Date): date read from logger

Coordinates:
- Latitude, Longitude (numeric): degrees
- geometry (logical): placeholder column (not used for analyses)


### reflectance_age.csv

Raw reflectance spectra used for age prediction model training and validation (long format).

Variables:
- wl (numeric): wavelength (nm)
- wing (character): wing region (e.g., `fw`)
- color (character): color patch label (e.g., `black`)
- species (character): species code (e.g., `ccw`)
- sex (character): `female`, `male`
- individual (character): individual identifier (e.g., `LMU_05951`)
- measurement (numeric): replicate measurement index
- reflectance (numeric): reflectance value (unitless; as recorded by the spectrometer workflow)
- source (character): dataset source (e.g., `Colombia`)
- Age_days (numeric): known age in days (when available)


### samples_metadata_vilcabamba.csv

Metadata table linking metabarcoding samples to individuals and associated traits.

Identifiers:
- Org_ID (character): original individual identifier
- LMU_ID (character): LMU individual identifier
- ABD_ID (character): abdomen/sample identifier
- Insectary_ID (character): insectary/cage identifier

Sample and biology:
- Type (character): sampled tissue (e.g., `Proboscis`; other values possible depending on processing)
- Species (character): species label (e.g., `himera`)
- Sex (character): `f`, `m`
- Origin (character): origin category (e.g., `wild`)

Timing and place:
- Recapture_date (character): recapture date string (YYYY_MM_DD)
- Recapture_coordinates (numeric): coordinates field as stored in this file

Derived age fields:
- Predicted_age (numeric): predicted age in days from reflectance model
- Relative_age (numeric): scaled relative age (unitless, 0–1)
- Group (character): combined grouping label used in analyses (e.g., `f_Proboscis`)
- Age_group (factor): categorical age class (`Young`, `Middle`, `Old`)

Other processing/bookkeeping fields:
Several columns are placeholders or processing flags and may contain NA values depending on workflow stage:
- Location, Date_sampled, DNA, Wings, Brain_fixed, Phero_control, Phero_andro, RNA_antennae,
  Pollen_proboscis, Abdomen_sequencing, Proboscis_Sequencing, Logger_nr, Observer, Notes, Age_days


### Vilcabamba_samples_age.csv

Known-age and predicted-age table for Vilcabamba individuals used for age validation and downstream models.

Variables (selected):
- LMU_ID (character): individual identifier
- Insectary_ID (character): insectary/cage identifier
- Species (character): species code (e.g., `him`)
- Sex (character): `f`, `m`
- Age_days (numeric): known age in days (may be NA)
- Predicted_age (numeric): predicted age in days
- Relative_age (numeric): scaled relative age (unitless, 0–1)
- Brood (character): brood identifier
- Date_eclosed (Date): eclosion date
- Release_date, Recapture_date (character): date strings (YYYY_MM_DD)
- Release_coordinates, Recapture_coordinates (character): coordinate strings
- Location (character): site label
- Date_sampled (Date): sampling date
- DNA, Wings, Brain_fixed, RNA_antennae, Pollen_proboscis (character): Y/N processing flags
- Logger_nr (character): logger identifier
- Eyes_intact (character): Y/N field
- Observer (character): observer identity
- Location_body (character): where body was processed/stored
- Notes (character): free text


### Vilcambamba_recapture_data.csv

Capture–recapture dataset including pollen categories and associated environmental / logistical fields.

Variables:
- ID (factor): butterfly identifier
- Species (factor): species category
- Sex (factor): sex (`f`, `m`, `unknown`, or empty)
- Type (factor): `reared`, `wild`
- Date_time_capture (POSIXct): capture timestamp (may be NA for some rows)
- Pollen (ordered factor): ordinal pollen load category (`no`, `small`, `large`)
- Pollen_binom (ordered factor): binary pollen category (`yes`, `no`)
- Pollen_binary (numeric): numeric coding of pollen presence (0/1)
- Recapture (ordered factor): capture status
- Bodylength (numeric): body length (units as measured)
- Logger (factor): associated logger identifier (if applicable)
- Date_Time_Back (POSIXct): return/back time (if applicable)
- Location (factor): site label
- Latitude, Longitude (numeric): degrees
- geometry (logical): placeholder column (not used for analyses)
- Observer (factor): observer identity
- Notes (character): notes


## Reproducibility notes

- Analyses were performed in R (see “R Environment” section below).
- Scripts are designed to run using relative paths when the data files are located in the same directory as the scripts.
- If you adapt the workflow, set your working directory to the repository folder and avoid absolute paths.


## R Environment (for reproducibility)

Analyses were conducted in R on macOS with the following environment:

- R version: 4.5.1 (2025-06-13)
- Platform: aarch64-apple-darwin20
- OS: macOS Tahoe 26.2
- Time zone: Europe/Berlin

Attached packages (version numbers) include:
viridis (0.6.5); viridisLite (0.4.2); sp (2.2-0); sf (1.0-21); scales (1.4.0);
rnaturalearth (1.1.0); RColorBrewer (1.1-3); patchwork (1.3.2); ordinal (2023.12-4.1);
nnet (7.3-20); leaflet (2.2.3); lattice (0.22-7); htmlwidgets (1.6.4);
glmmTMB (1.1.12); ggmap (4.0.2); geosphere (1.5-20); DHARMa (0.4.7);
ggsci (3.2.0); speedyseq (0.5.3.9021); reshape2 (1.4.4); bipartite (2.22);
vegan (2.7-2); permute (0.9-8); sna (2.8); network (1.19.0);
statnet.common (4.12.0); phyloseq (1.41.1); randomForest (4.7-1.2); mgcv (1.9-3);
nlme (3.1-168); future (1.67.0); ggpmisc (0.6.2); ggpp (0.5.9); zoo (1.8-14);
ggspectra (0.3.16); photobiology (0.13.2); SunCalcMeeus (0.1.2); pavo (2.9.0);
janitor (2.2.1); broom (1.0.10); writexl (1.5.4); multcompView (0.1-10);
emmeans (1.11.2-8); pbkrtest (0.5.5); MASS (7.3-65); lme4 (1.1-37);
Matrix (1.7-4); gridExtra (2.3); effects (4.2-4); car (3.1-3); carData (3.0-5);
ggpubr (0.6.1); cowplot (1.2.0); smatr (3.4-8); readxl (1.4.5); lubridate (1.9.4);
forcats (1.0.0); stringr (1.6.0); dplyr (1.1.4); purrr (1.2.0); readr (2.1.5);
tidyr (1.3.1); tibble (3.3.0); ggplot2 (4.0.1); tidyverse (2.0.0)

(Full package namespace list is available via sessionInfo().)


## How to run the analyses

A typical workflow is:
1. Download and unzip the Zenodo record to a local folder.
2. Open R/RStudio and set your working directory to the folder containing the data and scripts.
3. Run scripts in the order relevant to each analysis component (brain, eye morphology, reflectance-age, metabarcoding, recapture).

Scripts are intended to use local, relative paths (i.e., they assume files are in the same folder). If you adapt the workflow, avoid absolute file paths.

 

## Contributors and contact

Creators:
- José Borrero, Department of Evolutionary Biology, LMU Munich
  ORCID: 0000-0003-0164-496X
  Email: jose.borrero@lmu.de

Contributors:
- David F. Rivas-Sánchez
- Daniel Shane Wright
- Stephen H. Montgomery
- Alexander Keller
- Richard M. Merrill

PI contact:
- Richard M. Merrill, Department of Evolutionary Biology, LMU Munich
  ORCID: 0000-0003-4527-9298
  Email: merrill@bio.lmu.de


## Funding

This research was funded by a European Research Council (ERC) Starter Grant (851040) to R.M.M.


## Dates and locations of data collection

- Field sampling and recapture datasets were collected during 2023 in southern Ecuador (Vilcabamba region) and associated sites used in the study.
- Laboratory measurements and analyses were conducted at LMU Munich.


## License

This Zenodo record is released under the license selected on Zenodo for the deposition.
If you reuse these data, please cite the Zenodo DOI above and the associated publication.

END OF README

Files

asv_vilcabamba_table.merge.txt

Files (65.3 MB)

Name Size Download all
md5:5f42075c4974bcac06c1666185d11137
623.4 kB Preview Download
md5:af346202394da7238411d2c7b3a6a9f3
272.7 kB Download
md5:170b033fad04ecaa74f11bc5e93af2bc
21.9 kB Preview Download
md5:c5e2746956feb5171eef26b151dde33b
23.6 kB Preview Download
md5:d4bcea2f5680b4662eaa81c997450375
11.6 kB Preview Download
md5:78cc12ab6c57927dd8a5c66a94085d84
2.8 kB Preview Download
md5:983882cd708221748d41d9b2a49d8bef
12.8 kB Download
md5:a6ccc6cf0384f48660fc8789f7434420
159.5 kB Download
md5:143f3041ef95cb93eaff7782e57c5235
14.2 kB Download
md5:ef3d1889d808cbfbc933daed6465f9b9
19.8 kB Preview Download
md5:876b2ccc3069db44085afb60691593b9
2.4 MB Preview Download
md5:50f1e1c143b5dd4a7340ad5c73d62ca0
61.5 MB Preview Download
md5:a3123bcc5f1a35ddd9f332f74015d31b
21.8 kB Download
md5:e40eedd99a7b21566b265c080903fb28
11.7 kB Preview Download
md5:402750431c76442bd0025a2d854b4e41
18.5 kB Download
md5:9c32057225a81b28e235065fd6474e8f
3.0 kB Preview Download
md5:db818596f53bf56c0dde08716877cff1
15.2 kB Download
md5:d764b87a41f7f6f95e53ef8267813f79
63.0 kB Preview Download
md5:e82d9fa564e8fef3d5b31a382d7fe5d9
123.8 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2025.09.18.676771 (DOI)

Funding

European Commission
SpeciationBehaviour - The genetic and neural basis of reproductive isolation 851040

Software

Programming language
R