Supplementary Data for Coale et al. 2025

Coale, Tyler

doi:10.5281/zenodo.18718641

Published December 8, 2025 | Version v3

Dataset Open

Supplementary Data for Coale et al. 2025

Coale, Tyler (Researcher)¹

1. University of California, Santa Cruz

Supplementary Data 1-16 for Coale et al. 2025.

01: P. calceolata physiology data from Fe/light co-limitation experiment including cell concentraions, chlorophyll a, C, N, Fe, Cu and protein cellular contents.

02: P. calceolata transcriptomic data. Counts and CPM of P. calceolata transcripts from Fe/light co-limitation experiment.

03: Annotations and abbreviations for P. calceolata gene models.

04: Fasta file of P. calceolata gene model coding sequences.

05: Fasta file of P. calceolata gene model amino acid sequences.

06: edgeR comparisons made using transcriptomic data.

07: Gene membership and GO enrichment in WGCNA modules.

08: Proteomics intensities (normalized and imputed).

09: Proteomics DE analysis via limma.

10: Results of Fe x Light interaction test - physiological parameters.

11: Results of Fe x Light interaction test - transcriptomics.

12: Results of Fe x Light interaction test - proteomics.

13: P. calceolata dyneins - class, Fe sensitivity, and protein sequence.

14: Environnmental data from the NCOG project.

15: Transcriptomics response types determine with edgeR.

16: Results of imputation analysis of proteomics data.

Source data for Figures: SourceData.xlsx

Code

01: R script for edgeR analysis of transcriptomes.

02: R script for limma analysis of proteomes.

README for code:

README – RNA-seq and Proteomics Differential Expression Pipelines
================================================================

- Code_01_Transcriptomics_edgeR.R reads `SupplementaryData_02_transcriptomics.xlsx`.
- Code_02_Proteomics_limma.R reads `SupplementaryData_08_proteomics.csv`.

----------------------------------------------------------------
1. Contents
----------------------------------------------------------------

This Zenodo deposit (10.5281/zenodo.17859961) includes:

- Scripts
- Code_01_Transcriptomics_edgeR.R - RNA-seq differential expression and Fe × Light interaction (edgeR)
- Code_02_Proteomics_limma.R - Proteomics differential abundance analyses (limma)

----------------------------------------------------------------
2. Software Requirements
----------------------------------------------------------------

- R 4.3.x
- CRAN packages
- edgeR ≥ 3.42
- limma ≥ 3.56
- readxl ≥ 1.4
- readr ≥ 2.1
- dplyr ≥ 1.1
- tidyr ≥ 1.3
- stringr ≥ 1.5
- matrixStats ≥ 1.3
- ggplot2 ≥ 3.5
- patchwork ≥ 1.2
- BiocManager ≥ 1.30

----------------------------------------------------------------
3. RNA-seq Differential Expression (edgeR)
----------------------------------------------------------------

3.1 Input
--------

- File: `SupplementaryData_02_transcriptomics.xlsx`
- Sheet: `mapped_read_counts`
- Column 1: gene IDs
- Remaining columns: raw integer read counts for each RNA-seq sample.

The script:

1. Reads the Excel sheet using `readxl::read_excel`.
2. Converts all count columns to numeric.
3. Uses a manually defined `group` vector that encodes each sample as a treatment combination (e.g., `HLpDay`, `LLmNight`, `HLd00`, `HLr03`, etc.), corresponding to Light (HL/LL), Fe status (p/m/d/r), and time of day (Day/Night/diel times).

3.2 edgeR GLM contrasts
-----------------------

The first part of the script:

1. Constructs a `DGEList` and performs TMM normalization.
2. Fits a negative binomial GLM with design `~ 0 + group`.
3. Defines a panel of pairwise contrasts using `makeContrasts`, including:
- HL vs LL within a given Fe and time (e.g., `HLpDayvsLLpDay`, `HLmNightvsLLmNight`)
- pFe vs −Fe within HL or LL (e.g., `HLpDayvsHLmDay`, `LLpNightvsLLmNight`)
- Day vs Night within Fe × Light (e.g., `HLmDayvsHLmNight`, `LLpDayvsLLpNight`)
4. For each contrast, runs `glmLRT`, extracts the full ranked gene table with:
- log2 fold change, logCPM, LR, PValue, FDR
5. Combines all contrasts into a single data frame and writes:

Output:

- `all_contrasts_with_FDR.csv`
- A wide table where each contrast contributes five columns (`logFC_*`, `logCPM_*`, `LR_*`, `PValue_*`, `FDR_*`). Each row corresponds to one gene ID.

3.3 Fe × Light interaction (RNA-seq)
------------------------------------

The second part of the script focuses on the Fe × Light interaction, restricting to +Fe (p) and −Fe (m) samples:

1. Builds a `meta` data frame from column names and the `group` vector, with:
- `Light` (LL / HL)
- `Fe` (p / m / d / r)
- `TimeOfDay` (Day / Night; coded times like 00, 23, 03 are treated as Day when not explicitly labeled).
2. Filters to +Fe vs −Fe only (Fe ∈ {p, m}) and drops resupply (r) and DFOB (d) conditions.
3. Runs an edgeR quasi-likelihood pipeline with design:

design_int <- model.matrix(~ Fe * Light + TimeOfDay, data = meta_int)

which includes main effects for Fe and Light and their interaction term, plus TimeOfDay as a covariate.
4. Identifies the Fe:Light interaction coefficient and runs `glmQLFTest` on that term.
5. Outputs a full ranked table of genes with interaction statistics:

Outputs (in directory `interaction_rna/`):

- `rna_FeXLight_interaction_edgeR.csv`
- Columns: `gene_id`, `logFC`, `logCPM`, `F`, `PValue`, `FDR`
- `logFC` is the estimated effect of the Fe × Light interaction on expression.
- `rna_FeXLight_interaction_summary.csv`
- Summary counts of genes tested and the number passing FDR < 0.05 and FDR < 0.10.

----------------------------------------------------------------
4. Proteomics Differential Abundance (limma)
----------------------------------------------------------------

4.1 Input
--------

- File: `SupplementaryData_08_proteomics.csv`
- Contains normalized, imputed protein intensities.
- The script:
- Skips lines 2–3 using `read_csv_drop_lines()` (to remove extra annotation rows).
- Expects one column named exactly `id` for protein IDs.
- Treats all remaining columns that match the pattern `^(HL|LL)([pmr])(00|11|23)([A-Z])$` as sample columns, where:
- `HL` / `LL` = light treatment
- `p` / `m` / `r` = +Fe, −Fe, or resupply
- `00`, `11`, `23` = sampling time codes
- `A`, `B`, `C` = biological replicates

4.2 Preprocessing
-----------------

1. Builds a sample metadata (`md`) with:
- `Light` ∈ {LL, HL}
- `Fe` ∈ {p, m, r}
- `Tcode` ∈ {11, 00, 23}
- `cond` = interaction of `Light.Fe.Tcode` (e.g., `HL.m.23`).
2. Extracts the expression matrix `X`, converts to numeric, and sets rownames to protein IDs.
3. Applies log2 transform with a pseudocount (offset = 1), then `normalizeBetweenArrays` (quantile normalization).
4. Removes proteins with zero variance across samples using `rowSds`.

4.3 Design and contrasts
------------------------

1. Builds a design matrix with one coefficient per observed `cond`:

design <- model.matrix(~ 0 + cond, data = md)

2. Fits a linear model with `lmFit` and `eBayes` (trend = TRUE, robust = TRUE).
3. Programmatically constructs a series of numeric contrast vectors (stored in list `C`), including:

- Fe effects within light at “day” sampling:
- (−Fe “day”) − (+Fe “day”) for each light level (LL, HL), pooling Day00/Day23 where appropriate.
- Resupply vs +Fe within light at day:
- Resupply “day” vs +Fe “day” for LL and HL.
- Day vs Night within each Fe × Light combination:
- (Fe “day”) − (Fe “night”) for p, m, r in LL and HL.
- HL vs LL within a given Fe at day:
- (HL, Fe, day) − (LL, Fe, day) against their appropriate Day timepoints.

All contrasts are built only if the required timepoints/conditions exist in the design (the code skips missing combinations safely).

4.4 Outputs
-----------

All results are written to the folder:

- `limma_results/`

Key files:

- Per-contrast differential abundance tables
- For each contrast name (e.g., `HL_Day_r_vs_p`, `LL_Day_m_vs_p`, `HLvsLL_p_Day`), the script writes:

- `limma_results/<contrast>.csv`
- Columns:
- `ProteinID`
- `logFC`
- `AveExpr`
- `t`
- `P.Value`
- `adj.P.Val`
- `B`

- Summaries
- `limma_results/summary_counts.csv`
- Basic counts of significant proteins per contrast.
- `limma_results/summary_counts_with_pct.csv`
- Adds `n_tested`, `pct_FDR_5`, and counts of up- and down-regulated proteins at FDR ≤ 0.05.

- Design and contrast metadata
- `limma_results/design_columns.csv`
- Lists the design matrix coefficients.
- `limma_results/<contrast>_weights.csv` for each contrast
- Records the non-zero coefficient weights used to build that contrast.

----------------------------------------------------------------
5. Running the Scripts
----------------------------------------------------------------

From R (or RStudio), set the working directory to the folder containing the scripts and supplemental data, for example:

setwd("path/to/unzipped_zenodo_archive")

Then:

- RNA-seq differential expression + Fe × Light interaction

source("path_to_RNAseq_script.R")

- Proteomics limma analyses

source("path_to_proteomics_script.R")

(Replace `path_to_RNAseq_script.R` and `path_to_proteomics_script.R` with the actual filenames in this archive.)

Provided the paths at the top of each script (`infile`, `outdir`, and any hard-coded paths) point to the included supplementary data files, the scripts will reproduce the RNA and protein differential expression results used in the manuscript and supplementary figures.

Files

SupplementaryData_01_physiology.csv

Files (77.5 MB)

Name	Size	Download all
Code_01_Transcriptomics_edgeR.R md5:9b448399b0882e48ac1b3543b7c99419	6.1 kB	Download
Code_02_Proteomics_limma.R md5:3009093e9f4767bd9621b48d8e2cb063	11.3 kB	Download
SourceData.xlsx md5:de33b16e94543709de77e0ec0f47705d	4.0 MB	Download
SupplementaryData_01_physiology.csv md5:e60e517ecb84c4b349cbed65d422685e	10.1 kB	Preview Download
SupplementaryData_02_transcriptomics.xlsx md5:4490c2de9abb3aef6f821ea72101fed6	13.1 MB	Download
SupplementaryData_03_annotations.csv md5:bec35c4eedbff45cc08901c5f294ad9d	12.4 MB	Preview Download
SupplementaryData_04_gene_models_cds.fasta md5:5468aab063a1d9179a980bf26569e370	21.0 MB	Download
SupplementaryData_05_gene_models_pep.fasta md5:189a39fc796e1945f62d15d3506ffd0e	7.6 MB	Download
SupplementaryData_06_transcriptome_edgeR.csv md5:f37460d1c962cab793b1e4544bb64daf	9.7 MB	Preview Download
SupplementaryData_07_WGCNA_modules.xlsx md5:da72a51dbf2b081198458682282a19d9	3.8 MB	Download
SupplementaryData_08_proteomics.csv md5:719358a9be91e4257dc5443b413e2133	1.3 MB	Preview Download
SupplementaryData_09_proteomics_limma.csv md5:b4a0e71440d205add3c2162309b2e07e	2.4 MB	Preview Download
SupplementaryData_10_physiology_interaction.csv md5:e80eff4479e85c5a4b9438941453dca3	3.8 kB	Preview Download
SupplementaryData_11_RNA_interaction.csv md5:28649509d08a84df3ab6d036054692d0	1.1 MB	Preview Download
SupplementaryData_12_proteomics_interaction.csv md5:b0f4045545604b2526f4dfa5455735fd	336.6 kB	Preview Download
SupplementaryData_13_Pcalceolata_dyneins.csv md5:f8f4b12b7c4db2a2be7ecbe52f9750ce	84.0 kB	Preview Download
SupplementaryData_14_NCOG.csv md5:ad2bc918bb508621567983a491bf80ef	148.0 kB	Preview Download
SupplementaryData_15_response_types.xlsx md5:88021bfed5dee6706c17600b4f5df51a	308.1 kB	Download
SupplementaryData_16_imputation.csv md5:c7946d746d7eacedd408a80e1a752009	137.5 kB	Preview Download

	All versions	This version
Views	118	66
Downloads	977	613
Data volume	4.4 GB	2.5 GB

Supplementary Data for Coale et al. 2025

Authors/Creators

Description

Files

SupplementaryData_01_physiology.csv

Files (77.5 MB)