github.com/PlasmoGenEpi/plasmodiumdrugres-wdl/plasmodiumdrugres

PlasmoGenEpi

doi:10.5281/zenodo.20709638

Published June 16, 2026 | Version v0.1.0

Software Open

github.com/PlasmoGenEpi/plasmodiumdrugres-wdl/plasmodiumdrugres

PlasmoGenEpi

PlasmodiumDrugRes WDL interface (parity with Nextflow)

This document defines the user-facing inputs and outputs for the WDL implementation of the plasmodiumdrugres pipeline, and how these map to the current Nextflow pipeline in ~/Documents/git_projects/plasmodiumdrugres/.

Source of truth (Nextflow):

Parameter schema: ~/Documents/git_projects/plasmodiumdrugres/nextflow_schema.json
Defaults: ~/Documents/git_projects/plasmodiumdrugres/nextflow.config
Workflow wiring / branching: ~/Documents/git_projects/plasmodiumdrugres/workflows/plasmodiumdrugres.nf
Input validation and PMO/population-field normalization: ~/Documents/git_projects/plasmodiumdrugres/subworkflows/local/utils_nfcore_plasmodiumdrugres_pipeline/main.nf

Inputs

Required: choose exactly one input mode

Provide exactly one of:

pmo (File): PMO JSON file.
allele_table (File): TSV/CSV containing microhaplotypes. When using this mode, panel_info_bed is also required.

Required files

loci_of_interest_bed (File): BED of loci of interest (single-locus estimates are computed at these loci).
loci_groups (File): TSV/CSV defining multi-locus groups (multi-locus estimates are computed for these groups).

Required iff using `allele_table`

panel_info_bed (File): BED defining panel target coordinates.

Optional grouping / population splitting

You can run either:

Single population (default): no splitting is performed; results are labeled using population_label (default pop1).
Per-population: split input tables by population and compute outputs for each population.

Inputs controlling this:

population_assignment (File?): TSV/CSV mapping specimen_name → population.
pmo_population_fields (String?, default null): comma-separated list of PMO specimen metadata fields; used only when pmo is provided and population_assignment is not provided.
pmo_population_separator (String, default _): join string used when building the population label from pmo_population_fields.
population_label (String, default pop1): used only when no population assignment is available.

Branching rule (parity target):

has_population_assignment = (population_assignment is provided) OR (pmo is provided AND pmo_population_fields is provided)

Optional references (PMO mode only)

These are used when generating a panel BED from PMO and adding reference sequences to it:

targeted_reference (File?, default null): FASTA containing only the targets.
genome_reference (File?, default null): FASTA containing the full genome.

Behavior (parity target):

If both are provided, prefer targeted_reference (Nextflow warns and prefers targeted reference).

Method selection (defaults from Nextflow)

mlaf_method (String, default naive): one of naive, MLBM, FEM.
- naive_mlaf_method (String, default wsaf_prop): passed to the naive multi-locus method.
slaf_method (String, default naive): one of naive, IDM, mhaps_freq.
- naive_slaf_method (String, default read_count_prop): passed to the naive single-locus method.
- mhaps_freq uses DCIFER in the current Nextflow pipeline.

Optional tuning parameters (passed through to scripts)

translate_loci_extra_args (String, default "")
mlbm_wrapper_aa_specimen_occurence_cut_off (Int?, default null)
naive_multilocus_wsaf_cut_off (Float?, default null)
dcifer_slaf_wrapper_coi_lrank (Int?, default null)
dcifer_slaf_wrapper_qstart (Float?, default null)
dcifer_slaf_wrapper_tol (Float?, default null)

Output directory convention

To mimic Nextflow's outdir organization (even though Terra does not require explicit staging), the WDL workflow will write deliverables under:

outdir (String, default output)

Outputs

On Terra, the workflow exposes exactly seven outputs as String URIs from the staging step (same idea as mad4hatter-wdl move_outputs). When Cromwell sees a gs://fc-…/… path for the merged ml_summary, files are copied with gcloud to gs://fc-…/<outdir>/<timestamp>/ using their original basenames. On local Cromwell (paths like /Users/… or tests/input/…), the same task uses cp into <execution_dir>/<outdir>/<timestamp>/ and outputs absolute local paths instead.

Optional input workspace_bucket (String?): set to the workspace bucket id (e.g. fc-15e572f9-33a3-4a1e-8534-099df773bfbf, no gs:// prefix) if your backend localizes files before WDL evaluates paths and automatic gs://fc-… detection fails—then GCS staging is forced.

outdir must be alphanumeric plus _ or - only (validated at workflow start).

Workflow output names (each value is a gs://… path to the file):

ml_summary → ml_summary.tsv
sl_summary → sl_summary.tsv
sl_from_ml_summary → sl_from_ml_summary.tsv
amino_acid_calls → amino_acid_calls.tsv.gz
collapsed_amino_acid_calls → collapsed_amino_acid_calls.tsv.gz
loci_covered_by_target_samples_info → loci_covered_by_target_samples_info.tsv
loci_of_interest_for_target_for_microhap → loci_of_interest_for_target_for_microhap.tsv.gz

Per-population merge artifacts and intermediate translated_loci/ paths are still computed inside the run but are not listed as workflow outputs; use the staged URIs above for downloads and downstream tooling.

Files