Published June 16, 2026 | Version v0.1.0

github.com/PlasmoGenEpi/plasmodiumdrugres-wdl/plasmodiumdrugres

Authors/Creators

Description

PlasmodiumDrugRes WDL interface (parity with Nextflow)

This document defines the user-facing inputs and outputs for the WDL implementation of the plasmodiumdrugres pipeline, and how these map to the current Nextflow pipeline in ~/Documents/git_projects/plasmodiumdrugres/.

Source of truth (Nextflow):

  • Parameter schema: ~/Documents/git_projects/plasmodiumdrugres/nextflow_schema.json
  • Defaults: ~/Documents/git_projects/plasmodiumdrugres/nextflow.config
  • Workflow wiring / branching: ~/Documents/git_projects/plasmodiumdrugres/workflows/plasmodiumdrugres.nf
  • Input validation and PMO/population-field normalization: ~/Documents/git_projects/plasmodiumdrugres/subworkflows/local/utils_nfcore_plasmodiumdrugres_pipeline/main.nf

Inputs

Required: choose exactly one input mode

Provide exactly one of:

  • pmo (File): PMO JSON file.
  • allele_table (File): TSV/CSV containing microhaplotypes. When using this mode, panel_info_bed is also required.

Required files

  • loci_of_interest_bed (File): BED of loci of interest (single-locus estimates are computed at these loci).
  • loci_groups (File): TSV/CSV defining multi-locus groups (multi-locus estimates are computed for these groups).

Required iff using allele_table

  • panel_info_bed (File): BED defining panel target coordinates.

Optional grouping / population splitting

You can run either:

  • Single population (default): no splitting is performed; results are labeled using population_label (default pop1).
  • Per-population: split input tables by population and compute outputs for each population.

Inputs controlling this:

  • population_assignment (File?): TSV/CSV mapping specimen_namepopulation.
  • pmo_population_fields (String?, default null): comma-separated list of PMO specimen metadata fields; used only when pmo is provided and population_assignment is not provided.
  • pmo_population_separator (String, default _): join string used when building the population label from pmo_population_fields.
  • population_label (String, default pop1): used only when no population assignment is available.

Branching rule (parity target):

  • has_population_assignment = (population_assignment is provided) OR (pmo is provided AND pmo_population_fields is provided)

Optional references (PMO mode only)

These are used when generating a panel BED from PMO and adding reference sequences to it:

  • targeted_reference (File?, default null): FASTA containing only the targets.
  • genome_reference (File?, default null): FASTA containing the full genome.

Behavior (parity target):

  • If both are provided, prefer targeted_reference (Nextflow warns and prefers targeted reference).

Method selection (defaults from Nextflow)

  • mlaf_method (String, default naive): one of naive, MLBM, FEM.
    • naive_mlaf_method (String, default wsaf_prop): passed to the naive multi-locus method.
  • slaf_method (String, default naive): one of naive, IDM, mhaps_freq.
    • naive_slaf_method (String, default read_count_prop): passed to the naive single-locus method.
    • mhaps_freq uses DCIFER in the current Nextflow pipeline.

Optional tuning parameters (passed through to scripts)

  • translate_loci_extra_args (String, default "")
  • mlbm_wrapper_aa_specimen_occurence_cut_off (Int?, default null)
  • naive_multilocus_wsaf_cut_off (Float?, default null)
  • dcifer_slaf_wrapper_coi_lrank (Int?, default null)
  • dcifer_slaf_wrapper_qstart (Float?, default null)
  • dcifer_slaf_wrapper_tol (Float?, default null)

Output directory convention

To mimic Nextflow's outdir organization (even though Terra does not require explicit staging), the WDL workflow will write deliverables under:

  • outdir (String, default output)

Outputs

On Terra, the workflow exposes exactly seven outputs as String URIs from the staging step (same idea as mad4hatter-wdl move_outputs). When Cromwell sees a gs://fc-…/… path for the merged ml_summary, files are copied with gcloud to gs://fc-…/<outdir>/<timestamp>/ using their original basenames. On local Cromwell (paths like /Users/… or tests/input/…), the same task uses cp into <execution_dir>/<outdir>/<timestamp>/ and outputs absolute local paths instead.

Optional input workspace_bucket (String?): set to the workspace bucket id (e.g. fc-15e572f9-33a3-4a1e-8534-099df773bfbf, no gs:// prefix) if your backend localizes files before WDL evaluates paths and automatic gs://fc-… detection fails—then GCS staging is forced.

outdir must be alphanumeric plus _ or - only (validated at workflow start).

Workflow output names (each value is a gs://… path to the file):

  • ml_summaryml_summary.tsv
  • sl_summarysl_summary.tsv
  • sl_from_ml_summarysl_from_ml_summary.tsv
  • amino_acid_callsamino_acid_calls.tsv.gz
  • collapsed_amino_acid_callscollapsed_amino_acid_calls.tsv.gz
  • loci_covered_by_target_samples_infoloci_covered_by_target_samples_info.tsv
  • loci_of_interest_for_target_for_microhaploci_of_interest_for_target_for_microhap.tsv.gz

Per-population merge artifacts and intermediate translated_loci/ paths are still computed inside the run but are not listed as workflow outputs; use the staged URIs above for downloads and downstream tooling.

Files

github.com-PlasmoGenEpi-plasmodiumdrugres-wdl-plasmodiumdrugres_v0.1.0.zip

Additional details