Published November 19, 2025 | Version v2
Data paper Open

PathoFact 2.0 Datasets

  • 1. ROR icon University of Luxembourg

Description

The PathoFact2_datasets.tar.gz archive contains datasets used for training, validation, and benchmarking of PathoFact 2.0. The dataset is organised into the following folders:

1. BenchMarking/

Contains test datasets used to benchmark the performance of PathoFact 2.0 against other prediction tools.

  • Toxin_module/
    • 1000_TEST_non-toxin_ToxinPred2BenchMarking.faa

Non-toxin test dataset used when benchmarking PathoFact 2.0 against ToxinPred2.

    • 1000_TEST_toxin_ToxinPred2BenchMarking.faa

Toxin test dataset used when benchmarking PathoFact 2.0 against ToxinPred2.

  • VF_module/
    • VF_Test_dataset_VirulentHunterBenchmarking.faa

Virulence factor test dataset used when benchmarking PathoFact 2.0 against VirulentHunter.

    • non-VF_Test_dataset_VirulentHunterBenchmarking.faa

Non-virulence-factor test dataset used for benchmarking PathoFact 2.0 against VirulentHunter.

 

2. Toxin_module/

Contains datasets used for training and validating the PathoFact 2.0 Toxin prediction module.

  • Toxin-related.faa
  • non-toxin.faa
  • splits/ — datasets divided into 80% training and 20% test sets:
    • TRAIN_Positive_TOX.faa
    • TRAIN_Negative_TOX.faa
    • Test_Positive_TOX.faa
    • Test_Negative_TOX.faa

 

3. VF_module/

Contains datasets used for training and validating the PathoFact 2.0 Virulence Factor prediction module.

  • VF_dataset.faa
  • non-VF.faa
  • splits/ — datasets divided into 80% training and 20% test sets:
    • TRAIN_Positive_VF.faa
    • TRAIN_Negative_VF.faa
    • Test_Positive_VF.faa
    • Test_Negative_VF.faa

 

The DOME-ML_PathoFact2.json is the DOME, the community standard for transparent machine learning json file created for PathoFact 2.0. 

Files

DOME-ML_PathoFact2.json

Files (220.2 MB)

Name Size Download all
md5:e97f325d96c1c42742605b81b528f9b8
10.0 kB Preview Download
md5:cf5ad7e8a9c89d58f507cb5c32e46024
220.2 MB Download

Additional details

Dates

Created
2025-11-12