Published July 28, 2025 | Version v1
Dataset Open

Dataset and Code for PIDE Binary Classification Model

Creators

Description

  • phage.fasta: 263,843 phage protein sequences (from UniProt).

  • bacteria.fasta: 263,843 bacterial protein sequences (from UniProt).

  • sequence_*.txt: Embedding files for train/val/test sets (precomputed from input sequences).

  • label_*.txt: Corresponding binary labels (0: bacteria, 1: phage).

  • *.py files: Training and evaluation scripts (input: embedding files).

Files

label_test.txt

Files (1.3 GB)

Name Size Download all
md5:7458115aa1324f763a383f64cc09a74a
100.4 MB Download
md5:32c1abb39149e51d4f467b5a52227ba1
4.1 kB Download
md5:e7d5064347046215aada65614d05afbd
4.3 kB Download
md5:648021233ea4eb0a8740d70fe306308e
158.3 kB Preview Download
md5:3d20f31053d19db6b3f270ed01feda37
738.8 kB Preview Download
md5:9a6f6be411e6b6e54f47773ffe8615ae
158.3 kB Preview Download
md5:7c1e19dabce54ea915d6f85a54a99aa5
78.3 MB Download
md5:36720f0a627e7373ce7f3cd424291e0e
171.5 MB Preview Download
md5:fe9eb180afd42c0835deee2d8a93645b
800.2 MB Preview Download
md5:75456564e793d18d1f9260ff1bccaa24
171.5 MB Preview Download