Haplotype-Resolved Chromosome-scale Assembly of the Bighead Catfish (Clarias macrocephalus) Genome
Authors/Creators
Description
Haplotype-Resolved Chromosome-Scale Genome Assembly of the Thai Bighead Catfish (Clarias macrocephalus)
This study presents the first high-quality, chromosome-scale, haplotype-resolved genome assembly of the Bighead catfish (Clarias macrocephalus), a freshwater species native to Thailand and the Mekong River basin. As a species of economic and ecological importance, C. macrocephalus plays a key role in Southeast Asian aquaculture and conservation efforts.
The assembly was generated using a combination of third-generation sequencing technologies, including PacBio HiFi, Oxford Nanopore (ONT), Hi-C, and Illumina paired-end sequencing. The resulting haplotype-resolved diploid genome spans 880 Mb across 27 pseudo-chromosomes, exhibiting high contiguity (N50 = 35.4 Mb), completeness (BUSCO = 95.5%, K-mers-Merqury-k21 = 96,6%), and base-level accuracy (QV50, corresponding to 99.999% correctness). The genome was manually curated and scaffolded using Hi-C chromatin conformation capture data, providing a comprehensive reference for future research.
This assembly fills a critical gap in genomic resources for the Clarias genus, offering valuable insights into structural variations, genetic diversity, and the effects of selective breeding of C. macrocephalus. The dataset supports applications in comparative genomics, conservation, aquaculture breeding programs, and pan-genome graph construction. Furthermore, it enables research into adaptive traits, such as the species’ benthic lifestyle and facultative air-breathing capability, which allow survival in low-oxygen environments.
Aligned with the United Nations’ Sustainable Development Goal (SDG) 2 (Zero Hunger), this genomic resource contributes to sustainable aquaculture and biodiversity conservation. All sequencing data, genome assemblies, and computational workflows are publicly available under NCBI BioProject number PRJNA1121957, supporting further research in fish genomics, hybridization studies, and genome evolution. All datasets and computational workflows are openly accessible to support further research in fish genomics and hybrid genome analysis.
📂 Data Records
🐟 Genome Assembly of Thai Bighead Catfish (isolate: CMAM) – Bighead catfish (TaxID: 35657)
📜 Raw Sequenced Reads (NCBI SRA)
🔬 Nanopore (20% err.): 🔗 SRR29723575 (SRR29723575)
🧪 HiFi: 🔗 SRR29723576 (SRR29723576)
🖥️ Illumina 150PE: 🔗 SRR29723578 (SRR29723578)
🧲 Hi-C 150PE: 🔗 SRR29723577 (SRR29723577)
🗂️ The assembled genome, deposited as a whole-genome sequence (WGS) diploid assembly.
🐠 Haplotype 1 | 🐟 Haplotype 2.
🧬 GenBank accession numbers: 🔗 JBLWMO000000000 (JBLWMO000000000) | 🔗 JBLWMP000000000 (JBLWMP000000000).
DATA DESCRIPTION (Final Assemblies (usable):
| Step | Description | Tool | Library Type | Assembly | File Name (Output) | File Name Suffix (Output) |
|---|
| FINAL AND LATEST (NCBI-Submitted) |
🐠 Haplotype 1 |
Hifiasm + GreenHill + JBAT+TGS-GapCloser + Polishing + Manual Curation | HiC + UL + HiFi + PE150 | Fully phased manually reviewed haplotype 2 |
(NCBI name: Bighead_catfish_fClaMac_hap1_MT.fasta) |
🔗 JBLWMO000000000 (JBLWMO000000000) |
| FINAL AND LATEST (NCBI-Submitted) | 🐟 Haplotype 2 | Hifiasm + GreenHill + JBAT+TGS-GapCloser + Polishing + Manual Curation | HiC + UL + HiFi+ PE150 | Fully phased manually reviewed haplotype 2 |
(NCBI name: Bighead_catfish_fClaMac_hap2.fasta) |
🔗 JBLWMP000000000 (JBLWMP000000000) |
| FINAL AND LATEST | 🐠🐟Collapsed Assembly (Mixed) | Flye | HiFi | Collapsed diploid assembly | CMAM_FLYE_assembly.fasta |
.assembly.fa |
📌 Data records are hosted under NCBI BioProject number: 🔗 PRJNA1132508 (WGS), PRJNA1159889 (Hap1), PRJNA1159890 (Hap2)
📌 Bighead Catfish BioSample accession number: 🔗 SAMN42347118 (SAMN42347118)
Other assemblies (Intermediate Files):
| Step | Description | Tool | Library Type | Assembly | File Name (Output) | File Name Suffix (Output) |
|---|
| Primary Initial Assemblies |
| 1 | Haplotype 1 | Hifiasm | HiC + UL + HiFi | Fully phased haplotype 1 | CMA.asm.hic.hap1.p_ctg.fa |
.hic.hap1.p_ctg.fa |
| 1 | Haplotype 2 | Hifiasm | HiC + UL + HiFi | Fully phased haplotype 2 | CMA.asm.hic.hap2.p_ctg.fa |
.hic.hap2.p_ctg.fa |
| Scaffolding and Intermediate Assemblies (Hifiasm and GreenHill) |
| 1 | Scaffolds | Hifiasm | HiC + UL + HiFi | Primary scaffolding | CMA.asm.hic.p_ctg.fa |
.hic.p_ctg.fa |
| 1 | Scaffolds | Hifiasm | HiC + UL + HiFi | Processed unitigs | CMA.asm.hic.p_utg.fa |
.hic.p_utg.fa |
| 1 | Scaffolds | Hifiasm | HiC + UL + HiFi | Raw unitigs | CMA.asm.hic.r_utg.fa |
.hic.r_utg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiC + UL + HiFi L0 | Phased haplotype 1 | CMA_HIC_UL_l0.asm.hic.hap1.p_ctg.fa |
.hic.hap1.p_ctg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiC + UL + HiFi L0 | Phased haplotype 2 | CMA_HIC_UL_l0.asm.hic.hap2.p_ctg.fa |
.hic.hap2.p_ctg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiC + UL + HiFi L0 | Primary contigs | CMA_HIC_UL_l0.asm.hic.p_ctg.fa |
.hic.p_ctg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiC + UL + HiFi L0 | Alternate contigs | CMA_HIC_UL_l0.asm.hic.a_ctg.fa |
.hic.a_ctg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiC + UL + HiFi L0 | Raw unitigs | CMA_HIC_UL_l0.asm.hic.r_utg.fa |
.hic.r_utg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiC + UL + HiFi L0 | Processed unitigs | CMA_HIC_UL_l0.asm.hic.p_utg.fa |
.hic.p_utg.fa |
| 1 | Scaffolds | Hifiasm | HiFi + UL | Primary contigs | CMA_HIFI.asm.p_ctg.fa |
.p_ctg.fa |
| 1 | Scaffolds | Hifiasm | HiFi + UL | Alternate contigs | CMA_HIFI.asm.a_ctg.fa |
.a_ctg.fa |
| 1 | Scaffolds | Hifiasm | HiFi + UL | Raw unitigs | CMA_HIFI.asm.r_utg.fa |
.r_utg.fa |
| 1 | Scaffolds | Hifiasm | HiFi + UL | Processed unitigs | CMA_HIFI.asm.p_utg.fa |
.p_utg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiFi | Primary contigs L0 | CMA_HIFI_l0.asm.p_ctg.fa |
.p_ctg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiFi | Alternate contigs L0 | CMA_HIFI_l0.asm.a_ctg.fa |
.a_ctg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiFi | Raw unitigs | CMA_HIFI_l0.asm.r_utg.fa |
.r_utg.fa |
| 1 | Scaffolds L0 | Hifiasm | HiFi | Polished unitigs containing Hap1 and Hap2 | CMA_HIFI_l0.asm.p_utg.fa |
.p_utg.fa |
| 2 | Scaffolds | GreenHill | Hap1 | Hifiasm hap1 phased & scaffolded with GreenHill | 02-CMA_HAP1.greenhill.fa |
NA |
| 2 | Scaffolds | GreenHill | Hap2 | Hifiasm hap2 phased & scaffolded with GreenHill | 02-CMA_HAP2.greenhill.fa |
NA |
| Failed Assemblies (Wtdbg2 - Not Used) |
| 1 | Assembly 1 | Wtdbg2 (failed low QV) | HiFi raw | Consensus contigs | CM_M_dbg.hifi.raw.fa |
.raw.fa |
| 1 | Assembly 1 | Wtdbg2 (failed low QV) | HiFi ONT raw | Consensus contigs | CM_M_dbg.cb.raw.fa |
.raw.fa |
| 1 | Assembly 1 | Wtdbg2 (failed low QV) | HiFi cns | Consensus contigs | CM_M_dbg.hifi_cns.fa |
.cns.fa |
| 1 | Assembly 1 | Wtdbg2 (failed low QV) | HiFi ONT cns | Consensus contigs | CM_M_dbg.cb_cns.fa |
.cns.fa |
| 1 | Consensus Assembly | Wtdbg2 (failed low QV) | HiFi | Polished consensus | CM_M_dbg.hifi.srp.fa |
.srp.fa |
| 1 | Consensus Assembly | Wtdbg2 (failed low QV) | HiFi ONT | Polished consensus | CM_M_dbg.cb.srp.fa |
.srp.fa |
Technical validation (To be done.):
| Step | Description | Tool | Library Type | Assembly | File Name (Output) | File Name Suffix (Output) |
|---|
| . |
Knowledge Dissemination:
| Object | Description | Link / File |
| Manuscript | Presentation and Interpretation of Results. (Version 1.0). | Bighead_catfish_C_macrocephalus_MS_draft_ver_1.pdf |
| Figure 1 | Sequencing Data Summary for C. macrocephalus Genome Experiment. | Figure_1_SEQUENCING_READS_AND_GENOMESCOPE2.0.png |
| Figure 2 | Comprehensive Haplotype-Resolved Genome Assembly and Scaffolding Workflow. | Figure_2_GENOME_ASSEMBLY_WORKFLOW.png |
| Figure 3 | Hi-C Contact Matrix Heat Maps of Individual Pseudo-chromosome in Haplotype 1. | Figure_3_SEPARATE_HIC_MAPS_HAPLOTYPE_1_all.pdf |
| Figure 4 | Hi-C Contact Matrix Heat Maps of Individual Pseudo-chromosome in Haplotype 2. | Figure_4_SEPARATE_HIC_MAPS_HAPLOTYPE_2_all.pdf |
| Figure 5 | Hi-C map of Hi-C Scaffolds - Bighead Catfish. | Figure_5_GENOME_WIDE_HIC_MAPS_HAPLOTYPE_1_AND_2.png |
| Figure 6 | Assembly Status Displaying Gaps and Telomeres, January 2024 - November 2025. | Figure_6_BIGHEAD_CATFISH_MANUAL_CURATION_PROGRESS_HAP1_HAP2.png |
| Figure 7 | Visual Genome Quality, Merqury Spectra and BUSCO Scores. | |
| Figure 8 | Synteny Analysis of Linkage Groups for Various Catfish Assemblies. | Figure_8_SYNTENIC_RELATIONSHIPS_TILAPIA_BIGHEAD_ZEBRAFISH.png |
| Table 1 | Summary Statistics of the Genome Assembly and Transposable Element Content. | Table_1_GENOME_SURVEY_GENOME_SUMMARY_AND_TRANSPOSABLB_ELEMENT_CONTENT.xlsx |
| Table 2 | Summary of Individual Scaffold Metrics in the Haplotype-resolved Assembly. | Table_2_BIGHEAD_CATFISH_SUMMARY_STATISTICS_PER_SCAFFOLD_QV_S-AQI-PCT.xlsx |
| Table S1 | Additional Statistics of Additional Assemblies | Table_S1_SUMMARY_STATISTICS_OF_BIGHEAD_CATFISH_ASSEMBLIES_BUSCO_CRAQ_MERQURY.xlsx |
| Table S2 | List of Software and Their Versions | Table_S2_LIST_OF_TOOLS_USED_FOR_BIGHEAD_CATFISH_ASSEMBLY.xlsx |
| Figure S1 | Assembly Graph (GFA), Hifiasm Primary Phased Contigs, Visualized in Bandage. | Supplementary_Figure_1_BANDAGE_ASSEMBLY_GRAPH_HIFIASM_HiC_UL_P_CTGS_HAP1_HAP2.png |
| Figure S2 | mtDNA Alignments in 210 Siluriformes Species Including Bighead Catfish. | Supplementary_Figure_2_mt_DNA_210_SPECIES_COMPARISON.png |
| Overleaf Project | A .zip Containing The Manuscript and all Figures and Tables, Including Technical Validation Files. | Bighead_catfish_C_macrocephalus_MS_draft_ver.1.zip |
Files
Bighead_catfish_C_macrocephalus_MS_draft_ver_1.pdf
Files
(41.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f113ff9ec654c073036709bda504c7b9
|
925.4 MB | Download |
|
md5:af79bae25e6460570d54b7cccf09cfc7
|
925.4 MB | Download |
|
md5:99528d8faa2f97d2aa2606f892d0584b
|
865.6 MB | Download |
|
md5:54236cb6cce26a86a66600b106dd7076
|
881.8 kB | Download |
|
md5:56b7c6e3214adcbe089c768d9a13d00b
|
117.3 MB | Preview Download |
|
md5:19580ced3bbab45561e3ad3fb3edd0ac
|
47.1 MB | Preview Download |
|
md5:bd9f6388f23cf1e3d7b259af915d5938
|
887.2 MB | Download |
|
md5:f04d4af6c93b61f24693018f1f39d4b1
|
1.4 kB | Download |
|
md5:76b1dfb3b409b6419eeca661ed4c01f2
|
893.4 MB | Download |
|
md5:400d07980ad494cd4d045e66076d0ff9
|
1.4 kB | Download |
|
md5:b404213f9104b81439f499f234f8161b
|
861.2 MB | Download |
|
md5:abbe6522bc1d3f4c7d7845335a94575e
|
848.4 MB | Download |
|
md5:8da40c8007c3efb69e924601804c0ac3
|
859.3 MB | Download |
|
md5:af257c52b9c9614496417aa518f44bf5
|
861.8 MB | Download |
|
md5:2a39ff0bc81a0324e27368186db7dd2a
|
841.8 MB | Download |
|
md5:3ca46ec3b85b534c65c18c06f8bbacdb
|
852.8 MB | Download |
|
md5:7ab65736db7cc08e5a3f8205e068a59f
|
881.6 MB | Download |
|
md5:ca6ed776557a890b020f08f0056de1d0
|
860.8 MB | Download |
|
md5:7b2d604e7e8701f37cd74612ccced110
|
1.1 GB | Download |
|
md5:440cd2db09462d85678fd1a686e86d0f
|
116.7 MB | Download |
|
md5:76d899d44adccaf89e00539c9156926d
|
1.4 GB | Download |
|
md5:b4c09c3ba9cee135fb8af4facfdda645
|
1.4 GB | Download |
|
md5:8aa0d3a6816da3705b17984c65e06bba
|
1.5 GB | Download |
|
md5:193b9ce24d12380015a9456d4aeaf64b
|
1.6 GB | Download |
|
md5:a58c2a77a34589e2c78240da50b9daad
|
323.3 MB | Download |
|
md5:1df61fccf99ac5177c278e834d0bdadc
|
1.2 GB | Download |
|
md5:27fb3c86b3c95541f75ba45653f24edf
|
1.6 GB | Download |
|
md5:4087078087c128cf68a3fd3471eff714
|
52.8 MB | Download |
|
md5:2517825166e075ac9f0c1f351217f158
|
1.5 GB | Download |
|
md5:0822b77d7fca930f10aa7f1c6143908a
|
1.6 GB | Download |
|
md5:fecda8ffb2c0ed8f6ba6f0464d215c63
|
1.6 GB | Download |
|
md5:d4cfa8b9c477157a7f701c41f67e5bb8
|
440.7 MB | Download |
|
md5:1ad92f39a70ff9bd51930c960b16cf10
|
1.1 GB | Download |
|
md5:800138599677ded51423b911ce82a229
|
1.6 GB | Download |
|
md5:b281bebe2a8d049a859203f62dd4bca7
|
91.1 MB | Download |
|
md5:0cb130fd94ec5ec7fe1952d7f38a8a90
|
1.5 GB | Download |
|
md5:193b9ce24d12380015a9456d4aeaf64b
|
1.6 GB | Download |
|
md5:7a44665ee8da0e22cb00860de4dc685e
|
1.7 GB | Download |
|
md5:7ab65736db7cc08e5a3f8205e068a59f
|
881.6 MB | Download |
|
md5:ca6ed776557a890b020f08f0056de1d0
|
860.8 MB | Download |
|
md5:800138599677ded51423b911ce82a229
|
1.6 GB | Download |
|
md5:237ec56a79bf1b586599e296c9852650
|
1.5 GB | Download |
|
md5:dc0de3b2eae26e58159c99ca5fbeb4e2
|
22 Bytes | Download |
|
md5:9d8942d68a2ec5c395e88e4ca781cd2b
|
782.0 kB | Download |
|
md5:4111c3052521784bf73b1597e8c2fa09
|
138.8 kB | Preview Download |
|
md5:cdb4ca22621110064f2072ac439c2968
|
997.7 kB | Download |
|
md5:ab47c2a2d7eae26661cd8e0f90834dc6
|
142.5 kB | Preview Download |
|
md5:f3d607f3e3eb0124989786798ea140fd
|
165.1 kB | Preview Download |
|
md5:93ccd976900ebfb3312b751179ba82e4
|
41 Bytes | Download |
|
md5:0e3fde5e30fb39fdd7c352ca14cd83c5
|
19 Bytes | Download |
|
md5:64b0468f469af4bf2fbda6034b018fba
|
48 Bytes | Download |
|
md5:d4f988f0b532cfb046932c24f6c5ef19
|
106.0 kB | Preview Download |
|
md5:09c7f2925cfcaca3008ea52e6daf23ad
|
1.2 MB | Download |
|
md5:d7cfae96be7fffe230f00cbe2e1204a9
|
107.4 kB | Preview Download |
|
md5:74205215256ffc23f32f4bdce3fb1618
|
110.5 kB | Preview Download |
|
md5:f090bdd6b52013c4512fbfa30aae834e
|
109.7 MB | Download |
|
md5:95bb2ba7db7b45abd420ddd3fda87bf0
|
79.4 MB | Download |
|
md5:ce2f354954a533378e5795bf8e27653a
|
906.8 MB | Download |
|
md5:bbac3db52d3e77313b8d4bbf725057a2
|
887.3 MB | Download |
|
md5:6ac77a5d8fe2d98fd64429415b7f9d4c
|
8.0 kB | Download |
|
md5:565cea2c58045f4772227a7fd7c5d021
|
161.3 kB | Download |
|
md5:39a5f29f31301168039ef6a1a38b79bd
|
175.1 kB | Download |
|
md5:0948441bd91e28df145da1594acf053f
|
762.4 kB | Preview Download |
|
md5:af545496247cca1910cc56bac343e312
|
91.7 kB | Preview Download |
|
md5:16b29d929e083759ce41405d49bad5a0
|
4.4 MB | Preview Download |
|
md5:795339dd60a6cb5c8d43ed108de1085e
|
18.1 MB | Preview Download |
|
md5:dd5ea6eebd824c33e55195f487f572df
|
16.3 MB | Preview Download |
|
md5:569bd522466dc920046d13ce9bfb3694
|
7.1 MB | Preview Download |
|
md5:152ad833753abca814c521d837ab5ae7
|
10.2 MB | Preview Download |
|
md5:8e1ddff87a84da56b3a85d6511eca380
|
2.2 MB | Preview Download |
|
md5:7113412339d2547103b45da39066eba5
|
533.9 kB | Preview Download |
|
md5:4fb3fb1acbd7c3c6c94477b1a61c586e
|
494.8 kB | Preview Download |
|
md5:8d1f8fc1b041f5c093e7111d275a2fb6
|
1.0 MB | Preview Download |
|
md5:50e9ea7f113046e35e6c78a6179a2f76
|
511.8 kB | Preview Download |
|
md5:7966247282bf2ee5d198d19556139ed0
|
2.5 MB | Preview Download |
|
md5:107873620a2c230008260e01d044b764
|
15.5 kB | Download |
|
md5:bcb72e84fb7054c8871c3047d200155b
|
112.8 kB | Download |
|
md5:c462332a8cf105f8185364953756ce6b
|
15.4 kB | Download |
|
md5:f6986d0363d9d72333ca762dbe369a50
|
112.8 kB | Download |
|
md5:b673694c90fb99dd21ec10e0be206f83
|
17.5 MB | Preview Download |
|
md5:fe82726a28b8ac69c5181671255367ec
|
19.4 MB | Preview Download |
|
md5:b3430bb9f7af9a85d1705c922812cdbd
|
1.3 MB | Preview Download |
|
md5:8bf983d495b74d4ca12a2e1fd65ad6c5
|
435.0 kB | Preview Download |
|
md5:194a4eeab741b215e1a0412ae4fdd1c0
|
10.9 kB | Download |
|
md5:c03c242a564e40c8105b40889a2e7e6d
|
13.0 kB | Download |
|
md5:36d802cc1650b8b5eb81493af5985b98
|
34.2 kB | Download |
|
md5:98249e8d167075662b3524d5442a2dd3
|
21.6 kB | Download |
|
md5:740798c2a873d33c5744fcbdf1f9d78b
|
8.9 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/Isoris/fClaMac