Published February 18, 2025 | Version 1.0
Journal article Open

Haplotype-Resolved Chromosome-scale Assembly of the Bighead Catfish (Clarias macrocephalus) Genome

Description

Haplotype-Resolved Chromosome-Scale Genome Assembly of the Thai Bighead Catfish (Clarias macrocephalus)

This study presents the first high-quality, chromosome-scale, haplotype-resolved genome assembly of the Bighead catfish (Clarias macrocephalus), a freshwater species native to Thailand and the Mekong River basin. As a species of economic and ecological importance, C. macrocephalus plays a key role in Southeast Asian aquaculture and conservation efforts.

The assembly was generated using a combination of third-generation sequencing technologies, including PacBio HiFi, Oxford Nanopore (ONT), Hi-C, and Illumina paired-end sequencing. The resulting haplotype-resolved diploid genome spans 880 Mb across 27 pseudo-chromosomes, exhibiting high contiguity (N50 = 35.4 Mb), completeness (BUSCO = 95.5%, K-mers-Merqury-k21 = 96,6%), and base-level accuracy (QV50, corresponding to 99.999% correctness). The genome was manually curated and scaffolded using Hi-C chromatin conformation capture data, providing a comprehensive reference for future research.

This assembly fills a critical gap in genomic resources for the Clarias genus, offering valuable insights into structural variations, genetic diversity, and the effects of selective breeding of C. macrocephalus. The dataset supports applications in comparative genomics, conservation, aquaculture breeding programs, and pan-genome graph construction. Furthermore, it enables research into adaptive traits, such as the species’ benthic lifestyle and facultative air-breathing capability, which allow survival in low-oxygen environments.

Aligned with the United Nations’ Sustainable Development Goal (SDG) 2 (Zero Hunger), this genomic resource contributes to sustainable aquaculture and biodiversity conservation. All sequencing data, genome assemblies, and computational workflows are publicly available under NCBI BioProject number PRJNA1121957, supporting further research in fish genomics, hybridization studies, and genome evolution. All datasets and computational workflows are openly accessible to support further research in fish genomics and hybrid genome analysis.

 

📂 Data Records

🐟 Genome Assembly of Thai Bighead Catfish (isolate: CMAM) – Bighead catfish (TaxID: 35657)

📜 Raw Sequenced Reads (NCBI SRA)
🔬 Nanopore (20% err.): 🔗 SRR29723575 (SRR29723575)  
🧪 HiFi: 🔗 SRR29723576 (SRR29723576)  
🖥️ Illumina 150PE: 🔗 SRR29723578 (SRR29723578)  
🧲 Hi-C 150PE: 🔗 SRR29723577 (SRR29723577)  

🗂️ The assembled genome, deposited as a whole-genome sequence (WGS) diploid assembly

🐠 Haplotype 1 | 🐟 Haplotype 2.

🧬 GenBank accession numbers: 🔗 JBLWMO000000000 (JBLWMO000000000) | 🔗 JBLWMP000000000 (JBLWMP000000000).  

DATA DESCRIPTION (Final Assemblies (usable):

Step Description Tool Library Type Assembly File Name (Output) File Name Suffix (Output)
FINAL AND LATEST (NCBI-Submitted)

🐠 Haplotype 1

Hifiasm + GreenHill + JBAT+TGS-GapCloser + Polishing + Manual Curation HiC + UL + HiFi + PE150 Fully phased  manually reviewed haplotype 2

fClaMac_1_1.0.fa

(NCBI name: Bighead_catfish_fClaMac_hap1_MT.fasta)

🔗 JBLWMO000000000 (JBLWMO000000000)
FINAL AND LATEST (NCBI-Submitted) 🐟 Haplotype 2 Hifiasm + GreenHill + JBAT+TGS-GapCloser + Polishing + Manual Curation HiC + UL + HiFi+ PE150 Fully phased  manually reviewed haplotype 2

fClaMac_2_1.0.fa

(NCBI name: Bighead_catfish_fClaMac_hap2.fasta)

🔗 JBLWMP000000000 (JBLWMP000000000)
FINAL AND LATEST 🐠🐟Collapsed Assembly (Mixed) Flye HiFi Collapsed diploid assembly CMAM_FLYE_assembly.fasta .assembly.fa

📌 Data records are hosted under NCBI BioProject number: 🔗 PRJNA1132508 (WGS), PRJNA1159889 (Hap1), PRJNA1159890 (Hap2) 
📌 Bighead Catfish BioSample accession number: 🔗 SAMN42347118 (SAMN42347118)  

 

Other assemblies (Intermediate Files):

Step Description Tool Library Type Assembly File Name (Output) File Name Suffix (Output)
Primary Initial Assemblies            
1 Haplotype 1 Hifiasm HiC + UL + HiFi Fully phased haplotype 1 CMA.asm.hic.hap1.p_ctg.fa .hic.hap1.p_ctg.fa
1 Haplotype 2 Hifiasm HiC + UL + HiFi Fully phased haplotype 2 CMA.asm.hic.hap2.p_ctg.fa .hic.hap2.p_ctg.fa
Scaffolding and Intermediate Assemblies (Hifiasm and GreenHill)            
1 Scaffolds Hifiasm HiC + UL + HiFi Primary scaffolding CMA.asm.hic.p_ctg.fa .hic.p_ctg.fa
1 Scaffolds Hifiasm HiC + UL + HiFi Processed unitigs CMA.asm.hic.p_utg.fa .hic.p_utg.fa
1 Scaffolds Hifiasm HiC + UL + HiFi Raw unitigs CMA.asm.hic.r_utg.fa .hic.r_utg.fa
1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Phased haplotype 1 CMA_HIC_UL_l0.asm.hic.hap1.p_ctg.fa .hic.hap1.p_ctg.fa
1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Phased haplotype 2 CMA_HIC_UL_l0.asm.hic.hap2.p_ctg.fa .hic.hap2.p_ctg.fa
1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Primary contigs CMA_HIC_UL_l0.asm.hic.p_ctg.fa .hic.p_ctg.fa
1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Alternate contigs CMA_HIC_UL_l0.asm.hic.a_ctg.fa .hic.a_ctg.fa
1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Raw unitigs CMA_HIC_UL_l0.asm.hic.r_utg.fa .hic.r_utg.fa
1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Processed unitigs CMA_HIC_UL_l0.asm.hic.p_utg.fa .hic.p_utg.fa
1 Scaffolds Hifiasm HiFi + UL Primary contigs CMA_HIFI.asm.p_ctg.fa .p_ctg.fa
1 Scaffolds Hifiasm HiFi + UL Alternate contigs CMA_HIFI.asm.a_ctg.fa .a_ctg.fa
1 Scaffolds Hifiasm HiFi + UL Raw unitigs CMA_HIFI.asm.r_utg.fa .r_utg.fa
1 Scaffolds Hifiasm HiFi + UL Processed unitigs CMA_HIFI.asm.p_utg.fa .p_utg.fa
1 Scaffolds L0 Hifiasm HiFi Primary contigs L0 CMA_HIFI_l0.asm.p_ctg.fa .p_ctg.fa
1 Scaffolds L0 Hifiasm HiFi Alternate contigs L0 CMA_HIFI_l0.asm.a_ctg.fa .a_ctg.fa
1 Scaffolds L0 Hifiasm HiFi Raw unitigs CMA_HIFI_l0.asm.r_utg.fa .r_utg.fa
1 Scaffolds L0 Hifiasm HiFi Polished unitigs containing Hap1 and Hap2 CMA_HIFI_l0.asm.p_utg.fa .p_utg.fa
2 Scaffolds GreenHill Hap1 Hifiasm hap1 phased & scaffolded with GreenHill 02-CMA_HAP1.greenhill.fa NA
2 Scaffolds GreenHill Hap2 Hifiasm hap2 phased & scaffolded with GreenHill 02-CMA_HAP2.greenhill.fa NA
Failed Assemblies (Wtdbg2 - Not Used)            
1 Assembly 1 Wtdbg2 (failed low QV) HiFi raw Consensus contigs CM_M_dbg.hifi.raw.fa .raw.fa
1 Assembly 1 Wtdbg2 (failed low QV) HiFi ONT raw Consensus contigs CM_M_dbg.cb.raw.fa .raw.fa
1 Assembly 1 Wtdbg2 (failed low QV) HiFi cns Consensus contigs CM_M_dbg.hifi_cns.fa .cns.fa
1 Assembly 1 Wtdbg2 (failed low QV) HiFi ONT cns Consensus contigs CM_M_dbg.cb_cns.fa .cns.fa
1 Consensus Assembly Wtdbg2 (failed low QV) HiFi Polished consensus CM_M_dbg.hifi.srp.fa .srp.fa
1 Consensus Assembly Wtdbg2 (failed low QV) HiFi ONT Polished consensus CM_M_dbg.cb.srp.fa .srp.fa
* L0 means that there was no purging of false duplication errors (i.e., the assembly is expected to be of longer size..).
 

Technical validation (To be done.):

Step Description Tool Library Type Assembly File Name (Output) File Name Suffix (Output)
.            

 

Knowledge Dissemination:

Object Description Link / File
Manuscript Presentation and Interpretation of Results. (Version 1.0). Bighead_catfish_C_macrocephalus_MS_draft_ver_1.pdf
Figure 1 Sequencing Data Summary for C. macrocephalus Genome Experiment. Figure_1_SEQUENCING_READS_AND_GENOMESCOPE2.0.png
Figure 2 Comprehensive Haplotype-Resolved Genome Assembly and Scaffolding Workflow. Figure_2_GENOME_ASSEMBLY_WORKFLOW.png
Figure 3 Hi-C Contact Matrix Heat Maps of Individual Pseudo-chromosome in Haplotype 1. Figure_3_SEPARATE_HIC_MAPS_HAPLOTYPE_1_all.pdf
Figure 4 Hi-C Contact Matrix Heat Maps of Individual Pseudo-chromosome in Haplotype 2. Figure_4_SEPARATE_HIC_MAPS_HAPLOTYPE_2_all.pdf
Figure 5 Hi-C map of Hi-C Scaffolds - Bighead Catfish. Figure_5_GENOME_WIDE_HIC_MAPS_HAPLOTYPE_1_AND_2.png
Figure 6 Assembly Status Displaying Gaps and Telomeres, January 2024 - November 2025. Figure_6_BIGHEAD_CATFISH_MANUAL_CURATION_PROGRESS_HAP1_HAP2.png
Figure 7 Visual Genome Quality, Merqury Spectra and BUSCO Scores.
Figure 8 Synteny Analysis of Linkage Groups for Various Catfish Assemblies.  Figure_8_SYNTENIC_RELATIONSHIPS_TILAPIA_BIGHEAD_ZEBRAFISH.png
Table 1 Summary Statistics of the Genome Assembly and Transposable Element Content.  Table_1_GENOME_SURVEY_GENOME_SUMMARY_AND_TRANSPOSABLB_ELEMENT_CONTENT.xlsx
Table 2 Summary of Individual Scaffold Metrics in the Haplotype-resolved Assembly.  Table_2_BIGHEAD_CATFISH_SUMMARY_STATISTICS_PER_SCAFFOLD_QV_S-AQI-PCT.xlsx
Table S1 Additional Statistics of Additional Assemblies  Table_S1_SUMMARY_STATISTICS_OF_BIGHEAD_CATFISH_ASSEMBLIES_BUSCO_CRAQ_MERQURY.xlsx
Table S2 List of Software and Their Versions Table_S2_LIST_OF_TOOLS_USED_FOR_BIGHEAD_CATFISH_ASSEMBLY.xlsx
Figure S1 Assembly Graph (GFA), Hifiasm Primary Phased Contigs, Visualized in Bandage. Supplementary_Figure_1_BANDAGE_ASSEMBLY_GRAPH_HIFIASM_HiC_UL_P_CTGS_HAP1_HAP2.png
Figure S2 mtDNA Alignments in 210 Siluriformes Species Including Bighead Catfish. Supplementary_Figure_2_mt_DNA_210_SPECIES_COMPARISON.png
Overleaf Project A .zip Containing The Manuscript and all Figures and Tables, Including Technical Validation Files. Bighead_catfish_C_macrocephalus_MS_draft_ver.1.zip

 

Files

Bighead_catfish_C_macrocephalus_MS_draft_ver_1.pdf

Files (41.6 GB)

Name Size Download all
md5:f113ff9ec654c073036709bda504c7b9
925.4 MB Download
md5:af79bae25e6460570d54b7cccf09cfc7
925.4 MB Download
md5:99528d8faa2f97d2aa2606f892d0584b
865.6 MB Download
md5:54236cb6cce26a86a66600b106dd7076
881.8 kB Download
md5:56b7c6e3214adcbe089c768d9a13d00b
117.3 MB Preview Download
md5:19580ced3bbab45561e3ad3fb3edd0ac
47.1 MB Preview Download
md5:bd9f6388f23cf1e3d7b259af915d5938
887.2 MB Download
md5:f04d4af6c93b61f24693018f1f39d4b1
1.4 kB Download
md5:76b1dfb3b409b6419eeca661ed4c01f2
893.4 MB Download
md5:400d07980ad494cd4d045e66076d0ff9
1.4 kB Download
md5:b404213f9104b81439f499f234f8161b
861.2 MB Download
md5:abbe6522bc1d3f4c7d7845335a94575e
848.4 MB Download
md5:8da40c8007c3efb69e924601804c0ac3
859.3 MB Download
md5:af257c52b9c9614496417aa518f44bf5
861.8 MB Download
md5:2a39ff0bc81a0324e27368186db7dd2a
841.8 MB Download
md5:3ca46ec3b85b534c65c18c06f8bbacdb
852.8 MB Download
md5:7ab65736db7cc08e5a3f8205e068a59f
881.6 MB Download
md5:ca6ed776557a890b020f08f0056de1d0
860.8 MB Download
md5:7b2d604e7e8701f37cd74612ccced110
1.1 GB Download
md5:440cd2db09462d85678fd1a686e86d0f
116.7 MB Download
md5:76d899d44adccaf89e00539c9156926d
1.4 GB Download
md5:b4c09c3ba9cee135fb8af4facfdda645
1.4 GB Download
md5:8aa0d3a6816da3705b17984c65e06bba
1.5 GB Download
md5:193b9ce24d12380015a9456d4aeaf64b
1.6 GB Download
md5:a58c2a77a34589e2c78240da50b9daad
323.3 MB Download
md5:1df61fccf99ac5177c278e834d0bdadc
1.2 GB Download
md5:27fb3c86b3c95541f75ba45653f24edf
1.6 GB Download
md5:4087078087c128cf68a3fd3471eff714
52.8 MB Download
md5:2517825166e075ac9f0c1f351217f158
1.5 GB Download
md5:0822b77d7fca930f10aa7f1c6143908a
1.6 GB Download
md5:fecda8ffb2c0ed8f6ba6f0464d215c63
1.6 GB Download
md5:d4cfa8b9c477157a7f701c41f67e5bb8
440.7 MB Download
md5:1ad92f39a70ff9bd51930c960b16cf10
1.1 GB Download
md5:800138599677ded51423b911ce82a229
1.6 GB Download
md5:b281bebe2a8d049a859203f62dd4bca7
91.1 MB Download
md5:0cb130fd94ec5ec7fe1952d7f38a8a90
1.5 GB Download
md5:193b9ce24d12380015a9456d4aeaf64b
1.6 GB Download
md5:7a44665ee8da0e22cb00860de4dc685e
1.7 GB Download
md5:7ab65736db7cc08e5a3f8205e068a59f
881.6 MB Download
md5:ca6ed776557a890b020f08f0056de1d0
860.8 MB Download
md5:800138599677ded51423b911ce82a229
1.6 GB Download
md5:237ec56a79bf1b586599e296c9852650
1.5 GB Download
md5:dc0de3b2eae26e58159c99ca5fbeb4e2
22 Bytes Download
md5:9d8942d68a2ec5c395e88e4ca781cd2b
782.0 kB Download
md5:4111c3052521784bf73b1597e8c2fa09
138.8 kB Preview Download
md5:cdb4ca22621110064f2072ac439c2968
997.7 kB Download
md5:ab47c2a2d7eae26661cd8e0f90834dc6
142.5 kB Preview Download
md5:f3d607f3e3eb0124989786798ea140fd
165.1 kB Preview Download
md5:93ccd976900ebfb3312b751179ba82e4
41 Bytes Download
md5:0e3fde5e30fb39fdd7c352ca14cd83c5
19 Bytes Download
md5:64b0468f469af4bf2fbda6034b018fba
48 Bytes Download
md5:d4f988f0b532cfb046932c24f6c5ef19
106.0 kB Preview Download
md5:09c7f2925cfcaca3008ea52e6daf23ad
1.2 MB Download
md5:d7cfae96be7fffe230f00cbe2e1204a9
107.4 kB Preview Download
md5:74205215256ffc23f32f4bdce3fb1618
110.5 kB Preview Download
md5:f090bdd6b52013c4512fbfa30aae834e
109.7 MB Download
md5:95bb2ba7db7b45abd420ddd3fda87bf0
79.4 MB Download
md5:ce2f354954a533378e5795bf8e27653a
906.8 MB Download
md5:bbac3db52d3e77313b8d4bbf725057a2
887.3 MB Download
md5:6ac77a5d8fe2d98fd64429415b7f9d4c
8.0 kB Download
md5:565cea2c58045f4772227a7fd7c5d021
161.3 kB Download
md5:39a5f29f31301168039ef6a1a38b79bd
175.1 kB Download
md5:0948441bd91e28df145da1594acf053f
762.4 kB Preview Download
md5:af545496247cca1910cc56bac343e312
91.7 kB Preview Download
md5:16b29d929e083759ce41405d49bad5a0
4.4 MB Preview Download
md5:795339dd60a6cb5c8d43ed108de1085e
18.1 MB Preview Download
md5:dd5ea6eebd824c33e55195f487f572df
16.3 MB Preview Download
md5:569bd522466dc920046d13ce9bfb3694
7.1 MB Preview Download
md5:152ad833753abca814c521d837ab5ae7
10.2 MB Preview Download
md5:8e1ddff87a84da56b3a85d6511eca380
2.2 MB Preview Download
md5:7113412339d2547103b45da39066eba5
533.9 kB Preview Download
md5:4fb3fb1acbd7c3c6c94477b1a61c586e
494.8 kB Preview Download
md5:8d1f8fc1b041f5c093e7111d275a2fb6
1.0 MB Preview Download
md5:50e9ea7f113046e35e6c78a6179a2f76
511.8 kB Preview Download
md5:7966247282bf2ee5d198d19556139ed0
2.5 MB Preview Download
md5:107873620a2c230008260e01d044b764
15.5 kB Download
md5:bcb72e84fb7054c8871c3047d200155b
112.8 kB Download
md5:c462332a8cf105f8185364953756ce6b
15.4 kB Download
md5:f6986d0363d9d72333ca762dbe369a50
112.8 kB Download
md5:b673694c90fb99dd21ec10e0be206f83
17.5 MB Preview Download
md5:fe82726a28b8ac69c5181671255367ec
19.4 MB Preview Download
md5:b3430bb9f7af9a85d1705c922812cdbd
1.3 MB Preview Download
md5:8bf983d495b74d4ca12a2e1fd65ad6c5
435.0 kB Preview Download
md5:194a4eeab741b215e1a0412ae4fdd1c0
10.9 kB Download
md5:c03c242a564e40c8105b40889a2e7e6d
13.0 kB Download
md5:36d802cc1650b8b5eb81493af5985b98
34.2 kB Download
md5:98249e8d167075662b3524d5442a2dd3
21.6 kB Download
md5:740798c2a873d33c5744fcbdf1f9d78b
8.9 MB Download

Additional details

Software