Published June 23, 2025 | Version 1.0.0
Dataset Open

parsomics Reference Dataset

Description

parsomics is a local-first data management tool designed to efficiently organize large volumes of data from prokaryotic metagenomics workflows. parsomics supports the automatic parsing and integration of diverse file formats, produced throughout typical metagenomics workflows, into the parsomics Local Relational Database (pLRDB) - a standardized, cohesive, querriable representation of the full workflow. parsomics was developed by researchers from the Brazilian National Biorenewables Laboratory, within the Brazilian Center for Research in Energy and Materials (CNPEM).

This reference dataset was made with public data from a metagenomics study conducted at CNPEM, which analyzed soil samples covered with sugarcane bagasse that has been maintained over decades in a biorefinery at the state of São Paulo, Brazil (doi:10.1038/s41586-024-08553-z). This dataset contains files produced by multiple bioinformatics tools whose output formats are supported by parsomics. The project's documentation includes a detailed specification supported file formats, as well as general usage guide.

Files

parsomics-reference-data.zip

Files (38.1 MB)

Name Size Download all
md5:8865a40bde5db0bebf366aef846e62ae
38.1 MB Preview Download

Additional details

Related works

Has part
Publication: 10.1038/s41586-024-08553-z (DOI)

Funding

Fundação de Amparo à Pesquisa do Estado de São Paulo
G.F.P. 22/03059-5

Software

Repository URL
https://gitlab.com/parsomics/
Programming language
Python
Development Status
Active