Code for SAMap comparison of S. canicula developing telencephalon to mouse and salamander

Papadopoulos, Nikolaos

doi:10.5281/zenodo.15233988

Published April 17, 2025 | Version v0.1.0

Computational notebook Embargoed

Code for SAMap comparison of S. canicula developing telencephalon to mouse and salamander

Papadopoulos, Nikolaos¹

1. University of Vienna

Contributors

Contact person:

Quintana Urzainqui, Idoia²

Researcher:

Papadopoulos, Nikolaos¹

1. University of Vienna
2. EMBL

scan-scrnaseq

Contributed code and analysis for the S. canicula snRNA-seq project. Presented in rough order of use.

1. `scripts/`

Scripts meant to run non-interactively. Mostly concerned with pre-processing and mapping of the single-nuclei datasets from the developing shark telencephalon. Presented here in order of execution.

convert_gtf.sh: convert the S. canicula GFF genome annotation to GTF format for ease of use.
filter_gtf.py: manual editing of the S. canicula GTF file (genome annotation) to add gene IDs to the gene definitions. This was done to facilitate the generation of a 10X mapping index.
mkref_10x.sh: create a mapping reference for for CellRanger.
count_SN036.sh, count_SN052.sh, count_SN053.sh: map and demultiplex single-nuclei data against the S. canicula reference using the CellRanger pipeline.

2. `samap/`

Code for cross-species comparison between the developing S. caniculatelencephalon (this study), M. musculus cerebral cortex (Moreau et al. 2021, Development), and P. waltl telencephalon (Woych et al. 2022, Science).

2.1 `prepare/`

Before the SAMap comparison can be performed, objects need to be converted to .h5ad. This is achieved with sceasy (v0.0.7; see documentation online). This conversion has to be done manually.

After the conversion, we edit the AnnData objects to include the abbreviated species identifier (mmus, pwal, and scan, respectively) in the cell barcodes and feature (gene) IDs, replace dashes with underscores, add EggNOG-mapper annotations for the genes, and connect the gene IDs to protein IDs where needed, to facilitate connection to the BLAST graph for SAMap. This step can be performed automatically with make prepare from the base directory.

2.2 `align/`

SAMap relies on a so-called "BLAST graph" that connects genes across species in the single-cell objects via their sequence similarity scores. Of course, any sequence aligner can be used instead of BLAST, provided the output looks the same way (same columns). We used MMseqs2 to calculate pairwise alignments of the proteomes of the three species against each other, using otherwise default parameters. This step can be performed automatically with make align from the base directory.

align.sh is the worker script that will run mmseqs easy-search.
align_all.py is the manager script that will go over the working directory and identify all peptide or transcript files that need to be aligned, then formulate and run the jobs on the command line.
adjust.py goes over the alignment results and fixes formatting problems that are unique to each species' gene annotation. This script also ensures that the IDs in the alignment result files match those in the AnnData objects.

2.3 `samap/`

After all is said and done, the SAMap calculations can be performed. This step can be performed automatically with make compare from the base directory.

pairwise.py is the worker script that will perform the pairwise comparison between two objects.
compare_all.py is the master script that will formulate and run the individual jobs on the command line.

After the calculations are completed, the final analysis and plotting is performed in the script collate.ipynb. Here, the SAMap similarity matrices for the shark-to-mouse and shark-to-salamander comparisons (direction matters) are collated, and the resulting similarity matrix visualised. This step has to be manually completed.

Files

Embargoed

The files will be made publicly available on December 30, 2026.

Reason: until manuscript submission

Additional details

Is supplement to: Dataset: 10.5281/zenodo.15234060 (DOI)

Repository URL: http://git.embl.de/npapadop/scan-scrnaseq
Programming language: Python , Shell , Jupyter Notebook , Markdown
Development Status: Inactive

	All versions	This version
Views	70	70
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Contributors

Contact person:

Researcher:

scan-scrnaseq

1. `scripts/`

2. `samap/`

2.1 `prepare/`

2.2 `align/`

2.3 `samap/`

Files

Embargoed

Related works

Software

Code for SAMap comparison of S. canicula developing telencephalon to mouse and salamander

Authors/Creators

Contributors

Contact person:

Researcher:

Description

scan-scrnaseq

1. scripts/

2. samap/

2.1 prepare/

2.2 align/

2.3 samap/

Files

Embargoed

Additional details

Related works

Software

1. `scripts/`

2. `samap/`

2.1 `prepare/`

2.2 `align/`

2.3 `samap/`