Code for SAMap comparison of S. canicula developing telencephalon to mouse and salamander
Description
scan-scrnaseq
Contributed code and analysis for the S. canicula snRNA-seq project. Presented in rough order of use.
1. scripts/
Scripts meant to run non-interactively. Mostly concerned with pre-processing and mapping of the single-nuclei datasets from the developing shark telencephalon. Presented here in order of execution.
convert_gtf.sh: convert the S. canicula GFF genome annotation to GTF format for ease of use.filter_gtf.py: manual editing of the S. canicula GTF file (genome annotation) to add gene IDs to the gene definitions. This was done to facilitate the generation of a 10X mapping index.mkref_10x.sh: create a mapping reference for for CellRanger.count_SN036.sh,count_SN052.sh,count_SN053.sh: map and demultiplex single-nuclei data against the S. canicula reference using the CellRanger pipeline.
2. samap/
Code for cross-species comparison between the developing S. caniculatelencephalon (this study), M. musculus cerebral cortex (Moreau et al. 2021, Development), and P. waltl telencephalon (Woych et al. 2022, Science).
2.1 prepare/
Before the SAMap comparison can be performed, objects need to be converted to .h5ad. This is achieved with sceasy (v0.0.7; see documentation online). This conversion has to be done manually.
After the conversion, we edit the AnnData objects to include the abbreviated species identifier (mmus, pwal, and scan, respectively) in the cell barcodes and feature (gene) IDs, replace dashes with underscores, add EggNOG-mapper annotations for the genes, and connect the gene IDs to protein IDs where needed, to facilitate connection to the BLAST graph for SAMap. This step can be performed automatically with make prepare from the base directory.
2.2 align/
SAMap relies on a so-called "BLAST graph" that connects genes across species in the single-cell objects via their sequence similarity scores. Of course, any sequence aligner can be used instead of BLAST, provided the output looks the same way (same columns). We used MMseqs2 to calculate pairwise alignments of the proteomes of the three species against each other, using otherwise default parameters. This step can be performed automatically with make align from the base directory.
align.shis the worker script that will runmmseqs easy-search.align_all.pyis the manager script that will go over the working directory and identify all peptide or transcript files that need to be aligned, then formulate and run the jobs on the command line.adjust.pygoes over the alignment results and fixes formatting problems that are unique to each species' gene annotation. This script also ensures that the IDs in the alignment result files match those in theAnnDataobjects.
2.3 samap/
After all is said and done, the SAMap calculations can be performed. This step can be performed automatically with make compare from the base directory.
pairwise.pyis the worker script that will perform the pairwise comparison between two objects.compare_all.pyis the master script that will formulate and run the individual jobs on the command line.
After the calculations are completed, the final analysis and plotting is performed in the script collate.ipynb. Here, the SAMap similarity matrices for the shark-to-mouse and shark-to-salamander comparisons (direction matters) are collated, and the resulting similarity matrix visualised. This step has to be manually completed.
Additional details
Related works
- Is supplement to
- Dataset: 10.5281/zenodo.15234060 (DOI)
Software
- Repository URL
- http://git.embl.de/npapadop/scan-scrnaseq
- Programming language
- Python , Shell , Jupyter Notebook , Markdown
- Development Status
- Inactive