Published September 11, 2021 | Version 1.0.2
Journal article Open

Collection of codes and annotated matrix for the paper "A cell atlas of human thymic development defines T cell repertoire formation"

  • 1. Sanger Institute
  • 2. Newcastle University
  • 3. Ghent University


This is the collection of codes and annotated matrix described in the paper “A cell atlas of human thymic development defines T cell repertoire formation”


This repository contains:

  • 'scjp' package to assist single-cell data analysis
  • jupyter notebooks which show the process of analysis for all figures
  • annotated, normalised matrix in h5ad format
  • csv files containing the metadata
  • raw count matrix in h5ad format
  • vdj files (cellranger output)


The following is description for each item:

"sample_metadata_fix.xlsx" is:

  • metadata for all samples generated
  • contains file key for gene expression and vdj data matching (the error in previous versions are fixed here!)

"" contains:

  • *.ipynb: jupyter notebooks describing the analysis
  • contains global variables shared across multiple notebooks
  • scjp: python package to support the single-cell analysis (not for the distribution, there are some dependency issue that needs to be fixed. Final version is under-preparation.)

"" contains:

  • *.csv: metadata including annotation per cell (.obs in scanpy anndata)
  • *.h5ad: anndata containing matrix for normalised read counts, metadata including annotation per cell (use python scanpy package for navigation. See 'Data_navigator.ipynb' for tutorial)
  • Data_navigator.ipynb: jupyter notebook describing each dataset

"HTA07.A01.v02.entire_data_raw_count.h5ad" is:

  • raw count matrix for all human data generated in the study
  • can be matched with annotated dataset (*.h5ad files described above) by observation names
  • 'adata.obs_names' is in format of '{filename}-{cellbarcode}'

"" contains:

  • cellranger output *.csv files for VDJ data analysis
  • vdj files can be matched to gene expression files based on the information in 'sample_metadata.xlsx'
    • they should share the same cell barcode

Please also check for future updates

This github repository will be used to update any additional materials which are not covered in here.

Please contact to: or for any questions


Files (4.7 GB)

Name Size Download all
1.6 GB Download
24.0 kB Download
2.5 GB Preview Download
137.1 MB Preview Download
493.3 MB Preview Download