Read spatial omics
[1]:
import SOAPy_st as sp
import pandas as pd
Read Visium
Using the h5 file to load the data:
Raw data download from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5924030
[2]:
adata_visium = sp.pp.read_visium2adata(
path = r'/csb2/project/SpatialPackage_whq/Tutorial/data/KIRC_5/',
count_file = 'filtered_feature_bc_matrix.h5'
)
/home/wangheqi/anaconda3/envs/SpatialOmics/lib/python3.9/site-packages/anndata/_core/anndata.py:1832: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")
[3]:
adata_visium
[3]:
AnnData object with n_obs × n_vars = 1949 × 36601
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
Read GeoMx DSP
Read spatial transcriptomics data of NanoString GeoMx DSP. Mouse embryonic development samples are used as examples.
Download from https://nanostring.com/products/geomx-digital-spatial-profiler/spatial-organ-atlas/mouse-development/
[4]:
adata_dsp = sp.pp.read_dsp2adata(
xml_file={
# Using the xml file for two samples as an example,
# you can add key-value pairs to the dictionary if you need information about sample points for more samples.
'mu_dev_E13_006': '/csb2/project/SpatialPackage_whq/Tutorial/data/nanostring_growth/mu_dev_E13_006.ome.xml',
'mu_dev_E13_011': '/csb2/project/SpatialPackage_whq/Tutorial/data/nanostring_growth/mu_dev_E13_011.ome.xml'
},
information_file='/csb2/project/SpatialPackage_whq/Tutorial/data/nanostring_growth/Export4_NormalizationQ3.xlsx',
)
[5]:
adata_dsp.obs.head()
[5]:
| SlideName | ScanLabel | ROILabel | SegmentLabel | QCFlags | AOISurfaceArea | AOINucleiCount | ROICoordinateX | ROICoordinateY | RawReads | ... | Timepoint | ROIID | SegmentID | ScanWidth | ScanHeight | ScanOffsetX | ScanOffsetY | LOQ (Mouse NGS Whole Transcriptome Atlas RNA) | NormalizationFactor | ExpressionFilteringThreshold (Mouse NGS Whole Transcriptome Atlas RNA) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SegmentDisplayName | |||||||||||||||||||||
| mu_dev_E9_001 | 001 | Full ROI | mu_dev_E9_001 | mu_dev_E9_001 | 1 | Full ROI | Low Negative Probe Count for Probe Kit Mouse N... | 47287.021916 | 392 | 16573 | 18896 | 4259786 | ... | E9 | c73163bc-f107-498f-bd40-bbcab9a48993 | f057dc6e-68ce-441d-a816-58802fc38258 | 16904.210938 | 20578.818359 | 7932 | 6094 | 16.252453 | 0.536152 | 16.252453 |
| mu_dev_E9_001 | 002 | Full ROI | mu_dev_E9_001 | mu_dev_E9_001 | 2 | Full ROI | Low Negative Probe Count for Probe Kit Mouse N... | 41175.373907 | 340 | 16485 | 19752 | 4725639 | ... | E9 | be667b65-38c0-49c4-af51-845ffd8a7a85 | 09985ba0-449c-4b1a-9c8f-9327991df8fa | 16904.210938 | 20578.818359 | 7932 | 6094 | 17.745085 | 0.496225 | 17.745085 |
| mu_dev_E9_001 | 003 | Full ROI | mu_dev_E9_001 | mu_dev_E9_001 | 3 | Full ROI | Low Negative Probe Count for Probe Kit Mouse N... | 43198.870210 | 403 | 15756 | 18824 | 5958816 | ... | E9 | ba522e1c-7e21-4cc6-b529-118603949d5a | 2ac08d0d-c65d-4ab9-b834-5ef7ebbad4cd | 16904.210938 | 20578.818359 | 7932 | 6094 | 18.109046 | 0.395298 | 18.109046 |
| mu_dev_E9_001 | 004 | Full ROI | mu_dev_E9_001 | mu_dev_E9_001 | 4 | Full ROI | Low Negative Probe Count for Probe Kit Mouse N... | 44444.810459 | 368 | 15722 | 19675 | 3703922 | ... | E9 | 52d9a6b1-934d-4f42-a80e-a4a78b7ede43 | aeed549d-8b7a-4fa4-b22a-c54059e83066 | 16904.210938 | 20578.818359 | 7932 | 6094 | 14.509348 | 0.605782 | 14.509348 |
| mu_dev_E9_001 | 005 | Full ROI | mu_dev_E9_001 | mu_dev_E9_001 | 5 | Full ROI | Low Negative Probe Count for Probe Kit Mouse N... | 31889.529594 | 279 | 15064 | 18429 | 3069897 | ... | E9 | a9e0bca3-59a4-4131-90b7-c787ca400759 | c53d1b52-712e-4a3d-9af4-5bdb55365eef | 16904.210938 | 20578.818359 | 7932 | 6094 | 12.118616 | 0.717618 | 12.118616 |
5 rows × 35 columns
The sampling points corresponding to each ROI is stored in.uns.point.
[6]:
adata_dsp.uns['point']
[6]:
| slide | roi | x | y | |
|---|---|---|---|---|
| 0 | mu_dev_E13_006 | 1 | 13011.793535 | 10484.417086 |
| 1 | mu_dev_E13_006 | 1 | 13109.708338 | 10499.139178 |
| 2 | mu_dev_E13_006 | 1 | 13184.708338 | 10547.123928 |
| 3 | mu_dev_E13_006 | 1 | 13242.708338 | 10642.093747 |
| 4 | mu_dev_E13_006 | 1 | 13261.840745 | 10730.656697 |
| ... | ... | ... | ... | ... |
| 4171 | mu_dev_E13_011 | 58 | 4548.893838 | 10387.159712 |
| 4172 | mu_dev_E13_011 | 58 | 4477.882657 | 10362.799483 |
| 4173 | mu_dev_E13_011 | 58 | 4437.660936 | 10371.299021 |
| 4174 | mu_dev_E13_011 | 58 | 4487.202775 | 10425.060315 |
| 4175 | mu_dev_E13_011 | 58 | 4636.188087 | 10423.920083 |
4176 rows × 4 columns
Read other barcode-based data
In most cases, the raw data of barcode-based spatial omics technology can be expressed in two tables: the coordinate information of each cell (spot) and the expression of each cell (spot). This sp.pp.read_csv2adata() generates the Anndata format by providing the two tables by the user.
Here we use slide-seqV2 data from the mouse olfactory bulb as a demonstration. Download from https://singlecell.broadinstitute.org/single_cell/study/SCP815.
[7]:
express = pd.read_csv('/csb2/project/SpatialPackage_whq/Tutorial/data/Slide_seqV2/Puck_200127_15.digital_expression.txt', index_col=0, header=0, sep='\t')
location = pd.read_csv('/csb2/project/SpatialPackage_whq/Tutorial/data/Slide_seqV2/Puck_200127_15_bead_locations.csv', index_col=0, header=0)
[8]:
adata_csv = sp.pp.read_csv2adata(express.T, spatial=location)
[9]:
adata_csv
[9]:
AnnData object with n_obs × n_vars = 21724 × 21220
Read imaged-based data
The image-based spatial omics technology needs to perform cell segmentation first, and quantitatively generate anndata format through the results of cell segmentation. Users are required to provide images of cell segmentation and staining images for each marker.
Here we use one (sample 4) of breast cancer MIBI-TOF dataests as an example. Download from https://mibi-share.ionpath.com.
[10]:
import tifffile as tiff
import matplotlib.pyplot as plt
image = tiff.imread('/csb2/project/SpatialPackage_whq/Tutorial/data/mibi_tof/TA459_multipleCores2_Run-4_Point4.tiff')
mask = tiff.imread('/csb2/project/SpatialPackage_whq/Tutorial/data/mibi_tof/Point4_binarymask.tiff')
[11]:
plt.imshow(mask, cmap='gray')
plt.show()
[12]:
plt.imshow(image[8, :, :], cmap='gray', vmax=5)
plt.show()
Determine the name of each channel and the channel that needs to be removed to obtain the quantitative Anndata.
[13]:
channel_names=[
'Au','Background','Beta_catenin','Ca','CD11b','CD11c','CD138','CD16','CD20','CD209','CD3',
'CD31','CD4','CD45','CD45RO','CD56','CD63','CD68','CD8','dsDNA','EGFR','Fe','FoxP3','H3K27me3',
'H3K9ac','HLA-DR','HLA-I','IDO','CK17','CK6','Ki67','Lag3','MPO','Na','P','p53','PanCK','PD-L1',
'PD-1','pS6','Si','SMA','Ta','Vimentin'
]
exp_removed = [0,1,3,19,21,23,33,34,40,42]
[ ]:
adata_img = sp.pp.read_mult_image2adata(
image=image,
mask=mask,
channel_names=channel_names,
remove_channels=exp_removed
)
[ ]:
adata_img