This CODEX cell type labels readme.txt file was generated on 2021-07-29 by John Hickey


GENERAL INFORMATION

1. Title of Dataset: Cell Type Labels for all Clustering and Normalization Combinations Compared for CODEX Multiplexed Imaging

2. Author Information
	A. Principal Investigator Contact Information
		Name: John Hickey
		Institution: Stanford University
		Email: jwhickey@stanford.edu


3. Date of data collection (single date, range, approximate date): 2020-2021 

4. Geographic location of data collection: Stanford University, Stanford, CA, USA 

5. Information about funding sources that supported the collection of the data: This work was supported by the U.S. National Institutes of Health (2U19AI057229-16, 5P01HL10879707, 5R01GM10983604, 5R33CA18365403, 5U01AI101984-07, 5UH2AR06767604, 5R01CA19665703, 5U54CA20997103, 5F99CA212231-02, 1F32CA233203-01, 5U01AI140498-02, 1U54HG010426-01, 5U19AI100627-07, 1R01HL120724-01A1, R33CA183692, R01HL128173-04, 5P01AI131374-02, 5UG3DK114937-02, 1U19AI135976-01, IDIQ17X149, 1U2CCA233238-01, 1U2CCA233195-01); the U.S. Department of Defense (W81XWH-14-1-0180, W81XWH-12-1-0591); the U.S. Food and Drug Administration (HHSF223201610018C, DSTL/AGR/00980/01); Cancer Research UK (C27165/A29073); the Bill and Melinda Gates Foundation (OPP1113682); the Cancer Research Institute; the Parker Institute for Cancer Immunotherapy; the Kenneth Rainin Foundation (2018-575); the Silicon Valley Community Foundation (2017-175329 and 2017-177799-5022); the Beckman Center for Molecular and Genetic Medicine; Juno Therapeutics, Inc. (122401); Pfizer, Inc. (123214); Celgene, Inc. (133826, 134073); Vaxart, Inc. (137364); and the Rachford & Carlotta A. Harris Endowed Chair to G.P.N. J.W.H. was supported by an NIH T32 Fellowship (T32CA196585) and an American Cancer Society - Roaring Fork Valley Postdoctoral Fellowship (PF-20-032-01-CSM). 


SHARING/ACCESS INFORMATION

1. Links to publications that cite or use the data: Frontiers Immunology - Hickey et. al 2021, in press.

2. Links to other publicly accessible locations of the data: Raw and processed data used within the manuscript for the four areas of the colon can be accessed from the HuBMAP portal through Globus with the following dataset IDs: HBM977.PCGP.852, HBM575.THQM.284, HBM462.JKCN.863, HBM334.QWFV.953, HBM938.KMNW.825. 

3. Recommended citation for this dataset: cite the Frontiers Immunology paper.


DATA & FILE OVERVIEW

1. File List: 
cell_1_annot.csv
cell_2_annot.csv
cell_3_annot.csv
cell_4_annot.csv

2. Relationship between files, if important: 
Each csv has the cell type annotations for each comparison. The number indicates the level of cell type granularity where 1 is the most granular and 4 is the least granular (i.e. 7 cell types)


METHODOLOGICAL INFORMATION

1. Description of methods used for collection/generation of data: 
We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

2. Methods for processing the data: 
From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 unique cell type labels. We also labeled cell types by hiearchical hand-gating data within cellengine (cellengine.com). We also created another gold standard for comparison by overclustering unormalized data with X-shift angular clustering. Finally, we created one last label as the major cell type call from each cell from all 21 cell type labels in the dataset. 

3. Instrument- or software-specific information needed to interpret the data: 
Any data analysis software such as python/R/Excel

4. People involved with sample collection, processing, analysis and/or submission: Garry Nolan, Yuqi Tan, Yury Goltsev 


DATA-SPECIFIC INFORMATION:

1. Number of variables: 27

2. Number of cases/rows: 124,780

3. Variable List: X, Y, Region, cell, ori_gating, a_ang, a_euc, a_kmeans,log_ang,log_euc,log_kmeans, min_ang, min_euc, min_kmeans, ori_ang, ori_euc, ori_kmeans, z_ang, z_euc, z_kmeans, a_leid, log_leid, min_leid, ori_leid, z_leid, over_cell, maj_cell

4. Variable Description: X & Y are for X, Y position in pixels in the overall montage image of the dataset for that individual region. There are also columns to indicate which region the data came from (4 total). The rest are labels generated by all the clustering and normalization techniques used in the manuscript and what were compared to each other. Each variable from the clustering combination is described first by the normalization technique used to make the dataset then separated by an underscore and the clustering algorithm used to analyze the cell types. 

5. Missing data codes: N/A
