There is a newer version of the record available.

Published January 14, 2022 | Version 2.0.0
Dataset Open

Breast tumour microenvironment structures are associated with genomic features and clinical outcome

  • 1. CRUK Cambridge Institute, University of Cambridge, Cambridge, UK
  • 2. Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
  • 3. Department of Histopathology, Addenbrookes Hospital, Cambridge, UK
  • 4. MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
  • 5. Department of Pathology, University of Nottingham, Nottingham, UK
  • 6. British Columbia Cancer Agency, University of British Columbia, Vancouver, Canada

Description

Data and code are provided in one directory.  This annotation document is divided into notes for those who wish to reuse data, and those who wish to rerun analysis code.

Data for reuse:

The data comprise three types:

  1. Full stack tiff images that contain multiplexed imaging mass cytometry (IMC) images.
  2. Image masks that identify image regions associated with cells, epithelium and vessels
  3. Processed data that contain measurements taken using the associated images.

 

All full stacks and masks are tiff images. Each IMC acquisition (image) is associated with six images in total: the full stack image itself and five image masks (whole cell, nucleus, cytoplasm, tumour and vessel). The naming convention for these images is MB####_###_ImageType.tiff, where:

  • MB#### is the METABRIC identifier. This can be used to link the data to other METABRIC data in the public domain.
  • ### is the ImageNumber. This links the image to columns in processed data files. It is a sequential integer between one and three digits long. Note that this number, assigned based on file order, is not the same across studies so cannot be used to link images from other METABRIC data sets e.g. Ali et al Nat Cancer 2020. Each image corresponds to a core from a tissue microarray (TMA) slide.
  • ImageType. A descriptive label that identifies the type of image.

Notes:

The order of image layers in full stack images corresponds to the markerStackOrder.csv file, which identifies each image layer with its corresponding isotope and epitope.

Masks are grayscale images where each discrete region is identified by a set of contiguous pixels associated with a single integer value. These tend to be sequential from the top to the bottom of the image (this is why a mask appears as a gradation of gray and white when opened in an image viewer). The processed single cell data ‘ObjectNumber’ column corresponds to whole cell masks, where the integer values of each cell maps to ‘ObjectNumber’, allowing for marker values and other features to be mapped to images.

 

Two processed data files:

SingleCells.csv where each row represents a cell, and columns are data associated with each cell. Each observation is uniquely identified by the combination of ImageNumber and ObjectNumber. These data have already been spillover corrected.

CellNeighbours.csv where each row represents a cell-cell interaction. The data are in graph format, with columns labelled ‘from’ and ‘to’ meaning from an index cell to a neighbouring cell (despite this convention, the data are undirected); the integers within these columns map to ObjectNumber in SingleCells.csv.

Note: The convention for `is_` variables in processed data files is that 0 is FALSE and 1 TRUE.

 

Two column annotation files:

Two corresponding annotation files that contain details on the content of each column in processed tabular files are also provided, they are SingleCellsAnnotation.xlsx and CellNeighboursAnnotation.xlsx

Other files are annotation and processed data files required by the code in the Code directory; they can be ignored unless you plan to rerun analyses.

Code and reproducibility

Analysis code and corresponding processed data are also provided in the directory.  The code was run within a conda environment, details of which are provided in the file CondaEnv.yml.  Processed metadata from the METABRIC study are among the files provided.  It is, however, recommended that additional analyses that rely on METABRIC metadata, use data downloaded from their original publications or a public repository as these data are subject to updates, and the user may wish to process them differently.

Code is separated by figures.  The code must be run in the order figures appear in the paper, as later code relies on derived files created earlier.  The code must also be run within the directory as relative paths rely on its structure.

Files

MBTMEStrIMCPublic.zip

Files (6.7 GB)

Name Size Download all
md5:992d04caf3cefcca7bfc5bb64813297f
6.7 GB Preview Download