There is a newer version of the record available.

Published January 14, 2022 | Version 1.0.0
Dataset Open

Breast tumour microenvironment structures are associated with genomic features and clinical outcome

  • 1. CRUK Cambridge Institute, University of Cambridge, Cambridge, UK
  • 2. Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
  • 3. Department of Histopathology, Addenbrookes Hospital, Cambridge, UK
  • 4. MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
  • 5. Department of Pathology, University of Nottingham, Nottingham, UK
  • 6. British Columbia Cancer Agency, University of British Columbia, Vancouver, Canada

Description

The data comprise three types:

  1. Full stack tiff images that contain multiplexed imaging mass cytometry (IMC) images.
  2. Image masks that identify image regions associated with cells, epithelium and vessels
  3. Processed data that contain measurements taken using the associated images.

 

 

All full stacks and masks are tiff images. Each IMC acquisition (image) is associated with six images in total: the full stack image itself and five image masks (whole cell, nucleus, cytoplasm, tumour and vessel). The naming convention for these images is MB####_###_ImageType.tiff, where:

  • MB#### is the METABRIC identifier. This can be used to link the data to other METABRIC data in the public domain.
  • ### is the ImageNumber. This links the image to columns in processed data files. It is a sequential integer between one and three digits long. Note that this number, assigned based on file order, is not the same across studies so cannot be used to link images from other METABRIC data sets e.g. Ali et al Nat Cancer 2020. Each image corresponds to a core from a tissue microarray (TMA) slide.
  • ImageType. A descriptive label that identifies the type of image.

Notes:

The order of image layers in full stack images corresponds to the markerStackOrder.csv file, which identifies each image layer with its corresponding isotope and epitope.

Masks are grayscale images where each discrete region is identified by a set of contiguous pixels associated with a single integer value. These tend to be sequential from the top to the bottom of the image (this is why a mask appears as a gradation of gray and white when opened in an image viewer). The processed single cell data ‘ObjectNumber’ column corresponds to whole cell masks, where the integer values of each cell maps to ‘ObjectNumber’, allowing for marker values and other features to be mapped to images.

 

 

Two processed data files:

SingleCells.csv where each row represents a cell, and columns are data associated with each cell. Each observation is uniquely identified by the combination of ImageNumber and ObjectNumber. These data have already been spillover corrected. Single cell measurements correspond to 'CellMask' image masks.

CellNeighbours.csv where each row represents a cell-cell interaction. The data are in graph format, with columns labelled ‘from’ and ‘to’ meaning from an index cell to a neighbouring cell (despite this convention, the data are undirected); the integers within these columns map to ObjectNumber in SingleCells.csv.

Notes: The convention for `is_` variables in processed data files is that 0 is FALSE and 1 TRUE.

 

 

Two column annotation files:

Two corresponding annotation files that contain details on the content of each column in processed tabular files are also provided, they are SingleCellsAnnotation.xlsx and CellNeighboursAnnotation.xlsx

Files

METABRICBreastCancerIMCTME.zip

Files (6.2 GB)

Name Size Download all
md5:6bc319787a09df8deacdc6da46620159
6.2 GB Preview Download