How to Build an Image-Processing Pipeline for Automating Multiparameter Histocytometry Analysis

Until relatively recently, analysis of imaging data has been primarily quantitative and limited to 3-4 markers. The advancement of various technologies overcoming this marker limitation provided the capability of analyzing multi-parameter imaging data down to the single cell level, termed histocytometry. Currently, most published end-to-end histocytometric analysis of imaging data is performed using expensive commercial programs or freely available analysis packages that require significant knowledge of programming languages for execution. Here we present a protocol that performs cell segmentation, pheno-typing and spatial analysis, using software with easy-to-use GUIs (graphical user interfaces). These protocols allow the user to derive spatial and pheno-typical data for the analysis of multiparameter microscopic images from most imaging platforms in a low-cost manner. © 2022 Wiley Periodicals LLC.


INTRODUCTION
Analysis of multiparameter microscopic images in modern research often employs the use of histocytometry to derive statistical analyses of single cell data in a spatial context. The benefit of this technique over manual annotation and characterization of cells is a high degree of automation/throughput, significantly decreased user bias, and increased reproducibility. To achieve histocytometric analysis of images, the following aspects are required: Segmentation of cells within the image, phenotyping of these segmented cells, and analysis of the parameters of the phenotypes (including spatial relationships) This basic protocol is ideal for users with images of non-complex samples that do not have tightly-clustered cells such as Cytospin preparations, or sections from tissues with relatively simple cellular composition (e.g., tissue from lungs or muscles as opposed to spleen or retinal tissues).

STRATEGIC PLANNING
This protocol is broken up in three broad components: cell segmentation, phenotyping, and spatial analyses.
Of note, large image file sizes lead to longer processing times; utilizing a sufficiently powerful computer to avoid computer crashes is highly recommended. In addition to improving processing speeds, with larger RAM capabilities and higher frequency CPUs the use of multicore CPUs also results in more images being processed simultaneously and therefore reduced overall processing time for an image dataset.

Images
Faithful segmentation of cells in this protocol is primarily dependent on thresholding of the cell nuclei channel signal into a representative cell mask. Images need to be of high resolution with sufficient signal-to-noise ratio to accurately define true marker expression, further details on this are stated in the Critical Parameters section of this protocol.
A list of commonly used image formats generated by microscopes that are compatible with histocytometric analysis are outlined in Table 1.
As a result of the use of Bio-Formats (Linkert et al., 2010) by CellProfiler, most formats will be supported. Pyramidal image formats such as .svs and .vsi are not directly compatible but could potentially be converted into TIFFs (standard .tiff for <4 GB images, or .btf for images larger than that), although the .btf format has not been tested with this pipeline.
For selection of positive marker signal cutoffs, in both the cell mask and cell phenotyping stages, we recommend the use of the following controls: unstained, single stained and Munoz-Erazo et al.

of 26
Current Protocols where appropriate, fluorescence minus one controls. For further details on these types of controls, see Roederer (2001).
Regarding image filenames and filepaths, it is important to avoid using special characters (e.g., # -$ %) and use underscores instead of spaces. This also extends to image and object names used within the modules of the programs used in this protocol. Failure to do so can result in program errors.

Software
The Basic Protocols make use of the following three software programs, each undertaking a key step of the histocytometry pipeline: • Cell Segmentation: CellProfiler (McQuin et al., 2018) was used for cell segmentation and .csv file generation as it is a flexible toolbox; workflows are built as pipelines which are comprised of modules. Each module is an image processing package dedicated to a specific role, and in a pipeline each subsequent module processes the output of the previous module to produce the desired output. Detailed protocols for using CellProfiler can be found in Dobson et al. (2021).
• Phenotyping: A program capable of analyzing .fcs files, usually a program designed for analyzing flow cytometry data (phenotyping). There are various suitable commercial options available, such as FCS Express and CytoBank. For the purposes of this protocol, we are demonstrating this section with FlowJo (Becton Dickinson).
• Spatial analyses CytoMAP (Stoltzfus et al., 2020) for performing various spatial analyses; the cell-to-cell distance function is employed in this protocol.
Both CellProfiler and CytoMAP are freeware programs; however, for the ease of demonstrating this workflow, a commercial .fcs analysis program is used. If the user is unable to access commercial .fcs analysis programs, freeware capable of processing and analyzing .fcs files are suitable; however, there may be slight adjustments to the protocol at the phenotyping stages. HistoCAT (Schapiro et al., 2017), a program designed for phenotyping and spatial analysis of imaging mass cytometry data is a potential alternative, although a protocol incorporating that software differs significantly from the one presented here.
All programs used are compatible with the Microsoft Windows and Apple Mac operating systems (Table 2); however, CytoMAP is only available for Windows if using the standalone version. If the user has access to MATLAB software on a Mac computer, they can run CytoMAP through that.

CELL SEGMENTATION AND GENERATION OF HISTOCYTOMETRIC .CSV FILE
This section deals with the processing of raw multichannel image files, segmentation of the nuclei and derivation of the cell mask as well as the generation of the histocytometric datafile, which contain the normalized intensity values of all of the markers for every successfully segmented cell.

Materials
In order to demonstrate this workflow and provide benchmarks for readers, we have provided 6-In order to demonstrate this workflow and provide benchmarks for readers, we have provided 6-channel microscopy images of murine lung sections. These are located in the folder "Lung_images_fullstain" within the zipped folder "Basic_Protocol_Files" located in https:// doi.org/ 10.5281/ zenodo.5781462. Staining of the sections followed a previously described protocol (Schmidt et al., 2019); additional details of the sample preparation and image acquisition can be found in the "Methods and Materials" word document located in the aforementioned Zenodo link. Download and install the latest version of CellProfiler that is compatible with your system at https:// cellprofiler.org/ releases/ . This protocol uses CellProfiler 4.0.3 but most of the steps should be applicable to previous versions of CellProfiler.
2. Ensure that that the "Images" module has been selected in the pipeline window ( Fig. 2) and drag the image files or file folders into the relevant region of the workspace (Fig. 2, red outline). "Show files excluded by filters" should be ticked by default; tick the box if it is not.
3. Next, we will go straight to the "Names and Types" module. Even if the image stacks are greyscale images, the "image type" selection needs to be set to [Color image] as Figure 2 The CellProfiler workspace. The pipeline window is circled in green, the action buttons in blue, and the workspace in red. As you go through the modules in the pipeline, the options and settings of that module will be displayed in the workspace. The action buttons are used to add and remove modules to the pipeline, as well as to rearrange the order. To add modules, the <+> button in the action button window will bring a pop-up submenu (highlighted in yellow). Modules can be added by selecting/highlighting the relevant module and clicking the <+ Add to Pipeline> button.

Figure 3
The "Names and types" module once all the images have been loaded as described in step 3 of Basic Protocol 1.

Figure 4
The "ColorToGray" module, with all the channels we will use for the protocol and the assigned names. seen in Figure 3 for the subsequent module to work. We will assign {Images} as the image name. Click on the <Update> button to ensure the images are compatible with the module settings.
4. To delineate the multichannel images into their respective channels, add the module "ColorToGray" by using the <+> button next to the "Adjust modules" in the action button window (Fig. 2, blue outline), and click on the module to add it to the pipeline. In the popup submenu, you can find this module under the "Image Processing" module category. Alternatively, you can use the search function in the submenu.
5. In the "Select input images", chose the {Images} designation we previously assigned. Select [Split] and [Channels] for the "Conversion method" and "Image type", respectively. The example image stacks we are processing have 6 channels, however the last one is not a marker channel (it is a differential interference contrast image).
In the relevant sections, input the channel number and assign the names as seen in 6. Next, add the "IdentifyPrimaryObjects" module (under the "Object Processing" module category). In this protocol, advanced settings will not be used (Fig. 5). Select the input image as {Nuclei_DRAQ}, which corresponds to the nuclei marker channel. Name the primary objects to be identified as {Nuclei}. The ideal [Min,Max] settings for the "Typical diameters of objects" will depend on the pixel size of your nuclei. To assess input values, select <Start Test Mode> in the action button window (Fig. 2, in blue), then select the <Run> button. A pop-up window will be generated if the eye icon next to the module is open (as so: ). You will not need to exit the test mode while the pipeline is still being constructed and refined.
7. In the "IdentifyPrimaryObject" pop-up window (Fig. 6), click on the magnifying glass icon to magnify the image as required. Using the measurement tool (to the right of the magnifying glass icon), determine the [Min,Max] settings for the nuclei by clicking and holding from one side of the nuclei, and dragging across the other side. In Figure 6, we can see that the diameter of the measured cell (thin dark blue line in the bottom left image) is more than 11 pixels (highlighted in red). With the example lung images, a range of 6-15 pixels effectively captures most of the nuclei, so we'll continue with those settings in the "IdentifyPrimaryObject" module.
8. The {Nuclei} objects generated in the previous set will be used as the seed to generate the cell masks using the "IdentifySecondaryObjects" (also found in the under the "Object Processing" module category). As our images do not have a cell marker that covers all the cells of interest, we will build our cell mask via an iterative process.
To do this, we will start by selecting {CD3_AF488} as the input image as seen in Figure 7. Select {Nuclei} as the input object and name the objects to be identified as {CD3_Objects}. Keep the rest of the settings as they are.
9. As with "IdentifyPrimaryObjects", assessment of the success of each "IdentifySec-ondaryObjects" module can be performed by observing the object outlines on the input image as seen in the module's popup window (Fig. 8). However, with this module, nuclei that belong to cells with marker signal are expanded from the original nuclei mask (in green) to the new cell mask (magenta). Nuclei with no signal have no expansion in their nuclei mask (magenta outlines with no inner green outline). By hovering your cursor over the "Input image, cycle #1" in the popup window, you can assess the intensity values and assess the most suitable "Lower bound on Threshold" (Fig. 7, red outline) for that particular marker channel. After assessment, the lower Munoz-Erazo et al.

of 26
Current Protocols Figure 6 In this popup window generated by the "IdentifyPrimaryObject" module, we can see the input image in the top left side, with the generated object masks in the top right side. The outlines of these object masks are overlayed on the original input image and shown in the bottom right. Also seen in this pop-up window is a table of details, which include the number of successfully segmented objects.

Figure 7
Screen shot of the first "IdentifySecondaryObjects" module, with the primary object {Nuclei} selected as the input object. Highlighted in red is the lower bound on threshold, which has been modified from the default value of '0' .
bounds threshold for the "IdentifySecondaryObjects" pertinent to the CD3 channel is changed from 0.00 to 0.02.
10. After generating secondary objects from the CD3 marker channel, additional secondary objects are identified by adding an additional "IdentifySecondaryObjects" Munoz-Erazo et al.

of 26
Current Protocols Figure 8 In the above example of the third "IdentifySecondaryObjects" pop-up window, we see that for a CD64+ cell, the original primary object outline for that cell is in green and has been expanded (in magenta) to accommodate the cell membrane. Other cells that are CD64 negative keep their previous object shape (in magenta).

Figure 9
Screen shot of the final "IdentifySecondaryObjects" module, with the penultimate secondary object selected as the input object.
using the previous secondary objects as seeds using the next highest coverage marker. For B220_BV480 secondary objects, the object seeds are {CD3_objects}. This process is continued with the minimum number of markers required to cover all the cells of interest. To avoid confusion, for each new iterative SecondaryObject generated, it is a good idea to append the new cell marker name to the previous SecondaryObject name. For the second "IdentifySecondaryObjects" module, this means the "Name the objects to be identified" is {CD3_B220_objects}. Continue this iterative process for the CD64_PECF594 and SiglecF_BV421 channels/images. Figure 10 "MeasureObjectIntensity" module, with the Proto_cellmask selected as the object to measure and the marker channels of interested which will be measured within the Proto_cellmask objects.

Munoz-Erazo et al.
The final cumulative secondary object will be called {Proto_cellmask} as seen in Figure 9.
For the subsequent stages of the protocol, we have proceeded with the following lower bounds for the "IdentifySecondaryObjects" modules (Table 3). These will be put into their respective "Lower bounds on threshold" as highlighted in Figure 9 (in red).
11. Next, we need to generate the measurements for the cell masks that will be used to generate the final output file (.csv) using two modules. Both can be found under the "Measurements" module category. The first module we will add is the "MeasureOb-jectIntensity". As seen in Figure 10, you will be asked for which images (translating to marker channels) to measure the intensities of. Although you may not be interested in the intensities of certain markers (e.g., nuclei marker), it is a good idea to include them anyway as it may come in handy later in the analysis workflow. Ensure {Images} is not selected. For the option "Select objects to measure", you need to select [Proto_cellmask].
12. Add the module "MeasureObjectSizeShape", and only select {Proto_cellmask} to be measured, and select [No] for calculating Zernike or advanced features 13. Finally, we will generate the .csv spreadsheet using the module "ExportToSpreadsheet". This can be found in the module category "Data Tools". By default, all of the different types of object intensities and shape measurements will be generated. This is an overwhelming amount of data, so it is better to select the specific measurements that you are interested in. Keep all module window options as they are (refer to Fig. 11) unless specifically stated below: Munoz-Erazo et al.

of 26
Current Protocols Figure 11 "ExportToSpreadsheet" module, displayed with the pop-up selection window generated when the user selects the option to choose what measurements will be exported.
• First ensure that the (Output file location) has been set to the location you want your spreadsheet files exported to, • Select [No] for adding a filename prefix.
• For the option "Select the measurements to export", we will select [Yes] and a new button will pop up ( Fig. 11) labelled: <Press button to select measurements>. Selecting this will generate a popup window called "Select Measurements" and you will see the objects the pipeline has generated.
• Ensure no other objects have been selected, and go into the {Proto_cellmask} submenu. For this pipeline, we will select the following options: Area shape: Area, Center, Eccentricity, MajorAxisLength, MinorAxisLength, Perimeter Intensity: Median intensity Number: Object Once done, click <Ok> in the "Select measurements" sub window • In the main pipeline window, select [no] for the option "Export all measurement types". In the newly generated option [Data to export], choose {Proto_cellmask}.
• Ensure the option <No> is selected for "Use the object name for the file name?". We will instead assign a custom file name: {Lung_ROI_spreadsheet.csv} in the "File name" section as seen in Figure 11.
14. The pipeline is now set up! All that is needed now is to generate the .csv file. First, exit the test mode by clicking <Exit Test Mode> in the in the action button window (Fig. 2, blue outline) and clicking the <Analyze Images> button.
15. You can save the pipeline for later use by selecting File>Save Project As with a custom name such as "Basic_Protocol". If you wish to use this pipeline in the future, once you open CellProfiler, you can drag the pipeline file (which has the filename extension .cpproj) into the pipeline window (Fig. 2

PHENOTYPING OF CELL POPULATIONS
This section will deal with the phenotyping aspect of the pipeline using a flow cytometry analysis software. The user needs to ensure that the program chosen can work with .csv files (either directly or after conversion to .fcs format), and ideally hierarchically annotate the generated gates. If the user does not have access to commercially available software, they can potentially use freeware versions.

Materials
.csv file (see Basic Protocol 1) 1. Open the .csv file generated above (in the 1.1 Segmentation: CellProfiler section of this protocol) with the flow cytometry data analysis software. Certain programs (such as FlowJo) will automatically convert the .csv file into an .fcs file; if your program cannot open .csv files, we suggest a .csv to .fcs conversion option such as flowcore (Hahne et al., 2009). Although only a singular .csv was generated from all of the images, they can be gated into their individual images using the parameter "ImageNumber". For the purposes of setting the phenotyping gates, it is best to use the amalgamated dataset first, establish phenotypes and then gate out the individual images before applying the now-established phenotype gates onto the image gates.
The following analyses will use scatterplots unless otherwise stated.
2. If using FlowJo, drag the {Lung_ROI_spreadsheet.csv} into a new workspace. An .fcs version of the .csv file will be autogenerated in the same file location as the .csv and this is what the program will use.
3. The next step is to ensure the ranges of the gates are set correctly. As all of the images used in this pipeline are 1024 × 1024 pixels wide, the lower and upper limits of {Lung_ROI_spreadsheet.fcs} need to be set to 0,1024 respectively for both the [Area_shape_Center_X] and [Area_shape_Center_Y] if they are not already. Ensure the scaling is also set to "Linear" (as opposed to other scaling options such as "Log").
4. As previously mentioned, the .fcs/.csv file is an amalgamation of all of the images. The images need to be separated out and this can be done by gating; the exact method of doing this will depend on the flow cytometry software you are using. In FlowJo, for {Lung_ROI_spreadsheet.fcs}, gate on each individual image by using a histogram display of the [ImageNumber] parameter on the x-axis. This is easily achieved using a 'range'-type of gating tool as in Figure 12.
By selecting the ROI1 gate, setting the [AreaShape_Center_Y] as the y-axis parameter and the [AreaShape_Center_X] as the x-axis parameter; a scatter plot version of the nuclei channel can be observed. This can be compared with the actual nuclei channel of the respective image and can be used to assess the fidelity of the fcs image gate to the image it was derived from as shown in Figure 13. It is worth noting the fcs image will be inverted along the y-axis when compared to the original image (as a result of CellProfiler's localization of the (0,0) x, y origin point differing from that of the flow cytometry software). Beyond the visual dissonance between the fcs x, y scatterplots and their respective images, this inversion is mainly inconsequential to downstream analyses. However, if the correct orientation is considered critical to the user, it is possible to invert the fcs files relative to the original file, see detailed steps in Support Protocol 2.
5. Because CellProfiler generates marker intensity values as normalized values from a range of 0-1, the only practical scaling option for the marker channels (Parameters 10-14), is the log scale. When log scaling is set, the scatterplot for intensity values may Munoz-Erazo et al.

of 26
Current Protocols Figure 12 Histogram of the parameter 'ImageNumber' , which is useful to gate on and identify the individual images from the .csv file generated in the CellProfiler. appear blank; this is due to the default min,max intensity limits being above 1. From the generated .csv/.fcs file, the upper and lower limits respectively are ∼0.000001 and 1; however, some flow cytometry software may not be capable of scaling down to such a lower limit. As long as scaling allows for clear gating of positive versus background staining, this is not an issue. The data as presented has been set with marker min max intensity values of 0.0001 and 0.1 (due to decimal limits in FlowJo), respectively.
6. If required, your initial gates will include forms of data clean-up, such as spatial gates to remove unsuitable sections of the image or removal of poorly segmented cells using the morphological value channels. With the provided lung files, no such clean-up is required and instead the gating strategy to generate the following cell phenotypes (in bold below) is based on (
7. Using the rectangle gating tool and Figure 14, execute the following gating procedure shown to generate the aforementioned phenotypes.
8. Now that the phenotypes have been identified and gated on in the collective data, we need to apply the phenotype gates onto the individual images. If your flow cytometry software supports gate copying/pasting, do so; otherwise you will need to refer to the gating strategy to gate on each individual image gate. You can check how well the gating strategy has worked by comparing the channel marker image with the respective phenotype gate primarily defined by that marker (as seen in Fig. 15, with respective flipped images as described in Support Protocol 2).

of 26
Current Protocols 9. Finally, to proceed for spatial analyses on the images, you will need to export the gates as .csv files. The exact method again will be dependent on the program you are using. Ensure that the phenotyping gating and naming structure is consistent between the samples you will compare. This is important for the subsequent program (CytoMAP) to recognize that the same population exists across all of the input samples. In FlowJo, the best approach is to first export the individual ROIs as .fcs files as follows: i. Select all of the ROI gates (ROI1-10), right-click and select the option "Export/Concatenate Populations" ii. Ensure that the <Export> button is selected, the "Format" selected is [FCS3], and that the "Destination" is the same folder as that {Lung_ROI_spreadsheet.fcs} is in. iii. Under "Advanced Options", remove any characters in the "Prefix" field. Select [Custom] for "Body" and click <Edit>. A popup window will be generated (Custom File Naming), and in the "Selected Keywords" section, ensure only {FJ_LAST_UNIQUE_POP_NAME} is present, then click <OK>, which will close the popup. iv. Click <Export>, which will generate another pop-up (Exporting-Please Wait ix. This time, ensure the "Selected Keywords" are {$FIL} and {FJ_LAST_UNIQUE_POP_NAME}. Click <OK> in the popup, and click <Export> in the main export/concatenate main window. Table 4.

SPATIAL RELATIONSHIP ANALYSES OF PHENOTYPED POPULATIONS
This part of the protocol describes a method for analyzing the distances of {B cells}, {T cells} and {Other Macrophages/Monocytes} to {Alveolar Macrophages}, as well as a method for determining which phenotype is more spatially associated with Alveolar Macrophages overall.

Materials
Download and install the latest version of CytoMAP (you can download relevant files that can be executed via MATLAB or download the stand-alone version under the 'Install' subsection of the following link: https:// gitlab.com/ gernerlab/ cytomap/ -/blob/ master/ README.md) .csv phenotype files that were generated in Basic Protocol 2 1. Open CytoMAP. Once the program is running, you should get a menu window like the one shown in Figure 16.
2. The first thing to do will be to upload the .csv file gates generated with the flow cytometry software in the previous procedure. You need to collectively upload all the gates for each ROI as defined by Table 4. To do this, select from the main CytoMAP window: File>Load Table of Cells. Once you do, you will get the popup window seen in Figure 17.
3. Here you will select the channels in your image that correspond to the respective x, y, and z channels. As our images do not have a z-axis (they are 2D images), there is an option to select for creating a pseudo z-axis channel, which is the default selection. The x channel in our workflow will be labelled "Ch_AreaShape_Center_X" and the y channel is "Ch_AreaShape_Center_Y". Afterwards, you should get another popup window as seen in Figure 18.
4. As all our images are from the same experimental condition, we will not need to do anything with the "sample annotation". Ensure a consecutive numeration for the "sample number" and that the sample name is "ROI_x", where 'x' corresponds to the number of the ROI image. Once ready, press <Load> to load your data.
To ensure this has worked successfully, back on the main window, select the button <New Figure>. You should get a window like Figure 19, although to view the table seen in that figure, you'll need to click on the button within this RSZ window <Show table>.
As you upload the gates for each image, the RSZ figure window will not update with the most current data, so you will need to open a new figure once all the image gates have been uploaded to see them all.
As you can see on the right of Figure 19, the cells of the selected ROI are displayed as a scatterplot.
5. Next, we will generate the measurements of All Cells to Alveolar Macrophages. To do this, in the main window (Fig. 16), under the title of "Cells", select the button <Calculate Distances>. You will then get a popup window that will allow you to select the phenotypes you would like to measure distances to (Fig. 20). We will select only "All/Eosinophils" and select for all the samples, and then select the <Ok> button.

of 26
Current Protocols Figure 16 Main window of the CytoMAP program.
6. Once this procedure has been performed, you will need to open a new RSZ figure to observe the changes. Once you have opened a new window, by selecting "Dist to all Eosinophils" in the dropdown menu of the [C-axis], you can observe a color-coded display of the distances in the scatterplot (Fig. 21, left-hand side). To visually compare the distance to Eosinophils between phenotypes, generate a boxplot by selecting the menu option in the main window Extension>Bar_and_Violin_plots.m and select the phenotypes of interest as well as selecting the samples/ROIs of interest. You will get a display like the one seen in (Fig. 21, right-hand side).

of 26
Current Protocols Figure 17 Pop-up window for defining spatial channels in uploaded .csv files.

Figure 18
The window for uploading image gate .csv files in CytoMAP.
7. If you want to perform statistics on the distances, you need to first export the data.
To do this File>Export Data Table for Prism. You will get a window like the one seen in Figure 22. Select all of the relevant phenotypes and all of the samples, and hit <Ok>. The files will be exported as .csv files, and can now be analyzed for statistical analyses with the relevant software.

of 26
Current Protocols

Figure 19
The RSZ Figure window display generated by CytoMAP once the files have been uploaded.

Figure 20
Window for calculating cell-to-cell distances.

NUCLEI SEGMENTATION ACCURACY TEST
Cell segmentation is the most critical aspect of histocytometry; it is the first step in the image analysis pipeline and provides the backbone of the subsequent steps. The following support protocol provides an analysis pipeline to evaluate the accuracy of the segmentation performed.
This protocol employs the use of CellProfiler and the addition of four additional modules to the Basic Protocols. This protocol evaluates the accuracy of settings used for nuclei segmentation in the "IdentifyPrimaryObjects" in the Basic Protocol 1. We use the Rand index (McQuin et al., 2018), a statistical test to identify the similarity between a set of clusters. The result values range from 0 to 1, where a value of 0 signifies no similarity and 1 is a perfect match. For materials, see Basic Protocol 1.
1. Complete Basic Protocol 1 up to step 7.
2. Add a "Threshold Module" (found under "Image Processing" module category). Use the following options: For "Select input image", use {Nuclei_DRAQ} "Name the output image", input {Threshold} "Threshold strategy": [Global] "Threshold method": [Minimum Cross-Entropy] Finally, for the "Lower and upper bounds on threshold", use the values 0.0 and 1.0, respectively. Keep the rest of the settings as default. This module generates a 'Ground Truth' output image that will serve as the benchmark of all nuclei-marker positive signal present given in an input image.
3. Add the module "ConvertImagetoObjects" (Found under "ObjectProcessing" module category). Use the following options within this module: "Input image": {Threshold} "Name the output object": [GroundTruth]" "Convert to Boolean image": <No> "Preserve original labels": <Yes> 4. Add the "MeasureObjectOverlap" module (Found under "Measurement" module category). Use the following options within this module: "Select the objects to be used as ground truth…": {GroundTruth} "Select the objects to be tested for overlap…": {Nuclei} "Calculate earth mover's distance": <No> A Rand index above 0.90 is the minimum value and desired as a lower range of reasonable segmentation.
5. Add the module "ExportSpreadsheet" (under "Data Tools"). Ensure you select an appropriate output folder in the option [Elsewhere] and select "Output file location" and "Sub-folder". Make sure the file path is correct for you to retrieve the information easily.
NOTE: Ideally, your nuclei segmentation should derive a Rand index value between 0.90 and 0.95 to provide confidence of the nuclei segmentation parameters in the "IdentifyPri-maryObjects" module. Values above 0.95 tend to be indicative of segmentation that includes an excessive amount of non-specific objects (e.g., non-nuclei objects), which will affect the quality of the results further down in the pipeline.
A visual demonstration of the segmented nuclei overlayed on top of the Ground Truth image can be seen in Figure 23.

CORRECTING Y-AXIS INVERSION OF HISTOCYTOMETRY DATA RELATIVE TO ORIGINAL IMAGE FILE
As previously mentioned, the Basic Protocols generate inverted histocytometry data along the y-axis relative to the original image. Beyond the visual dissonance s, this inversion is inconsequential to downstream analyses and will not affect cell distribution or cell type fluorescent intensity.
However, correct visualization of a generated histocytometry file may be important for a user, especially in cases where cross-referencing cell phenotypes identified by the histocytometry to specific structures present in the original image (e.g., bronchiole or blood vessels).
The protocol is a complement to Basic Protocol 1 and results in y-axis-correct histocytometry data relative to the original image. For materials, see Basic Protocol 1.
1. Complete Basic Protocol 1 up to step 13.

of 26
Current Protocols 2. Add as many "FlipAndRotate" modules (Found under the "Image Processing" module) as there are channels of interest (five in the case of the lung sample images) after module "ColourToGray" in the pipeline.
3. Each "FlipAndRotate" module will be dedicated to a specific channel. Starting with the module that will process the nuclei/DRAQ channel, select {Nuclei_DRAQ} as the input image. In the "Name the output image" field, enter: Nuclei_DRAQ_FP. In the 'Select method to flip', select [Top to bottom].
5. Using the naming schema shown in the previous step, derive inverted images for the rest of the channels and update the input image field in their respective pre-existing "IdentifySecondaryObject" module 6. Update the "MeasureObjectIntensity" module by deselecting the original 'images to measure' and selecting their {_FP} versions.
7. In a similar manner to the previous step, go to the pre-existing module "Ex-portToSpreadsheet"and update the 'Measurements to export' field. Specifically, go to <Press button to select measurements" and replace the original images with their {_FP} counterparts in 'Protocell_mask>Intensity>MedianIntensity'. Click the <Analyze Images> button.

Background Information
Initially, histocytometry pipelines were created by individuals or research groups using a variety of software that were not initially designed with histocytometry in mind, including Imaris, Excel, and Flowjo (Gerner, Kastenmuller, Ifrim, Kabat, & Germain, 2012). As use of the technique increased, both freeware and commercial programs were developed with varying levels of cost, capability and user-friendliness. The most complete noncommercial histocytometry pipelines are either primarily operated using a programming language such as R or Python (Ali et al., 2020;Schapiro et al., 2021), or presented as a summarized descriptive process in the Materials and Methods section of a journal article (Gerner et al., 2012;Tan-Garcia et al., 2020).
While histocytometry extends analysis beyond what conventional flow cytometry can offer (e.g., spatial analysis), given the longer equivalent sample acquisition and analysis time of histocytometric data; significant investment in capable acquisition platform(s) and processing hardware/software is needed. This is especially true if trying to generate similar numbers of cell/event detection per sample, (requiring equipment such as lightsheet microscopes imaging whole tissue samples over periods over several days). Alternatively, depending on the experimental parameter of interest, focusing on specific regions in the tissue of interest can generate sufficient data without requiring excessively long imaging times.
This protocol paper provides a detailed step-by-step guide on how to quantitatively analyze microscopic images at the single-cell level, define multiple cell phenotypes and perform spatial analyses between these phenotypes. However, given that antibody markerstaining patterns, antibody concentration and tissue architecture can dramatically affect marker intensity, the selection of the intensity min/max cutoffs (both at the cell mask and phenotyping steps) must be identified for each experiment. Given the normalization of the marker intensity signal to a range between 0 to 1, and the limitation of certain flow cytometry programs to adequately display this range, data produced by this protocol has two limitations. Firstly, it is difficult to phenotype cells based on relative expression of a marker (e.g., high expression vs. intermediate expression) and secondly, high-dimensional reduction approaches (e.g., tSNEs) may not produce valid results due to the "compression" of lower intensity signals.
Regarding the generation of a cell mask, our protocol can be tweaked to give specific results for intensity expression in different cell compartments (e.g., cytoplasm and nuclei). However, the resultant histocytometry .csv file would require significant post-processing to generate a useable .fcs file, likely with software such as R.

of 26
Current Protocols

Critical Parameters
Cell segmentation • For the best segmentation results, your imaging data should be derived from samples with ideal staining and acquisition parameters. Images should have sufficient resolution to ensure both that segmentation can computationally occur, and visually the results of the segmentation can be confirmed or refined. This resolution requirement will be dependent on the cell density of the sample, the signal-tonoise ratio of the nuclei marker and the magnification that the image was taken at. A further consideration of minimum image resolution is the identification of cell membrane/cytoplasm borders.
Ideally, the resultant images satisfy the Nyquist sampling criterion during digitalization of the acquired signal and fulfil the Rose criterion for signal-to-noise ratio in all the marker channels (Pawley, 2006).
• Cell nuclei staining ideally needs to be distinct and discrete. This can be challenging, especially in immune infiltrates of tissue or overconfluent cell cultures. Taking images at a higher magnification and or the use of the zstack function on a confocal may improve this situation.
• The dynamic range of the nuclei stain in such a sample may exceed the capability of the imaging platform, resulting in overexposure or underexposure of certain populations. If possible, additional nuclear-specific stains could be used (e.g., staining for histone proteins). Understanding the biology of the cells can also help with strategizing solutions. For example, replicating cells can have larger, dimmer nuclei with diffuse DNA staining compared with their non-replicating counterparts. However, these cells will have strong staining for replicationassociated proteins such as Ki67. As such, creating a merged channel image of Ki67 and DAPI staining for example can give better nuclei image segmentation than DAPI alone.
• Creating an accurate cell mask is a critical step, especially if cytoplasmic markers will be used to phenotype based on relative expression. However, the use of membrane markers or distinct cytoplasmic stains for the sample of interest may not always be available. In these cases, expansion of the nuclei mask can be a solution to create a cell mask. The expansion distance selected needs to balance the requirement of spatially overlapping the target cell body minimizing the possibility of spatially overlapping neighboring cell bodies. This is especially of concern with images that contain cells with filamentous cell bodies or dendritic protrusions.
• Support Protocol 1 can be used in two different manners; as either a standalone protocol or incorporated into the Basic Protocol 1 pipeline. If performing the latter, to save computational time it is recommended to evaluate the Rand scores within the Test Mode of CellProfiler and then to subsequently disable (untick) the Support Protocol 1-specific modules prior to executing the pipeline in the "Analyze Images" mode.

Cell phenotyping
• As with flow cytometry, cell phenotyping requires a strong understanding of the biology of your sample for accurate classification. This consideration also extends to the panel selection of antibodies and/or marker of interest; they should be as specific and accurate as possible.
• The possibility of bias must be kept in mind to avoid discarding novel and unexpected results. Conversely, unusual findings may be a result of imperfect sample preparation and/or poor cell segmentation. This can potentially be identified by checking the spatial distribution of the cells in question: Are the located a region where various phenotypes mingle, or are they a region with a mainly homogenous phenotype composition? If they are the former, the cell masks of the cells in question may be covering more than one cell.
• Given the diversity of imaging methods and marker staining procedures, it is also critical to understand their respective intricacies in order to interpret and troubleshoot issues with your results. For example, if you derive images from samples prepared with cyclic IHC staining, imperfect stripping of an antibody against CD3 prior to a CD20 staining step can result in false double-positive staining.

Spatial analysis
• As with all other types of analyses, meaningful results from spatial analysis is entirely dependent on all of the previous steps required to generate the histocytometry data (the computer science adage "Garbage In: Garbage Out" applies here). Poorly processed samples, bad panel selection, inappropriate segmentation and phenotyping settings can all result in false spatial relationships, as well as missing potential real relationships.
• Especially pertinent with histocytometric data derived from tissue/organ sections the orientation of the section plane to regions of interest is highly critical. The compartments where Munoz-Erazo et al.

of 26
Current Protocols Return to original CellProfiler pipeline, and ensure the channel marker names are correctly assigned to the channel number in the "ColorToGray" module.
the cells of interest may lie need to be in the same plane as the sectioning plane.
• The preparation of the sample is important, sample preparation techniques that affect morphology will affect spatial distances. This is especially true in multi-tissue sections where different tissues have different mechanical properties; sample preparations that are optimal for one type of tissue may result in structural artifacts of another such as squashing or cracking. This is turn will artifactually increase or decrease spatial relationships. Table 5 shows causes and suggested solutions to problems that are typically encountered.

Understanding Results
The CellProfiler pipeline file (.cpproj), FlowJo workspace file (.wsp), CytoMAP MATLAB worksheet file (.mat), as well as associated .csv/.fcs files generated with the supplied lung images and following Basic Protocols 1-3; are located at https:// doi.org/ 10. 5281/ zenodo.5781462. As previously mentioned, these images are provided for evaluation of the protocol and are derived from a single murine lung specimen. This consequence of this is that no statistically meaningful relationship can be derived from this dataset beyond this individual mouse.
With that in mind, the selection of marker cutoffs for both the cell segmentation and phenotyping stages was decided from a Munoz-Erazo et al.

of 26
Current Protocols combination of the single stain controls and unstained controls (provided in the folder "Lung_images_singlestains"). Unlike flow cytometry where processing methods result in a cell suspension, imaging retains all manner of non-cellular components such as collagen fibers and lipid accumulations. These can be significant sources of both autofluorescence and non-specific staining, complicating the process of selecting appropriate positive marker signal intensity cutoffs. In our provided images, this is especially notable in the CD3-AF488 channel. As a result, visual confirmation of the performance of marker cutoffs by comparing against the original images is required in addition to the traditional histogram/scatterplot comparison to ensure that the most appropriate intensity cutoffs have been selected for the dataset.
This can be seen in the zebraplot examining CD64+ cells Figure 14, where there appears to be a small CD64+ population that was incorrectly excluded. In reality, inclusion of this population collectively results in more false positive populations than real events as determined by visual confirmation against the original image.
While we have only showcased the Cy-toMAP function "Distance to other cells", there is a plethora of spatial analysis options present in CytoMAP for analysis. However, it is important to have a specific type of spatial analysis in mind before commencing an experiment, instead of testing out various spatial analyses until a statistically significant result is obtained; this will result in type I statistical errors. Furthermore, spatial relationships require biological significance to have relevance; with this dataset taken from the lungs of a naïve mouse we cannot infer whether the higher spatial association of alveolar macrophages to B cells relative to monocytes/interstitial macrophages has any biological relevance.

Basic Protocol
The analysis can take between 1 to 4 hr, depending on the tissue nature, complexity of the panel, and amount of image to be analyzed.

Support protocols
Segmentation test: Can be completed in 0.5 to 1.5 hr depending on nature of the tissue and the number of images to be analyzed.
Visualization correction protocol: Can be completed in 0.5 hr as a separated analysis pipeline of incorporated in the basic protocol analysis pipeline.