ilastik: interactive machine learning for (bio)image analysis

We present ilastik, an easy-to-use interactive tool that brings machine-learning-based (bio)image analysis to end users without substantial computational expertise. It contains pre-defined workflows for image segmentation, object classification, counting and tracking. Users adapt the workflows to the problem at hand by interactively providing sparse training annotations for a nonlinear classifier. ilastik can process data in up to five dimensions (3D, time and number of channels). Its computational back end runs operations on-demand wherever possible, allowing for interactive prediction on data larger than RAM. Once the classifiers are trained, ilastik workflows can be applied to new data from the command line without further user interaction. We describe all ilastik workflows in detail, including three case studies and a discussion on the expected performance. ilastik is an user-friendly interactive tool for machine-learning-based image segmentation, object classification, counting and tracking.

R apid development of imaging technology is bringing more and more life scientists to experimental pipelines where the success of the entire undertaking hinges on the analysis of images. Image segmentation, object tracking and counting are time consuming, tedious and error-prone processes when performed manually. Besides, manual annotation is hard to scale for biological images, since expert annotators are typically required for correct image interpretation (although recent citizen science efforts show that engaged and motivated non-experts can make a substantial impact for selected applications 1,2 ). To answer the challenge, hundreds of methods for automatic and semi-automatic image analysis have been proposed in recent years. The complexity of automatic solutions covers a broad range from simple thresholding to probabilistic graphical models. Correct parametrization and application of these methods pose a new challenge to life science researchers without substantial computer science expertise.
The ilastik toolkit, introduced briefly in ref. 3 , aims to address both the data and the methods/parameters deluge by formulating several popular image analysis problems in the paradigm of interactively supervised machine learning. It is free and open source, and installers for Linux, MacOS and Windows can be found at www. ilastik.org (Box 1).
In ilastik, generic properties ('features') of pixels or objects are computed and passed on to a powerful nonlinear algorithm ('the classifier'), which operates in the feature space. Based on examples of correct class assignment provided by the user, it builds a decision surface in feature space and projects the class assignment back to pixels and objects. In other words, users can parametrize such workflows just by providing the training data for algorithm supervision. Freed from the necessity to understand intricate algorithm details, users can thus steer the analysis by their domain expertise.
Algorithm parametrization through user supervision ('learning from training data') is the defining feature of supervised machine learning. Within this paradigm further subdivisions can be made, most noticeably between methods based on deep learning and on other classifiers; see refs. [4][5][6] for a description of basic principles and modern applications. From the user perspective, the most important difference is that deep learning methods-for image analysis, usually convolutional neural networks-operate directly on the input images, and pixel features are learned implicitly inside the network. This deep learning approach is extremely powerful, as witnessed by its recent success in all image analysis tasks. However, as it needs training data not only to find the decision surface, but also to build a meaningful high-dimensional feature space, very large amounts of training data have to be provided. Our aim in the design of ilastik has been to reach a compromise between the simplicity and speed of training and prediction accuracy. Consequently, ilastik limits the feature space to a set of pre-defined features and only uses the training data to find the decision surface. ilastik can thus be trained from very sparse interactively provided user annotations, and on commonly used PCs.
ilastik provides a convenient user interface and a highly optimized implementation of pixel features that enable fast feedback in training. Users can introduce annotations, labels or training examples very sparsely by simple clicks or brush strokes, exactly at the positions where the classifier is wrong or uncertain. The classifier is then re-trained on a larger training set, including old and new user labels. The results are immediately presented to the user for additional correction. This targeted refinement of the results brings a steep learning curve at a fraction of the time needed for dense groundtruth labeling.
ilastik contains workflows for image segmentation, object classification, counting and tracking. All workflows, along with the corresponding annotation modes, are summarized in Table 1 and Fig. 1. In the following section, each workflow is discussed in greater detail, with case studies demonstrating its use for real-life biological experiments.
ilastik can handle data in up to five dimensions (3D, time and channels), limiting the interactive action to the necessary image context. The computational back-end estimates on the fly which region of the raw data needs to be processed at a given moment. For the pixel classification workflow in particular, only the user field of view has to be classified during interactive training, making the workflow applicable to datasets significantly larger than RAM. Once a sufficiently good classifier has been trained, it can be applied to new data without user supervision (in the so-called batch mode),

ilastik workflows
The ilastik workflows encapsulate well-established machine-learning-based image processing tasks. The underlying idea is to allow for a wide range of applicability and ease of use: no prior knowledge of machine learning is needed to apply the workflows.
Pixel classification. Pixel classification-the most popular workflow in ilastik-produces semantic segmentations of images, that is, it attaches a user-defined class label to each pixel of the image. To configure this workflow, the user needs to define the classes, such as 'nucleus' , 'cytoplasm' or 'background' , and provide examples for each class by painting brushstrokes of different colors directly on the input data (Fig. 1a). For every pixel of the image, ilastik then estimates the probability that the pixel belongs to each of the semantic classes. The resulting probability maps can be used directly for quantitative analysis, or serve as input data for other ilastik workflows.
More formally, it performs classification of pixels using the output of image filters as features and Random Forest 7 as a classifier. Filters include descriptors of pixel color and intensity, edge-ness and texture, in 2D and 3D, and at different scales. Estimators of feature importance help users remove irrelevant features if computation time becomes an issue.
By default, a Random Forest classifier with 100 trees is used. We prefer Random Forest over other nonlinear classifiers because it has very few parameters and has been shown to be robust to their choice. This property and, most importantly, its good generalization performance 8 make Random Forest particularly well suited for training by non-experts. Detailed description of the inner work of a Random Forest is outside the scope of this paper. Geurts et al. 5 provide an excellent starting point for readers interested in technical details. With Random Forest as default, we still provide access to all the classifiers from the scikit-learn Python library 9 and an API (application programming interface) for implementing new ones. In our experience, increasing the number of trees does not bring a performance boost with ilastik features, while decreasing it worsens the classifier generalization ability.
Note that pixel classification workflow performs semantic rather than instance segmentation. In other words, it separates the image into semantic classes (for example, 'foreground versus background'), but not into individual objects. Connected components analysis has to be applied on top of pixel classification results to obtain individual objects by finding connected areas of the foreground classes. Case study 1 (Fig. 2) provides an illustration of these steps. ilastik workflows for object classification and tracking (released after the case study 10 was published) can compute connected components from pixel prediction maps. More powerful post-processing has to be used in case the data contains strongly overlapping objects of the same semantic class. Fiji 11 includes multiple watershed-based plugins for this task, which can be applied to ilastik results through the ilastik Fiji plugin.

Case study 1: Spatial constraints control cell proliferation in tissues.
Streichan et al. 10 investigate the connection between cell proliferation and spatial constraints of the cells. The imaging side of the experiment was performed in vivo, using epithelial model tissue and a confocal spinning disk microscope. Cells at different stages of the cell cycle were detected by nuclei of different color (green for S-G2-M phase, red for G0-G1) produced by a fluorescent ubiquitination-based cell cycle indicator. The pixel classification workflow of ilastik was used to segment red and green nuclei over the course of multiple experiments, as shown in Fig. 2. Outside of ilastik, segmentation of nuclei was expanded into cell segmentation by Voronoi tesselation of the images. Dynamics of the cell area and other cell morphology features were used to test various hypotheses on the nature of cell proliferation control.
Autocontext. This workflow is closely related to pixel classification. It builds on the cascaded classification idea introduced in ref. 12 , and simply performs pixel classification twice. The input to the second stage is formed by attaching the results of the first round as additional channels to the raw data. The features of the second round are thus computed not only on the raw data, but also on the first-round predictions. These features provide spatial semantic context that, at the cost of more computation time and higher RAM consumption, makes the predictions of the second round less noisy, smoother and more consistent.
Object classification. Since pixel-level features are computed from a spherical neighborhood of a pixel, they fail to take into account object-level characteristics, such as shape. Consequently, the pixel classification workflow cannot distinguish locally similar objects. In ilastik, this task is delegated to the object classification workflow. First, objects are extracted by smoothing and thresholding the probability maps produced by pixel classification. Segmentations obtained outside of ilastik can also be introduced at this stage. Features are then computed for each segmented object, including intensity statistics within the object and its neighborhood, as well as convex-hull-and skeleton-based shape descriptors. Advanced users can implement their own feature plugins from a Python template. In addition to their direct use for classification, the per-object descriptors can also be exported for follow-up analysis of morphology. Training of the object-level classifier is achieved by simple clicking on the objects of different classes (Fig. 1d).

NATURe MeThODs
While this workflow is RAM-limited in training, batch processing of very large images can be performed block-wise.
Case study 2: TANGO1 builds a machine for collagen export by recruiting and spatially organizing COPII, tethers and membranes. Raote et al. 13 investigate the function of the TANGO-1 protein involved in collagen export. This study examines spatial organization and interactions of TANGO-1 family proteins, aiming to elucidate the mechanism of collagen export out of the ER. Various protein aggregations were imaged by stimulation emission depleted microscopy. Figure 3a shows some examples of the resulting structures that had to be analyzed. Note that locally these structures are very similar: the main difference between them comes from shape rather than intensities of individual components. This problem is an exemplary-use case for the object classification workflow. Pixel classification is applied to first segment all complexes (Fig. 3b, left). The second step is object classification with morphological object features, which separates the segmented protein complexes into rings, incomplete rings and ring aggregates (Fig. 3b, right). Once the classifiers are trained, they can be applied to unseen data in batch mode (Fig. 3c).
Carving workflow. The carving workflow allows for semi-automatic segmentation of objects based on their boundary information; algorithmic details can be found in refs. 14,15 . Briefly, we start by finding approximate object boundaries by running an edge detector over all pixels of the image volume. The volume is then segmented into supervoxels by the watershed algorithm, with a new supervoxel at each local minimum of the edge map and seeds at all local minima. Supervoxels are grouped into a region adjacency graph. The weights of the graph edges connecting adjacent supervoxels are computed from the boundary prediction in between the supervoxels. To segment an object, the user provides brush stroke seeds (Fig. 1c), while ilastik runs watershed with a background bias on the superpixels.  Brush stroke labels are used to predict which class a pixel belongs to for all pixels (magenta, mitochondria; blue, membranes; black, cytoplasm; red, microtubuli). b, Multicut. Click labels on edges between superpixels (green, false edges; red, true edges) are used to find a non-contradicting set of true edges and the corresponding segmentation. c, Carving. Object (blue) and background (magenta) brush stroke labels are used to segment one object in 3D. d, Object classification. Click labels are used to predict which class an object belongs to (blue or magenta). e, Counting. Clicks for objects and brush strokes for background (magenta) are used to predict how many objects can be found in user-defined rectangular regions and the whole image. f, Tracking. Clicks for dividing (cyan) and non-dividing (magenta) objects, clicks for merged (yellow) and single (blue) objects are used to track dividing objects through time (objects of same lineage are shown in the same color). Data from the FlyEM team (a-c), Daniel Gerlich Lab (d), the Broad Bioimage Benchmark Collection 32 (e) and the Mitocheck project 39 (f). Detailed video tutorials can be found on our YouTube channel (https:// www.youtube.com/playlist?list=PL1RliBnTmcHzQTGogF9fw59rbf1c7hFse).

NATURe MeThODs
For images with clear boundaries, the boundary estimate can be computed directly from the raw image. In less obvious cases, the pixel classification workflow can be run first to detect boundaries, as shown in case study 3. 3D data from electron microscopy presents the ideal-use case for this workflow 16,17 , but other modalities can profit from it as well, as long as the boundaries of the objects of interest are stained 18 . Along with pixel-wise object maps, meshes of objects can be exported for follow-up processing by 3D analysis tools such as Neurmorph 19 .
Case study 3: Increased spatiotemporal resolution reveals highly dynamic dense tubular matrices in the peripheral ER. Nixon-Abell et al. 20 investigate the morphology of peripheral endoplasmic reticulum (ER) by five different super-resolution techniques and focused ion beam-scanning electron microscope (FIB-SEM) imaging. ilastik was used for the challenging task of ER segmentation in the FIB-SEM volume. Heavy-metal staining for electron microscopy gives contrast to all membranes; an additional complication for ER segmentation comes from its propensity to contact other cell organelles. Locally, the ER is not sufficiently different from other ultrastructures to be segmented by pixel classification directly (Fig. 4a). Nixon-Abell et al. chose the semi-automatic approach of carving the ER out of the volume based on boundary information. First, pixel classification workflow was applied to detect the membranes. The membrane prediction served as boundary indicator for the carving workflow, which was run blockwise to improve interactivity. Some of the carving annotations are shown in Fig. 4b. Carving results over multiple blocks were merged and the remaining errors in the complete block were fixed by proof-reading in the Amira software 21 , as it provides an efficient way to inspect large 3D objects. The final 3D reconstruction for the area in Fig. 4a,b is shown in Fig. 4c.
Boundary-based segmentation with multicut. Similar to the carving workflow, the multicut workflow targets the use case of segmenting objects separated by boundaries. However, unlike carving, this workflow segments all objects simultaneously without user-provided seeds or information on the number of objects to segment. Instead of seeds, users provide labels for edges in the initial oversegmentation of the data into superpixels, as shown in Fig. 1b. The superpixel edges are labeled as 'true' when the superpixels belong to different underlying objects and should be kept separate; and 'false' when they belong to the same object and should be merged. Based on these labels, a Random Forest classifier is trained to predict how likely an edge is to be present in the final segmentation. The segmentation problem can then be formulated as partitioning of the superpixel graph into an unknown number of segments (the multicut problem 22 ). In general, finding a proven, globally optimal solution is infeasible for problems of biologically relevant size. Luckily, fast approximate solvers exist and, in our experience, provide good solutions 23 .
This workflow was originally developed for neuron segmentation in electron microscopy image stacks. A detailed description of

NATURe MeThODs
the algorithms behind it can be found in ref. 24 , along with application examples for three popular electron microscopy (EM) segmentation challenges. Potential applications of this workflow are, however, not limited to EM and extend to boundary-based segmentation in any imaging modality.
Counting workflow. The counting workflow addresses the common task of counting overlapping objects. Counting is performed by density rather than by detection, allowing it to accurately count objects that overlap too much to be segmented. The underlying algorithm has been described in ref. 25 . Briefly, user annotations of background (brush strokes) and individual objects (single clicks in object centers) serve as input to a regression Random Forest that estimates the object density in every pixel of the image (Fig. 1e). The resulting density estimate can be integrated over the whole image or rectangular regions of interest to obtain the total number of objects. The counting workflow can only be run in 2D.
Tracking workflow. This workflow performs automatic trackingby-assignment, that is, it tracks multiple pre-detected, potentially dividing, objects through time, in 2D and 3D. The algorithm is based on conservation tracking 26 , where a probabilistic graphical model is constructed for all detected objects at all time points simultaneously. The model takes the following factors into account for each object: how likely it is to be dividing, how likely it is to be a false detection or a merge of several objects, and how well it matches the neighbors in subsequent frames. Following the general ilastik approach, the users provide this information by training one classifier to recognize dividing objects (Fig. 1f, cyan and magenta labels) and another one to find false detections and object merges (Fig. 1f, yellow and blue labels) 27 . Weighted classifier predictions are jointly considered in a global probabilistic graphical model. We provide sensible defaults for the weights, but they can also be learned from data if the user annotates a few short tracklets 28 in the 'tracking with learning' workflow. The maximum a posteriori probability state of the model then represents the best overall assignment of objects to tracks, as found by an integer linear program solver 29 . The resulting assignment and division detections can be exported to multiple formats for post-processing and correction in external tools, such as MaMuT 30 . For long videos, tracking can be performed in parallel using image sequences that overlap in time.

When it works and when it does not
The fundamental assumption of supervised machine learning is that the training data with groundtruth annotations represents the overall variability of data sufficiently well. Changes in imaging conditions, intensity shifts or previously unseen image artefacts can degrade classifier performance in a very substantial manner, even in cases where a human expert would have no problem with continuing manual analysis. It is thus strongly recommended to both optimize the image acquisition process to make the images appear as homogeneous as possible and validate the trained algorithm in different parts of the data (for example, in different time steps or different slices in a stack). The paramount importance of this validation step motivated us to develop the lazy computation back-end of ilastik, which allows users to explore larger-than-RAM datasets interactively. Since the prediction is limited to the user field of view, they can easily train the algorithm in one area of the data and then pan or scroll to another area and verify how well the classifier generalizes. If needed, additional labels can then be provided to improve performance in the new areas. The appropriate amount of training data depends on the difficulty of the classification problem and the heterogeneity of the input data projected to feature space. Since both of these factors are difficult to estimate formally, we usually employ the simple heuristic of adding labels until the classifier predictions stop changing.
Conversely, if the classifier predictions keep changing wildly after a significant number of labels has been added, ilastik features are probably not a good fit for the problem at hand and a more specialized solution needs to be found. Note that, unlike convolutional neural networks, ilastik does not benefit from large amounts of densely labeled training data. A much better strategy is to exploit the interactive nature of ilastik and provide new labels by correcting classifier mistakes. Training applets in all workflows provide a pixelor object-wise estimate of classifier uncertainty. While pixels next to a label transition area will likely remain uncertain, a well-trained classifier should not exhibit high uncertainty in more homogenous parts of the image. Along with classifier mistakes, such areas of high uncertainty are a good target for adding more labels. Finally, it is also important to place labels precisely where they need to be by choosing the appropriate brush width.
Formally, the accuracy of a classifier must be measured on parts of the dataset not seen in training. If additional parameters need to be tuned (such as segmentation thresholds and tracking settings), the tuning needs to be performed on parts of the data that were not used for classifier training. The overall validation should then happen on the data not seen in either step. Since ilastik labels are usually very sparse, classifier performance can be assessed by the visual inspection of its output on unlabeled pixels. For quantitative evaluation, previously unseen part(s) of the data need to be annotated manually and then compared to algorithm results.
To set realistic performance expectations, remember that the algorithm decisions are based on the information it sees through the image features. For the pixel classification workflow and generic

FOCUS | PersPective
NATURe MeThODs features available in ilastik, the context a classifier can consider is limited to a spherical neighborhood around each pixel. The radii of the spheres can range from 1 to 35 pixels, even larger radii can be defined by users. This, however, can make the computation considerably slower. To check if the context is sufficient for the task at hand, zoom into the image until the field of view is limited to 70 pixels. If the class of the central pixel is still identifiable, ilastik will likely be able to handle it.
Similarly, the object classification workflow is limited to features computed from the object and its immediate vicinity. Hand-crafted features must be introduced if higher-level context or top-down biological priors are needed for correct classification (see, for example, spatial correspondence features often used in medical image analysis 37 ). The same consideration is true for the speed of computation: a well-implemented and parameterized pipeline specific for the application at hand will be faster than the generic approach of ilastik.
As for any automated analysis method, the underlying research question itself should not be over-sensitive to algorithm mistakes. For non-trivial image analysis problems, human parity has so far been reached for a few selected benchmarks, with careful training and post-processing by teams of computer vision experts. It is to be expected that, for a difficult problem, a classifier trained in ilastik will make more errors than a human. However, as long as the training data is representative, it will likely be more consistent. For example, it might be harder for the classifier to segment ambiguous areas in the data, but the difficulty will not depend on the classifier's caffeination level or last night's sleep quality. Finally, in cases where the algorithm error rate is too high for its output to be used directly, it often turns out that proof-reading automatic results is faster and less prone to attention errors than running the full analysis manually.

Combining ilastik with other (bio)image analysis tools
The core functionality of ilastik is restricted to interactive machine learning. Multiple other parts of image analysis pipelines have to be configured and executed elsewhere-a non-trivial step for many ilastik users who do not possess the programming expertise to connect the individual tools by scripts. To address this problem, we have developed an ilastik ImageJ plugin, which allows users to import and export data in the ilastik HDF5 format and to run pretrained ilastik workflows directly from Fiji 11 . We have also made this functionality accessible as KNIME nodes 31 and as a CellProfiler 32 plugin 'predict' . ilastik project files are simply HDF5 files and can be manipulated directly from Python code.

other machine-learning-based tools
The wide applicability and excellent generalization properties of machine learning algorithms have been exploited in software other than ilastik. The closest to ilastik is perhaps the Fiji Weka segmentation plugin 33 , which allows for interactive, though RAM-limited, segmentation in 2D and 3D. CellCognition and its Explorer extension 34 use machine learning for phenotyping in high-throughput imaging setups. SuRVoS 35 performs interactive segmentation on superpixels targeting challenging low-contrast and low-signalto-noise images. FastER 36 proposes very fast feature computation for single cell segmentation. Microscopy Image Browser 37 offers multiple pre-processing and region selection options, along with superpixel-based segmentation. Cytomine 38 allows for large-scale web-based collaborative image processing.

Conclusions
Machine learning has been the driving force behind the computer vision revolution of recent years. Besides the raw performance, one of the big advantages of this approach is that the customization of algorithms to a particular dataset happens by providing training data. Unlike the changes to the algorithm implementation or parametrization, training can (and should) be performed by application domain experts. For this, ilastik provides all the necessary ingredients: fast generic features, powerful non-linear classifiers, probabilistic graphical models and solvers, all wrapped into workflows with a convenient user interface for fast interactive training and post-processing of segmentation, tracking and counting algorithms.
In its current version (1.3.3), ilastik does not include an option to train deep convolutional neural networks (CNNs). The main reason for this limitation is the impossibility-with the currently available segmentation networks-to train from scratch using very sparse training annotations, as we do with the 'shallow' classifiers. Reducing requirements to the training data volume is a very active topic in CNN research. We believe that such methods will become available soon and the modular architecture of ilastik will allow us to incorporate them without delay.
Motivated by the success stories of our users, we remain committed to further development of ilastik (Box 2). We envision closer integration with other popular image processing tools, better support of outside developers and, on the methodological side, a userfriendly mix of deep and shallow machine learning.

Box 2 | Contributing to ilastik
The ilastik team is always happy to receive feedback and contributions from outside. ilastik is written in Python with a few performance-critical components in C++; the GUI is based on PyQt. Over the years, the codebase of ilastik has been expanded by a wide range of developers, from temporary student assistants to professional software engineers. At any level of coding expertise, there are ways for you to make ilastik better for everyone: • Share your experience with us and with others, by posting on the forum (https://forum.image.sc/tags/ilastik) or writing directly to the team at team@ilastik.org. • Submit an issue to our issue tracker if ilastik behaves in an unexpected way or if you find important functionality is missing: https://github.com/ilastik/ilastik/issues. • Contribute to the documentation by submitting a pull request to our documentation repository https://github.com/ ilastik/ilastik.github.io. If you documented your steps with ilastik on your own website, blog or protocol paper, send us a link and we will be happy to point to it from the main page. • Contribute to the overall ilastik development at https:// github.com/ilastik. We provide conda packages for all our dependencies. The software architecture is described in the developer documentation at https://www.ilastik.org/development.html. The main issue tracker of ilastik (https:// github.com/ilastik/ilastik/issues) contains a special tag for good first issues to tackle. To get your code included into ilastik, submit a pull request on the corresponding repositories.
Do not hesitate to start the pull request before the code is finalized to receive feedback early and to let us help you with the integration into the rest of the system.

FOCUS | PersPective
NATURe MeThODs