BAR-Seq clonal tracking of gene-edited cells

Gene editing by engineered nucleases has revolutionized the field of gene therapy by enabling targeted and precise modification of the genome. However, the limited availability of methods for clonal tracking of edited cells has resulted in a paucity of information on the diversity, abundance and behavior of engineered clones. Here we detail the wet laboratory and bioinformatic BAR-Seq pipeline, a strategy for clonal tracking of cells harboring homology-directed targeted integration of a barcoding cassette. We present the BAR-Seq web application, an online, freely available and easy-to-use software that allows performing clonal tracking analyses on raw sequencing data without any computational resources or advanced bioinformatic skills. BAR-Seq can be applied to most editing strategies, and we describe its use to investigate the clonal dynamics of human edited hematopoietic stem/progenitor cells in xenotransplanted hosts. Notably, BAR-Seq may be applied in both basic and translational research contexts to investigate the biology of edited cells and stringently compare editing protocols at a clonal level. Our BAR-Seq pipeline allows library preparation and validation in a few days and clonal analyses of edited cell populations in 1 week. In this protocol, barcodes are introduced into cells via homology-directed targeted integration, and clones are tracked in xenotransplanted hosts by high-throughput sequencing. The results can be analyzed using a freely available online program.


Introduction
Viral vectors are widely exploited to transfer genetic information into cells of interest. Gene therapy takes advantage of viral vectors to introduce therapeutic transgenes in patients' cells and, therefore, holds great promise for the treatment of several diseases 1 . The semi-random genomic integration of some viral vectors, such as gamma-retroviral and lentiviral vectors (LVs), enables univocal and permanent marking of genetically modified cells and their progeny, thus identifying cell clones that can be tracked over time and through space by means of vector integration sites 2 . Clonal tracking of genetically modified cells in preclinical and clinical studies has expanded knowledge on the safety and effectiveness of gene therapy, as well as giving remarkable insights on target cell biology and differentiation 3-7 . In recent years, gene editing has broadened the scope and means of genetic manipulation by allowing precise integration in pre-selected safe genomic loci or in situ functional correction of a mutant gene 8 . Engineered nucleases, such as CRISPR/Cas, are used to induce a DNA double-strand break (DSB) at the locus of interest, which can then be repaired by either the non-homologous end joining (NHEJ) or the homology-directed repair (HDR) cellular machinery (Fig. 1a). The NHEJ pathway rejoins the free DNA ends of the break while often introducing small base insertions or deletions (indels), thus leaving a permanent genetic scar at the edited locus. Conversely, the highfidelity HDR pathway can repair the DSB using a DNA template bearing homology to the target site.
Clonal tracking of edited cells can provide relevant information on the complexity and dynamics of edited cell clones, thus expanding the characterization of the engineered cell product and guiding its development and optimization in view of prospective clinical translation. Genomic scars introduced by NHEJ might provide a surrogate clonal tracker in applications aiming at gene disruption, albeit underpowered by the recurring generation of a few dominant indels. On the other hand, gene correction or targeted integration strategies based on HDR lack such a possibility, because all editing events will have the same sequence. Recently, we developed a barcoding-based strategy (BAR-Seq) that allows clonal tracking of edited cells by means of unique molecular identifiers (barcodes, BARs) embedded in the DNA template for HDR 9 . Molecular barcoding is a well-established and successful method for lineage and cell tracking 7,10,11 . BAR-Seq is a versatile and portable 'three-step' clonal tracking pipeline (Fig. 1b) based on: 1 Generation of barcoded HDR template libraries, 2 Editing of the locus of interest in the selected target cell population (here, hematopoietic stem/ progenitor cells, HSPCs) using the editing nuclease of choice and the barcoded template library, 3 Deep sequencing of BAR-Seq amplicons from the edited cells or their progeny at different times after treatment and bioinformatic analyses for the retrieval of BAR sequences and their abundances.
Here we describe the BAR-Seq workflow applied to human HSPC gene editing and report a robust experimental and bioinformatic pipeline to assess the clonal composition of edited cells. BAR-Seq enables (i) characterization of the in vivo repopulation capacity of gene-edited human HSPCs transplanted in murine recipients, (ii) validation of improved editing protocols and (iii) identification of experimental conditions preserving a broader clonal repertoire of edited cells in recipient hematopoiesis.
In addition, we provide a flexible, freely available and user-friendly web application (http://www. bioinfotiget.it/barseq), which eases and speeds up clonal tracking analyses of gene-edited cells from raw fastq sequencing data. Of note, this web application and its bioinformatic pipeline are also suitable for clonal tracking analyses based on viral barcoded vectors, thus providing a useful tool for viral or gene-editing clonal tracking by researchers with limited programming experience.
Applications of the method BAR-Seq can conceivably be applied to any cell type of interest from any eukaryotic species. Templated sequence editing will be more efficient in actively cycling cells, because HDR is restricted to the S/G2 phase of the cell cycle 12 . Quiescent and slowly cycling cells, such as HSPCs, are less permissive to HDR 13 . We did not observe any detrimental consequence of introducing BAR in the HDR template, neither on HDR efficiency nor on the cellular response to the editing procedure 9 .
BAR-Seq can be applied to bulk ex vivo cultured cells and used to track engraftment and lineage output upon transplantation. The high sensitivity of the barcoding-based platform allows assessment of clonal abundance and distribution even within rare sorted cell subpopulations, such as hematopoietic progenitors.   In stage 1, the BAR-Seq plasmid library is generated as a suitable template for HDR. The BAR sequence can be embedded downstream of a corrective cassette in the HDR template. In this specific scheme, the BAR is included downstream of a reporter cassette (brown arrow: promoter; green box: transgene cassette). The BAR-Seq plasmid library can be used as a transfer vector for viral library preparation. In stage 2, target cells are edited by delivering the nuclease and the BAR-Seq HDR template library developed in stage 1. In stage 3, BARs are retrieved from edited cells (either cultured, from peripheral blood, or from organs) by deep sequencing of the edited alleles. Clonal tracking analyses are performed by applying the BAR-Seq bioinformatic pipeline. AAV, adenoassociated vector; IDLV, integrase-defective lentiviral vector.

NATURE PROTOCOLS
BAR-Seq is expected to be compatible with any engineered nuclease, including zinc finger nucleases (ZFNs), transcriptional activator-like effector nucleases and CRISPR/Cas variants. In our studies, we successfully used ZFN (S.F. and D.C., unpublished observations) and CRISPR/Cas9 9 as nucleases.
Barcodes can be embedded in HDR templates designed to target virtually any genomic locus, both in somatic 9 and sex-linked chromosomes (S.F. and D.C., unpublished data), although some considerations should be made in the former case (see 'Limitations'). BAR-Seq could be adapted to different delivery vehicles of HDR templates. Therefore, plasmid, double-stranded (ds) DNA, singlestranded (ss) DNA, integrase-defective LV (IDLV) and adeno-associated vector (AAV) can be used for BAR-Seq experiments. In our studies, we successfully used AAV serotype 6 (AAV6) 9 , which has been shown to efficiently transduce hematopoietic cells 14 ,and VSV.G pseudotyped IDLV (S.F. and D. C.,unpublished data). Barcodes can conceivably be included in any part of the HDR template, either in transcribed or non-transcribed regions. The use of transcribed barcodes (eBAR-Seq) would allow powerful combination of single-cell transcriptomic studies with lineage tracking information, as previously reported in other studies based on barcoded LVs 15 .
Finally, the BAR-Seq bioinformatic pipeline also identifies the most abundant ('dominant') cell clones in the HDR-edited population, thus focusing the analyses on shared clones across samples more robustly contributing to the in vitro outgrowth or to host repopulation. In our experimental settings, this approach allowed for study of the clonal behavior of long-term repopulating HSPC clones and removed the background signal derived from short-lived cells providing limited output. Of note, the frequency of dominant HSPC clones long term after transplant in murine recipients, as calculated by BAR-Seq, fitted well with estimates in limiting-dilution repopulation assays 16,17 , thus validating our clonal tracking pipeline. Of note, clonal analysis of the most abundant clones may find its application beyond hematopoiesis, e.g., to study the dynamics and the biology of tissue regeneration (e.g., skin, liver and brain) upon in vivo gene editing or transplantation of ex vivo gene-edited stem/progenitor cells, or to track clonal expansion of barcoded gene-edited cancer cells.

Comparison with other methods
In the gene-editing field, fluorescent reporter genes (e.g., GFP) have been extensively used to provide a surrogate and easy readout to track and quantify bona fide HDR-edited cells and their progeny 14,18,19 . However, this approach does not allow for discrimination among different edited clones. BAR-Seq enables clonal tracking of HDR-edited cells at single-cell resolution, even in the absence of reporter-expressing cassettes, by offering the possibility to investigate proliferation, differentiation, self-renewal and long-term maintenance of HDR-edited cell clones, as well as to stringently compare different editing protocols and reagents.
Genomic scars (i.e., indels) introduced by NHEJ at the nuclease target site can be exploited as markers of clonality and for lineage tracking 20,21 by measuring indel diversity. Of note, clonal diversity might be underestimated by biased insertion/deletion of nucleotides 22 or restoration of the wildtype sequence. Furthermore, NHEJ or microhomology-mediated 'large' deletions might result in dropout of cell clones because of the loss of primer binding sites on the target region 23 . Finally, this approach cannot interrogate HDR-edited cells.

Experimental design
Design and production of the barcoded plasmid library The conventional (non-barcoded) template for HDR is composed of a transgene or therapeutic cassette framed by two sequences bearing homologies for the intended target site (Fig. 2a). The maximal length of the HDR template is dependent on the cargo capacity of the vector used for its delivery, which is~4.7 and 9 kb for AAV and IDLV, respectively. Protocols describing the optimal configuration of a conventional HDR template have been previously published [24][25][26] . The nonbarcoded template for HDR can be designed and purchased by gene synthesis services and then subcloned into the appropriate backbone (any plasmid suitable for transfection, e.g., pAAV-MCS or pCCL-LV transfer plasmids) depending on the vehicle chosen for template delivery. Here we highlight the key points for the generation of plasmid libraries carrying the BAR-Seq HDR template.
First, we advise generating the conventional HDR template in a backbone suitable for the final purpose. For AAV, the whole HDR template must be cloned between inverted terminal repeats (ITRs), conventionally derived from AAV2. Alternatively, the HDR template can be cloned in a thirdgeneration self-inactivating transfer construct suitable for IDLV production or in any plasmid DNA Step 1-47 Step 1-47 suitable for transfection. In any case, two unique restriction sites generating incompatible ends (such as SphI and Bsu36I) spaced by >10 nt should be included at the intended site for BAR cloning to avoid BAR concatemers and facilitate the generation of the BAR-Seq HDR template library. BAR must be cloned either inside (transcribed eBAR) or outside (genomic BAR) an expressed cassette and always between homology arms to ensure its incorporation into the genome upon HDR (Fig. 2a). BAR length and consensus sequence must be carefully evaluated to reach an adequate library complexity (i.e., number of unique molecular identifiers) and minimize biases in BAR structure. BAR-Seq requires forward and reverse primers binding the upstream and downstream BAR cloning site. To exclusively amplify the on-target BARs, one of the two primers must bind the genomic region outside the homology arm ('In-Out' PCR approach) (Fig. 2b).
The maximal theoretical complexity of the library having length x can be calculated as 4 x , assuming that all positions can include all nucleotides. More generally, the complexity is calculated as C = ∏D i , where D i is the number of allowed nucleotides in position i, and the effective BAR length is log 4 (C). The decision about BAR length strictly depends on the expected clonality of the cell population of interest. Indeed, the longer the BAR is, the more complex the library will be and the lower the probability of tagging more than one HDR-edited cell with the same BAR 2,27 . To minimize the chance of having two individual cell clones tagged with the same BAR, we recommend an effective BAR length >12 nt and library complexity >10 3 -fold higher than the expected complexity of the HDR-edited cell population. The risk of having two cells sharing the same barcode (cell collision) follows a binomial distribution. However, this distribution can be approximated by a Poisson distribution with λ = n/b, where λ is the inverse of the aforementioned threshold (10 −3 ), when the number of cells (n) and BAR (b) are sufficiently high, as in this protocol and other barcoding techniques. Therefore, the probability of uniquely barcoded cells is P Poisson (0; λ), which is >99% already when λ = 10 −2 , corresponding to a 1:100 proportion between the number of cells and BARs (where P Poisson is the probability mass function of the Poisson distribution) (Supplementary Table 1). Previous works indicate that a 1:100 proportion 28 or even a 1:50 proportion 11 is sufficient to minimize cell-collision events. Because the final library complexity is necessarily lower than the theoretical one due to bottlenecks in library preparation, we propose being more stringent in the recommended proportion between the number of cells and BARs (λ = 10 −3 , instead of λ = 10 −2 ).
In the BAR structure, some positions may be fixed or limited to few bases to avoid generation of BARs carrying the cloning restriction sites, which would be preferentially ligated, and thus overrepresented, in the library because of shorter length 27 . To account for these requirements, we designed the BAR consensus sequence as 5′-NNYNNNTNTNNNNRTNDNNNHH-3′, having a maximal theoretical complexity of~2.899 × 10 10 (instead of~1.759 × 10 23 with Ns only). Of note, the real complexity of the library will be lower than the theoretical one because of bottlenecks in cloning and plasmid/viral preparation. With our protocol, we can easily obtain a final library complexity from 3 × 10 5 to 8 × 10 5 unique BARs with homogeneous representation. Processing of BARs involves a step of collapse of sequences closer than a given edit distance to recover BARs generated by PCR and sequencing errors, thus avoiding them being recognized as independent clones. However, this process may erroneously collapse two BARs identifying different cells (BAR collision). The library complexity may help in identifying an upper bound on the number of cells that are needed to generate collisions. The distribution of edit distances in a random set of sequences is modeled after a Gumbel distribution, because it is more appropriate to model differences in sequencing data when Levenshtein distance is applied 29,30 (whereas a binomial distribution is appropriate when Hamming distance is adopted). We estimated Gumbel distribution parameters μ and β by simulating a pool of 10 4 random sequences with length ranging from 5 to 32 nt and derived the probability of collisions from the cumulative distribution function at d ≤ 3. The number of sequences, hence the number of edited cells, needed to generate a collision is estimated as in the 'birthday attack' problem 31 by the following equation: where C is the library complexity, and p is the computed probability. Values are tabulated in Supplementary Table 2 for d∈{1, 2, 3}.
To prepare the BAR-Seq plasmid library, a synthesized ss oligonucleotide (ssODN) library embedding the BAR sequence is amplified by a few PCR cycles to generate the complementary strand, digested with the appropriate restriction enzymes and subcloned as an insert in the non-barcoded plasmid (Fig. 2c). In the ssODN, we suggest including stuffer sequences flanking the restriction sites to verify successful digestion of the amplified product. Plasmid amplification is performed by ultraefficient chemical transformation in recombinase-negative Escherichia coli, which are plated on LB agar plates. In parallel, transformation of an equimolar amount of dephosphorylated digested backbone ligated in the absence of the barcoded insert should be performed as a control. The number of colonies in the control plate should be 10 4 -fold lower than the one in the other plates to ensure that <1 out of 10 4 HDR-edited clones is untraceable in the cell population. An estimation of the total number of colonies provides a useful indication of the library complexity. Colonies are collected and mixed by scraping, and bacteria are grown in LB at 30°C to minimize recombinogenic events. The BAR-Seq plasmid library is purified from the bacteria outgrowth. Of note, several parameters during BAR-Seq library cloning might affect the number of unique BARs and, therefore, can be modified to obtain final libraries with higher complexity depending on experimental needs. In particular, the number of PCR cycles on the ssODN library can be decreased to improve library diversity. The amount of ligated plasmid, the number of ligation reactions and the number of transformed E. coli can be scaled up to achieve BAR-Seq libraries of higher complexity. Electrocompetent rather than chemically competent recombinase-negative cells can be used to further increase final library complexity. High-throughput sequencing of the BAR-Seq plasmid library is highly recommended to assess its diversity (i.e., high complexity and equal representation of the different BARs) before moving to the next steps of the protocol (see also Box 1 and 'BAR-Seq bioinformatic analysis').
Alternatively, generation of the BAR-Seq plasmid library could also be achieved by Gibson assembly 32 , although inserts >100 bp should be used to maximize its efficiency. Other detailed protocols have been proposed for the generation of barcoded plasmid libraries 2 .

Barcoded viral library production, purification and titration
The choice of the delivery vehicle for the HDR template is pivotal to maximize efficiency and tolerability of gene editing, and strictly depends on the target cell type and the specific application. Non-viral delivery methods, such as injection, lipofection or nucleofection of plasmids, ssDNA or dsDNA 33,34 and viral vector transduction, such as with IDLVs 35 or AAVs 26 , can be used to deliver HDR templates into mammalian cells. In our studies, highly efficient gene editing in human HSPCs was achieved by AAV6, which is the most efficient AAV serotype for HSPC transduction 14 . BAR-Seq libraries are also compatible with other serotypes depending on the cell types of interest. Briefly, AAVs are produced in HEK293 adherent mammalian cells by co-transfection of two or three plasmids containing (i) the AAV genes (i.e., rep and cap), (ii) the essential adenoviral genes VA, E2A and E4, and (iii) the AAV genome with a maximal size of 4.7 kb framed by ITRs required for genome replication and encapsidation 36 . Plasmids required for AAV production are commercially available (e.g., the AAV genome by Agilent Technologies (cat. no. 240071) and the pDGM plasmid for rep-cap and helper expression by Addgene/Russell's laboratory (Addgene number 110660 for AAV2/6)). A detailed protocol for AAV production, purification and titration and further indications about AAV6 production and their use in ex vivo gene-editing experiments, have been previously published 26,37,38 . Alternatively, custom AAVs can be produced by specialized companies. In general, the experimental workflow and the reagents for production and purification of viral libraries are identical to those of conventional viral vectors. The number of transfected cells during viral library preparation is the most critical parameter to avoid significant loss of library complexity compared to the plasmid library; as a rule of thumb, transfection of 1.1 × 10 9 HEK293 cells suffices for the production of an AAV library starting from a plasmid library with 10 5 -10 6 unique BARs 38 . Scaling up of AAV production may be necessary for more complex libraries. In any case, diversity assessment of the BAR-Seq viral library by transduction of highly permissive cell lines is highly recommended before moving on to gene-editing experiments (see also Box 2 and 'BAR-Seq bioinformatic analysis').

Gene-editing procedure
The gene-editing protocol varies according to the application and the target cell type. Although BAR-Seq could also be applied to in vivo gene editing, its most straightforward application is in the context of ex vivo gene editing for clonal tracking of HDR-edited cells capable of host engraftment. A protocol for the design of CRISPR/Cas gene-editing strategies has been previously published 39 . Here we briefly report our optimized gene-editing procedure for human HSPCs as described in ref. 9 and suitable for BAR-Seq analyses. HSPCs can be collected from different donor sources, such as cord blood (CB), mobilized peripheral blood or bone marrow, upon informed consent and in compliance with protocols approved by the relevant institutional review boards. Human HSPCs from these sources can also be purchased from different sellers (e.g., Lonza and STEMCELL Technologies). To favor HDR and maintain long-term repopulating potential, HSPCs are stimulated in culture with early acting cytokines (SCF, FLT3L, TPO and IL6) in the presence of the stem-cell preserving compounds StemRegenin 1 (SR1) 40 and UM171 41,42 . SR1 and UM171 allow more robust hematopoietic output from edited CB HSPCs in hematochimeric mouse models and moderately increase clonality of short-term engrafting progenitors 9 . After 3 d of stimulation, CRISPR/Cas nucleases are delivered by nucleofection as ribonucleoproteins (RNPs) composed of the purified Cas protein and the single guide RNA synthesized with chemical modifications to stabilize its structure and avoid innate cellular responses affecting cell biology 9,19 . Alternatively, ZFNs, transcriptional activator-like ! CAUTION The cell lines used in your research should be regularly checked to ensure that they are authentic and are not infected with mycoplasma.
Procedure • Timing 4-5 d ! CAUTION This procedure must be performed in a sterile hood. c CRITICAL Cell lines highly permissive to transduction with the viral vector of interest (e.g., K-562) should be used to assess the diversity of the BAR-Seq library.
1 Transduce 1-10 × 10 6 K-562 cells with an appropriate volume of the BAR-Seq viral library from Step 48. We advise transducing cells at high multiplicity of infection (MOI = 100 transducing units/ml for IDLV; MOI = 10 4 for AAV). Incubate transduced cells for 24 h at 37°C in a 5% CO 2 and 20% O 2 humidified atmosphere. c CRITICAL STEP The number of transduced cells and the MOI are critical parameters to exhaustively sequence the BAR-Seq viral library and may vary according to the expected library complexity and diversity. To be conservative, the values given above are suitable for libraries with expected complexity <10 6 . 2 Collect transduced cells in a suitable sterile Falcon tube, add 10 volumes of DPBS and pellet them using a 5810R centrifuge (1,100 rpm at room temperature for 10 min).
j PAUSE POINT The cell pellet can be stored at −80°C for >1 year.
3 Extract the genomic DNA using the DNeasy blood and tissue kit according to the manufacturer's instructions. Use a NanoDrop spectrophotometer to quantify DNA concentration. 4 (Optional) Set up digestion reactions of the DNA from the previous step with the methylation-sensitive restriction enzyme DpnI to cleave and eliminate residual plasmid contaminants. effector nucleases and CRISPR/Cas can be delivered as HPLC-purified mRNA 19,[42][43][44] . Nucleofection can also be exploited to co-deliver barcoded HDR templates, such as dsDNA 45 or ssDNA 46 , or editing enhancers 9 . When using AAV6 for the delivery of a barcoded HDR template, HSPCs are transduced immediately after nucleofection 14 . Alternative protocols for AAV6-or IDLV-based gene editing in HSPCs were previously described 9,26,47 . Human edited HSPCs can then be transplanted by tail-vein injection in immunodeficient mice (NSG or NSGW41) to evaluate their long-term repopulating and self-renewal potential. We suggest transplanting the same number of culture-initiating HSPCs per mouse (i.e., the outgrowth of the same number of starting cells at the beginning of the culture) across experimental conditions, rather than the same number of HSPCs after editing. This procedure allows for stringent comparison of the impact of different editing treatments on HSPC repopulation capacity. Of note, transplantation of a high number of culture-initiating HSPCs per mouse may lead to saturation of the hematopoietic niche 9 and may camouflage differences in HSPC reconstitution capacity across experimental conditions. We suggest transplanting 1.0-1.5 × 10 5 and <5 × 10 5 culture-initiating cells/mouse when editing CB or mobilized peripheral blood HSPCs, respectively.
Quantification of the editing efficiency and phenotypic characterization of the edited cell population (either ex vivo or in vivo) might be highly relevant to complement and interpret BAR-Seq data. If reporter genes (e.g., NGFR and GFP) are embedded in the barcoded HDR template, the percentage of reporterexpressing cells can be used as a readout of the fraction of cells harboring integration. For more reliable quantification, it is advisable to measure reporter expression in the treated cells after several days of culture, when multiple rounds of proliferation have diluted any residual episomal HDR template, which may otherwise contribute to reporter expression and confound the assessment of integrated copies. To measure HDR efficiency at the molecular level, we perform digital droplet PCR (ddPCR)-based assays that quantify the copies of edited alleles and those of a reference unedited gene 42,45 . We advise designing the ddPCR amplicon following an 'In-Out' PCR approach to specifically amplify the donor-genome junction (either 5′ or 3′) upon on-target HDR-mediated integration (Fig. 2b). Although these assays must be designed and optimized for each editing strategy, their high sensitivity and precision allows reliable quantification of HDR editing events. Notably, the ddPCR amplicon can be designed to overlap with the BAR-Seq amplicon, whenever possible. If clonal tracking studies are also extended to NHEJ-edited cells, accurate quantification of the indel frequency is directly provided by targeted next-generation sequencing of the nuclease target site 48 . Alternatively, a mismatch-sensitive endonuclease assay or Sanger sequencing followed by deconvolution analysis (e.g., tracking of indels by decomposition; TIDE) 49 may be performed to assess the overall nuclease cutting efficiency. Independently of the method used for the assessment of editing efficiency, we recommend including adequate controls in the experimental design (i.e., HDR donor only and untreated cells) and performing HDR-editing analyses >3 d after the editing procedure, if applicable, to minimize any confounding effect due to the presence of an episomal HDR template (e.g., PCR jumping).

BAR-Seq amplicon design and library preparation
BARs are retrieved from the BAR-Seq plasmid/viral library, when assessing its complexity and diversity, or from genomic DNA (gDNA) of edited cells.
In the first case, BAR sequences can be extracted by PCR amplification using primers flanking the BAR. To ensure an adequate coverage of the original library, the number of sequencing reads should be set to ≥10-fold higher than the expected library complexity, as estimated by the total number of bacterial colonies counted. Amplicons can be sequenced by single-end Illumina MiSeq (MiSeq Reagent Kit v3), NextSeq or HiSeq platforms, depending on the required number of reads. Consecutive sequencing rounds of the same library can be performed to increase sequencing depth when an insufficient number of reads has been obtained.
In the second case, the number of cells used for the analysis depends on the expected complexity of the population of interest and the percentage of HDR-edited cells: the higher the expected population complexity, the higher the number of HDR-edited cells that needs to be harvested and analyzed by BAR-Seq to exhaustively investigate cell population complexity. From another perspective, the number of HDR-edited cells analyzed determines the abundance of the rarest BAR (i.e., cell clone) that can be identified. As a rule of thumb, clones representing 0.01% of an HDR-edited cell fraction, whose proportion is 10% of the bulk population, can be recovered only if analyzing ≥100,000 bulk cells. As also indicated in other clonal tracking pipelines 2 , we suggest performing BAR-Seq analysis on a 10-fold higher number of bulk cells to ensure better results. Importantly, sorting of reporterexpressing cells to obtain bona fide HDR-edited cells, if applicable, is not required, because the BAR-Seq amplicon design allows extraction of BARs from a bulk population.
In our work, we performed BAR-Seq analyses on in vitro cultured samples of edited HSPCs, on whole blood samples collected at different times after transplant and on sorted human cell lineages (B cells, myeloid cells, T cells and HSPCs) from hematopoietic organs of reconstituted NSG mice (either primary or secondary recipients) 9 . When analyzing blood samples or sorted cell lineages, it should be noted that the number of HDR-edited cells within the harvested bulk population might sometimes be low because of poor human cell engraftment, limited lineage output or low availability of biological material. The BAR-Seq wet laboratory procedure and bioinformatic analysis successfully extracted BARs from a bulk population comprising as few as 100 HDR-edited cells. However, such a low number of cells may call for caution when interpreting these data because rare clones with <1% abundance may be undetectable. On the basis of our findings on the frequency of repopulating cells 9 , a collection of >10,000 bulk cells from sorted cell lineages should be set as a threshold to obtain robust results even with as low as 1% HDR editing in the human graft.
In any case, equivalent amounts of gDNA should be used among different samples for library preparation to avoid biasing BAR-Seq analysis. BAR sequences can be extracted by PCR amplification. To minimize sequencing errors in the BAR region, we suggest designing asymmetric amplicons with the forward primer (Read 1, R1) binding close to the BAR sequence. Amplicon length can vary on the basis of the position of the BAR in the HDR template. We usually design amplicons spanning a region of 300-400 bp to minimize carryover of primer and primer dimers during amplicon purification. BAR-Seq amplicon preparation is based on two PCR rounds of a maximum 15-20 cycles each. In the first round, the BAR-containing region is amplified with a couple of 'PCR1' primers designed to bind the target site (Supplementary Table 3). Of note, PCR1 primers and amplification conditions must be optimized for each target site of interest, which may introduce variability in amplification efficiency or sensitivity across different loci. In the second round, R1/R2 primer complementary sequences, i5/i7 Illumina indexes and P5/P7 are added to the amplicon by nested PCR with the use of 'PCR2' primers listed in Supplementary Table 3. Single-round PCR with only 'PCR2' primers may be considered as an option for BAR extraction, although lower amplification efficiency may occur when performing BAR-Seq on few edited cells. We have multiplexed up to 49 independent samples in the same Illumina sequencing run (for MiSeq: Reagent Kit v3). Higher or lower levels of multiplexing are possible depending on the desired sequencing depth for each sample. During amplicon preparation and sequencing, we suggest including one sample in which no BAR should be retrieved by the BAR-Seq bioinformatic analysis. This additional control may be helpful to evaluate the background sequencing noise, the presence of cross-contamination and the extent of index switching. Sequencing read length may vary according to the position of the BAR within the amplicon. Although in our key reference paper 9 we performed paired-end sequencing (in which we discarded the R2 file because only the R1 reads contained the amplicon), single-end Illumina MiSeq, NextSeq or HiSeq sequencing is sufficient because the bioinformatic pipeline works with only one fastq file for each sample. In fact, the only requirement is that the amplicons containing the BARs are fully contained in the sequencing reads provided to the pipeline. In case the full amplicon is too long to be fully contained in a single Illumina read, paired-end sequencing can be performed so that both reads will cover the BAR sequence. In this case, we suggest performing a read merging using FLASh 50 , which is a software specifically designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of the reads. The resulting longer reads can then be provided to the BAR-Seq pipeline.
Importantly, sequencing depth strictly depends on the number of edited cells analyzed. In agreement with other protocols 2 , we advise sequencing~100 reads for each HDR-edited cell, and we recommend avoiding massive over-or under-sequencing of the samples, which may increase the background noise or may miss some clones, respectively. When applied to clonal tracking of geneedited HSPCs in vivo, between 50,000 and 500,000 reads for each sample are sufficient (considering the average HDR editing efficiency and the advised number of cells to collect). Replicates of sample library preparation and sequencing are typically not necessary but may be relevant when analyzing samples with low input gDNA to minimize sampling issues.

BAR-Seq bioinformatic analysis
The BAR-Seq bioinformatic pipeline is freely available at https://bitbucket.org/bereste/bar-seq. After download and installation of the required software and packages, the pipeline can be executed locally (Fig. 3).
As in most next-generation sequencing reads analyses, the first bioinformatic operations to be performed are quality check, quality filtering and trimming (if needed) of input sequences. As a first goal, the BAR-Seq bioinformatic pipeline extracts BARs from sequencing reads by exploiting the amplicon structure (i.e., the known conserved sequences flanking the BAR (upstream and downstream)). BAR-Seq uses TagDust2 51 to process the input reads and extract BAR sequences, because its Hidden Markov Model for complex sequence structures, which include gaps and partial blocks, provides good flexibility in the detection of anchor sequences (provided as input) used to identify the BAR, even in the presence of sequencing errors in the known amplicon sequences adjacent to the BAR. To balance this flexibility, we imposed an additional filter on the basis of the barcode structure and its length (see 'Design and production of the barcoded plasmid library') to discard all the extracted BARs not satisfying these constraints. Although by default our pipeline relies on TagDust2, any bioinformatic tool able to identify unknown substrings of variable length by exploiting the known adjacent ones, possibly containing mismatches or small insertions/deletions, can be used to accomplish this task. A notable example of such applications is the R package genBaRcode, which allows processing of sequencing reads with many different error-correction approaches and visualization routines 52 . As a final result of this step, the preliminary set of BARs is extracted from the input sequences, and their abundance is computed by counting the number of occurrences of each unique BAR.
Because BARs extracted with TagDust2 could have different lengths due to the model's flexibility, a preliminary filter based on sequence length distribution estimates the most recurring BAR length value and keeps only those BARs having that length, while discarding those that are too long or too short. Moreover, the BAR-Seq pipeline offers the possibility of applying or not additional structural filters. In the former case, the user has the possibility to filter out: (i) sequences having in at least one position a nucleotide with frequency <1%, assuming that a nucleotide with such a low nucleotide abundance is artifactual, or (ii) BARs not respecting the predetermined nucleotide limitations in the structure, which can be specified by the DNA International Union of Pure and Applied Chemistry (IUPAC) notation 53 .
However, the number and the counts of this preliminary set of BARs could be influenced by errors occurring during the sequencing process. Indeed, sequencing errors could produce low-count spurious BARs bearing sequence similarity with the much more abundant ones. To account for this, we developed a graph-based method that identifies and then merges 'ego-networks' for each  Chao1 abundance-based index) by clicking on the 'Check Diversity/Richness' button in the 'Results' page. Alternatively, the same operation can be done manually by downloading the text file from the 'Results' page and uploading it in the 'Check Diversity/Richness' tab. Of note, when analyzing the original plasmid/viral library, the latter BAR-Seq web application function allows estimating the number of uniquely labeled cells that can be tracked with a certain confidence level by using the given library (as shown in the resulting table). Two more tabs are available on the website: 'Help' for more details concerning the use of the different web tools with related examples to test their functionality and 'Contacts' for reporting any issues related to the use of the web application.

Limitations
The efficiency of HDR may limit BAR-Seq application, particularly in slowly cycling or quiescent cells. HDR editing also requires extensive manipulation (DNA DSB and simultaneous delivery of the DNA template), which might cumulatively affect cell survival and proliferation. These limitations, however, pertain to the biology of target cells and their suitability for templated editing and not to the tracking technology per se. Several strategies to enhance HDR editing have been proposed so far 13 . In our study, we found that transient hyperactivation of the E2F pathway and simultaneous dampening of the editing-induced p53-mediated response increase the permissiveness to HDR in long-term repopulating HSPCs and improve the tolerability of the editing procedure 9,45 .
Although the frequency of biallelic HDR targeting is generally low, a fraction of edited cell clones may carry two BARs if targeting somatic chromosomes. Such multiple BAR integrations might have minimal influence on data interpretation when interrogating cell fate and clonal composition 27 . Editing of sex-linked chromosomes in male cells allows more accurate quantification of the clonal composition.
BAR-Seq does not provide information on the dynamics and clonality of unedited or NHEJ-edited cells, which may be present in different proportions in a cell population treated for HDR editing. In our study, we combined the BAR-Seq and CRISPResso2 48 pipelines to comprehensively analyze clonal behavior of HDR-and NHEJ-edited cells 9 . Finally, detection of rare quiescent or short-living cells that provide very limited cell output in transplantation experiments might be challenging with the BAR-Seq pipeline because of their low abundance close to sequencing background noise.
Low sequencing depth may result in an under-sampling of the overall population of cells (and, consequently, of BARs) in the sample, limiting the overall BAR-Seq procedure. In this case, detection of low-abundant BARs might be challenging, especially when analyzing highly polyclonal populations.

Reagents
Barcoded HDR plasmid cloning • HPLC-purified ssODN containing the degenerated BAR sequence, the stuffer sequences and the restriction site as outlined in Fig. 2c (Sigma-Aldrich, or another vendor); an example of the BAR ssODN is provided in Supplementary

NATURE PROTOCOLS
Computational part • The pre-processing of sequences can be performed on a computer running the Linux terminal (also called 'command line interface' or 'shell'), with installed common utilities for performing read quality control and trimming. A common approach is to use FastQC (https://www.bioinformatics.babraham. ac.uk/projects/fastqc/) for read quality control and Trimmomatic (http://www.usadellab.org/cms/?pa ge=trimmomatic) for read filtering and trimming. Details about the installation and requirements of these applications on different operating systems are available in their reference manuals. The required computational resources depend on the number of sequencing reads, even if a commodity laptop is usually enough to perform all the computation. • The BAR-Seq pipeline can be run locally on a Linux computer with Python3 installed and the following pre-requisite packages: numpy, editdistance, network, pandas, matplotlib, logomaker and scipy (as reported in the GitHub repository https://bitbucket.org/bereste/bar-seq). Moreover, the pipeline requires the TagDust2 software installed (http://tagdust.sourceforge.net). Details about the installation and requirements of these applications on different operating systems are available in their reference manuals. The required computational resources depend on the number of sequencing reads, even if a commodity laptop is usually enough to perform all the computation.
• Alternatively, the BAR-Seq pipeline can be run online using a common laptop with internet access (www.bioinfotiget.it/barseq) and no other specific software installed. A detailed description of the web application usage is present on the website. • Example datasets are available as Supplementary Software.
Cas9 RNP preparation ! CAUTION This reagent must be prepared in a sterile hood. To obtain the required amount of gRNA, mix custom Alt-R CRISPR-Cas9 crRNA and tracrRNA in a 1:1 molar ratio, incubate at 95°C for 5 min and cool at room temperature for 10 min. If a single gRNA (purchased from Synthego or another vendor) is used, there is no need for the annealing step. To prepare 25 pmol of RNP complex, add in a new sterile Eppendorf tube 1 µl of DPBS, 0.41 µl of Cas9 protein and then gRNA to a predefined gRNA:Cas9 molar ratio (typically ≥1.5). Incubate at room temperature for 15 min to allow complexing. Add 1 µl of Alt-R Cas9 electroporation enhancer (0.1 nmol). If a single gRNA is used, there is no need to add the electroporation enhancer. We advise preparing the Cas9 RNP complex fresh.
c CRITICAL To maximize gene-editing efficiency, the RNP complex dose and the Cas9: gRNA molar ratio should be optimized by the user in preliminary ad hoc experiments.
P3 mixture supplementation ! CAUTION This reagent must be prepared in a sterile hood. Add one vial of supplement 1 solution (Lonza) to one vial of P3 Primary Cell Nucleofector solution (Lonza). Briefly mix by vortexing. The supplemented solution can be stored at 4°C for a maximum of 3 months. To prepare the antibody cocktail for peripheral blood and spleen phenotyping, pre-mix the indicated antibody volumes and then add the total mixture volume to the sample in a final volume of 50 or 200 μl, respectively (see Procedure). In case more than one sample must be stained, we suggest preparing the antibody cocktail by multiplying the indicated antibody volumes by the number of samples + 2. We recommend that the antibody cocktail be prepared freshly. To prepare the antibody cocktail for bone marrow phenotyping, pre-mix the indicated antibody volumes and then add the total mixture volume to the sample in a final volume of 200 μl (see Procedure). In case more than one sample must be stained, we suggest preparing the antibody cocktail by multiplying the indicated antibody volumes by the number of samples + 2. We advise preparing the antibody cocktail freshly.

BD FACSAria Fusion
For cell sorting, BD FACSAria Fusion should be equipped with four lasers: blue (488 nm), yellow/ green (561 nm), red (640 nm) and violet (405 nm). We advise using an 85-μm nozzle and setting the sheath fluid pressure at 45 psi. We recommend a highly pure sorting modality (four-way purity sorting). The drop delay should be determined by BD FACS Accudrop beads before sorting. Briefly mix and spin, and then amplify on a thermocycler machine with the following settings.
Step 5 Incubate at 37°C for 60 min. 6 Purify the digestion with the MinElute PCR purification kit according to the manufacturer's instructions (elution volume = 11 μl). 7 Equilibrate the High Sensitivity D1000 reagents and ScreenTape at room temperature for 20 min. 8 Vortex High Sensitivity D1000 reagents for 30 s. 9 Separately prepare a 1:10 and 1:100 dilution of the digested and undigested products and the negative control from Step 1 in molecular-grade water; mix by briefly vortexing. 10 To verify successful digestion and specific amplification, load the dilutions on an Agilent 4200

Number of cycles
TapeStation following the manufacturer's instructions. In our experience, partial digestion may occur without a major impact on the next steps.
? TROUBLESHOOTING 11 Use a NanoDrop spectrophotometer to quantify the insert template concentration (expected concentration >50 ng/μl). 12 Set up digestion reactions of the non-barcoded plasmid backbone with restriction enzymes Bsu36I and SphI. Incubate at 37°C for 2 h. The 0× ligation reaction is performed as a negative control (no insert). 20 Gently mix by pipetting, briefly spin and incubate the reactions at 25°C for ≥60 min. 21 Pre-warm at 30°C SOC and LB media and 150-mm LB agar plates supplemented with the appropriate antibiotic. 22 Chill on ice two vials of XL10-Gold ultracompetent cells for 5-10 min. Split each vial in two separate pre-chilled Eppendorf tubes (four tubes total). 23 Add 2 μl of β-mercaptoethanol to each tube containing ultracompetent cells and gently swirl the tubes. c CRITICAL STEP Avoid vortexing XL10-Gold ultracompetent cells. 24 Incubate on ice for 10 min. 25 Add 5 μl of the ligation products to each tube containing ultracompetent cells and gently swirl. c CRITICAL STEP Avoid vortexing XL10-Gold ultracompetent cells. 26 Incubate on ice for 30 min. 27 Heat-pulse the tubes at 42°C for 30 s and then incubate on ice for 2 min. c CRITICAL STEP The duration of the heat-pulse is critical to achieve optimal transformation efficiency. We recommend not overextending or shortening this time. 28 Add 450 μl of prewarmed SOC medium to each tube and incubate them for 60 min at 37°C with shaking at 300 rpm. 29 For each tube containing the bacteria outgrowth from Step 27, prepare 1:1, 1:10 and 1:50 dilutions in prewarmed LB medium (final volume = 300 μl) and plate them in pre-warmed 150-mm LB agar plates. 30 Incubate the plates at 30°C overnight. 31 Estimate the total number of colonies per plate. This step allows identification of: (i) the plasmid/ insert molar ratio that ensures the higher transformation efficiency; (ii) the dilution factor (DF) of bacteria outgrowth, avoiding colonies overcrowding in the plate; (iii) the maximum estimated theoretical complexity of the barcoded library when transforming the whole product of one ligation reaction ('no. of colonies/plate' × 'DF' × 4), and therefore the number of ligation reactions ('M') required to reach the target library complexity; and (iv) the number of XL10-Gold vials ('V') and 150-mm LB agar plates ('P') needed for library production. c CRITICAL STEP We advise counting the number of colonies in a subarea of the plate and then multiplying this number by the area/subarea ratio. The number of colonies may be uncountable at 1:1 and 1:10 dilutions. Furthermore, the estimated number of colonies in the control condition should be ≥104-fold lower than in the other plates to minimize the chance of having clones carrying the non-barcoded template. ? TROUBLESHOOTING 32 Having identified the optimal plasmid/insert molar ratio for ligation, set up 'M' ligation reactions by scaling up calculations from Step 18. c CRITICAL STEP Assess the diversity of the BAR-Seq plasmid library by deep-sequencing the BAR region (Box 1). Proceed to the next steps only if the library complexity and diversity are sufficient to univocally tag cell clones in the target population of interest (as discussed in Experimental design). ? TROUBLESHOOTING 48 (Optional) The BAR-Seq plasmid library is ready to use in gene-editing experiments. Alternatively, the BAR-Seq plasmid library may serve as a transfer plasmid for BAR-Seq viral library production 26,37,38 . c CRITICAL STEP Assess the diversity of the BAR-Seq viral library before moving on to geneediting experiments by following steps provided in Box 2.

? TROUBLESHOOTING
Thawing of CB CD34 + HSPCs • Timing day '0' of the editing procedure: 30 min ! CAUTION Cell culture and the gene-editing procedure must be performed in a sterile hood. c CRITICAL CB CD34 + HSPCs are purchased frozen from Lonza and contain at least 1 × 10 6 total cells per vial. HSPCs must be stored in liquid nitrogen. 49 Determine the number of culture-initiating CB HSPCs required for the experiment. c CRITICAL STEP Transplantation of a high number of culture-initiating HSPCs in sublethally irradiated NSG mice results in the saturation of the hematopoietic niche. Therefore, the number of transplanted culture-initiating HSPCs is critical to potentially uncover differences in the clonality or repopulation capacity of edited cells. In our experience, <1.5 × 10 5 and >3 × 10 5 culture-initiating CB HSPCs/mouse are the non-saturating and saturating number of cells, respectively. c CRITICAL STEP If more than one vial is required, we strongly suggest pooling cells from different donors to reduce inherent variability. 50 Pre-warm the supplemented StemSpan medium (see 'Reagent setup') at 37°C. 51 Thaw CB CD34 + cells by immersing the vial in a 37°C water bath for 5 min.
52 Transfer the cell suspension in a 50-ml sterile Falcon tube. Add 'drop-by-drop' 10 volumes of DMEM and pellet cells using a 5810R centrifuge (1,100 rpm at room temperature for 10 min). 53 Carefully aspirate and discard the supernatant. Quickly and gently resuspend the cell pellet with the pre-warmed StemSpan medium. Seed cells at 5 × 10 5 cells/ml. c CRITICAL STEP This cell concentration favors cell cycling during pre-stimulation while promoting cell maintenance. 54 Add dmPGE 2 to the culture medium (final concentration = 10 μM) and mix well by pipetting. c CRITICAL STEP dmPGE 2 protects CB HSPCs from thawing toxicity and preserves their stemness properties 18 . dmPGE 2 should not be further supplemented in the medium after the editing procedure. 55 Incubate cells for 3 d at 37°C in a 5% CO 2 and 20% O 2 humidified atmosphere. c CRITICAL STEP In our experience, 3-d expansion of HSPCs before the gene-editing procedure allows maximizing HDR efficiency in the long-term repopulating HSPC compartment (ref. 18 and unpublished data).
Gene editing of cultured HSPCs • Timing day '+3' of the editing procedure: 1-2 h 56 Pre-warm the supplemented StemSpan medium at 37°C (see 'Reagent setup'). 57 Count the viable cultured HSPCs with the TC20 automated cell counter by loading <10 μl of a 1:1 (vol/vol) ratio between cell suspension and trypan blue. Collect 1-5 × 10 5 cells in a 1.5-ml sterile Eppendorf tube, add 10 volumes of DPBS and pellet them using a 5430 centrifuge (2,250 rpm at room temperature for 10 min). 58 Carefully aspirate and discard the supernatant. Resuspend the cell pellet with P3 primary solution mixture (see 'Reagent setup') and add 25-50 pmol of the RNP complex to reach a final volume of 20 µl/sample. c CRITICAL STEP The electroporation mixture may be supplemented with mRNA(s) to overexpress proteins of interest, as previously described for editing enhancers 9,45 . For optimal electroporation efficiency and lower cytotoxicity, the volume of the P3 primary solution mixture should not be <70% of the total electroporation volume. The final electroporation volume may be increased up to 25 µl/sample, if required. Be aware that increasing the electroporation volume above 25 µl/sample may cause electroporation failure. 59 Transfer the cell solution to an individual 4D-Nucleofector strip for each condition/mouse and proceed immediately to electroporation using the manufacturer's pre-recorded human CD34 + EO-100 program. 60 Wait 1 min and then add 180 µl of pre-warmed StemSpan medium. Transfer cells to an appropriately sized well plate to reach the final concentration of 1 × 10 6 cells/ml. c CRITICAL STEP Cells belonging to the same experimental condition can be pooled. 61 Incubate cells for 15 min at 37°C in a 5% CO 2 and 20% O 2 humidified atmosphere. 62 Transduce electroporated cells with the BAR-Seq AAV6 library (from Step 48) at a multiplicity of infection (MOI) of 20,000 vector genomes (vg)/cell and gently mix by pipetting. c CRITICAL STEP Keep the virus stock on ice. Do not thaw the same AAV aliquot more than three times, to ensure virus stability and reproducibility in results. 63 Incubate cells for 24 h at 37°C in a 5% CO 2 and 20% O 2 humidified atmosphere. c CRITICAL STEP In our experience, 24 h is the minimum recovery time to avoid loss of cell engraftment after transplantation in NSG mice.
Transplantation of edited HSPCs in immunodeficient NSG mice • Timing day '+4' of the editing procedure: 2 h 64 Irradiate NSG female mice at 190 rad ≥4 h before cell transplantation. c CRITICAL STEP We advise considering at least five mice per experimental group, and the experiment should ideally be repeated twice to allow reliable statistical analysis. Mice should be randomly distributed across the experimental groups. 65 Count the viable edited HSPCs, collect all cells in a new 15-ml sterile Falcon tube, add 10 volumes of DPBS and pellet them using a 5810R centrifuge (1,100 rpm at room temperature for 10 min). 66 Carefully aspirate and discard the supernatant. Resuspend the cell pellet with DPBS to a final volume of 200 µl × (no. of mice to be transplanted/group) + 50 µl excess.
67 Place 200-µl aliquots in individual sterile 1.5-ml Eppendorf tubes (one for each mouse) and keep them on ice until transplantation. Cells should not be kept on ice for >1 h. 68 Immediately proceed to transplant irradiated NSG mice. Perform intravenous tail-vein injection with the previously prepared 200 µl of DPBS containing edited HSPCs using a sterile 0.5-ml insulin syringe with a 29-gauge × 12.7-mm needle. c CRITICAL STEP Do not keep HSPCs on ice for too long, to avoid cytotoxicity. The mouse's tail may be pre-warmed with a red-light lamp to ease injection. In case of a potentially incomplete or failed injection, the mouse should be univocally marked and must be excluded from the experiment in case engraftment failure is confirmed by flow cytometry analyses. ! CAUTION Carefully handle syringes to avoid punctures. 69 Collect the remaining 50 µl of DPBS with edited HSPCs and add supplemented StemSpan medium to reach a final concentration of 5 × 10 5 cells/ml. To avoid excessive dilution of the supplemented StemSpan in DPBS, we suggest pelleting the cells by using a 5430 centrifuge (2,250 rpm at room temperature for 10 min) and carefully removing the supernatant before adding the medium. This is particularly recommended in case the total volume to reach the final cell concentration of 5 × 10 5 cells/ml is <500 µl. 70 Culture HSPCs at 37°C in a 5% CO 2 and 20% O 2 humidified atmosphere for an additional 7 d to proceed with in vitro clonal analysis. c CRITICAL STEP In vitro clonal analysis can be performed after 7 d of culture. We recommend caution when shortening this time of culture because barcoded template dilution may be incomplete, and thus PCR jumping may occur between on-target integrated barcodes and episomal barcoded AAVs. In vitro clonal analysis may also be performed on sorted cell subpopulations. 71 (Optional) Perform further characterization of edited HSPCs in vitro, such as in vitro clonogenic capacity (Box 3), cell phenotype (Box 4) and HDR/NHEJ editing efficiency (Boxes 5 and 6), if relevant for the interpretation of clonality data.

? TROUBLESHOOTING
Sample preparation for BAR-Seq analysis of cultured HSPCs • Timing day '+10' of the editing procedure: 2 h 72 Collect >200,000 cultured HSPCs for each experimental condition, add 10 volumes of DPBS and pellet them using a 5430 centrifuge (2,250 rpm at room temperature for 10 min). 73 Aspirate and discard the supernatant.
j PAUSE POINT The cell pellet may be frozen at −80°C for >1 year. 74 Extract gDNA using the QIAamp DNA micro kit according to the manufacturer's instructions. Use a NanoDrop spectrophotometer to quantify DNA concentration. gDNA samples are now ready for BAR-Seq clonal analysis following Step 104 to the end. j PAUSE POINT gDNA can be kept at 4°C or frozen at −20°C for short-(weeks to a few months) or long-term (>1 year) storage, respectively. Box 3 | Clonogenic in vitro assay of edited HSPCs Extra reagents • MethoCult H4434 Classic (STEMCELL Technologies, cat. no. 04434) Procedure • Timing day '+4' of the editing protocol: 1 h ! CAUTION This procedure must be performed in a sterile hood.
1 Count the viable HSPCs. 2 For each experimental condition, prepare 6 ml of methylcellulose-based medium supplemented with 100 IU/ml penicillin, 100 µg/ml streptomycin and 2% (vol/vol) glutamine in a 15-ml Falcon tube. 3 For each experimental condition, collect 2,400 cells and add them to the supplemented medium from Step 2. Thoroughly vortex and incubate for 10 min at room temperature. 4 For each experimental condition, seed three wells (triplicate) of a P6 well plate with 1.5 ml of the medium from step 3 (~600 cells per well). c CRITICAL STEP Reverse pipetting technique is recommended because of the viscosity of methylcellulose media and to avoid formation of bubbles. We also advise pipetting 1.5 ml by pipetting two times 750 µl using the L-1000XLS+ pipette. 5 Incubate for 14 d at 37°C in a 5% CO 2 and 20% O 2 humidified atmosphere. 6 Count the colonies in each well.  j PAUSE POINT The stained sample may be kept at 4°C for ≤1-2 h.

Phenotypic analysis of peripheral blood samples from transplanted NSG mice •
5 Add 1 µl of 7-AAD to each sample for live/dead staining and briefly mix by vortexing. 6 Incubate for 10 min and then perform flow cytometry analysis with BD FACS Canto II. The full gating strategy is provided in Supplementary  Fig. 1.

NATURE PROTOCOLS
87 Extract gDNA using the QIAamp DNA micro kit according to the manufacturer's instructions. Use a NanoDrop spectrophotometer to quantify DNA concentration. gDNA samples are now ready for BAR-Seq clonal analysis following Step 104 to the end. ! CAUTION This procedure must be performed in a DNA/RNA-free hood. 5 Proceed to droplet formation according to the manufacturer's instructions (Bio-Rad). 6 Seal the plate with a pierceable foil and load the PCR reactions on a thermocycler machine using the following PCR program.
Step Number of cycles  Bio-Rad). c CRITICAL STEP The percentage of alleles carrying HDR-mediated integration can be calculated as follows: (no. of targeted locus + droplets/ no. of TTC5 + droplets) × 100. If targeting a sex chromosome in male cells, the result of the formula must be multiplied by 2 to fit with the TTC5 reference gene used for normalization, which is located in an autosomal chromosome. In this case, the percentage of alleles carrying HDRmediated integration corresponds to the percentage of HDR-edited cells.
Box 6 | Quantification of the NHEJ editing efficiency in in vitro edited HSPCs Extra reagents • T7 endonuclease I (New England Biolabs, cat. no. M0302L) • Custom primers specific for the locus of interest (Sigma-Aldrich, Metabion or another vendor; primers used in ref. 9 for the AAVS1 locus are listed below) NHEJ AAVS1 FW GCCCTGGCCATTGTCACTTT RV GGACTAGAAAGGTGAAGAGCC Procedure • Timing day '+7' of the editing protocol: 5 h c CRITICAL Primers must be specific for the locus of interest and amplify a sequence between 500 and 1,000 bp in length. 1 Set up a PCR mixture as follows:

Component
Amount (each reaction) gDNA (from step 3 of Box 5) 5 0 -100 ng 5× GoTaq reaction buffer 5 μl MgCl 2 (25 mM) 4 μl dNTPs (10 mM 3 Immediately run an annealing program as follows. The annealing step is crucial to generate sequence mismatches. Step Temperature Duration j PAUSE POINT gDNA can be kept at 4°C or frozen at −20°C for short-(weeks to a few months) or long-term (>1 year) storage, respectively.
Sample preparation for cell sorting from hematopoietic organs of transplanted NSG mice • Timing 3 h per 10 animals 88 Eighteen to twenty weeks after transplant, NSG mice may be euthanized according to the approved institution protocol. 89 Collect cells from the spleen by crushing and from posterior legs' bone marrow by flushing.

Spleen c
CRITICAL To minimize formation of clumps, we strongly recommend working on ice and using cold reagents when processing the spleen. 90 Crush the Spl and filter cells using a 40-µm cell strainer with 10-15 ml of cold MACS buffer. 91 Centrifuge the homogenate (1,300 rpm at 4°C for 10 min) and then discard the supernatant. 92 Add 1 ml of cold ACK lysis buffer to the cell pellet, thoroughly vortex and incubate for 5 min at room temperature. 93 Wash with cold MACS buffer by filling the tube to the top, centrifuge (1,300 rpm at 4°C for 10 min) and then discard the supernatant. 94 Resuspend the cell pellet with 5 ml of cold MACS buffer and then filter the cell suspension with a 40-µm cell strainer. Centrifuge (1,300 rpm at 4°C for 10 min) and then discard the supernatant. Proceed to Step 97.
Bone marrow 95 Flush bone marrow from posterior legs using 10-15 ml of cold MACS buffer and a 10-ml syringe with 1-ml needle. 96 Filter the cells using a 40-µm cell strainer, centrifuge (1,300 rpm at 4°C for 10 min) and then discard the supernatant. Proceed to Step 97.
Staining, cell sorting from hematopoietic organs of transplanted NSG mice and sample preparation for BAR-Seq clonal analysis • Timing 5 h per 10 animals 97 Resuspend the cell pellet with 200 µl of cold MACS buffer, add the Fc block anti-mouse (1 µl/ sample) and anti-human (2 µl/sample) and incubate for 10 min at room temperature. 98 Add the dedicated anti-human antibody cocktail for cell lineage characterization (see 'Reagent setup') and incubate for 15 min at 4°C. 99 Wash the cells by filling the tube with DPBS + 2% (vol/vol) FBS and pellet cells using a 5810R centrifuge (1,100 rpm at room temperature for 10 min). Discard the supernatant. j PAUSE POINT The stained sample may be kept at 4°C for ≤1-2 h. 100 Resuspend the cells in the desired volume for cell sorting. Add 2 µl of 7-AAD to each sample for live/dead staining and briefly mix by vortexing. c CRITICAL STEP To facilitate high-speed sorting and to prevent clogging of the nozzle, filter the samples through a 35-μm filter immediately before sorting and dilute them such that at a flow rate of 2.0, an event rate of 5,000/10,000 events/s is not exceeded. 101 Sort cell subpopulations of interest with BD FACSAria Fusion and collect samples in 1.5-ml Eppendorf tubes containing 500 µl of DPBS. The full gating strategy is provided in Supplementary Fig. 1. c CRITICAL STEP We recommend using unstained and single-stained controls to set up compensation. Rainbow beads (SPHERO rainbow calibration particles) should be included to standardize the experiments and have to be run before each acquisition. Box 6 | (continued)  Seal the plate with a pierceable foil, briefly mix and spin. Load the PCR reactions on a thermocycler machine and amplify using the following PCR program.
Step In a vertical laminal-flow hood, carefully add 0.5 μl of primer 'PCR2_Fw#' and 0.5 μl of primer 'PCR2_Rv#' for each well. On a dedicated bench, carefully add 5 μl of PCR product from Step 104 by piercing the foil with the tips loaded on a multichannel pipette. Seal the plate with a new pierceable foil, briefly mix and spin. Load the PCR reactions on a thermocycler machine and amplify using the following PCR program.

Number of cycles
Step Number of cycles

BAR-Seq bioinformatic analyses • Timing 8 h per 49 samples
113 Process reads locally (option A) or online (option B). Option A is preferable for users with advanced bioinformatic skills, a Linux computer with Python3 and the pre-requisite packages specified in the 'Equipment' section. Option B is preferable for users without any specific bioinformatic skill and/or who are using a regular laptop with internet access. (A) Local execution (i) (Optional) Pre-process the input reads (usually in FASTQ format) to check their quality and perform a filter/trim to remove low-quality portions or sequencing adapters (i.e., using FastQC and Trimmonatic), if required. (ii) Extract the barcode sequences from the input reads, on the basis of the amplicon structure, using TagDust2 (Fig. 3). c CRITICAL STEP The percentage of reads from which a valid BAR is extracted should be >80%. ? TROUBLESHOOTING (iii) Perform the filtering of extracted reads on the basis of their length and structure

Troubleshooting
Troubleshooting advice can be found in Table 1.

Anticipated results
The BAR-Seq pipeline is designed to perform clonal tracking analyses of edited cells both in vitro and in vivo, without any limitation due to the locus, donor template or nuclease platform. We easily obtain BAR-Seq plasmid libraries with 3-8 × 10 5 unique BARs, and we observe only minimal skewing in BAR abundance from plasmid to viral library 9 . As discussed in Experimental design, the required library complexity and diversity strictly depend on the expected complexity of the cell population of interest and the desired number of traceable cells. The BAR-Seq web application provides a function that estimates whether the sequenced plasmid/viral library is suitable for clonal tracking of a desired number of cells. Software outputs for good (highly diverse) and bad (poorly diverse) libraries are shown in Fig. 5a. with the considered library. The scatterplots show the frequencies (y-axis) of BAR counts (x-axis) in the library (i.e., the frequency of BARs found at count 1, 2,…, n). Data are charted in log-log scale for improved readability. In a good library preparation, the curve approaches a line (power enrichment), with high steep (small variability among BAR counts) and with no or little dispersion in the right tail (absence of predominant BARs). b, Agilent 4200 traces of two amplicon library preparations.
The experimental procedure described in this protocol allows efficient BAR extraction from gDNA of edited cells. An example of the amplicon library profile is shown in Fig. 5b.
When applied to clonal tracking of edited HSPCs, BAR-Seq uncovered the multilineage and selfrenewing capacity of engrafting HDR-edited HSPCs in human hematochimeric mice, with most clones shared among cell lineages long-term after transplant. Moreover, the human HDR-edited cell graft was composed of a few highly abundant clones. Higher complexity of the clonal repertoire can be achieved by dampening the editing-induced p53-dependent DNA-damage response and enhancing editing efficiency 9 .

Data availability
The BAR-Seq software with some example datasets is provided as a zip file (Supplementary Software) in the Supplementary Information. These datasets, which are part of the study originally described in ref. 9 and available in Gene Expression Omnibus with the accession code GSE144340, have been analyzed with the BAR-Seq computational pipeline to generate the example results presented in Fig. 4.

Code availability
The scripts for BAR-Seq analysis are freely available at https://bitbucket.org/bereste/bar-seq under the terms of the GNU General Public License version 3 (GPLv3). The BAR-Seq webtool is freely available at http://www.bioinfotiget.it/barseq.