Genome architecture and global gene regulation in bacteria: making progress towards a unified model?

The bacterial nucleoid was first described more than 50 years ago, but the recent application of new imaging technologies and physical analytical methods has brought fresh insights to the structure of the DNA within the nucleoid. Here, Charles Dorman discusses these insights and argues that, in addition to DNA topology and nucleoid-associated proteins, gene regulation is an important organizing principle of nucleoid architecture. Data obtained with advanced imaging techniques, chromosome conformation capture methods, bioinformatics and molecular genetics, together with insights from polymer physics and mechanobiology, are helping to refine our understanding of the spatiotemporal organization of the bacterial nucleoid and its gene expression programmes. Here, I discuss the proposal that, in addition to DNA topology and nucleoid-associated proteins, gene regulation is an important organizing principle of nucleoid architecture.

studied initially from the standpoint of gene regulation, and the impact of DNA supercoiling on transcription has been recognized for decades [14][15][16] . There is an intuitive appeal to the use of nucleoid-structuring features such as NAPs and DNA topology to influence gene expression, because this would integrate the process of gene expression with the very cellular structure within which it takes place. In this Opinion article, I discuss how our appreciation of the relationship between the nucleoid and gene expression has been deepened by recent findings, which suggest that the processes involved in efficient gene regulation themselves represent a nucleoid-structuring principle.

Nucleoid structure and superstructure
The chromosome of the model Gramnegative bacterium Escherichia coli is organized at a nanometre-scale structural level and at a micrometre-scale superstructural level 17 . When considering nucleoid organization, it is helpful to consider the dimensions of the container in which the nucleoid is found. During exponential growth, the E. coli cell measures approximately 2 μm in length by 1 μm in diameter. To put this in context, the lac operon, consisting of the three structural genes lacZ, lacY and lacA (encoding β-galactosidase, lactose permease and galactoside O-acetyltransferase, respectively), measures about 1.7 μm in length based on the number of nucleotides. The single, circular 4,639 kb chromosome of E. coli K-12, which was originally estimated to contain 4,288 protein-coding genes 18 , is itself about 1.5 mm in circumference and, if opened fully, would have a diameter of 0.5 mm, which is 500 times the diameter of an E. coli cell. These dimensions illustrate the extent of the packaging problem associated with the bacterial nucleoid, a problem that is compounded by the need to arrange the DNA in ways that make it not just compact but also readily available for replication, segregation, gene expression and gene regulation.
Landmark genetic experiments published in 2004 by Boccard and colleagues revealed that the E. coli chromosome is composed of six distinguishable zones: four macrodomains (Ori, Left, Right and Ter) and two additional non-structured regions (NS-left and NS-right) 19 (FIG. 1a). This organization imposes certain restrictions on the permitted rearrangements to the linear-order sequence of the chromosome, perhaps even limiting its potential for further evolution in some respects 20 .
The Ori macrodomain consists of the region around the origin of bidirectional chromosomal replication, oriC, and the Ter macrodomain is located at the opposite pole of the chromosome and contains the terminus of DNA replication 19,21 . The Ter macrodomain also contains the dif site, which is crucial for the resolution of chromosome dimers by the site-specific tyrosine recombinases XerC and XerD 22 . The four other regions make up the bulk of the left and right replichores, the two 'arms' of the chromo some along which DNA polymer ase moves during bidirectional chromosome replication 19 (FIG. 1a). In E. coli, oriC is positioned at mid-cell, with the replichores to the left and right of this point and aligned with the long axis of the bacterium. This arrangement is dependent on MukBEF, which fulfils the structural maintenance of chromosomes (SMC) condensin function in E. coli 23 . The Ter macrodomain moves from cell pole to mid-cell in newborn cells and is maintained there through an interaction between the Ter-binding protein MatP and the ZapB component of the cell division apparatus 24 .
The DNA-binding protein MatP has a special relationship with the Ter macrodomain. In contrast to the widespread distribution of SeqA (a negative regulator of replication initiation) owing to the presence of binding sites around much of the chromosome, MatP binds to a sequence motif (matS) that is found uniquely within Ter, a macrodomain from which SeqA is excluded 24 . MatP has an important role in timing the separation of daughter chromosomes during cell division: it can hold two copies of the Ter macrodomain together, preventing premature chromosome segregation 25 . MatP also plays a part in the formation of loops in the Ter macrodomain, by holding copies of the matS sequence together and thus providing a degree of compaction to this macrodomain 26,27 .
There is an interesting distribution of important cell cycle-associated DNA-binding proteins among the macrodomains, including SeqA, the nucleoid occlusion protein SlmA and MatP, and these proteins might contribute to macrodomain organization 21 . The intensively studied protein SeqA is involved in the timing of chromosome replication initiation and binds to hemimethylated 5′-GATC-3′ sites [28][29][30] . These sites are abundant at oriC but are also found elsewhere on the chromosome, although not in Ter 21,31,32 .
SlmA has an important role in coordinating chromosome positioning within the dividing cell; it too has a distribution that corresponds with the macrodomain structure of the chromosome. It binds to a specific DNA sequence that is absent from the Ter macrodomain and infrequently found in the Left and Right domains. Thus, SlmA is mainly located in the Ori macrodomain and the flanking unstructured domains 21,33 . MatP, SeqA and SlmA differ from the NAPs in that they bind either exclusively inside (MatP) or exclusively outside (SeqA and SlmA) the Ter macrodomain; most NAPs do not display such strong macrodomain specificity in their binding preferences 13 . This might imply that the NAPs are involved in chromosome organization at a different level to the macrodomain-specific proteins. This conjecture is supported by several lines of evidence that link at least some NAPs to the maintenance of nucleoid structure at the microdomain level. Also, MatP and SlmA have not (so far) been reported to influence global gene expression, whereas many NAPs have.
Superimposed on its macrodomain structure is the organization of the chromosome at the level of looped DNA microdomains 34 . In the earliest studies of nucleoid structure, the number and size of the looped domains were under-and overestimated, respectively 35,36 . Much more accurate estimates were subsequently obtained using genetic methods that allowed the impact of looped-domain breakage on DNA supercoiling to be measured around the chromosome 37 . These data were combined with assessments of the influence of topological barriers at microdomain boundaries on site-specific recombination efficiency 38 and also a systematic examination of looped domains in electron microscopy images of chromosomes released from lysed cells 37 . It now seems that the chromosome of E. coli is divided into approximately 400 looped microdomains, each with an average circumference of 10-12 kb 37,38 (FIG. 1b). Microdomains might be both transient and predominantly a feature of chromosomes in exponentially growing bacteria 38 . Nucleoid structure seems to be more diffuse in slow-growing bacteria 10 .
Several lines of evidence from physical and genetic studies indicate that NAPs, especially H-NS (histone-like, nucleoid-structuring protein) and Fis (factor-for-inversion stimulation), play a part in forming the boundaries of microdomains, where they act as insulators, or domainins 32 . Further support for H-NS as a domainin has come from super-resolution imaging combined

Glossary
Chromatin immunoprecipitation followed by microarray (ChIP-chip). A method that allows the binding sites for a specific protein to be identified throughout a genome in vivo. The protein of interest is crosslinked to DNA in living bacteria with formaldehyde, and the genomic DNA is extracted and then sonicated to achieve a desired average DNA fragment length. An antibody specific for the protein of interest (or for an epitope tag that has been attached to the protein by genetic engineering) is used to precipitate the protein-DNA complex. The crosslinks are then reversed, the released DNA is fluorescently tagged, and its genomic location is identified using a DNA microarray.
Chromosome conformational capture (3C). A technique that identifies physical interactions between parts of the genome (specifically, interactions that would not be predictable from a survey of the DNA sequence alone). Macromolecules are chemically crosslinked in living cells, and then the DNA is extracted, digested with a restriction enzyme and subjected to intramolecular ligation. PCR is used to detect novel junctions in the ligated DNA, which are predicted to arise from the close proximity of the now-joined sequences in the folded nucleoid. A chromatin immunoprecipitation step can be added to study novel interactions that depend on a specific protein, such as a nucleoid-associated protein.
Chromosome conformational capture carbon copy (5C). A chromosome conformational capture (3C) library is first constructed, and then multiplex primers with universal primer extensions are annealed to the novel junctions in the library and ligated together. The 3C junctions serve as templates to guide the perfect ligation of the primers. These can then be used in microarrays or subjected to high-throughput sequencing to identify the DNA forming the junction.

Dps
(DNA protection during starvation). A nucleoid-associated protein that is expressed in stationary phase cultures (or in cultures experiencing oxidative stress) and is thought to protect the DNA from damage.

Fis
(Factor-for-inversion stimulation). A nucleoid-associated protein that is expressed in early exponential phase cultures, organizes the local DNA topology and modulates transcription.

H-NS
(Histone-like, nucleoid-structuring protein). A nucleoidassociated protein with a preference for binding to AT-rich DNA. H-NS is expressed at all stages of growth, silences the transcription of hundreds of genes and organizes nucleoid structure.

HU
A nucleoid-associated protein with a general DNA-binding and DNA-compacting activity.

IHF
(Integration host factor). A paralogue of HU with site-specific DNA-binding and DNA-bending activity.

Macrodomains
Genetically defined large-scale chromosomal segments that are unlikely to undergo recombination with each other because the resulting rearrangements are detrimental to the survival of the bacterium. The Escherichia coli chromosome has four macrodomains (Ori, Ter, Left and Right) and two so-called non-structured regions (NS-left and NS-right).

Microdomain
A topologically independent 10-12 kb loop that coexists with other microdomains within the macrodomain superstructure of the Escherichia coli genome. There are around 400 microdomain loops in the genome.

Nucleoid-associated proteins
(NAPs). Low-molecular-mass, abundant DNA-binding proteins that are thought to act as architectural components within the nucleoid and to modulate gene expression. Escherichia coli has at least 12 distinct NAPs.

Replichores
The two arms of the circular chromosome along which bidirectional DNA replication occurs. The right (or clockwise) replichore and the left (or anticlockwise) replichore each extend, in opposite directions, from the origin of chromosome replication (oriC in Escherichia coli) within the Ori macrodomain to the terminus of replication within the Ter macrodomain.

Topoisomerase
An enzyme that alters the linking number of the DNA by cutting, strand passage and religation. The imaging data reveal that H-NS forms, on average, two prominent foci within the nucleoid that are consistent with the clustering of many microdomain boundaries 4 (FIG. 1b). The most straightforward interpretation of the imaging data is that H-NS is involved in organizing the DNA in each of the two replichores, which align with the long axis of the E. coli cell, like the H-NS foci (FIG. 1b) 39,40 . It is interesting to note that StpA, a closely related paralogue of H-NS, does not form these large foci but is instead scattered throughout the nucleoid 4 . The significance of these differences in distribution is unknown, but they might reflect a preference on the part of StpA for binding to RNA and/or its preferential interaction with a portion of the H-NS population that is not involved in focus formation. The negatively supercoiled nature of the chromosome itself represents an organizing influence, not least because it  22 . The directions of DNA replication are shown by the black arrows, and these arrows constitute the right (clockwise) and left (anticlockwise) repli chores. The positions of key genes that encode proteins named in the main text are indicated. b | The chromosome is shown as a writhed structure (top), reflecting imaging data which suggest that it adopts a conformation of this type 2,10,64 , at least in rapidly growing bacteria 10 . The thickness of this writhed DNA is indicative of the underlying layers of structure, as indicated below. A portion of the left replichore is illustrated as a solenoid and as a plectoneme, both of periodicity 117 kb. The DNA is next compacted by introducing 10-12 microdomains into each of its 117 kb units. These microdomain circles (each of 10-12 kb) have a diameter of approximately 1.3 μm, giving the nucleoid a cross-section of about 2.6 μm. Supercoiling these small circles compacts them approximately twofold 12 . The nucleoid-associated protein H-NS (histone-like, nucleoidstructuring protein) 4,13,32 is thought to have a core role within the two replichores, as it holds together the ends of the microdomain loops. crp, cyclic AMP receptor protein gene; dps, DNA protection during starvation; fis, factor-for-inversion stimulation; hup, HU subunit gene; ihf, integration host factor subunit; gyr, DNA gyrase subunit; topA, topoisomerase I. contri butes to DNA compaction 12,41 (FIG. 1b).
The bacterial type II topoisomerase DNA gyrase introduces negative supercoiling using the energy of ATP to drive the reaction [41][42][43][44] . In addition, the movement of the polymerases involved in transcription and DNA replication creates local domains of negatively supercoiled and relaxed DNA as the polymerases unwind the duplex 41,45 . DNA supercoiling affects transcription efficiency on several levels and so serves as a further integrating factor influencing gene expression and nucleoid structure [41][42][43][44][45][46][47] . Transcription has been proposed as a nucleoid-structuring principle in its own right, on the basis of observations that genes subject to high rates of transcription gather together to form foci 48 . Among the chromosomal genes reported to form foci are the ribosomal RNA (rrn) operons, and the promoters of these operons are controlled by NAPs, by negative supercoiling of the DNA and by guanosine tetraphosphate, the signalling molecule involved in the stringent response 16,48 . However, how these rrn operons are organized in the absence of high transcription rates is unclear, so it might be premature to conclude that high rates of transcription per se create rrn foci. It has also been reported that plasmids with a constitutively active promoter gather at the cell pole, but in the absence of transcription, these plasmids remain randomly distributed 49 . Although it must be noted that plasmids lie outside the nucleoid, this work does suggest that transcription can lead to gene repositioning and has the potential to influence nucleoid structure. Perhaps it would not be surprising if the need to regulate transcription had the potential to influence nucleoid structure.

Gene regulation then and now
Early gene regulation studies involving model bacterial systems (for example, the lac operon in E. coli) strongly informed opinion about the likely mechanisms used to regulate the other genes in the cell 50 . Although a variety of mechanisms became apparent reasonably quickly, the scene was dominated by the concepts of trans-acting protein-mediated repression or activation of transcription initiation, and the fact that regulatory genes were often located adjacent to the promoters that their protein products controlled. Placing two or more structural genes in an operon facilitates the regulation of these neighbouring genes by a single regulatory protein 51 . Work with the catabolite repressor cyclic AMP receptor protein (Crp) showed that one regulatory protein can affect the expression of a multitude of genes or operons and that these genes or operons can be located at many different chromosomal positions 52,53 (FIG. 2). Insights of this type were central to our understanding of the molecular processes underlying the concept of the 'regulon' , a collective of (usually geographically dispersed) genes under the control of a common regulatory factor 51 . Crp is an abundant protein that binds to a vast number of potential sites in the chromosome, far more than can be accounted for by binding to Crp-dependent promoters alone 54 . This has led to the interesting suggestion that Crp has at least as much in common with NAPs as it does with conventional transcription factors 55 . However, it is also possible that this widespread binding of the abundant Crp protein reflects the need for high-level expression of those proteins that have geographically dispersed targets in the folded chromosome in the nucleoid. Fluorescence in situ hybridization microscopy has been used to monitor the diffusion of labelled mRNA molecules expressed by the groESL operon and the crescentin (creS) gene in Caulobacter crescentus and by the lacZ gene in E. coli 3 . This analysis showed that mRNA translation occurs close to the DNA template. This discovery of de facto compartmentalization in bacteria has important consequences for our view of the efficacy of trans-acting factors in gene regulation, not least because it raises questions about the ability of geographically dispersed genes to communicate with each other. Upregulating groESL expression by heat shocking C. crescentus cells increases the dispersion of groESL mRNA, showing that transcripts from highly expressed genes can migrate from the site of their synthesis. In this context, it is interesting to note that a correlation exists between the level of expression of a regulatory gene and the number of genes that it controls 56 . Thus, one principle driving gene co-location in the nucleoid might be the need to bring regulatory genes within an effective range of their targets for physiologically meaningful regulation to occur 57 . (Perhaps small regulatory RNAs are more diffusible and thus provide a means of 'regulation-at-a-distance' that is more effective than protein-based mechanisms.)

RpoS RpoD RpoS
It has been suggested that the periodic arrangements of genes along a solenoidal (that is, helically wound) chromosome provides a means of facilitating communication between genes and/or their products 58,59 (FIG. 2). A bioinformatic analysis of more than 100 bacterial genomes has identified statistically correlated gene pairs that tend to both co-occur and co-locate 60 . In E. coli, the genes in each gene pair are separated along the chromosome by multiples of 117 kb, leading to the suggestion that much of the chromosome is arranged in a helixlike structure with a 117 kb periodicity that facilitates the close alignment of these genes. Furthermore, these paired genes are associated with the most heavily transcribed regions of the genome 57 . Helical phasing of genes within each replichore would facilitate communication between genes and their products, and the need to accommodate this spatial co-location is likely to represent a strong organizing principle within the architecture of the nucleoid (FIG. 2). Such phasing could be achieved equally well by a solenoidlike structure or by a plectonemic (that is, braid-like) arrangement of the chromosomal DNA 7,61 (FIGS 1b,2).
Signal-processing analysis has been used on a transcriptomic scale to examine the co-expression of clusters of genes along the E. coli chromosome 62,63 . On the basis of data from one study, it has been proposed that there are three levels of transcriptional spatial organization: short range (up to 16 kb), medium range (100-125 kb) and long range (600-800 kb) 62 . It is interesting to note that these correspond in scale to the proposed sizes of, respectively, a chromosomal microdomain (10-12 kb), a helix with 117 kb periodicity and a macrodomain 37,38,[58][59][60]62 . By contrast, another study detected a periodicity of 33 kb in addition to the 117 kb value 63 . These findings illustrate the importance of experimentally testing the in vivo importance of this periodicity. Data obtained from imaging or from 3C or chromosome conformation capture carbon copy (5C) experiments have indicated that the arms of the nucleoid are interwound in Bacillus subtilis 64 , C. crescentus 2 and E. coli 10 . Nucleoidal writhing can be expected to influence gene-gene proximity at a level above that of the periodic solenoidal structure (FIG. 1b). In C. crescentus, relocating parS (a sequence that is required for chromosome segregation in this species but is lacking in E. coli) changes the large-scale folding of the nucleoid without noticeably changing gene expression 2 . However, such alterations to the gross folding of the nucleoid might not have an impact at the small and intermediate scales, where gene-gene communication might influence nucleoid architecture and vice versa.
The order of the genes along the E. coli chromosome is remarkably similar to that in Salmonella enterica, even though these two species separated from their common ancestor about 100 million years ago 65 . Such conservation is indicative of an underlying structure-function imperative, and this conserved gene order has been considered in the context of the macrodomain structure of the E. coli nucleoid 5 . This analysis found that genes encoding trans-acting transcription factors are typically found in the same replichore as their targets, an observation that is in keeping with the need for co-location of regulatory genes and their regulatory subjects. By contrast, genes contributing to the same process (for example, ribosome production) are distributed in both replichores, but at comparable distances from oriC 5 . This might indicate a need to place genes at corresponding points along a putative DNA topological gradient in each replichore.
Chromatin immunoprecipitation followed by microarray (ChIP-chip) data revealed that gyrase-binding sites are more abundant at the Ori pole than at the Ter pole of the chromo some and suggest that this results in a gradient of negative supercoiling extending from Ori to Ter 5,62 . However, it might simply reflect a need to maintain supercoiling at set points in zones of the chromosome that have different levels of transcriptionally induced supercoiling, as the average superhelical density around the chromosome is similar, at least under some growth conditions. On the one hand, the suggestion that the Ter region, which lies at the periphery of the nucleoid in bacteria growing in M9 growth medium 66 , is in a more relaxed state is consistent with data from modelling, which indicate that Ter has a low level of topological complexity 7,57 . On the other hand, the condition of the Ter macrodomain might be a product of the growth conditions used in the experiments in which this macrodomain was analysed: rapidly growing bacteria have a more sharply delineated nucleoid structure than bacteria in stationary phase 10 . A systematic study of nucleoid architecture in the context of growth phase would be helpful in resolving this issue.

The fourth dimension
Bacterial physiology changes as the organism passes through successive growth stages. Following its introduction into fresh liquid medium in batch culture, the bacterial population spends a period of time in lag phase adjusting to its new environment. The population then enters the log phase of exponential growth, expanding at its maximal rate in the new environment until some vital component becomes limiting, causing a transition to stationary phase, when rapid growth ceases (FIG. 3). The cell composition and nucleoid architecture change throughout these growth stages. Patterns of gene regulation are dynamic, resulting in sequential changes to global gene expression.
In lag phase and stationary phase bacteria, the DNA has a lower superhelical density than it does in log phase bacteria 47 , reflecting shifts in the ratio of the ATP/ADP concentration and the impact of this ratio Figure 3 | Growth phase and elements that affect nucleoid structure. A typical growth curve for Escherichia coli growing in batch culture begins with a lag phase (while cells acclimatize), followed by the log phase of exponential growth and, finally, stationary phase (when the cells stop growing, usually owing to nutrient limitation). Important nucleoid-associated proteins are expressed at different times during the growth curve, as indicated, and the balance between the two main RNA polymerase σ-factors, RpoS and RpoD, also varies with growth stage. In addition, there are significant changes in DNA topology: DNA is negatively supercoiled (SC) in log phase cells, whereas it is more relaxed (R) in lag phase and stationary phase cells. The figure gives a purely qualitative impression of these events. Dps, DNA protection during starvation; Fis, factor-for-inversion stimulation; H-NS, histone-like, nucleoid-structuring protein; IHF, integration host factor. on the ATP-dependent DNA-supercoiling activity of DNA gyrase 42,44 . Increased compaction of the nucleoid in stationary phase bacteria could be achieved by binding of NAPs, especially Dps (DNA protection during starvation) 67,68 (FIG. 3). The differential sensitivities of the two principal σ-factors of RNA polymer ase to DNA supercoiling imposes a temporal control on their activities: RpoD (also known as σ 70 ) activity correlates with periods of relatively high chromosomal supercoiling, whereas RpoS (also known as σ S ) becomes dominant as the DNA relaxes 46 (FIG. 3).
The NAP population also shows a dynamic expression pattern 69 that is a function of the growth cycle 13,70 (FIG. 3). Fis and the HU α-subunit are expressed early; HU β-subunit and the two subunits of IHF (integration host factor) appear in exponential phase, with IHF peaking at the exponentialto-stationary phase transition; Lrp also peaks during this transition; Dps is maximally expressed in stationary phase; and H-NS is expressed at a constant ratio to chromosome copy number throughout growth 13 (FIG. 3). There is a rough correspondence between the proximity of NAP-encoding genes to oriC and the period at which they are expressed during growth 5 . As these NAPs influence the expression of many other genes, these determinants of nucleoid structure have a profound and widespread impact on the global gene expression profile of the cell. This is an integrating principle that links environment, gene regulation and nucleoid structure.
Implications of a unified model The structure of the bacterial nucleoid is subjected to constraints at a number of levels because the chromosome has to be configured for optimal rates of accurate replication and segregation while accommodating the complex gene expression programmes that support the life of the cell, and because any chromosomal configuration must be compatible with the volume available to house it in the bacterium. Using DNA topology and NAPs simultaneously to modulate both gene expression and nucleoid architecture allows these two factors to be integrated. However, the associated folding of the genetic material in the nucleoid constrains the free movement of gene products, creating the need to co-locate certain genes to facilitate communication. Co-location can be achieved linearly or by exploiting the periodicity in nucleoid architecture to ensure that specific genes remain within regulatory range of each other. The levels and timing of gene expression, themselves subjected to regulation, can overcome the compartmentalization problem to some extent, allowing a given regulatory gene to exert influence at a distance.
To what extent is this apparently wellintegrated nucleoid structure capable of further evolution? Bacterial chromosomes can undergo substantial rearrangement without incurring lethality 2,71 , and the horizontal transfer and integration of novel genes is a routine event in many bacterial populations 72 . This suggests that nucleoid architecture has the scope to adapt to modifications, with the final arbiter of success being the manner in which the new form affects the competitive fitness of the bacterium.