Rational design of allosteric modulators: Challenges and successes

Recent advances in structural biology and computational techniques have revealed allosteric mechanisms for an abundance of targets leading to the establishment of rational design of allosteric modulators as a new avenue for drug discovery. Considering that allostery is an intrinsic property of the protein conformational ensemble, allosteric drug design has the potential to develop into an innovative approach to modulate the dysregulation of therapeutic targets that are considered to be undruggable at their orthosteric site, explore strategic design opportunities to tackle new chemical space, or develop mutant‐specific therapies to target mutations occurring far from the enzyme active site. Traditionally, allosteric drug discovery has been performed through high‐throughput screening or through serendipitous discoveries; however, recent developments in structure‐based and ligand‐based methods have led to exciting advancements of designing bioactive allosteric ligands rationally. In this review article, we highlight the advantages and disadvantages of allosteric modulators and present structure‐based and ligand‐based drug design methodologies for the identification of allosteric binding sites and allosteric modulators. We also illustrate representative studies for the design allosteric modulators for proteins belonging to a wide range of protein families, also considering irreversible binding with covalent allosteric modulators. Additionally, we analyze challenges and successes in the rational design of allosteric inhibitors and activators. Finally, we present the future of rational allosteric ligand design with newly built computational tools that we expect to be applied in future studies, concluding to theoretical and practical guidelines for allosteric ligand design strategies and identify knowledge gaps that need to be addressed to improve efficiency in allosteric drug design.


| INTRODUCTION
Allostery can be defined as a functional change of the orthosteric site generated by a perturbation produced from the binding of an effector to a distant site (allosteric site). The effect at the allosteric site is linked to the active site by conformational changes and dynamics that transmit the allosteric effect in a wave-like manner along pathways of amino acids (allosteric pathways) within the protein regulating its activity. 1 This biological phenomenon was first introduced by Cristian Bohr over a century ago, who observed positive cooperativity of oxygen binding to the hemoglobin protein. 2 Initially, numerous formalisms emerged to explain this phenomenon [3][4][5][6] until the derivation of the Monod-Wyman-Changeux (MWC) concerned model, 7 which used for the first time the term "allosterism" to describe this cooperativity, along with the Koshland-Némethy-Filmer (KNF) sequential model 8 ; these remained for years the two leading allosteric models. The current approach to describe allostery is that a perturbation at an allosteric site of a protein structure leads to a shift in equilibrium populations, suggesting that allostery is a property of the conformational ensemble. [9][10][11] Based on this redistribution of the conformational ensemble, it is surmised that allostery is not only associated with the cooperativity between enzyme subunits, but it is also an intrinsic property of all dynamic proteins. 12,13 Therefore, in the protein conformational ensemble different allosteric mechanisms pre-exist; thus, the binding of an allosteric modulator stabilizes the active form of the allosteric domain, which through the preexisting allosteric pathways modulates the activity (positively or negatively) of the functional domain, remodeling the free energy landscape. 11,14 However, this notion has been challenged by other studies showing that allostery can be observed in the absence of a structural pathway, 15 suggesting the conformational ensemble as a framework to explain allostery through the statistical nature of the interactions responsible for the transmission of information. 10,16 Since the first discovery of allosteric modulators, it has been shown that most protein receptors can accommodate allosteric sites in addition to their orthosteric (active) sites. 13,17,18 Because of the various diverse possible allosteric effects, allosteric modulators can be classified into different subtypes. Positive allosteric modulators (PAMs) enhance the potency of the endogenous agonist inducing relaxed states for both the allosteric and orthosteric sites, whereas negative allosteric modulators (NAMs) reduce the effect of the orthosteric ligand/substrate inducing a tense state for the orthosteric site ( Figure 1). 19 Allosteric modulators, which have a high affinity for an allosteric binding site, yet with no effect on protein activity are called neutral or silent allosteric modulators (SAMs). SAMs may also act antagonistically F I G U R E 1 Schematic representation of positive allosteric modulator (PAM), negative allosteric modulator (NAM), or silent allosteric modulator (SAM) binding on allosteric sites on a receptor and the effect on the protein orthosteric site. All binding sites can adopt two states, the relaxed (R), where the substrate/co-factor can bind with high affinity, and the tense (T), where the substrate/cofactor cannot bind or binds with low affinity. All possible combinations are expressed in a two letter code, for example, RT, where each letter denotes the state of the allosteric and the orthosteric sites, respectively. In the unbound conformational ensemble there are four possible combinations of protein states with different probabilities based on the free energy landscape. Binding of an allosteric modulator shifts the conformational ensemble towards the relaxed form (in case of a PAM) or the tense form (in case of a NAM) for the orthosteric site through preexisting allosteric pathways. Binding of a SAM in the allosteric site does not affect the orthosteric site, which can exist in either of the relaxed or the tense states. However, a SAM binds with high affinity acting as an antagonist to other allosteric modulators, blocking the allosteric regulation to other allosteric ligands that bind in the same allosteric site (Figure 1). 20 In addition to PAMs, NAMs, and SAMs, allosteric partial or full agonists, antagonists, and inverse agonists also exist, capable of modulating the protein activity by binding to the allosteric site, irrespective of the absence or presence of an orthosteric ligand. 21 Furthermore, allosteric modulators may act simultaneously as modulators for endogenous ligands and as allosteric agonists, which are called ago-allosteric modulators, increasing the efficacy and modulating (positively, negatively, or neutrally) the potency of the endogenous ligand. 22,23 In kinases and phospholipases, allosteric ligands bind to remote cavities, distant from the orthosteric site, and stabilize unique, inactive conformations. 24,25 Over the past years, rational drug design has been mainly facilitated in orthosteric sites as allosteric sites and the effect induced by allosteric ligands on protein function is often unknown. Additionally, structural information of allosteric modulators bound to proteins is limited compared to active site ligand-protein complexes. At the same time, many protein therapeutic targets have been considered undruggable or difficult to target in the orthosteric site for a variety of reasons, for example, extremely tight binding of the substrate, absence of a known orthosteric site, or selectivity issues arising from a highly conserved active site across proteins of the same family, such as kinases or G-protein-coupled receptors (GPCRs). Targeting undruggable protein targets in combination with the recent patent cliff has generated an increased interest in allosteric drug design over the past few years. This growing interest has motivated the creation of structural databases of allosteric ligands bound to proteins and raised interest in rational, experimental, and computational methods predicting allosteric sites and allosteric pathways, enabling the rational design of bioactive allosteric ligands. [26][27][28][29][30][31][32][33] Despite these efforts, most allosteric bioactive ligands hitherto have been discovered by high-throughput screening (HTS) [34][35][36][37] often followed by structure-activity relationship (SAR) optimizations [38][39][40][41][42] with limited applications of computer-aided drug design. Going forward, new structural information coming to light 42,43 in combination with novel methodologies and approaches can bolster allosteric rational drug design.
In this review, we focus on rational, computer-aided allosteric drug design methods and applications and less in medicinal chemistry optimizations, which are countless (for a recent review see Ref. 44). First, we discuss the advantages and challenges in allosteric ligand discovery and proceed with summarizing methods for discovering allosteric pathways and predicting allosteric sites and cryptic pockets. We present structure-based and ligand-based studies that have been employed to successfully design allosteric modulators for several protein families and address the obstacles that must be overcome such as the identification of cryptic allosteric pockets not detectable in crystal structures and the need for conformational ensemble generation to account for protein flexibility. The variety of different structural and computational approaches that are presented herein may offer key principles and strategies for the design of allosteric modulators and serve as practical considerations for computer-aided allosteric drug discovery.

| ALLOSTERIC MODULATORS: PROS AND CONS
Allosteric modulators may demonstrate improved properties in terms of subtype selectivity and specificity, possibly reducing side effects [45][46][47][48] compared to orthosteric ligands because allosteric sites are less evolutionary conserved than orthosteric sites. 49,50 Thus, they may target proteins that have been considered undruggable in the absence of an orthosteric site or very tight binding of the substrate/native ligand, and they may also selectively target and inhibit mutant proteins in cancer, while displaying limited binding to the wild type (WT). 19,51,52 The importance of selectively targeting mutant proteins is demonstrated by the fact that treatment with allosteric inhibitors in cancer may overcome drug resistant mutations as demonstrated in a recent study of coadministering nilotinib and the allosteric inhibitor ABL001 to the oncogene BCR-ABL1, which showed that this combination can completely overcome drug resistance without recurrence after the end of the treatment, verifying the hypothesis that the dual inhibition of oncogenes using distinct targeting mechanisms might improve treatment outcomes by providing enhanced target coverage. 53 Moreover, allosteric modulators stabilize a unique conformation of the protein ensemble, providing a new pharmacology for the receptor (e.g., ligand-biased signaling). [54][55][56] Another effect of allosteric modulators is that they bind noncompetitively, allowing the endogenous ligand to bind to the orthosteric site, enabling modulation of the receptor rather than switching the activity on or off entirely such as orthosteric ligands. 45 Additionally, PAMs act synergistically with orthosteric ligands, increasing their affinity, thus decreasing the orthosteric ligand dosage. 21 Moreover, allosteric effectors are saturable, meaning that once allosteric sites are occupied, no additional effects can be observed. 24 Finally, probe dependence is one of the unique features of allostery, which allows a protein to respond differently to the same allosteric modulator when different orthosteric or endogenous ligands are bound. [57][58][59] Intriguingly, this feature reflects the bidirectional regulation between allosteric and orthosteric sites. 60 Therefore, signaling bias and probe dependence are allosteric features that have the potential to lead to unambiguous and precise allosteric ligand design. 61 On the other hand, the discovery of allosteric bioactive ligands presents challenges beyond those encountered in orthosteric drug discovery. Allosteric sites are often shallower and narrower than orthosteric sites hampering the discovery of allosteric drugs with sufficient binding affinity. 46,62 Additionally, because of the variety of different mechanisms of modulation that were mentioned before, along with the binding affinity the allosteric efficacy must also be taken into consideration, 29,63 because minor differences in allosteric modulators can lead to opposite effects in the protein function. 24,64,65 Furthermore, although orthosteric sites are straightforward to discover due to the known location of substrate binding, allosteric sites are often unknown or difficult to discover, as they are only accessible in specific protein conformations that may not have an associated resolved 3D structure. 31,66 It turns out that is also difficult to predict allosteric sites using classical molecular dynamics (MD) simulations as allosteric effects often occur through conformational changes in the microsecond, [67][68][69] or millisecond time scale. [70][71][72][73] Finally, allosteric sites are less evolutionarily conserved across protein families compared to orthosteric sites, but there are also less evolutionarily conserved among subspecies complicating testing in different species homologs, that is, human versus rodent receptors. 29,74,75 Despite the above-mentioned disadvantages, the advantages outweigh the challenges, and the design of allosteric modulators holds a significant promise for drugging proteins considered to be undruggable, but also for modulating the activity of orthosteric ligands providing better selectivity with fewer side effects, smaller dosage, reduced toxicity, modulating the protein activity instead of switching it on or off, inducing signaling bias, and overcoming drug resistance.

| METHODS
Structure-based allosteric drug design has proven challenging owing to the limited resolved 3D protein structures bound to allosteric modulators. Allosteric sites are often cryptic remaining undiscovered in protein conformations. Cryptic pockets are most often absent in the unbound structure, and only in the presence of the ligand the cryptic site relaxed state becomes dominant in the conformational ensemble. 31 Thus, in order to unveil these cryptic sites, the conformational space of the apo and/or holo protein structures must be examined; molecular simulations can be an efficient way to generate an ensemble of conformations and explore these sites. Molecular simulation techniques for this purpose are molecular mechanics calculations such as MD or Monte Carlo (MC) simulations complemented by techniques that produce conformational ensembles such as Markov state models (MSMs), or enhanced sampling techniques such as weighted ensemble MD, metadynamics, coarse-grained simulations, or combinations of methods. [76][77][78][79][80][81] Site detection algorithms may then be utilized on protein conformers to discover new putative allosteric sites. [82][83][84][85][86][87] An important consideration after identifying the new cavities is whether these sites can communicate with functional domains of the protein such as the active site, thus being able to act allosterically and modulate protein function upon ligand binding. Several techniques have been developed for this purpose, which we describe below.
Allosteric sites can be targeted with small molecules aiming to shift the conformational landscape towards a specific state (active, inactive, or intermediate). The subsequent step is structure-based design with various methodologies such as virtual screening or de novo design in a variety of protein conformers of the conformational ensemble to implicitly consider protein flexibility. 88,89 In many cases presented herein, homology modeling was facilitated in the absence of a crystal structure. When homology modeling is necessary, an efficient practice is to produce different homology models and screen known allosteric and orthosteric inhibitors along with inactive compounds and decoy molecules. The homology model, which ranks known allosteric and orthosteric inhibitors on top, can be considered the best structural conformation to be employed in further studies. 90,91 In cases where the 3D protein structure or the required conformational state is not available, structure-based design might not be feasible. Allosteric ligand-based drug design does not rely on the 3D structure of the protein but on known active and inactive ligands that bind to the desired binding site. Typically, when only a few known allosteric binders are available, similarity search methods are applied, [91][92][93] while in the case that several active ligands exist, pharmacophore models that describe binding interactions and ligand physicochemical and structural properties can be generated. 92,94 Pharmacophore models are routinely subjected to virtual screening, filtering chemical libraries in search for compounds that encompass the same pharmacophore features, or for designing de novo allosteric ligands. Allosteric pharmacophore modeling is not solely a ligand-based approach, but could also involve information about the binding site and residues that are important for binding. [95][96][97][98] In the last decade, numerous methodologies and applications have been developed to explore allostery in proteins and predict putative allosteric sites or allosteric pathways. These can be broadly categorized in sequence-based and structure-based techniques. Sequence-based methods are based on the principle that allosteric pathways are evolutionarily conserved and that protein residues coevolve in a correlated manner. 99,100 To quantify how residues coevolve and identify patterns of correlated mutations, the statistical coupling analysis (SCA) method is performed to measure covariation between pairs of amino acids in a protein multiple sequence alignment (MSA) of an ensemble of homologous sequences, revealing allosteric pathways and decomposing the protein into sectors. 101,102 This methodology has been facilitated in the pySCA Python package 103 for the REACH evolution-based strategy that rationally engineers allosteric control in proteins. 104 The drawbacks of the evolution-based approach are twofold. First, its success is dependent on the number of available homologous sequences and the need of a 3D structure, which is not always available. Second, and most important, residue coevolution might not necessarily be related to allostery as it is discussed in Refs. 30, 105. Structure-based methods are predominantly based on residue movement cross-correlation within the protein. [106][107][108][109][110] Motional correlations can be obtained from MD simulations or from conformational ensembles produced from MD, [111][112][113][114][115][116]117,118 using principal component analysis (PCA) or from normal mode analysis (NMA), elastic network models (ENMs) such as the Gaussian network model (GNM), 118,119 the anisotropic network model, 120 or the torsional network model, 121,122 which uses torsion angles as degrees of freedom. Graph theory has been also employed, which provides a way of representing proteins in a reduced form, identifying allosteric pathways from the dynamic couplings between the residues of the orthosteric and the allosteric sites. In graph theory, proteins are represented by an atomistic graph network model, where nodes are atoms and weighted edges are the covalent and noncovalent bonds, 123 or by a coarse-grained network model, which is the most usual case, where nodes are residue Cα atoms, or residue center of mass, connected with edges based on a distance cutoff maintained in the MD trajectory at a certain percent (e.g., 75% of the MD trajectory) 108,124 or based on other inter-residue properties. 125,126 The edge weights are usually based on the covariance matrix of the displacement of Cα atoms, or from mutual information analysis from MD simulations, and the allosteric hot spots and communities are identified using centrality measures, for example, betweenness centrality, eigenvector centrality, and other metrics. 124,[127][128][129][130][131][132] The allosteric pathway is then detected using the Dijkstra algorithm, 133 or the Bellman-Ford algorithm, 134 which finds the shortest path between two nodes in a graph. A plethora of tools integrating graph theory and network models for detecting protein dynamics and allosteric communications pathways have been recently reviewed in Ref. 135. It must be noted that calculating cross-correlations using the covariance matrix may miss nonlinear correlations and ignore orthogonal vibrations; for this purpose, approaches based on mutual information have been developed, which enable quantifying all correlations, including nonlinear and higher-order correlations [111][112][113]115,116,136,137 or from the singular value decomposition. 120 Additionally, by using the covariance matrix, the time evolution information and dynamic properties of the system are lost. 9,30,60 Another challenge in network-based methodologies using equilibrium sampling is that allosteric conformational changes may occur in time scales inaccessible to plain MD. 30 Enhanced sampling simulations may then be employed. 30,105 Available tools for the prediction of allosteric pathways are summarized in Table 1.
Path-based methods such as accelerated molecular dynamics (aMD), 24 and metadynamics 25 were developed to aid in sampling complex free energy states and landscapes. These calculations are used when systems are at equilibrium; when studying nonequilibrium properties, the Jarzynski method can describe the free energy between two states. 26 Nonequilibrium methods can be used to understand ligand unbinding pathways. 28 Early work was performed with metadynamics for docking ligands on flexible receptors in water, mimicking the real dynamics of a ligand binding/unbinding to a target. From these calculations it is possible to reconstruct the free energy surface and thereby compute a binding free energy. This approach was first applied to four protein-ligand pairs (β-trypsin/benzamidine, β-trypsin/ chlorobenzamidine, immunoglobulin McPC-603/phosphocholine, and CDK2/staurosporine), whereby the docked geometry and the free energy of docking in all cases were correctly predicted. 138 More recently, a more generally applicable metadynamics scheme was developed for predicting free energy profiles and binding energies. For example, studies on GPCRs using a single collective variable (CV) together with well-tempered multiple-walker metadynamics with a funnel-like boundary were used to predict binding free energies of 12 diverse ligands in five receptors to a root-meansquare error less than 1 kcal/mol. 139 In recent years, an application of metadynamics, binding pose metadynamics, has become increasingly popular as a method that allows for an efficient scoring of ligand poses in a protein binding pocket. Ligand poses can be evaluated in terms of their stability under the bias of the metadynamics simulation; the ones that are unstable are expected to be infrequently occupied in the free energy landscape, thus making minimal contributions to the binding affinity. Recent results using crystal structures with ligands known to be incorrectly modeled, as well as T A B L E 1 Computational tools and databases for the prediction of allosteric sites, allosteric pathways, and cryptic sites  63 structurally diverse crystal structures with ligand fit to electron density from the Twilight database, show that binding pose metadynamics can successfully differentiate compounds whose binding pose is not supported by the electron density from those with well-defined electron density. 140 A novel protocol based on clustering of multiple walker metadynamics simulations allowed identifying the preferential binding mode from conformational ensembles for three GPCR structures and reproduced the correct allosteric binding mode known from the literature. 141 In another study, MD and bias-exchange metadynamics simulations were utilized to elucidate the mechanism of how the natural substrate of κ-casein acts as an allosteric activator and to compute the free energy surface for the process. 142 These examples illustrate the potential and exciting opportunities of enhanced sampling simulations for the investigation of allosteric ligands, yet these methods have not been fully exploited. An extensive review highlights recent advances and discusses challenges in predicting protein-ligand binding using metadynamics-based approaches, with an emphasis on structure-based design. 143 A plethora of computational methodologies and tools have recently utilized the wealth of available structural data and the predictive power of machine learning, often in combination with the aforementioned techniques, for predicting putative allosteric sites, allosteric signaling pathways, allosteric hotspots, and cryptic sites. [144][145][146][147][148][149][150][151][152][153][154][155][156] The allosteric database ASD v3.0 28 and the allosteric benchmark ASBench 157 have been extensively used for the training and testing of many developed tools for allosteric pocket detection. Tools such as Allosite 145 or Cryptosite 148 utilize pocket detection methods in conjunction with ASD to train machine learning algorithms, for example, SVD, 145,148 or Random Forests 147,155,156 to predict allosteric sites. Moreover, to improve the predictive performance in identifying allosteric pockets in algorithms such as AlloPred, 146 machine learning can be combined with NMA (or ENMs), 146,149 node-weighted residue interaction networks, 151 and MD simulations. [153][154][155] Recently, methodologies combining allosteric communication and dynamics were introduced providing novel perturbation approaches to predict allosteric sites. Tee et al. exploited the bidirectional nature of allosteric communication to create a reverse perturbation approach, using a structure-based statistical mechanical model, 158 identifying allosteric sites by perturbing the orthosteric site. 159 Yao et al. introduced a method termed comparative perturbed-ensembles analysis 160 to study protein functional dynamics using conformational ensembles generated from MD simulations of the unperturbed (ligand free or WT) and perturbed (ligand-bound or point mutated) protein, instead of the conformational ensemble of a long MD simulation bound to a specific substrate. 160 The amino acid network approach called dCNA (difference contact network analysis) 161 utilizes graph theory using two or more conformational ensembles as input (i.e., ensemble of holo and ensemble of apo, or ensemble of WT and ensemble of mutant protein), and partitions the protein in communities. This novel correlation-based analysis is able to distinguish positive versus negative allosteric responses during PAM and NAM binding. 160 Significant efforts towards the direction of discovery and investigation of cryptic or transient allosteric binding pockets have also appeared in the last decade. Most of the developed methodologies utilize mixed-solvent MD simulations using probes, which open and stabilize cryptic pockets, [162][163][164][165][166] or enhanced sampling methods, 66,[167][168][169][170][171][172][173] which effectively sample the opening of cryptic pockets. Other computational methodologies revealing cryptic binding pockets use MSMs, 174-176 machine learning, 148 and postprocessing methods for MD trajectory and pocket flexibility analysis. 177,178 Many of the abovementioned cryptic site discovery methods have been implemented in software programs and web-servers summarized in Table 1, and are discussed in detail in recent reviews. 179, 180 We will not extend further into the aforementioned computational techniques and tools, as they have been extensively described and discussed in numerous recent reviews, also listing the accessible available web-servers and stand-alone programs implementing the abovementioned methodologies. 30,32,33,60,105,110,129,135,[180][181][182][183] Here, we review several successful examples of rational design of allosteric modulators categorized by protein families. The protein targets, the methods and tools utilized, and the results are summarized in Table 2. A list of abbreviations can be found in Table 3.

| GPCRs
GPCRs are the largest family of integral membrane proteins comprising seven conserved transmembrane helices, three extracellular loops, and three intracellular loops. GPCRs are involved in many diseases and disorders [204][205][206][207][208] and are sought-after drug targets. Approximately 34% of all US Food and Drug Administration (FDA) approved drugs target GPCRs. 209 GPCRs are grouped in six classes (classes A-F) based on their sequence and functional similarity. Drug design in the orthosteric sites of GPCRs can be challenging because these sites are highly conserved across GPCR subfamilies causing side effects due to poor selectivity. [210][211][212] For this reason, allosteric drug design is frequently employed for GPCRs. Crystallization of GPCRs is complex due to their inherent flexibility and instability outside of the membrane, and thus most GPCR-targeting drugs were discovered by identifying lead compounds from HTS, followed by iterative trial and error optimizations. However, technological improvements have recently led to the high throughput X-ray determination of GPCR structures in complex with allosteric modulators, holding promise for the rational allosteric drug design in these targets. 42,43,[213][214][215] Below, we summarize examples of identifying allosteric mechanisms and allosteric ligand design on GPCRs using computational techniques.

| M2 muscarinic acetylcholine receptor
Muscarinic acetylcholine receptors (mAChR) belong to class A GPCRs and consist of five subclasses (M1-5). M2 is important for modulating the cardiac function and the cognitive functions of learning and memory. 216 Dror et al. studied the structural basis of allosteric ligand binding in M2 elucidating its allosteric regulatory mechanism. 67 In their study, unbiased all-atom MD simulations of the human M2 receptor (PDB: 3UON) 217 were performed using the parallel supercomputer Anton 218 and the CHARMM force field. 219,220 In these simulations, the cocrystallized antagonist QNB was removed or replaced by NMS and known M2 allosteric modulators. Structurally diverse relevant drugs were also simulated independently with initial conformations more than 20 Å away from the extracellular vestibule center of M2. Results showed that all ligands bind to the same allosteric site at the extracellular vestibule, and that PAMs and NAMs act via two different allosteric mechanisms although they bind to the same allosteric binding site. The computational binding modes were tested experimentally providing insights into the chemical modifications that can alter the effect of allosteric modulators from PAMs to NAMs and vice versa.
In another computer-aided drug design approach, aMD simulations 221 were applied to discover PAMs and NAMs for the M2 receptor. 222 In this study, simulations used the X-ray structures of the inactive QNB-bound (PDB: 3UON) 217 and active IXO-nanobody-bound M2 receptor (PDB: 4MQS) 212 structures and trajectories were clustered to produce a conformational ensemble of M2. The extracellular vestibule allosteric site was then defined based on the binding location of the LY2119620 allosteric modulator (PDB: 4MQT, Figure 2). 212 Subsequently, ensemble docking was performed using the National Cancer Institute (NCI) compound library, 223 and high-throughput virtual screening followed by the induced-fit docking 224-227 method on the top 100 compounds docked and scored by Glide. 228-231 A total of 38 compounds were tested experimentally to measure the binding affinity and the efficacy (PAMS or NAMs) resulting in 12 of them being M2 allosteric modulators with binding affinity ≤30 μM.
Using a different concept, several studies for the M2 receptor explored bitopic (or dualsteric) ligands as potential bioactive ligands that target both orthosteric and allosteric sites. 232 Valant et al. generated a homology model of M2 based on the crystal structure of the 2-adrenergic receptor and generated the lowest energy model using biased probability Monte Carlo (BPMC) simulations. 233 Then, the mAChR known agonist McN-A-343 was docked onto the homology model; results showed that McN-A-343 is not an orthosteric ligand but rather a bitopic ligand consisting of an orthosteric and an allosteric moiety. Even more intriguing is the fact that the allosteric moiety of McN-A-343 is a PAM for orthosteric inhibitors, but a NAM for orthosteric agonists confirming the above-mentioned allosteric property termed probe dependence. Overall, these results can lead the way toward an alternative structure-based bitopic binding T A B L E 2 Proteins for which allosteric modulators have been rationally designed, methods and tools utilized, and synopsis of the results procedure targeting both allosteric and orthosteric sites, in the case where these sites are adjacent to each other, providing even better subtype and functional selectivity. 234,235

| Complement component 5a receptor
The complement component 5a receptor (C5aR) is a class A GPCR involved mainly in inflammatory responses, but also autoimmune, neuropathic pain, and neurological disorders. [236][237][238][239] Based on a study of the neutral allosteric inhibitor reparixin binding to the minor pocket of the transmembrane region of CXCR1 and CXCR2, 240 a hypothesis was made that this minor pocket is conserved across this GPCR family. Based on this assumption, a C5aR homology model was built using the crystal structure of C-C chemokine receptor type 5 (CCR5) bound to the HIV FDA-approved allosteric drug maraviroc (PDB: 4MBS) 241 as a template. 205 Then, MD simulations were performed in the C5aR model for 1 ns using the Discovery Studio 242 software and the CHARMM force field, 219,220 and an average structure from the last 100 ps was built. Subsequently, a site-targeting library was designed based on structural characteristics of the minor pocket to identify new allosteric C5aR inhibitors. Molecule DF2593A was selected as the most promising lead and was docked using LiGenDock 243,244 to identify the binding poses with C5aR. In vitro and in vivo tests followed, identifying DF2593A as a potent and orally active C5a noncompetitive allosteric antagonist effective in rodent models for inflammatory and neuropathic pain. DF2593A is being developed by the Center for Research on Inflammatory Diseases in partnership with Dompé Pharma although currently there is no data on clinical development. 245

| Ovarian cancer G-protein coupled receptor 1
The ovarian cancer G-protein coupled receptor 1 (OGR1) is a class A GPCR encoded by the GPR68 gene, belonging to the GPCR proton-sensing family. 246 Proton-sensing GPCRs are activated when the extracellular pH is below 7, and their expression is shown to be significant in acid-induced pain sensation. 247 In a combined experimental-computational study, first it was shown experimentally that the benzodiazepine drug lorazepam is a PAM of OGR1 and then the OGR1-lorazepam complex was modeled computationally. 90 To model the OGR1 structure, 407 OGR1 homology 3D models were created using MODELLER 248 based on the CXCR4 structure (PDB: 3ODU), 249 which were then expanded using ENM with the 3K-ENM program 250 by additionally generating 2900 models. For each one of the total 3307 models, virtual screening of active benzodiazepines, inactive compounds from the National Clinical Collection library, and property-matched decoy molecules using the flexible ligand sampling algorithm in DOCK 251,252 was performed in five candidate allosteric sites. Iterative cycles of modeling and optimization resulted in the OGR1-lorazepam complex structure, where lorazepam was ranked first among the all the other decoy compounds, providing the correct binding pose of lorazepam and its allosteric binding site. Then, in order to discover more active PAMs than lorazepam, virtual screening of more than 3 million compounds against the lorazepam allosteric site of OGR1 was performed; this exercise resulted in the calculation and scoring of more than 3 trillion complexes. Seventeen compounds were selected for in vitro testing based on their ranking score, their interactions with OGR1, their chemical diversity, and also the score of their analogs. Four of them showed increased activity, but none of them was more active than lorazepam. Subsequently, hundreds of analogs of these early hits were docked and scored, and 25 of them were selected for testing; 13 of them were more active than lorazepam. The compound with the highest allosteric effect based on the standard allosteric operational model 253 was named ogerin. Optimization of ogerin continued by producing a virtual library of more than 600 ogerin analogs, docking into the OGR1 model, synthesis and assaying of 13 analogs, out of which three analogs were shown to be more active than ogerin. Additionally, ogerin showed high specificity as an OGR1 PAM over other proton-sensing GPCRs. Utilizing the Similarity Ensemble Approach (SEA) 254 program against a panel of 2800 offtargets, ogerin and its analogs revealed association with adenosine receptors which was confirmed with physical profiling. Based on these results, again using SEA for computational screening, it was revealed that there is a cross-talk between OGR1, adenosine, and γ-aminobutyric acid receptors allosteric modulators, which should be considered in future drug design in these receptors. The same procedure was followed for the psychosine receptor (GPR65), a receptor belonging to the same proton-sensing GPCR family, resulting in the discovery of allosteric agonists and NAMs, proving the general applicability of this approach.

| Dopamine receptors
Dopamine receptors belong to class A GPCRs and are neurologic drug targets because abnormal dopamine receptor function is implicated in many neuropsychiatric disorders, drug addiction, and Parkinson's disease. 255,256 Lane et al. studied computationally the dopamine receptor D 3 (D3R) by initially creating a benchmark set of 28 known D3R selective and potent antagonists from the ChEMBL database, 257 then generating conformers based on the protein crystal structure (PDB: 3PBL) 258 using NMA, followed by energy-based side chain sampling using an antagonist of the initial benchmark set. 259 These protein conformers were then evaluated by virtual screening of the antagonist benchmark set and decoy compounds, and were optimized using the iterative ligand-guided receptor optimization (LiBERO) algorithm. 260 The best D3R (apo) model was selected, and the endogenous ligand dopamine was docked. For the identification of new potential allosteric bioactive ligands, a library of more than four million compounds was prepared using the Molsoft ScreenPub database, which was virtually screened in an allosteric pocket adjacent to the orthosteric site using the ICM-Pro 261,262 software. Subsequently, the top 300 compounds were selected and clustered (Tanimoto similarity), and the top 25 compounds were chosen for experimental tests to measure binding affinity, functional potency and selectivity, eventually identifying two potent NAMs. The NAM with the highest potency in D3R was also selective for the dopamine receptor D 2 (D2R). Based on this finding, a recent study was conducted using SAR optimizations on this NAM. A library of analogs was constructed accounting not only for the binding affinity but also for their efficacy to restore dopamine function. 263 The D3R NAM and its analogs were found to be also potent D2R NAMs, and three of the analogs were shown to be many-fold more selective for D2R over D3R. To model the allosteric binding of the above-mentioned NAMs, the cocrystallized structure of the antipsychotic drug risperidone bound to the D2R (PDB: 6CM4) 264 was used. Risperidone was replaced with raclopride (which was used in the binding experiments) using docking with the ICM-Pro 261,262 software and, the D2R/raclopride complex was optimized using BPMC. The D3R NAM along with its 33 analogs were docked in the D2R complex again using ICM-Pro, providing insights into the interactions of these NAMs with the orthosteric ligand and the D2R. It is worth noting that given the structural morphology of the orthosteric and allosteric pockets, the rational design of bitopic drugs holds a promising alternative for the D2R and D3R proteins. 265-268

| Glucagon receptor
Rational allosteric drug design of class B GPCRs has been even more laborious than class A because of the very few structural models available. Glucagon receptor (GCGR or GLR) is a class B GPCR, which regulates blood glucose levels and glucose homeostasis. 269 In a study performed before GCGR was crystallized, 270 a GCGR homology model was designed based on the corticotropin-releasing factor type 1 receptor (CRFR1) template, which in turn was modeled as prototypical class B representative from noncompetitive (allosteric) antagonists, extensive SAR and site-directed mutagenesis data available at that time. 91 Subsequently, MD simulations of the CRFR1 model in a hydrated phospholipid bilayer environment were performed. Virtual screening of known CRFR1 antagonists along with chemically similar decoys using the Gold software 271 on the last snapshot of the MD trajectory showed that the final model was accurate, distinguishing true actives from decoys. Then, the GCGR model was constructed based on a) the CRFR1 template, b) SAR data of known noncompetitive antagonists, and c) MD refinement, similar to the CRFR1 case. A second virtual screening of an in-house database of 2 million compounds with various physicochemical properties was carried out using Pipeline Pilot 272 against existing negatively ionizable (Set A) and neutral (Set B) noncompetitive GCGR antagonists. Results were further filtered with a ligand-based similarity search to one of the reference neutral noncompetitive GCGR antagonists using the ROCS software. 273,274 Selected compounds were then docked and scored in an allosteric hydrophobic pocket between transmembrane helices 3, 4, and 5 of the GCGR model using Gold. 271 Compounds with a high score were preprocessed and filtered based on protein-ligand interaction fingerprints (Tanimoto similarity) with the IFP 275 program, yielding a total of 228 compounds for Set A and 125 for Set B, which were clustered giving a total of 23 compounds. These compounds were then assayed along with the reference allosteric antagonist L-168049 in a functional assay, and two inhibited glucagon activity; one showed the same potency as L-168049. Intriguingly, one of the inactive virtual hits of the GCGR was later identified to be a PAM of the glucagon-like peptide-1 receptor (GLP-1R). 91

| Purinergic receptors
Purinergic receptors are classified in two GPCR classes (P1 and P2Y receptors) and a ligand-gated ion channel class (P2X receptors). P1 receptors, also called adenosine receptors, consist of four types, A1, A2A, A2B, and A3, which bind to adenosine and exert various biological effects in regulating normal cell physiology. 276 The P2Y class consists of eight receptor types, P2Y1, P2Y2, P2Y4, P2Y6, P2Y11, P2Y12, P2Y13, and P2Y14, which bind to nucleotides and are expressed in almost all human cells; these are implicated in cardiovascular, central nervous system (CNS), and other diseases. 277 Recent advances in X-ray crystallography have enabled the structure-based drug design of orthosteric, allosteric, and bitopic ligands in P1 and P2Y receptors, 213 predicting a putative allosteric site in the adenosine A3 receptor (A 3 AR) using supervised MD simulations, 278 and rationalizing SAR data of NAMs with docking studies in the P2Y 1 R. 279 F I G U R E 2 The crystal structure of M2 muscarinic acetylcholine receptor bound to the orthosteric agonist iperoxo and the positive allosteric modulator LY2119620 in an adjacent allosteric site above the orthosteric site (PDB: 4MQT) is visualized on the left. On the right, the 2D representation of LY2119620 (top) and iperoxo (bottom) are shown

| 15-Lipoxygenase
Lipoxygenases are iron-containing enzymes, which catalyze the deoxygenation of polyunsaturated fatty acids such as arachidonic and linoleic acids, and are involved in inflammatory response and cancer. 280,281 15-Lipoxygenase (ALOX15) is one of the six human lipoxygenases implicated in anti-inflammation as activation of ALOX15 may alleviate inflammatory responses. 282 In a study focusing on designing allosteric activators, 283 the human ALOX15 was modeled based on the rabbit 15S-lipoxygenase crystal structure (PDB: 2P0M) 284 using PRIME. [285][286][287] The modeled structure was minimized and equilibrated using Desmond 288,289 with the OPLS-AA/SPC force field 290-292 followed by six independent MD simulations of 10 ns each, which were then combined and clustered yielding 30 cluster representatives. The CAVITY 86,87 program was used for the prediction of binding pockets, and a mutual information approach with the MutInf 111 program was facilitated for the identification of clusters of residues with statistically significant correlations. Six highly populated protein cluster representatives with an open putative allosteric pocket conformation having the best CAVITY score, and which motions were correlated with the motions of the orthosteric site according to MutInf, were selected for virtual screening of the SPECS 293 database initially with DOCK, 294,295 and then with AutoDock. 296,297 After manually evaluating the top 500 compounds for each of the six protein conformers, a total of 175 compounds were selected for assaying resulting in one compound that increased the ALOX15 activity thereby improving the control of inflammation. Subsequently, the top 3000 compounds generated from virtual screening were rescored using Glide [228][229][230][231] in order to discover binding modes similar to this activator. From this procedure, 54 compounds were selected for further in vitro testing resulting in the identification of another two allosteric activators, albeit not as effective and potent as the first one.

| Human immunodeficiency virus type 1 integrase
Human immunodeficiency virus type 1 (HIV-1) is one of the two HIV types that are responsible for the acquired immune deficiency syndrome (AIDS). Millions of people have passed away from this virus, and still, millions are currently infected highlighting the continuous need for drug development against this virus. 298 HIV-1 integrase inserts the HIV genetic material in the DNA of the infected human cell. In a rational drug design study implemented 20 years ago, the cocrystallized structure of the 5CITEP inhibitor bound to HIV-1 integrase (PDB: 1QS4) 299 was simulated with MD simulations for 2 ns. 300 Flexible ligand docking of 5CITEP in protein conformations retrieved from the simulation using AutoDock 296,297 revealed that 5CITEP docked in a cryptic allosteric site adjacent to the orthosteric site that was not visible in the crystal structure. 301 Ten "butterfly" compounds able to bind in both sites were then designed based on the structure of 5CITEP and docked in 10 of the protein conformers extracted from the MD simulations. It is important to note that the discovery of this allosteric site led to the development of the FDA-approved drug raltegravir by Merck & Co. 302

| HIV-1 protease
The Carlson group has thoroughly investigated the HIV-1 protease (HIVp) computationally, an enzyme essential for the life-cycle of HIV, 303 using a dynamic receptor-based pharmacophore model called multiple protein structures (MPS), which they created in an HIVp allosteric site named "eye site". 95,[304][305][306] They applied the MPS method on MD simulations of three unbound semi-open structures of HIVp (1HHP, 307 3HVP, 308 and 3PHV) 309 simulated for 3 ns each to generate a receptor-based pharmacophore model. 95 The MPS resulted in a seven-site pharmacophore model of the eye site, which was then virtually screened with the University of Michigan's Center for Chemical Genomics compound database using MOE. 310,311 Eleven compounds were identified as possible allosteric inhibitors, and a representative was chosen for simulations to test its binding stability. Five independent Langevin dynamics (LD) simulations for 5 ns each and 10 MD simulations for 10 ns each revealed that this reference compound was relatively stable in the allosteric binding site. Follow-up experimental work showed that this compound exhibited HIVp inhibition of IC 50 = 18 ± 3 μM.
A Markush chemical-similarity search was performed on this reference compound using the UNITY 312,313 module of the SYBYL 314 program against four compound libraries matching 7230 compounds. 92 These were scored and ranked based on shape and charge distribution similarity using ROCS 273,274 and EON, 315 and the first 200 compounds of each library were manually examined resulting in 48 compounds that were subjected to experimental testing. These assays revealed that a compound is an allosteric inhibitor, equipotent in the WT and the multidrug-resistant HIV-1p proteins. Further examination with five independent MD simulations and five independent LD simulations of 20 ns ensued, where the compound was docked in the HIVp allosteric eye site using the Glide 228-231 module. MD simulations showed a stable binding of this compound, while LD simulations showed reversible binding and unbinding. An analysis of protein collective motions from the MD trajectories on the complex and apo HIVp structures showed that the compound induces significant localized changes in the protein structures.

| HIV-1 reverse transcriptase
HIV-1 reverse transcriptase (HIV-RT) is the enzyme responsible for converting the HIV RNA genome into DNA, an essential step in retroviral replication. 316 HIV-RT possesses a unique and highly selective hydrophobic allosteric pocket in which many diverse allosteric inhibitors called non-nucleoside reverse transcriptase inhibitors (NNRTIs) have been designed, six of them now being FDA-approved. [317][318][319][320] In a computer-aided NNRTI design journey that nearly lasted 20 years, 321 lead generation using de novo design with the BOMB 322 software, virtual screening with Glide, 228-231 and lead optimization with free energy perturbation calculations combined with MC with the MCPRO 323 program led to the discovery of NNRTIs and their optimization from micromolar to picomolar concentrations, the most potent NNRTIs reported to date. [324][325][326][327][328][329][330] This computational procedure of lead generation and optimization also led to the discovery and crystallization of picomolar activity NNRTIs selective for the Y181C and Y181C/K103N HIV-RT mutant proteins. 331-333

| Flavivirus NS2B-NS3 protease
Flavivirus is a genus composed of more than 70 viruses causing severe and deadly diseases. The Flavivirus NS2B-NS3 protease complex is critical for the Flavivirus replication and is emerging as an essential drug target. 334 An inspection of the crystal structures of the holo (PDB: 3U1I) 335 and apo (PDB: 2FOM) 336 forms of the Dengue virus DENV protease complexes revealed an allosteric pocket opposite to the active site apparent only in the inactive apo state of the NS2B-NS3 protease complex. 337 This allosteric site was virtually screened with the NCI diversity set II compound library using the AutoDock Vina 338 suite; the top 29 compounds were then assayed as potential allosteric DENV protease inhibitors. Three of them inhibited the NS2B-NS3 protease complex activity, and one showed high potency and effective inhibition of the Dengue virus 2, Zika virus, West Nile virus, and Yellow fever virus. Moreover, MSA revealed that the NS3 residues forming this allosteric pocket are highly conserved among flaviviruses, indicating that this site can be a potential target for drug development for other flaviviruses, too. 339

| HCV nonstructural protein 5B
Hepatitis C virus (HCV) is the cause of hepatitis C, which is an infectious disease leading to liver cirrhosis, failure, and cancer, with millions of affected people worldwide. [340][341][342] One of the nonstructural proteins of HCV, the RNA-dependent RNA polymerase (RdRp-NS5B), essential for replicating the HCV viral RNA, contains five allosteric sites. 343 Pharmacophore-based studies followed by virtual screening and hit optimizations on these allosteric sites yielded potent ligands able to inhibit the function of RdRp-NS5B allosterically. 344,345 We refer the readers to a recent review that summarizes the structure-based and ligand-based drug development against RdRp-NS5B along with other nonstructural proteins of HCV. 346

| Hsp90
Heat shock proteins (Hsps) are responsible for protecting cells when they are stressed by high temperatures but they can also act as chaperones assisting the folding of other proteins. [347][348][349] Hsp90 is emerging as an important cancer target. MD simulations of 70 ns were performed in the ATP bound, the ADP bound, and the apo Hsp90 using the crystal structure (PDB: 2CG9), 350 the GROMACS 351 software, and the GROMOS 352 force field. 353 Subsequently, correlation analysis on the MD trajectories, essential dynamics, and GNM were applied as a signal propagation analysis procedure showing the existence of a connection between the C-terminal domain (CTD) and the ATP site. The five most representative conformations of the CTD were extracted by clustering the MD trajectories, which along with the initial crystal structure were used for the identification of possible allosteric pockets using the pocketFinder module of the ICM 261,262 software. One of the predicted allosteric sites was visible in all conformers and was employed to build a six-feature pharmacophore model for virtual screening. 354 The NCI compound database was filtered applying the allosteric pharmacophore model, and 36 putative lead compounds were selected; 14 were assayed generating biased allosteric inhibitors disrupting the communication with various client proteins. The most potent compound was then rationally optimized 355,356 (summarized in detail in a recent review). 357

| Hsp70
Hsp70 plays a critical role in oncogenesis, and similarly to Hsp90 it is a key cancer target. Hsp70 was modeled using as templates the N-terminal crystal structure of the human Hsp70 protein (PDB: 1S3X), 358 the Escherichia coli Hsp70 substrate binding domain structure (PDB ID: 2KHO), 359 and the Caenorhabditis elegans Hsp70 C-terminus structure (PDB: 2P32) 360 with the PRIME module. 361 After alignment and refinement of the modeled human Hsp70, the SiteMap 362-364 module was utilized for the discovery of putative allosteric sites. Manual inspection of the resulting cavities showed that the most druggable site is located in the N-terminal domain, which was chosen for the design of allosteric inhibitors. Several scaffolds were designed and patented taking into advantage a cysteine residue in the allosteric site for covalent binding. The lead compound was then docked computationally to examine interactions in the allosteric site and was further assayed in vitro proving that this compound is an Hsp70 modulator. Subsequently, SAR studies were performed providing novel anticancer allosteric inhibitors with competent potencies that selectively interact with the novel allosteric pocket of Hsp70. 357,365-367

| 3-Phosphoinositide-dependent protein kinase 1
Protein kinases regulate the activity of other proteins usually by phosphorylating certain amino acids or lipids using ATP . Dysregulation of protein kinases is involved in many diseases and cancer. [368][369][370] The 3-phosphoinositidedependent protein kinase 1 (PDK1 or PDPK1) is a member of the AGC kinase subfamily of protein kinases, which possesses a regulatory site separate from the ATP and the substrate binding sites, termed PDK1-interacting fragment (PIF) pocket. 371 In a study targeting PDK1, the active structure of PKA (PDB: 1ATP), 372 which is a member of the same AGC kinase subfamily containing this allosteric site, was used to define a pharmacophore model 98 instead of the available PDK1 crystal structure, which is in a semi-active state (PDB: 1H1W). 373 A virtual screening exercise of the Maybridge compound database ensued, and a total of 220 visually evaluated hit compounds were tested in vitro. Two of the compounds showed increase of PDK1 activity, and revealed that targeting this allosteric site leads to the transition between the inactive and active states of PDK1.
In a more recent study targeting the same allosteric site of PDK1, 374 a structural ensemble of six PDK1 conformations was produced: the first structure was derived from the cocrystallized structure of PDK1 bound to the allosteric activator PS48 (PDB: 3HRF), 375 where the PIF site is found in a closed conformation, the second from the cocrystallized structure of PDK1 bound to the covalent allosteric modulator 1F8 (PDB: 3ORX), 64 where the PIF site is in an open conformation, which was then replaced in the first model, the third by producing a semi-open PIF conformation using the MODELLER 248 program, and the other three structures were produced by optimizing the position of the Arg131 side chain using PLOP [285][286][287] in the first three PIF conformations. Next, a compound library was built from the ZINC 376,377 database using the DUD-E 378 procedure containing 6300 ligands that were structurally dissimilar from the 112 known binders of the PIF site, but having a similar physicochemical profile with them. These ligands were docked in all six conformers in the PIF site using the DOCK 251,252 program, and three compounds were selected for assaying based on docking score rankings, geometry filtering, and visual inspection. Two out of three compounds bound in the PIF pocket and modulated PDK1 activity. From these two compounds, 518 analogs were extracted from the ZINC 376,377 database, and were docked to the six conformers resulting in 15 compounds that scored equally or better than the two parent compounds. One displayed stronger inhibition than the parent compounds in in vitro assays, and was cocrystallized with ATP bound to PDK1 (PDB: 4XX9, Figure 3).

| Epidermal growth factor receptor
Epidermal growth factor receptor (EGFR) is a transmembrane kinase protein of the receptor tyrosine kinase (RTK) superfamily. Its mutations are implicated in lung cancer, and ligand design is pursued to selectively target mutant forms of EGFR but not the WT protein. 379 HTS of 2.5 million compounds on the L858R/T790M EGFR mutant vs. the WT EGFR unveiled a noncompetitive inhibitor that is highly selective for the mutant protein. 52 The X-ray cocrystal structure of the mutant protein with this inhibitor validated binding in an allosteric site adjacent to the active site of the inactive state (PDB: 5D41). Medicinal chemistry optimizations led to the rational discovery of an allosteric inhibitor being 1000-fold more selective for the L858R/T790M EGFR mutant as well as nonselective for 250 other kinases. Lastly, in vivo studies of the optimized allosteric inhibitor in combination with the dimer-disrupting antibody cetuximab resulted in an effective treatment for mutant-driven lung cancers in mice. Recently, the crystal structure of EGFR T790M/C797S/V948R in complex with this allosteric inhibitor was reported (PDB: 5ZWJ) showing a noncompetitive reversible binding with high specificity and potency for the T790M mutant EGFR. 380

| Tropomyosin receptor kinase A
aTropomyosin receptor kinase A (TrkA) is another member of the RTK family associated with inflammatory pain. 381 In an HTS study allosteric inhibitors of TrkA were discovered and patented. 382 The most selective allosteric inhibitor of TrkA over TrkB was crystallized (PDB: 6D1Y) 383 validating its allosteric nature. This hit molecule was optimized to a lead molecule (PDB: 6D1Z), 383 and a lead optimization virtual chemical library of 11,000 compounds was created based on analysis of water molecule positions and energetics in the allosteric binding site using the WaterMap 384-387 module. 383 This compound library was prepared using LigPrep 388 and was virtually docked using Glide [228][229][230][231] in the allosteric site of the crystal structure mentioned above; the top-scoring 100 compounds were examined in vitro. One of the screened compounds showed significant improvement in potency and selectivity over TrkB/C and was further optimized resulting in an orally F I G U R E 3 The crystal structure of PDK1 with the orthosteric (blue) and allosteric PDK1-interacting fragment-pocket (red) binding sites in surface representation is visualized on the left (PDB: 4XX9). On the right, the 2D representation of the allosteric compound RF4 is depicted in red nontoxic highly selective and potent allosteric inhibitor (PDB: 6D20). 383 In the end, three different allosteric inhibitors were crystallized revealing similar binding patterns (Figure 4, PDB IDs: 6D1Y, 6D1Z, 6D20). 383 4.6 | Phosphatases 4.6.1 | Tyrosine-protein phosphatase non-receptor type 11 Phosphatases are enzymes that regulate the activity of proteins by removing phosphate groups through hydrolysis, usually with an opposite effect than kinases. Phosphatases together with kinases modulate protein activity in cells constituting significant components of the cell regulatory network. Protein tyrosine phosphatases (PTPs) regulate the dephosphorylation of tyrosine residues participating in cell proliferation, differentiation, and survival. 389,390 Tyrosineprotein phosphatase non-receptor type 11 (PTPN11 or SHP2) is a member of PTPs, which mutants are implicated in several pathophysiological diseases such as the Noonan syndrome and juvenile myelomonocytic leukemia. 391,392 In a scaffold-based drug design approach combining pharmacophore modeling, ligand docking, and MD simulations, the crystal structure of SHP2 in complex with the allosteric inhibitor SHP099 (PDB: 5EHR) 393,394 was used to produce ten receptor-ligand pharmacophore models with Discovery Studio software 242 . 96 Using selectivity scores and receiver operating characteristic analysis, the best pharmacophore model was chosen and was further validated by re-docking SHP099 in the crystal structure, which resulted in a similar binding conformation. Subsequently, the allosteric inhibitor SHP099 was divided into three scaffolds, each of which was replaced using the ZINC 376,377 database providing a library of nearly 5 million compounds. ADMET (absorption, distribution, metabolism, excretion, and toxicity) filtering reduced this library to 100,000 compounds, which was then subject to a two-way virtual screening. First, pharmacophore screening using Discovery Studio was performed; compounds scoring higher than SHP099 were then virtually docked using the CDOCKER utility of Discovery Studio, yielding eight compounds with better in vitro binding than SHP099. Subsequently, MD simulations were carried out on the apo structure of SHP2, the SHP2 in complex with SHP099, and the SHP2 in complex with the top-scoring compound for 20 ns each using GROMACS. 351 The top-scoring compound showed the same inhibitory activity as SHP099 albeit with a more favorable binding conformation.
In another study targeting the T253M/Q257L SHP2 double mutant, two additional putative allosteric sites different than the SHP099 allosteric binding site were predicted using SiteMap. [362][363][364]395 After screening a library of approximately 1.5 million compounds experimentally, compounds that were inactive in the full conformation of SHP2, active in the phosphatase domain of SHP2 (active site), and inactive in the T253M/Q257L double mutant SHP2 were discarded. SHP099 is inactive in the T253M/Q257L double mutant of SHP2 and as a result compounds targeting the F I G U R E 4 Superimposed crystal structures of Tropomyosin receptor kinase A bound to three different allosteric inhibitors (PDBs: 6D1Y, 6D1Z, and 6D20). N-terminal residues 481-483 were truncated for clarity SHP099 allosteric binding site are also predicted to be inactive. Subsequently, one weak inhibitor with equipotent activity in the WT and the mutant SHP2 was cocrystallized with SHP2 (PDB: 6BMR 395 ), revealing that this inhibitor binds in a new allosteric site. Structure-based design optimizations taking into account this new allosteric binding site led to the identification of another two inhibitors, which were also crystallized bound to SHP2 (PDBs: 6BMX, 6BMV). 395 Visualization of these three crystal structures revealed that the SHP099 allosteric binding site remained unperturbed offering the possibility of dual inhibition. Dual inhibition was later confirmed with biochemical and crystallographic studies for SHP099 and the three new allosteric inhibitors resulted in biochemical enhancement of IC 50 along with dual binding in both allosteric sites via crystal structures (PDBs: 6BMU, 6BMY, 6BMW). Further SAR optimization studies taking into account the crystal structures of SHP2 cocrystalized with the allosteric inhibitor SHP099 (PDB: 5EHR) and with two HTS allosteric hits (PDBs: 6MD9, 6MDA) produced a pharmacophore model accounting for the protein-ligand interactions of the SHP099 allosteric pocket. 396,397 Many new orally bioavailable allosteric inhibitors demonstrating high potency and selectivity were discovered with some of them being crystalized with SHP2 (PDBs: 6MDB, 6MDC, 6MDD, 6MD7), confirming that they bind to the same allosteric site that stabilizes an auto-inhibited conformation.

| γ-Aminobutyric acid type A receptor
Ligand-gated ion channels are transmembrane proteins, which allow ions to pass the cell membrane after the binding of a neurotransmitter. The γ-aminobutyric acid type A receptor (GABA A R) is a ligand-gated ion channel implicated in the CNS and is an essential target for antiseizure drug development. 398,399 In a study performed before GABA A R crystallization, 37 homology models of GABA A R were derived from cysteine-loop receptor family members, and the allosteric modulator diazepam along with other benzodiazepine ligands were docked using the FlexX 400 and FlexE 401 software packages, in order to discard those with poor lipophilic interactions producing six final models. 402 Then, energy minimization was performed generating thousands of poses, which were clustered to identify similar binding modes, leading to three clusters, in which homogenous subsets of poses were extracted (common binding modes). These were evaluated based on ligand interactions and experimental evidence, and validated by virtual screening of benzodiazepinebinding-site ligands and decoy molecules generated from the DUD 378 database, using pharmacophore modeling with the LigandScout 403 software. The pharmacophore model of the most promising binding mode was used for the discovery of new ligands by screening almost 100,000 compounds from the DUD 378 database. In the top 0.5% ranked compounds, ligands showing anticonvulsive activity were tracked in the literature, and commercially available representatives were assayed leading to the discovery of two hits inhibiting the benzodiazepine-binding site of GABA A Rs. Additionally, docking-based virtual screening was performed on the GABA A R models using the same binding mode employed to generate the pharmacophore model. An in-house library of 1010 chemically diverse fragmentlike molecules was virtually screened using the PLANTS 404 docking method with rigid and flexible docking. From the top 30 ranked compounds, 10 were selected for experimental validation, with two of them inhibiting the benzodiazepine-binding site.
In a similar structure-based allosteric ligand discovery study, targeting pLGICs, and specifically, Gloeobacter violaceus ligand-gated ion channel (GLIC), a close homolog of GABA A R, led to the discovery of allosteric modulators. 405 The crystal structure of GLIC bound to the general anesthetic propofol (PDB: 3P50), 406 which acts as a PAM in GABA A R, was embedded in a dioleoylphosphatidylcholine lipid membrane and was equilibrated for 25 ns using MD simulations with the GROMACS 351 software. Propofol along with 100 decoy molecules selected according to the DUD-E 378 procedure were docked in 302 structures retrieved from the simulation using the DOCK 251,252 program. The structure with the best ranking for propofol was virtually screened using the DOCK program and more than 150,000 molecules from the ZINC 376,377 database. Inspection and filtering of the 700 top-ranked compounds led to 22 compounds selected for assaying. More than half of them were identified as GLIC hits with three of them producing greater modulation than propofol. In the end, these hits were tested on GABA A R, with more than half modulating GABA A R, although none of them having better potency than propofol.
In a ligand-based allosteric drug design study, a pharmacophore model was derived based on two different efficacious allosteric modulators, which putatively bind to the same GABA A R allosteric site, using the LigandScout 403 software. 94 A conformational ensemble was generated for each ligand, and the best matching pose between them was calculated. A shared pharmacophore model was derived from this conformational pair, and it was used as a virtual screening filter of several compound libraries. Various modifications in the selected top hit compound led to the discovery of 39 structural analogs, which were tested in vitro. Two of them were tested in vivo displaying low-dose anticonvulsant activity and GABA A R selectivity.

| Glycine receptors
Glycine receptors (GlyRs) are pentameric glycine-gated chloride ion channels with a critical role in CNS that mediate inhibitory neurotransmission in the spinal cord and brain stem and retina. 407 In an ensemble-based virtual screening study, MD simulations of the transmembrane domain (TMD) of the NMR structure of the human GlyR-a1 (PDB: 2M6I) 408 embedded in pure 1-palmitoyl-2-oleoylphosphatidylcholine (POPC) and POPC/cholesterol lipid bilayer systems were performed using the GROMACS 351 software and the Amber03 409 force field. 410 For each system, three parallel 50 ns replicate simulations were performed followed by RMSD clustering, yielding 180 protein conformational representatives. Then, the DrugBank 411 database of 1549 FDA-approved compounds was virtually screened in the THC binding site of all GlyR-a1 TMD representatives twice, with and without lipid molecules using the AutoDock Vina 338 program. Virtual screening resulted in 14 compounds in the top 25 of both virtual screens, and 11 of them along with another two compounds with a better score than THC were further experimentally tested. Twelve compounds were shown to potentiate GlyR-α1, providing an overall hit rate of 92% with seven compounds being more potent than THC and also being selective GlyR-a1 modulators with respect to other members of the Cys-loop receptor superfamily. Eventually, this led to identifying FDA-approved drugs with analgesic effect.
In a hit-to-lead optimization study performed by Amgen Inc., an HTS hit compound with modest potency was improved using SAR and medicinal chemistry optimizations. This study led to a highly potent potentiator (AM-3607), 412 which was cocrystallized with human GlyR-α3, bound to a novel allosteric site, adjacent to the orthosteric site where glycine binds. 413 Aiming in the discovery of new classes of GlyR potentiators, a virtual library of hundreds of commercially available sulfonamides was designed and evaluated based on the structure and property similarity with the HTS hit compound; this effort led to the property-guided parallel synthetic discovery of two compounds with improved properties. 414 Hit-to-lead and SAR optimizations in one of them resulted in a lead compound with retained potency for GlyRs and improved lipophilic efficiency and selectivity against other Cys-loop receptors. Moreover, a pharmacophore model was built based on the presumed bioactive conformation of the HTS hit compound and an internal Amgen library of more than half a million compounds was virtually screened with MOE, 310 resulting in 4267 hit compounds. These compounds were screened with functional assays and 17 hit compounds were discovered, among which a novel and structurally different class of sulfone modulators.
Recently, a web database named GRALL was designed containing all allosteric ligands that modulate the GlyR-α1 and GlyR-α3 function, providing information on their putative binding sites, the direction of modulation, the potency, and others. 198 This database is expected to be a significant asset in the structure-based allosteric drug discovery of GlyRs.

| Acid-sensing ion channels
Acid-sensing ion channels are also expressed in CNS and are implicated in various neurological diseases such as multiple sclerosis, Parkinson's disease, and many others. 415 In a recent study focusing on identifying novel acid-sensing ion channel 3 (ASIC3) modulators, a ligand-based in silico screening of the FDA-approved drug library eDrug3D using the PAM 2-guanidine-4-methylquinazoline (GMQ) as a query structure was performed. 416 Using ROCS 273,274 in combination with manual inspection of the top 150 ranked hits, identified 5 drugs as potential hits, which were validated experimentally; one drug named GBZ was confirmed as an ASIC3 activator. To assess whether GBZ acts allosterically, three homology models of ASIC3 based on the closed, open, and desensitized states of ASIC1 were used for five independent blind docking runs with GMQ and GBZ in the entire trimer structures using AutoDock. 296,297 Both drugs preferentially docked to a pocket located in the palm domain. Furthermore, a GBZ derivative, called sephin1, which was in clinical trials and was not included in the eDrug3D database, exhibited higher similarity with GMQ than the aforementioned drugs, bound to the same non-proton ligand-sensing domain site, and it was shown experimentally to activate the ASIC3.

| Nicotinic acetylcholine receptors
Nicotinic acetylcholine receptors are members of the Cys-loop family of ligand-gated ion channels and are implicated in neurological and neuromuscular disorders. 418 In a similar study of virtual screening already FDA-approved drugs on the α7 nicotinic acetylcholine receptor (α7 nAChR), 25 known α7 nAChR PAMs were docked in a transmembrane site of revised structural models of the α7 nAChR TMD 419 in the open and closed conformations. 420 These PAMs were then clustered based on RMSD, and six pharmacophore models were created for the largest clusters in the open and the closed conformations using ROCS. 273,274 A set of 42 known α7 nAChR PAMs along with decoy molecules generated using DecoyFinder 421 and the ZINC 376,377 database were virtually screened to validate the pharmacophore models. The best pharmacophore model for the open and the closed conformations was chosen for virtual screening of the FDAapproved drug DrugBank 411 database. Filtering of the top 100 hits from each model resulted in 81 compounds, and four of the highest-ranked members of the four classes of compounds that were represented in the 25 top-ranked hits were experimentally tested, resulting in an inhibitor of Na-K-Cl cotransporter protein, called furosemide, acting as a PAM, and a carbonic anhydrase II inhibitor, a cyclin-dependent kinase 2 inhibitor, and a DNA gyrase antibiotic, acting as NAMs.

| Cysteine-aspartic proteases 3 and 7
Cysteine-aspartic proteases (caspases) are a family of enzymes implicated mainly in cell apoptosis. Caspase deficiency is connected with tumor development, while its upregulation increases cell death. 422,423 Caspase-3 is a significant target as it is a primary effector in the execution-phase of cell apoptosis. 424 In a site-directed fragment-based ligand discovery study searching for chemical fragments that bound near existing surface cysteines, an allosteric site of caspase-3 was discovered, and then two allosteric inhibitors named DICA and FICA were identified and crystallized with caspase-7 (PDB: 1SHJ and 1SHL respectively), 425 as caspase-3 and -7 display high sequence identity and structural similarity. 426,427 In a recent computational caspase-3/7 allosteric drug design study, the caspase-7/DICA complex crystal structure was employed to virtually screen a library of FDA-approved drugs from the ZINC 376,377 database using DOCK 251,252 after the breaking of the disulfide bond of Cys290/DICA for the identification of reversible binding allosteric bioactive ligands. 428 The exact same procedure was implemented for the orthosteric site to distinguish compound features depending on their preferred binding site. Comparing the top 25 allosteric hits with the top 25 orthosteric hits, differences were observed in their average net charge and polar/apolar desolvation. Taking into account these results, only 5 compounds that scored in the top 100 in the allosteric site and did not score in the top 100 in the orthosteric site were further tested experimentally. Four compounds showed significant inhibitory activity in caspase-3, concluding that these already FDA-approved drugs can inhibit the caspase-3 and -7 allosterically.
4.9 | Oxidoreductases 4.9.1 | D-3-phosphoglycerate dehydrogenase D-3-phosphoglycerate dehydrogenase (PHGDH) is a tetramer enzyme that belongs to the family of oxidoreductases, which catalyzes chemical reactions utilizing NAD+ as a cofactor. 429 It is the first enzyme in the pathway of serine biosynthesis and can also be allosterically inhibited by serine, thus regulating the serine production. In a computational study using PHGDH from E. coli, the coarse-grained two-state G o model 430 was used to identify novel allosteric sites by generating an ensemble of active and inactive states. 431 Using the unbound (PDB: 1YBA) 432 and the bound (PDB: 1PSD) 433 crystal structures, LD simulations were performed using the Cafemol 434 software, parameterized so that the unbound state was dominant in the ensemble. Moreover, the CAVITY 86,87 program was implemented to predict new pockets, and subsequently, perturbations were induced in these pockets in another coarse-grained simulation. A site was considered as allosteric in the event of a population distribution change. Two novel allosteric sites were predicted in the PHGDH enzyme, and virtual screening was performed in one of them using DOCK 294,295 with the SPECS 293 compound library and then using the AutoDock Vina 338 program to screen the top 10,000 compounds. The top-ranked 1000 compounds were manually inspected and 178 were tested in vitro. Three novel allosteric inhibitors were discovered, revealing a low μM inhibitor and validating the predicted site as allosteric.
The same virtual screening procedure was applied on the other predicted allosteric site, where 170 compounds were tested in vitro resulting in significant inhibition of the PGDH activity from three compounds. 435 Intriguingly, two of those exhibited a rise in the activity of PGDH in low concentrations and inhibition in high concentrations. SAR studies were conducted on these two compounds using the PHASE 436 engine and compounds with high similarity to the parent compounds were virtual docked using the Glide 228-231 module. Twenty five compound analogs from each of the two parent compounds were selected for in vitro experiments resulting in seven allosteric inhibitors in total, with one of them having a better inhibitory activity than the parent compounds.

| Glutathione peroxidase 4
Glutathione peroxidase 4 (GPX4) is one of eight members of the glutathione peroxidase family, which is a subfamily of oxidoreductases that protect cells against oxidative damage. Downregulation of GPX4 is associated with inflammation and cell ferroptosis. 437,438 In a recent study for rationally designing allosteric activators, the pocket detection CAV-ITY 86,87 program was utilized on the human U46C mutant GPX4 crystal structure (PDB: 2OBI), 439 and the predicted sites were analyzed with the motion correlation CorrSite 109 program, identifying one of the predicted sites as allosteric. 440 This site was also further validated as allosteric by analyzing its dynamic changes and the correlated motions with the substrate site using MD simulations. Subsequently, the SPECS 293 library was screened using the Glide [228][229][230][231] module with the SP scoring function, and then the top 10,000 compounds were virtually screened with the more accurate XP scoring function. The top 2000 compounds were inspected manually, and 251 of them were tested experimentally resulting that the top compound selectively increases the maximum GPX4 activity by 263%. SAR studies of the top compound identified eight new GPX4 activators with some being more potent in suppressing ferroptosis and inflammation than known activators. Sirtuin 6 (SIRT6) is a protein with multiple functions including modulation of cellular senescence and apoptosis, DNA repair, tumor necrosis factor protein production regulation, and others. [441][442][443] The Zhang group used their allosteric pocket detection tool, Allosite, 145,149 to identify putative allosteric sites on the SIRT6 protein (PDB: 3ZG6). 444,445 Allosite predicted a previously-discovered deep hydrophobic allosteric site, 446 and subsequently seven compound libraries consisting a total of more than five million compounds were virtually docked using Glide. [228][229][230][231] The top 20 ranked compounds were experimentally tested, with two demonstrating μΜ activation. The common scaffold of these two compounds was used as a starting point for medicinal chemistry optimizations, leading to two compounds with an order of magnitude lower EC 50 , and strong selectivity in the sirtuin protein family. Binding of these compounds in the allosteric site was verified experimentally with mutagenesis and crystallography (PDB: 5Y2F), delivering promising lead compounds for non-small cell lung cancer. 447

| High-temperature requirement A protein
Helicobacter pylori is a gram-negative bacterium found in the stomach linked with gastritis, ulceration and stomach cancer. [448][449][450] The high-temperature requirement A protein (HtrA) is responsible for the migration of this bacterium across the epithelial barrier and is a therapeutic target for the abovementioned diseases. 451 In a receptor-based virtual screening approach for the identification of allosteric inhibitors of HtrA, 93 a homology model from various HtrA protein sequences followed by sequence alignment using the BLAST 452 tool showed that the 3D structure with the highest sequence identity was the 3MH6 453 PDB. Then, the 3MH6 sequence was aligned with the other sequence homologs using the ClustalW 454 program, and the HtrA comparative model was built using MODELLER 248 . A pocket distal to the orthosteric site was discovered using the PocketPicker 455 tool. Subsequently, the PoLiMorph 456-458 receptor-based screening module, which evaluates complementarity of pockets and ligands, was used to calculate graph descriptions of the distal site and ligands collected from the SPECS 293 Natural Products and Screening Collection libraries. From a chosen hit, two analogs were derived from a ligand-based similarity search protocol using PoLiMorph and one of them displayed in vitro allosteric inhibitory action on HtrA.
4.12 | Beta-lactamases 4.12.1 | TEM β-lactamase TEM β-lactamases are enzymes produced by gram-negative bacteria responsible for drug resistance of β-lactam antibiotics. 459 In the study of Bowman et al., 1000 MD simulations were performed in the apo TEM β-lactamase structure (PDB: 1JWP) 460 as well as 400 MD simulations including a known allosteric inhibitor (PDB: 1PZO) 461 using GROMACS 351 and the Amber03 force field 409 for a total aggregated simulation time of 100 μs in order to explore the conformational space. 174 Then, MSMs were employed utilizing MSMBuilder 462 to capture the protein states in the energy landscape minima. Putative binding pockets were identified using LIGSITE 463 in representative structures from each MSM. To distinguish which pockets were cryptic, pockets identified in the apo structure were discarded, and the rest were clustered, successfully identifying a known cryptic allosteric site along with another two novel cryptic pockets, which were verified experimentally. 175 Subsequently, virtual screening was applied to the known cryptic allosteric site against a library of 12,695 compounds with the Surflex-dock program 464 using PDB ID: 1PZO. The top-40 scoring compounds were tested experimentally resulting in a hit compound acting as PAM. 465 The same compound library was then docked in 15 structures of the most populated protein states of the MSM containing this cryptic site using a Boltzmannweighted docking approach that takes into account the population information from the MSM. 466  Signal transducer and activator of transcription 3 (STAT3) is a transcription factor implicated in cancer and inflammatory and cardiovascular diseases. 467,468 Allosteric drug discovery on STAT3 has been performed using Allofinder, a platform consisting of a combination of software tools for drug discovery. 185 Allosite was first employed to predict five putative allosteric sites in STAT3 (PDB: 3CWG). 469 One of the sites is located in the CCD domain implicated in STAT3 regulation, far from the STAT3-DNA interface. Virtual screening was performed in this site with AutoDock Vina 338 and the SPECS 293 compound library. Compounds were ranked by a binding scoring method for affinity prediction of allosteric ligand-protein interactions, Alloscore. 187 The top 15 hits were assayed, with one compound inhibiting STAT3. Site-directed mutagenesis confirmed the binding of this compound in the predicted allosteric pocket.
4.14 | Peptidase C1 proteins 4.14.1 | Cathepsin K Cathepsin K is a protease responsible for bone resorption, and is a prime therapeutic target for osteoporosis. 470,471 The SCA method was performed in the MSA of an ensemble of 1239 catalytic domains from the papain-like cysteine peptidases family followed by hierarchical clustering, revealing a protein sector of coevolving residues with a continuous residue network. 472 Alanine-scanning of 15 single-substitution mutants verified this residue network as allosteric. Using AutoLigand 473 on Cathepsin K (PDB: 1ATK) 474 predicted seven putative allosteric pockets in contact with sector residues. Compound libraries from the ZINC 376,377 database were virtual screened in all seven predicted binding sites using DOCK, 294,295 and the top-scoring 10% compounds were subsequently virtually screened with AutoDock. 296,297 More than 200 compounds were assayed and, 15 showed potency and one was crystallized identifying one of the predicted sites as a novel allosteric site (PDB: 5J94). Further experiments showed that this compound is a potent and selective Cathepsin K allosteric inhibitor.

| K-Ras
Ras proteins are key regulators of cell growth signaling pathways that are switched on/off in a GTPase cycle between the active GTP-bound and the inactive GDP-bound states. Dysregulation of Ras proteins occurs in 25% of human cancers. 475,476 The K-Ras protein, which is a member of the Ras family and one of the most common human oncogenes, was computationally studied by performing Mg 2+ /GTP bound and Mg 2+ /GDP bound K-Ras MD simulations using the AMBER 477,478 software and force field. 479 The K-Ras bound states were modeled using the human K-Ras crystal structure in complex with a GTP analog (PDB: 3GFT). Subsequently, a protein conformational ensemble was produced from Ras crystal structures and MD snapshots to account for K-Ras flexibility using PCA and RMSD clustering with the Bio3D 480 R package. The FTMap 83,85 program and the AutoLigand 473 module were used in this conformational ensemble to identify novel binding pockets. In addition, blind docking was conducted with known Ras inhibitors derived from HTS using AutoDock 296,297 and a grid covering the whole crystal structure. Five nonsubstrate pockets were identified, named p1, p2, p3, p3b, and p4, and a virtual screening exercise was performed in all of them and all conformers using the NCI and ZINC 376,377 compound libraries and Glide [228][229][230][231] . The top compounds were rescored with eight other scoring functions and were then filtered with Lipinski's rule of 5. 481 19 top-scoring compounds targeting the p3 site, which was the only one present in all three pocket discovery methods applied in this study, were assayed; hits displaying moderate affinity and selectivity serving as starting points for lead optimization were discovered.
The same research group also performed a large scale MD simulation (1.76 μs in total) of GTP-bound Q61H mutant K-Ras (PDB: 3GFT) using NAMD 482 and the CHARMM 219,220 force field. 483 A conformational ensemble of 75 protein structures were extracted based on RMSD clustering, and andrographolide and derivatives were blind-docked identifying three putative binding sites. 484 The binding stability of a chosen ligand was tested in two of these pockets with five independent MD simulations for each pocket resulting in a preferred binding pocket. In vitro studies of these compounds established K-Ras inhibition for two compounds. Then, authors focused on G12D K-Ras and performed 300 ns MD simulations (PDB: 4DSO) 485 to generate an ensemble of conformations with an open allosteric site p1. 486 Six million compounds were virtually screened in the conformation with the most open p1 site, using AutoDock 296,297 and the ZINC 376,377 database, and top-scoring compounds were re-scored with AutoDock Vina. 338 The top 500 hits from each screen were filtered based on their interactions with the protein, and 11 compounds were assayed. Two were found to inhibit K-Ras by disrupting the K-Ras-Raf interaction.
Concurrently, MD simulations were performed in the WT K-Ras, and in the four major mutants G12D (PDB: 4DSO), 485 G12V, G13D, and Q61H (PDB: 3GFT) K-Ras for 200-400 ns each. 487 The G12V and G13D K-Ras mutants were created from PDB ID 4DSO. 8-12 representative conformations were selected for each of the p1, p2, p3, p4 allosteric pockets and each of the WT and mutant proteins, based on RMSD clustering and pocket analysis with MDPocket, 488 resulting in 162 final structures. 23 million compounds from the ZINC 376,377 database were filtered based on drug-like properties and many other features, resulting in 350,000-700,000 ligands per mutant per pocket, which were virtually screened (76 million docking runs) using Glide. [228][229][230][231] For each pocket, docking results were combined and the topscoring 10% compounds were clustered and post processed with the DataWarrior 489 program resulting in 785 hits, with 87% populating the p1 and p2 allosteric sites. Although clustering of the 785 hits displayed high diversity, scaffold diversity analysis revealed that more than half of them contain similar scaffolds, including pyrazole, indole, and thiazole. Ninety hits from the p1 and p2 allosteric sites were assayed with NMR, resulting in nine compounds (10% success rate) causing NMR amide chemical shift perturbations, and one was shown to bind to p1 with low μM affinity.
In another study targeting the G12D K-Ras mutant, virtual screening of 1.36 million compounds from the Chemdiv 490 library resulted in a hit compound, 0375-0604, which was selected for further study. 491 The G12D K-Ras mutant (PDB: 4DSU) 485 was used to create the G12C and Q61H K-Ras mutants, and 0375-0604 was docked in a hydrophobic binding pocket adjacent to the switch I and switch II regions in all mutant structures using Glide. [228][229][230][231] In vitro experiments showed binding affinity to K-Ras, selectivity for the mutant forms of K-Ras over the WT, and the potential to block the binding of GTP.
Moreover, the G12C mutation of K-Ras provides with the opportunity of covalent allosteric ligand design. The advantages of this irreversible binding, as well as, the progress in designing covalent allosteric drugs rationally for K-Ras are described below.

| Covalent allosteric modulators
Small molecule ligands can be grouped into two major categories based on their interactions with the biological target. Noncovalent ligands interact with the binding site in a reversible manner, while covalent ligands bind irreversibly with protein amino acids containing nucleophiles such as hydroxyl or sulfhydryl groups found in serine, cysteine, threonine, or tyrosine. This irreversible binding is transcribed in lower administered dosage as covalent ligands do not detach from their targets. Subsequently, this leads to higher potency and high energy barriers between active and inactive conformations, 492 which together with the aforementioned advantages of allosteric modulators such as the high specificity and the noncompetitive inhibition imply that covalent allosteric ligands are worthy of exploration.
Examples of rational covalent allosteric modulator design have been reviewed in Ref. 48. An intriguing case is the oncogenic G12C mutation of GTPase K-Ras4B, for which SAR and crystallographic studies identified covalent allosteric inhibitors forming a disulfide bond with the mutant residue G12C (PDBs: 4LYJ, 4M22, 4M1Y, 4M1W, 4M1T, 4M1S, 4M1O, 4LYH, 4LYF, 4LV6, 4LUC); these inhibitors displayed significant in vitro potency. 493 Further iterative structurebased design in one of these inhibitors led to the discovery of a very potent and efficacious covalent allosteric inhibitor, named ARS-853, which stabilizes the inactive state of GDP-bound G12C mutant K-Ras4B (PDB: 5F2E). 494,495 Further scaffold and SAR optimizations led to the development of ARS-1620, which was also cocrystallized with K-Ras G12C (PDB: 5V9U). 496 ARS-1620 exhibited in vivo potency in K-Ras tumors. 496 By exploiting the H95/Y96/Q99 cryptic pocket, 497 ARS-1620 was further optimized with iterative substitutions, SAR optimizations, and crystallography, leading to AMG 510 (PDB 6OIM, see Figure 5), 498 which significantly inhibited tumor growth in vivo, and is the first G12C K-Ras inhibitor to enter clinical trials by Amgen Inc. 498,499 AMG 510 (sotorasib) was granted accelerated approval (Lumakras™, Amgen, Inc.) by the FDA on May 28 2021, for adult patients with KRAS G12C mutated locally advanced or metastatic non-small cell lung cancer, who have received at least one prior systemic therapy. This application was granted priority review, fast-track, breakthrough therapy and orphan drug designation. Information about the clinical trials of sotorasib can be accessed at https://clinicaltrials.gov under the identifier NCT03600883.
Another covalent allosteric inhibitor case is the rational design and synthesis of novel allosteric inhibitors that covalently modify the protein kinase Akt. 500 Co-crystal structures of these inhibitors with Akt can be found in PDB: 6HHF 501 as well as another four structure-based designed and synthesized covalent allosteric inhibitors (PDBs: 6HHJ, 6HHG, 6HHI, 6HHH), identified as potent candidates for in vivo studies and further lead optimization. 502 In an allosteric communication computational study based on evolutionary data performed by Tang et al. 503 for the Myosin-2 motor protein, a pathway connecting the ATP site with the lever arm of the protein was discovered using MSA and conservation weights. To predict new putative allosteric binding sites near this pathway, we used the fragment-based computational mapping server FTMap, which predicts binding hot spots of macromolecules, 83,504 in the F I G U R E 5 On the left panel the crystal structure of K-Ras G12C mutant cocrystallized with allosteric covalent inhibitor ARS-1620 and GDP (PDB ID: 5V9U) is depicted. On the right panel, the crystal structure of K-Ras G12C mutant cocrystallized with allosteric covalent inhibitor AMG 510 (sotorasib) and GDP (PDB ID: 6OIM) is seen. Binding of AMG 510 opens up the cryptic pocket enclosed by H95/Y96/ Q99, which are depicted in red crystallographic structure PDB: 2XO8 505 ( Figure 6). Indeed, we identify a pocket near the allosteric pathway, and specifically next to Met486; this finding could open up the possibility of allosterically, and possibly covalently, targeting Myosin-2 by modulating the ATP site through this allosteric pathway. 506 Several other covalent allosteric inhibitors have been co-crystallized with their protein targets have been crystallized such as the ubiquitin-like modifier SUMO E1 in complex with a covalent allosteric inhibitor targeting a cryptic pocket with high specificity (PDB: 6CWY) 507 and the natural product novolactone bound covalently and allosterically to the Hsp70 (PDB: 4WV7) 508 paving the way for further structure-based allosteric design.

| DISCUSSION AND CONCLUSIONS
Rational design of allosteric modulators is gaining increasing attention in the last decade. Traditionally, targeting orthosteric sites has been the primary approach for drugging proteins; although orthosteric drug discovery always remains an attractive practice, allosteric inhibition now offers a new paradigm to modulate receptor function. Allosteric or noncompetitive inhibition allows for modulation rather than complete inhibition, and unique allosteric sites among protein family members for higher selectivity, lower toxicity, and fewer side effects. The advantages of allosteric drug design outmatch the hurdles, and a continuous stream of studies has emerged for the development of allosteric modulators in the last decade. Although to date most allosteric bioactive ligands and drugs have been discovered through HTS, technological improvements and advances in understanding allosteric mechanisms have led to new structural allosteric data and rational procedures for targeting biological targets allosterically and to the emergence of new tools specifically for allosteric ligand design.
A general process for allosteric ligand design cannot be standardized as demonstrated in the different approaches in case studies presented herein. Depending on the available data of the biological target, e.g. availability of 3D structure, known allosteric site, known allosteric modulators, different strategies can be adopted. When the protein 3D structure is not known, homology modeling can be utilized to create a model (or several models) of the protein. 90,91,93,205,361 Usually, homology models are then validated for their proximity to the real structure by successfully docking known active ligands with better predictive score than inactive ligands and decoys, 90,91 or other techniques such as constructing the average structure from MD simulations 205 and evaluating it based on experimental data. 91 When the 3D protein structure is unknown but allosteric modulators are known, that is, from HTS, or when the allosteric site is known, ligandbased and receptor-based pharmacophore modeling is another option. 91-96, 98, 344, 345, 354 The basic difference between allosteric and orthosteric ligand design is that in most cases the allosteric site is undetermined. When the allosteric site F I G U R E 6 The crystal structure of Myosin-2 (PDB: 2XO8) and the calculated allosteric pathway shown in CPK representation. With green we show the fragments produced by FTMap. In the inset, a close-up of the predicted binding site next to Met486 (the sulfur atom of Met486 that could facilitate allosteric binding is shown in yellow) location is known from literature or crystallography, as in several studies discussed herein, 95,96,98,222,344,345,374,383,428 one can proceed with a standard rational drug design procedure including simulations, virtual screening, ensemble docking, and lead optimization. When the protein target contains multiple pockets on its surface, most commonly it is not known whether small molecule ligands in these sites could modulate the activity of the distal active site or whether they reorder the protein conformational landscape. Various methodologies and tools specific to allosteric site prediction have been designed and applied based on residue coevolution, 472 correlated residue motions, 283,353,440 perturbations, 431 and machine learning, 185,445 with many of them being combined with MD, MC, or NMA, as discussed in Section 3 and in other reviews. 30,32,33,60,105,110,129,135,[180][181][182][183] Although these tools can correctly predict allosteric sites, they have been utilized only in a few allosteric modulator design cases and mostly by their authors. 185,283,440,445,472 Other methods for detecting allosteric sites are blind docking, 416,479,484 where the whole protein surface is docked by known allosteric modulators, and spontaneous binding, 67 where the ligand locates the allosteric binding site by exploring the receptor surface in MD simulations, although these approaches are computationally expensive.
Considering that allostery is a property of the protein conformational landscape, the protein conformational space of the apo and the holo structures can be explored using simulations in order to generate an ensemble of conformations. 174,222,353,431,479 By examining this conformational ensemble it is possible to discover cryptic sites not visible in the original crystal structure. Targeting such cryptic sites may result in allosterically modifying the protein function. 174,301,465,507 Although a plethora of methodologies and software programs have been designed for the discovery F I G U R E 7 Possible mechanisms describing the binding of a ligand or a probe in a cryptic site. (a) In the conformational selection mechanism the protein must adopt a conformation in which the cryptic pocket is open in order for the ligand to bind. (b) In the induced-fit mechanism the cryptic binding pocket is closed and conformational changes are only observed when the ligand encounters the pocket. (c) A mixed approach combines both mechanisms; first the cryptic pocket adopts a semi-open form with the conformational selection mechanism, and then, the ligand selects these conformations and binds with the induced-fit mechanism, fully opening the cryptic binding site 66,513 and druggability assessment of cryptic pockets (Table 1), the investigation of whether these cryptic pockets act allosterically is not explored, leaving room for improvement in the cryptic allosteric binding site prediction and drug design. Furthermore, it was recently demonstrated that cryptic sites lie in flexible regions and can be open in apo structures. Using a conformational selection mechanism (Figure 7a), the protein adopts a conformation in which the cryptic pocket is favorable for ligand binding. With the induced-fit mechanism (Figure 7b), which was until recently considered as the most probable mechanism of cryptic site detection, the cryptic binding pocket opens only when the ligand encounters the pocket. 513 Previously, it was shown that an interplay between these mechanisms may take place for the formation of cryptic sites (Figure 7c). 66 It is possible that in the apo structures cryptic sites may acquire a semi-open form or in rare events an open form but further research is needed.
Moreover, in a few cases presented herein, an allosteric pocket was discovered adjacent to the orthosteric pocket. 233,259,301 The proximity of such sites provides the opportunity and the possibility to create bitopic ligands, consisting of a moiety interacting with the orthosteric site and an allosteric moiety interacting with the extended allosteric part of the pocket. 232 A few covalent allosteric drugs have also been discovered. In the putative allosteric site, detection of a druggable cysteine, threonine, serine, or tyrosine offers the opportunity of covalent allosteric modulator design, 119,194,500 these may offer additional advantages such as smaller drug dosages and more efficacy.
After the identification of putative allosteric sites, the drug design process of allosteric bioactive ligands is fairly traditional. Although allosteric modulators have different biological response than orthosteric ligands, 62,187,514 standard procedures of structure-based or ligand-based virtual screening using docking or ensemble docking for the most representative protein conformers or pharmacophore modeling of large databases counting millions of compounds, followed F I G U R E 8 Proposed workflow for computational allosteric ligand design by filtering and clustering, are usually followed as evident from the case studies discussed herein. The general chemical characteristics that allosteric modulators have, for example low molecular weight, being hydrophobic, only a few rotatable bonds, can be taken into account for manual inspection of the top scoring compounds that will be selected for assaying. A simplified workflow for rational allosteric ligand design is presented in Figure 8, and a variety of available tools predicting allosteric sites and pathways is presented in Table 1. One major point that is often overlooked in allosteric ligand design is the communication of the discovered pocket with the active/orthosteric site or other functionally important regions of the protein. It is not enough to identify and verify the location of a new protein binding pocket; the association of these putative allosteric binding sites with the orthosteric site should then be examined. 51 The conformation and sequenced-based tools, outlined in Section 3, can identify binding sites as well as predict their association with the orthosteric site. Although conformation and sequenced-based approaches are seemingly not connected, they could actually be two sides of the same coin. The ensemble view of allostery supports both the population shift through change in conformational dynamics but also matches evolution. This relationship was recently discussed in Refs. 515, 516. Rational allosteric drug design has clearly advanced in the last decade and at the same time has generated significant interest and areas for potential improvements. The case studies presented herein demonstrate successful rational design approaches and principles for the discovery of potent allosteric modulators even for targets that were previously thought to be undruggable, for example, K-Ras. Moreover, the diversity of the proteins that were subjected to rational design of allosteric modulators supports the fact that allostery is a property of the conformational ensemble, and thus all proteins may be considered having an allosteric nature. Progresses in computational power, and structural and computational/methodological advancements are expected to lead the rational design of allosteric modulators to a more mature stage, holding an auspicious future in drug design.

CONFLICT OF INTEREST
The authors have declared no conflicts of interest for this article. DATA AVAILABILITY STATEMENT Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
ORCID Zoe Cournia https://orcid.org/0000-0001-9287-364X RELATED WIREs ARTICLES Computing protein dynamics from protein structure with elastic network models Computational structure-based drug design: Predicting target flexibility Cooperative dynamics of proteins unraveled by network models