In silico homology modelling and identification of Tousled-like kinase 1 inhibitors for glioblastoma therapy via high throughput virtual screening protein- ligand docking 1,2Kamariah Ibrahim, 3Abubakar Danjuma Abdullahi, 1Nor Azian Abdul Murad, 1,4Roslan Harun, 1Rahman Jamal 1 UKM Medical Molecular Biology Institute, Universiti Kebangsaan Malaysia, Malaysia; 2Biomedical Science Department, Faculty of Medicine, University of Malaya,Kuala Lumpur; 3Pengiran Anak Puteri Rashidah SaÕadatul Bolkiah Institute of Health Sciences, University of Brunei Darussalam; 4KPJ Ampang Puteri Specialist Hospital, Ampang, Selangor, Malaysia Received on 16/01/2018 / Accepted on 25/06/2018 Abstract Glioblastoma multiforme (GBM) is a high-grade brain tumor of which the survival patients remain poor. Tousled-like kinase 1 (TLK1), a serine-threonine kinase, was identified to be overexpressed in cancers such as GBM. TLK1 plays an important role in controlling survival pathways. To date, there is no structure available for TLK1 as well as its inhibitors. We aimed to create a homology model of TLK1 and to identify suitable molecular inhibitors that are likely to bind and inhibit TLK1 activity via in silico high-throughput virtual screening (HTVS) protein-ligand docking. The 3D homology models of TLK1 were derived from various servers. All models were evaluated using Swiss Model QMEAN server. Validation was performed using multiple tools. Energy minimization was performed using YASARA. Subsequently, HTVS was performed using Molegro Virtual Docker 6.0 and ligands derived from ligand.info database. Drug-like molecules were filtered using ADME-Tox filtering program. Best homology model was obtained from the Aurora B kinase (PDB ID:4B8M) derived from Xenopus levias structure that share sequence similarity with human TLK1. Two compounds were identified from HTVS to be the potential inhibitors as it did not violate the Lipinski rule of five and the CNS-based filter as a potential drug-like molecule for GBM. Introduction Glioblastoma multiforme (GBM) is the most common primary brain tumour in adults. It is also classified as grade IV glioma which arises from the lineage of star- shaped glial cells known as astrocytes. The survival rate is very poor where only 15% of patients survived more than 24 months due to disease aggressiveness and heterogeneity of the disease [1, 2]. Although several molecular inhibitors have been developed to target aberrantly expressed enzymes and proteins, the results have been very frustrating [3, 4]. Factors contributing to resistant of GBM cells include deregulation of key signalling pathways, namely PTEN, TP53, RB and PI3K-Akt [5, 6], increased in the expression of anti-apoptotic proteins BCL2 and survivin [7, 8], iterative perivascular growth within the highly vascularized brain [9], and presence of 30-65% constitutively active EGFRvIII mutant in GBM which secretes higher levels of invasion-promoting proteins [10](Sangar et al., 2014). Studies have revealed that Tousled-like kinase 1 (TLK1) is overexpressed in breast cancer [11], prostate cancer [12], and cholangiocarcinoma [13]. In our unpublished study, we proved that TLK1 is overexpressed in GBM and silencing of TLK1 results in a significant decrease in invasion, migration and GBM cells survival [14]. Human TLK1 contains 766 amino acids and is one of the members of the Tousled-like kinase family consisting of TLK1 and TLK2 [15]. The gene is mapped on chromosome 2q31.1 and encoded by 25 exons. TLK1 share 85% sequence identity to TLK2, and both share ~50% sequence identity with Arabidopsis thaliana where Tousled-like kinase family was initially identified [16]. This serine- threonine kinase is an important signalling regulator mainly involved in the cell cycle regulation, cellular mitosis, cell survival, and proliferation [17]. In general, the N-terminal domain of Tousled-like kinase is well conserved to include three potential nuclear localization sequences and three putative coiled-coil regions, while the C-terminus region contains the catalytic ATP-binding domain at the region that consists of 456 to 734 amino acid residues. The active binding site is located within the protein kinase domain sequence [18]. This ~90 kDa kinase is activated by the CHK1/ATM DNA damage pathway [19]. TLK1 interacts with its substrates, namely Asf1, histone H3 [20], and Rad9 [17] to activate DNA damage and DNA repair activity [21]. It was suggested that when overexpressed, TLK1 is involved in radioprotection and chemo-resistance of cancer cells [22]. Unfortunately, the structure of TLK1 has not been elucidated and this hinders the full understanding of TLK1 biological processes. Nonetheless, the X-ray diffraction data for the kinase domain of human TLK1 family member TLK2 have been recently reported which may shed a light on structural understanding of human Tousled-like kinase [23]. No structure is yet available for both TLK1 and TLK2, hence, we perform a homology modelling study of TLK1 structure to understand its function in orchestrating cellular functions particularly in cancer pathways. In this study, we present a structural homology model of the TLK1 catalytic binding domain which may serve as a potential target for molecular inhibitors. We then used the proposed structure to identify potential inhibitors for TLK1 by utilising in silico ligand- docking with high throughput virtual screening (HTVS) targeting more than 16,000 candidate compounds. Materials and methods Template identification and homology modelling The amino acid sequence of human TLK1 was retrieved from UniProt with the accession number: Q9UKI8 (http://www.uniprot.org/). The TLK1 FASTA format amino acid sequence was downloaded into the BLASTP and PSI-BLAST search (http://blast.ncbi.nlm.nih.gov/) in order to identify homologous proteins. The appropriate template for TLK1 was blasted against pdb database. Query identified suitable template based on the E-value of 2e-26 to 2e-22 and sequence identity ranging from 30% to 33% at the protein kinase catalytic domain indicating similarity of structure and function. The template and the target sequences were later aligned using the Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/). Subsequently, homology modelling was carried out against the chosen template using HOmology ModellER [24], I-Tasser [25], and PsiPred [26]. Analysis were performed in October 2013 until January 2014. Homology models quality estimation We used 20 pdb files created and generated from two homology modelling servers; Homology modeller (HOMER) and i-Tasser to further estimate and ensure appropriate model selected are in good quality. The model quality estimation was performed using the Swiss-Model Qualitative Model Energy Analysis (QMEAN) Server (https://swissmodel.expasy.org/qmean/), of which the composite scoring function, derives a quality estimation on the basis of the geometrical analysis of single models [27]. It also describes the major geometrical aspects of the protein structures. Five different structural descriptors were used. The local geometry was analyzed using the torsion angle potential function over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential was used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, were also included. In comparison with other protein structure evaluation servers, the QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models [28]. 3D structure was then visualized using PyMol software (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schršdinger, LLC). Validation of modelled structure The best homology model created was used for further investigation. We used the latest version of PDBsum (http://www.ebi.ac.uk/thornton- srv/databases/pdbsum/) which provides further information on protein function prediction, structural topology, PROCHECK and cleft analysis. We also used ProSA which displays scores and energy plots that highlight potential problems spotted in protein structures [29]. Prediction of the protein structure function was performed using proteo-genomic analysis software 3d2go (http://www.sbg.bio.ic.ac.uk/phyre/pfd/html/help.htm l). This allowed full structural scan of the protein structure made against the Structural Classification of Proteins (SCOP) database using a modified version of BLAST [30]. Energy minimization was performed on YASARA server (http://www.yasara.org/minimizationserver.php). High throughput in silico ligand-docking analysis In silico ligand-docking analysis was performed using Molegro Virtual Docker (MVD version 2013.6.0) to predict protein-ligand interactions. The potential binding sites of selected proteins and candidate small molecules were characterized by the molecular docking algorithm called MolDock which was derived from ŅPiecewise Linear Potential [31]. The MolDock score refers to the approximate binding energies between protein and ligand which is usually expressed in kcal/mol. This software handles all aspects of the docking process from the preparation of the molecules to determine the potential binding site of the target protein, and the predicted binding modes of the ligand. Interestingly, MVD has been shown to provide higher accuracy compared with the other commercially available docking softwares e.g. Glide, Surflex and FlexX [32]. Docking requires five steps; importing molecules, importing ligands, molecular preparation, creating template and docking. Candidate ligands for ligand-docking screening were downloaded from Ligand.Info (http://ligand.info/) which compiles various publicly available databases of small molecules and compounds from ChemBank, KEGG, ChemPDB, Drug-likeness NCI subset and non-annotated NCI subset [33]. We downloaded a total of 16,358 sdf. format small molecules from KEGG ligands (10,005), ChemBank (2,344) and ChemPDB (4,009) for high throughput screening of potential inhibitors for TLK1. Due to the large number of candidate KEGG ligands, we filtered out some of these compounds based on the relevancy to the present TLK1 3D model using Findsite server [34] as a pre- molecular docking step. After filtering these ligands, only 1,386 KEGG ligands were selected for further investigation. Most of the ligands in the database as well as the homology model or molecule did not have correct bond orders and bond angles. Hence, full optimization of molecules and ligand preparation was performed using Molegro Virtual Docking software default setting whereby appropriate missing hydrogen atoms were added, missing bonds were assigned, partial charges were added if necessary and flexible torsions in ligands detected. Docking study was performed at the catalytic domain of TLK1. Simulation on the modelled protein identified five cavities as potential binding sites. However, only one cavity was used for the ligand- docking study i.e. the cavity with the largest surface area and volume of 214.528 arbitary unit within the catalytic domain sites of TLK1. The predicted sites had a grid resolution of 0.3 and a binding site of 15 radius from the template. The Moldock optimizer was used as a search algorithm and the number of runs was set to 10 with a maximum iteration of 1000, scaling factor of 0.50, 0.90 cross over and a population size of 50. The maximum number of poses generated was 5. Potential ligands were selected based on the best MolDock score value that is less than -170. Visualization of ligand-protein interaction The three-dimensional and two-dimensional visualisation of ligand-protein interaction were performed using the Maestro software package (Maestro, version 10.4, Schršdinger, LLC, New York, NY, 2015). In silico bioavailability study Lead molecules identified from the high throughput ligand-docking screening were subjected to further in silico filtering to identify those with the best values in terms of their absorption, distribution, metabolism, excretion and toxicity (ADME-Tox). This was done using the FAF-Drugs3 (November 2014 edition) which is a free ADME-Tox [35] filtering tool. This step will ensure the suitability of lead molecules based on toxicity for future in vivo applications. We applied LipinskiÕs Rule of Five [36] to remove some reactive groups and compounds. We have also included the Central Nervous System (CNS) drugs physicochemical criteria [37, 38], which includes (1) molecular mass less than 450 Da, (2) partition coefficient (logP) of 0.2 -6.0, (3) hydrogen bond donors not less than three, (4) hydrogen bond acceptors not less than five and (5) topological surface area (tPSA) within 3-118. Results Homology modelling of TLK1 serine/threonine kinase The PSI-BLAST results of TLK1 sequence Q9UKI8 were analysed and we selected the protein hits based on query coverage, similarity and identity. The model structure which was selected showed sequence identity and similarity that ranged from 27% to 37% and a query coverage E-value that ranged from 4e-29 to 9e-15 and covered only the protein kinase domain site (450-756). The homology model was created based on the TLK1 protein kinase catalytic domain sequence. We selected 40 protein sequence templates for homology modelling using various softwares. However, only 18 models were successfully created using HOmology ModellER and i-tasser. We evaluated all the 18 models using QMEAN Server and identified the Aurora B kinase structure from African clawed frog Xenopus levias (PDB ID: 4B8M) as the best template structure for TLK1 producing a total QMEAN score of 0.68 out of 1.0 required for an excellent homology model (Table 1). The Aurora B kinase that in complex with inner centromere protein A (VX-680) was determined to 1.85  resolution (PDB ID: 4B8M). Pro-Motif analysis showed that the modelled TLK1 structure, with 270 amino acids, contains 4 beta-hairpins, 6-beta bulges, 10 strands, 14 helices, 15 helix-helix interactions, 16 beta-turns and 3 gamma turns (figures 1A and 1B). The homology model of TLK1 was also assessed using ProSA Z-score. The overall Z-score quality was -4.92 suggesting a good quality model compared with the available structure from NMR and X-ray (Figures 2A and 2B). Ramachandran plot obtained from PROCHECK analysis achieved a good quality model assessment of 90.1% in the favoured region (Figure 2C). The plot represents the psi and the phi angles of the amino acid residues. Details of the analysis plot can be referred to Table 2. Analysis from the 3D structural superposition (3d-ss) web server [39] showed the root mean square deviation (RMSD) between template structure and the 3D homology model structure to be 0.543  (Figure 2D). ERRAT overall quality factor is 53.696% and at least more than 80% of the amino acids have scores more than or equal to 0.2 in the 3D/1D profile. The YASARA public server for energy minimization provided a value of 16140271100.5 kJ/mol to 143790.2 kJ/mol with a score of -1.53 to -0.95. Functional analysis of the TLK1 modelled structure performed using 3d2go web server identified the following activities with the highest confidence value of 1.0: phosphotransferase activity alcohol group as the acceptor, protein amino acid phosphorylation, protein serine/threonine kinase activity and nucleotide binding. Nucleus and protein binding functions were predicted with a confidence value of 0.89. Functional prediction in cell cycle, mitosis, phosphoinositide- mediated signalling (confidence value of 0.86), centrosome, spindle organization, regulation of protein stability, ubiquitin protein ligase binding (confidence value of 0.85) were all in concordance with experimental data [40, 41]. These findings were predicted to be similar with the function of human Aurora kinase2 (PDB ID: 2J4Z). Interestingly, with a confidence value of 0.79, the modelled TLK1 structure is also predicted to be involved in insulin receptor signalling pathway and actin cytoskeleton organization which is similar to the human PDK1 (PDB ID:1UU3). This indicates that TLK1 could be involved in the regulation of actin filament organization particularly in controlling cancer cell motility. High throughput virtual ligand-docking screening The cut-off point of the MolDock docking scoring was set at less than -170 to select ligands that predicted to have high binding affinity to TLK1. We identified 192 lead molecules, and ATP was the top scoring molecule in the docking procedure with a MolDock score of - 193.654. The amino acid residues that found to involve in the protein-ligand interactions were GLY463, ARG464, GLY465, GLY466, PHE467, SER468, GLU469, VAL470 and LYS485. The compounds that utilized in the screening were initially not known until we have completed the identification procedure. The results showed that ATP docked accurately within the cavity, suggesting the robustness of the in silico experiment. In silico pharmacokinetic analysis The 192 compounds with the best MolDock scores were submitted to the Free ADME-Tox filtering tool 3 (November 2014 edition) for pharmacokinetic analysis. Analysis were subjected to the LipinskiÕs Rule of Five (RO5) [36] and filters for CNS drugs [37, 38], to ensure the efficacy and safety of the candidate compounds. The final filtering process revealed that only two compounds passed this assessment without violating the general LipinskiÕs RO5 and the CNS rule. These compounds were identified as ID352 and ID1652 from the ChemBank database (Tables 3 and 4). Their chemical structures, IUPAC names, the radar plot of physicochemical analysis, oral absorption estimation data and the Pfizer 3/75 Rule Positioning plot, which estimated drug-like molecules that are likely to cause toxicity and experimental promiscuity, are presented in Figures 3A-H. ID1652 is known as beraprost which is a prostacyclin analogue used in the treatment of arterial hypertension [42]. It has a better docking score, with no violation of LipinkiÕs rule of five and a low promiscuous toxicity as compared to ID352 or bepridil which is a calcium channel blocker for anti-angina [43]. Beraprost also has a better hydrogen bonding score from the ligand- docking simulation. Results from receptor-ligand interactions (Figure 4) revealed a common cavity for ATP, ID352 and ID1652 binding. The residues that are involved in the interactions include GLY465, GLY466, PHE467, SER468, VAL470, and LYS485. These suggested that both of the two compounds bind to catalytic site of TLK1 ATP binding pocket. Discussion GBM remains as the solid tumour with the poorest survival in adults since the past few decades. The search for the right molecular target is still ongoing and one of the many approaches is by using computer- aided drug discovery tools. Our recent in vitro study identified TLK1 as a potential target for glioblastoma multiforme. We found TLK1 to be overexpressed and the knockdown of TLK1 reduced cellular proliferation and invasion [14]. An auto-phosphorylated chemical inhibition screen on recombinant TLK1B, which is a known splice variant, has been performed by Ronald et al, using more than 6,000 compounds. This study identified four inhibitors belonging to the class of phenothiazine antipsychotics that are structurally and chemically similar. The same study also showed that thioridazine was able to sensitize prostate cancer cells when used with doxorubixin [44]. Although chemical library screening for drug discovery seems promising, it is very expensive and time consuming. A study using the ChemBL database and Kinase SaRfari application identified 74 ŅhitsÓ compounds that can potentially bind to TLK1 [45]. However, no details were reported on the specific biding sites and the specific TLK1 structure that were used for the screen. In this study we used a computational approach to identify suitable TLK1 inhibitors based on a homology model that has been created. The 3D structure of TLK1 is currently not available for drug design strategy, hence we used 18 PDB templates that shared 30% to 33% sequence identity, to create homology models of TLK1. Thus, Aurora B kinase (PDB ID: 4B8M) was identified as the most suitable homology template by the HOmology modellER server. This model allows us to perform ligand-docking analysis to identify potential inhibitors for TLK1. One of the major challenges for optimal therapeutic intervention for glioblastoma and other types of brain tumor is to achieve maximal penetration across the blood brain barrier (BBB). The BBB is a structure composed of endothelial cells which is associated with perivascular neurons, pericytes and astrocytic end-feet processes. The endothelial cells connected by tight junctions form an almost impenetrable barrier to all compounds except highly lipidized small molecules of less than 400 Da [46]. Although many studies have identified drug-like molecules from high throughput virtual screening, most only follow the LipinskiÕs rule of five and have neglected the probability calculations for the molecules to cross the BBB. This eventually led to dismal results in in vivo studies [46Š48]. We used the recent version of the free ADME-TOX software and utilized the CNS filter to identify drug- like molecules that are able to cross the BBB. With this approach we identified bepridil and beraprost as the two compounds which may bind specifically at the catalytic site of TLK1 receptor protein and also fulfilled the CNS drugs selection criteria [37, 48]. We observed that more than 80% of the interactions involved between ligands and receptor are hydrophobic. We have also identified other lead compounds for TLK1 such as the imidazole-pyrrole polyamide derivatives with better binding affinity (with Moldock Score of -208.44 to -209.34) compared to bepridil and beraprost. Unfortunately, these compounds violated the LipinskiÕs Rule of Five and have molecular masses of more than 450 Da which are not suitable to cross the blood brain barrier. Beraprost, an analogue to prostacyclin or PGI2, is commonly used for arterial pulmonary hypertension and has multiple physiological effects such as endothelial vasodilation, inhibition of platelet aggregation, leukocyte adhesion, and vascular smooth muscle cell proliferation [49]. Activation of the PGI2 signalling pathway by beraprost sodium suppressed lung cancer metastases by preventing maturation of angiogenesis [50]. It was also reported to enhance permeability and retention (EPR) of solid tumors by decreasing tumor blood flow by 70%, hence inhibiting tumor growth. Furthermore, it did not affect normal cells and systemic blood flow [51]. Since this compound mimics structurally related lipid soluble hormone PGI2, it was predicted that the efficacy of the compound will be high as it will be able to cross the BBB [52]. Bepridil is a known sodium-calcium channel blocker that is use for anti-arrhythmias. An earlier study reported that bepridil caused tumor growth inhibition in neuroblastoma and astrocytoma cells by causing a prolonged increase in free intracellular calcium concentration when cells were co-treated with anti- estrogens [53]. Bepridil has been experimentally found to bind to the N-domain pocket of cardiac troponin C but with negative cooperativity [54]. Even though, theoretically, bepridil can cross blood brain barrier effectively [55], our findings showed that it may have non-specific binding properties towards TLK1. Hence, it will be an added value if some chemical modification can be made to increase its selectivity towards TLK1. It is worth to note that S- bepridil was found to have a higher binding affinity towards the p53 binding domain in MDM2 [56]. In order to enhance binding affinity between TLK1 receptor and these two identified ligands, as well as preventing cross binding towards other types of receptors, modification of current ligand structure by QSAR fragment based on pharmacophore analysis is warranted for future study. This study has identified potential inhibitors that binds at the catalytic site of TLK1. However, identification of inhibitors that can bind to the non-catalytic component of a particular kinase would also be useful as they would also play significant roles in the regulation of cellular functions [57]. Further studies of TLK-ligand complex structure will allow identification of allosteric inhibition sites to provide much specific TLK1 regulatory inhibitory effects. Conclusion We have successfully created a 3D structure for the catalytic domain for TLK1 which was predicted to be a potential molecular target for GBM. We have performed vigorous analysis to determine the suitability and stability of the modelled structure through various quality control platforms. We identified beraprost and bepridil as the two candidate compounds that will bind to TLK1. These two drugs are commonly used for cardiovascular diseases. Further in vitro and in vivo studies need to be performed to validate the therapeutic value of these compounds for GBM. Acknowledgement We gratefully acknowledge Dr Ng Chyan Leong from the Institute of Systems Biology, National University of Malaysia, Bangi, Selangor who contributed to the preparation of this manuscript. This study was funded by a Higher Institution Centre of Excellence (HICoE) research grant (JJ-015-2011), Ministry of Higher Education, Malaysia.