Exploring three-dimensional quantitative structural activity relationship (3D-QSAR) analysis of SCH 66336 (Sarasar) analogues of farnesyltransferase inhibitors
Abstract
3D-QSAR analysis of a set of 37 analogues of SCH 66336 (Sarasar) was performed by most widely used computational tool, molecular field analysis (MFA) to investigate the substitutional requirements for the favorable receptor-drug interaction and to derive a predictive model that may be used for the designing of a novel farnesyltransferase inhibitors (FTIs). Regression analysis was carried out using genetic partial least squares (G/PLS) method. A highly predictive and statistically significant model was generated. The predictive ability of the model developed was assessed using a test set of six compounds (r2 as high as 0.791). The analyzed MFA model has demonstrated a good fit, having r2 value of 0.967 and cross-validated coefficient r2 value as 0.921.
Keywords: QSAR; Farnesyltransferase inhibitors; Molecular field analysis; Anti-cancer; Genetic partial least squares (G/PLS) method
1. Introduction
Farnesyltransferase (FTase) is a compelling therapeutic tar- get for the treatment of a broad spectrum of cancers. It has been shown that farnesyltransferase inhibitors (FTIs) can stop protein farnesylation and suppress the growth of Ras de- pendent tumor cells. Therefore, the search for inhibitors of FTase for the treatment of cancer has generated considerable recent interest [1,2]. The tricyclic drug, SCH 66336 (Sarasar, Fig. 1), was the first FTase inhibitor to enter the clinic, result- ing in favorable outcomes in a number of solid tumor types and hematological malignancies [3e7]. FTase is a zinc metal- loenzyme which catalyzes the reaction between farnesyl di- phosphate (FPP) and the cysteine residue found in the tetrapeptide sequence CAAX (C = cys, A = an aliphatic amino acid, X is typically Met) in the carboxyl terminal of a group of membrane-bound small G-proteins such as Ras, RhoB, RhoE, lamin A and B, and transducin [8]. FTIs are promising agents in cancer therapy due to their excellent efficacy and low sys- temic toxicity in pre-clinical animal models.
The official birth date for quantitative structureeactivity re- lationships (QSAR) is considered to be 1962, when Hansch et al. developed quantitative relationships between chemical structures to a wide variety of physiochemical properties [9]. Once the QSAR model is established it can be used to predict properties of compounds as yet unmeasured or even unknown. The Hansch approach may not give more insight into the struc- tureeactivity relationships where the diverse molecular fea- tures are present in the data set. In cases where specific three-dimensional characteristics such as stereochemistry af- fect significantly the biological activity of the drug molecule, 3D-QSAR scores over normal QSAR analysis.
To explore the substitutional requirements for SCH 66336 as FTIs and to obtain highly predictive model, 3D-QSAR analysis was performed using the most widely used computa- tional tool, molecular field analysis (MFA) by considering the steric and electrostatic influences [10]. The generated model may guide the rational synthesis of novel compounds. Thus, our main objective is to design highly selective SCH 66336 analogs as FTIs in the hope that these molecules may be fur- ther explored as novel FTase molecules. The derived model gives insight to the influence of various interactive fields on the activity and thus aids in designing and forecasting the FTase activity of novel molecules.
2. Materials and methods
2.1. Selection of molecules
Data sets of 37 analogues of SCH 66336 (Sarasar) were se- lected and FTase inhibitory activity data (IC50) were collected from published literature [11]. The biological activities were converted into the corresponding pIC50 values (—log IC50), where IC50 value represents the drug in molar concentration that causes 50% inhibition of FTase. All the IC50 values were obtained using the same assay method [12]. The pIC50 values of the molecules under study spanned a wide range from 5 to 9. About 31 compounds were selected as the training set and the remaining 6 compounds were included in the test set. The test set was selected based on the suggestions given by Oprea et al. [13]. The structures and biological activity data of training and test set molecules are described in Table 1.
2.2. Molecular modeling
All molecular modeling studies were carried out using Cer- ius2 version 4.10 (Accelrys; San Diego, CA) molecular mod- eling software running on silicon graphic workstation [14]. The structures of the compounds were built using molecular sketcher facilities provided in the modeling environment of Cerius2. All molecules were initially energy minimized with smart minimizer. Geometric optimization was carried out us- ing DREIDING force field [15]. Partial atomic charges were calculated using the Gasteiger method [16]. Multiple confor- mations of each molecule were generated using the Boltzmann Jump as a conformational search method. Further geometric optimization of each molecule was carried out with MOPAC 6 package using the semi-empirical AM1 (Austin Model) Hamiltonian [17].
2.3. Molecular alignment
A proper alignment of the structures is critical for obtain- ing valid 3D-QSAR models. Furthermore, it is vital that all compounds are aligned in a pharmacological active orienta- tion since the 3D-QSAR model assumes that each structure exhibits activity at the same binding site of the receptor. To obtain a consistent alignment, all the molecules were superimposed on to the shape reference compounds, which were selected as conformers of the most active molecules. The method used for performing the alignment was the max- imum common subgroup (MCSG) method [14]. This method looks at molecules as points and lines, and uses the tech- niques of graph theory to identify patterns. It finds the largest subset of atoms in the shape reference compound that is shared by all the structures in the study table and uses this subset for alignment. A rigid fit of atom pairings was per- formed to superimpose each structure so that it overlays the shape reference compound. The bold-faced portion of the most active molecule 21 shown in Fig. 2 was used as the template for the superimposition. Stereoview of aligned molecules is shown in Fig. 3.
2.4. Molecular field analysis
All the studies were performed with the QSAR module of Cerius2. MFA model is predictive and sufficiently reliable to guide the chemist in the design of novel compounds. This ap- proach is effective for the analysis of data sets where activity information is available but the structure of the receptor site is unknown. It attempts to postulate and represent the essential features of a receptor site from the aligned common features of the molecules that bind to it. The MFA calculates probe in- teraction energies on a rectangular grid around a bundle of ac- tive molecules. The atomic coordinates of the contributing models are used to compute field values on each point of a 3D grid. Fields of molecules are represented using grids and energy associated with each grid point can serve as input for the calculation of a QSAR. These energies were added to the study table to form new columns headed according to the probe type. The molecular field was created using proton and methyl groups as probes, which represent electrostatic and ste- ric fields, respectively. Only 10% of the total descriptors whose variance was higher were considered for further analysis.
The major steps of molecular field analysis were (1) gener- ating conformers and energy minimization; (2) matching atoms using maximum common substructure (MCS) search and aligning molecules using default option; (3) setting MFA preferences (rectangular grid with 2 A˚ step size, charges
by Gasteiger algorithm, H+ and CH3 as probes); (4) creating the field; and (5) regression analysis by G/PLS algorithm.
2.5. Genetic partial least squares (G/PLS)
Regression analysis of data was performed using G/PLS tech- niques available in QSAR+ environment of Cerius2 software.This algorithm may be used as an alternative to a genetic function approximation (GFA) calculation. G/PLS is derived from two QSAR calculation methods: GFA and partial least squares (PLS). Both GFA and PLS have been shown to be valuable tools in cases where the data set has more descriptors than samples. The G/PLS algorithm uses GFA to select appropriate basis functions to be used in a model of the data and PLS regression as the fitting technique to weigh the basis functions’ relative contributions in the final model. PLS is a generalization of regression, which can handle data with strongly correlated and/or noisy or numerous X variables [18]. It gives a reduced solution, which is statistically more robust than multiple linear regression (MLR). The linear PLS model finds ‘‘new variables’’ (latent variables or X scores) which are linear combinations of the original variables. To avoid overfitting, a strict test for the significance of each consecutive PLS component is necessary and then stopping when the compo- nents are nonsignificant. Cross-validation is a practical and reli- able method of testing this significance [19]. Application of G/ PLS thus allows the construction of larger QSAR equations while still avoiding overfitting and eliminating most variables. Best model was selected based on statistical measures such as data points (n), correlation coefficient (r), square correlation coeffi- cient (r2), cross-validated correlation coefficient (r2 ), predicted.
3. Results and discussion
The best model for 31 training set molecules was developed (Eq. (1)). In this equation the steric (CH3) and electrostatic (H+) descriptors, specify the regions where variations in the structural features (steric or electrostatic) of different com- pounds in the training set, lead to increased or decreased activ- ities. The number accompanying descriptors represents its position in the three-dimensional MFA grid. G/PLS was car- ried out over 100 000 generations with a population size of 100. The optimal number of components was set to 5. An en- ergy cutoff of —30 to +30 kcal/mol was set for both steric and electrostatic contributions. The smoothing parameter d, was set to 1.0 to control the bias in the scoring factors between value of 0.966 is the average squared correlation coefficient calculated during the validation procedure. An 2 is computed from the subset of variables used one at a time for the validation procedure. Activity of the training set mole- cules was predicted using this equation and is given in Table 1 and graph between true activity and predicted activity is shown in Fig. 4A.
The numbers associated with the descriptor specify its loca- tion in the 3D-grid around the most active molecule as shown in Fig. 5. The presence of steric descriptor (CH3/622) close to R2 position indicates the importance of steric interactions and the presence of two electrostatic descriptors near the same position namely (H+/645) with a negative coefficient and (H+/523) with a positive coefficient in the final QSAR (Eq. (1)), describes a subtle balance of electrostatic parameters required at this po- sition. The need for moderate electron withdrawing groups with appropriate steric parameters is evident. Electron withdrawing character dominates because of descriptor (H+/523) having higher coefficient value as compared with (H+/645). Hence, the activity of molecule 12 is higher as compared with mole- cules 7e11, 13, and 14. Similar trend can also be seen in mol- ecule 17, which showed higher activity as compared with molecules 15 and 16. Appearance of (H+/908) and (H+/785) with a positive coefficient at R1 position indicates that moderate electron withdrawing group increases the activity of molecules. Hence, molecules 26, 27, 29e31 have higher activity as com- pared with molecules 25 and 28. Similar trend can also be seen in molecules 32e34, which showed higher activity as compared with molecule 35.
4. Conclusions
3D-QSAR model of FTase inhibitory activity has been de- veloped based on steric and electrostatic descriptors to inves- tigate the substitutional requirements for the favorable receptor-drug interaction and to predict relative inhibitory activities of 37 analogues of SCH 66336. This study yielded stable and statistically significant model with high correlation coefficient. Three-dimensional features, electrostatic and ste- ric, can be easily identified from the map developed for the best model. Significant predictive ability of the model ob- served for the external test set molecules supports that the derived model can be used for the designing of the novel in- hibitors. Overall, the present 3D-QSAR study investigates the indispensable structural features, which can be exploited for the modifications in SCH 66336 (Sarasar) analogues in order to achieve improved FTase inhibitory Lonafarnib activity.