Anupama Mittal, Mukta Sharma* and Aarti Singh
Department of Pharmacy, Banasthali University, Banasthali, Rajasthan, India
Received date: July 06, 2016; Accepted date: September 06, 2016; Published date: September 12, 2016
Citation: Mittal A, Sharma M, Singh A (2016) QSAR Modelling of PDE5 Inhibitory Activity of Tetracyclic Guanine Derivatives as Antihypertensive Agents. Int J Drug Dev & Res 8:043-051
A linear and non-linear quantitative structure-activity relationship (QSAR) study is presented for modelling and predicting PDE5 inhibitory activity. A data set consisted of 32 derivatives of tetracyclic guanine was used in this study. Statistical analysis techniques, such as Multiple Linear Regression (MLR), Partial Least-Squares (PLS) Regression and Neural Network (NN) were carried out to calibrate and validate QSAR model. Leave one out method was used to get stable MLR-QSAR with high predictivity: r=0.92, r2=0.85, r2 cv=0.75 and comparable value of cross validated correlation coefficient r2cv=0.78 of PLS in order to predict the robustness of the model. The results obtained by forward feed neural network explained the effect of electronic, hydrophobic and topological descriptors on the biological activity.
PDE5; cGMP; MLR; QSAR; Regression analysis
There are two important secondary messengers that regulate many physiological processes i.e., 3′,5′-cyclic adenosine monophosphate (cAMP) and 3′,5′-cyclic guanosine monophosphate (cGMP). Any extracellular stimulation causes a rapid change in the level of cyclic nucleotide, producing the physiological responses elicited by the stimulus. The levels of these intracellular cyclic nucleotides are regulated by the cyclases that synthesize and phosphodiesterases (PDEs) that degrades them into inactive metabolite . According to their specificity toward hydrolysis of cyclic AMP (CAMP) or cyclic GMP (cGMP) can be grouped into 11 families . PDE5, cGMPspecific PDE, is distributed in various tissues such as lung, kidney, spleen, endothelial cells, heart and smooth muscle cells, etc. and plays a very important role in the regulation of the cellular level of cGMP. PDE5 inhibitors elevate the level of cGMP which mediated various vascular functions so today it is an attractive target for the treatment of hypertension . The binding site of cGMP on PDE5 is present on the N-terminal regulatory GAF-A domain  of the enzyme. Both protein kinase A (PKA) or PKG can phosphorylate PDE5, and this causes a significant increase in PDE5 activity. PDE5 inhibitors compete with the substrate cGMP for binding to the protein at the catalytic site. Even though cGMP binding to the catalytic site stimulates cyclic-nucleotide binding to the allosteric  sites, inhibitors do not elicit the same function. Currently there are three (Sildenafil, vardenafil and tadalafil) Phosphodiesterase type-5 inhibitors (PDE5-i) used in the treatment of male ED . Recently FDA approved sildenafil for the treatment of pulmonary hypertension due to its vasodilatory and antiproliferative effect on the pulmonary  vasculature. QSAR (quantitative structure activity relationship)  provides the guidelines about the important properties of chemical compounds so that drugs of higher potency can be obtained. A successful QSAR model for tetracyclic guanine reported for PDE5 inhibition was generated to understand the relationships between chemical structure and biological activity. Comprehensive literature review reveals that very few attempts has been made to build QSAR model in the field of PDE5 as an antihypertensive agent. In the present study, QSAR model was developed by using linear and nonlinear methodology that may be helpful in development of potent antihypertensive agents.
QSAR analysis and statistical analysis were carried out by using TSAR 3.3 and chemical structures were sketched by using Chem draw ultra (8.0).
In the present study, a data set 38 compounds of tetracyclic guanine  derivatives (Table 1) has been taken from the literature for QSAR studies. The reported IC50 values (nM) for PDE-5 have been changed to the logarithmic scale [log IC50], for QSAR study.
Structure preparation and descriptor calculation
The 2-D Structures were drawn by using the Chemdraw ultra 8.0 software and imported to the Tsar window. The structures were cleaned up and subjected to charge calculation. The Charge-2 CORINA 3D package in TSAR 3.3 was used to calculate partial charge, and their geometries were optimized using its Cosmic  Module. Molecular descriptors for the whole molecule and their substituents were calculated, which vary in common points of the generic structure. Several Topological, steric, electronic, Connectivity, shape indices, hydrophobic descriptors were generated to describe their physicochemical  properties and altogether more than 250 descriptors (independent variables) were calculated. However, generation of large pool of descriptors could increase the risk of over fitting of the data, in order to evade this problem pruning of data was carried out which helped in reducing data redundancy. Descriptors with the zero values for all the compounds were discarded. To reduce the pool of descriptors a correlation matrix  was generated to study the data pattern. Data was reduced by pair wise correlation in which the correlation was analyzed between biological activity and descriptor and between intercorrelated descriptors. Amongst the highly intercorrelated descriptors, the one that had a high correlation with biological activity  were retained for the development of QSAR model and the other was discarded. Through the data reduction processes, eventually 22 descriptors were chosen, which were not correlated with each other. This process of data reduction was repeated number of times to get descriptors which were highly correlated with biological activity. Finally, the QSAR model consisted of three descriptors without any intercorrelation.
Model development and statistical analysis
Multiple linear regressions analysis  was performed to quantify the relationship between physiochemical parameters and biological activities. Stepwise MLR analysis with leave-one-out (LOO) was used for the development of QSAR model. The dataset was randomly split into a training set of 26 and a test set of 6 compounds. It was divided in such a way that both contained the compounds with diverse chemical structures and biological activity. The generated model was analyzed by statistical parameters such as correlation coefficient (r2), cross validated coefficient (r2cv it is the key measure of the predictive power of the model), lowest s value (s’ signifies the standard error of the regression model) and high f value . The predictive capability of the 2D-QSAR models was determined with the help of an internal test set of 6 compounds that were not included in the model development. Structure generation, optimization, charge derivation, and all other steps of test sets were done in the same way as in the case of training set as described above, and their activities were calculated using the model generated by the training set. MLR produced the best model with three descriptors. Partial least square analysis was also carried out to get stable, correct and highly predictive models even for correlated descriptors. This regression technique is especially useful when the number of descriptors (independent variables) is comparable to or greater than the number of compounds (data points) .
Neural network analysis
Neural network analysis is also called as forward feed neural network (FFNN). It is a computer based programme in which a number of processing elements, also called neurons, units, or nodes are interconnected by links in a netlike  structure forming ‘‘layers.’’ A certain variable value is consigned to every neuron. The input neurons gather their values from independent variables signifying the input layer . The hidden neurons assemble values from other neurons, giving a result that is passed to a successor  neuron. The output neurons receive values from other units and correspond to different dependent variables, forming the output layer . The whole neural network represented as I–H–O, where I, H, and O are the number of neurons in the input, hidden, and output layers, respectively. FFNN analysis was performed by using the same descriptors that were used to develop MLR and PLS regression model to assess the relative predictivity of the linear and non-linear methods. The results of FFNN were visualized on a 2D plot of output node against input (dependency graphs).
QSAR analysis is one of the most useful technique for optimizing lead compounds and designing a new drug. Best equation derived from QSAR model should use the minimum number of descriptors to obtain the best fit. The molecular structural schemes of 38 compounds have been taken from the published article. The structure and biological activity of Tetracyclic Guanine inhibitors are shown in Table 1. The inhibitory activity of PDE5 inhibitors was shown in IC50 (nM) (where IC50 is the effective concentration (in nM) required to inhibit PDE- 5 enzyme by 50%) The data sets of 38 molecules were divided into a training set of 26 compounds and the rest of 6 compounds were included in the test set. In this series nucleus represented in Figure 1 was excluded because of structural diversity so as to avoid any obstruction in the development of the model.
|Descriptor||Total dipole (whole molecule)||Lipole Z Component (whole molecule)||Ipso atom E-state index (subst. 3)||Number of H-bond Acceptors||Biological Activity|
|Total dipole (whole molecule)||1||0.19172||0.21754||-0.042868||-3.0855|
|LipoleZ Component (whole molecule)||0.19172||1||0.054791||0.068609||-0.45335|
|Ipso atom E-state index (subst. 3)||0.21754||0.054791||1||0.017916||0.24714|
|Number of H-bond Acceptors||-0.042868||0.068609||0.017916||1||-0.70999|
Table 2: Correlation matrix to analyse the correlation between molecular descriptors and biological activity.
Linear regression analysis
The multivariate analysis was performed using the leave one out row method with the whole data set with insignificant predictive power r2cv MLR=0. 278; r2cv PLS=0.578. The statistical significance of the model was improved by reducing the pool of descriptors as described in above section. Finally, four descriptors were selected for the development of the model, which were independent of each other and well correlated with biological activity. The relationship between generated molecular descriptors and biological activity is represented through correlation matrix (Table 2).
In equation 1, 2 and 3 (Table 3) the value of standard error was quite high and the value of cross validation and F-test is quite low. In order to improve the predictivity of the model and to gradually improve the statistical significance, outliers were deleted. Outliers  are the data points that are fitted far apart from the linear model as well as act on some different mode of binding. In this study, 6 outliers such as 12, 22, 39, 52, 55 and 8 with different predictions were detected by using a training set deleting one by one as shown in equation. After removing outliers and number of descriptors the MLR model showed an improved statistical value and predictivity as shown in equation 1
|MLR Model||r||r2||r2 cv||s-value||F-value||Outlier|
|1) Y=-0.214 × X1-0.121 × X2+3.299 × X3-1.316 × X4-0.001 × X5-7.132||0.78||0.61||0.44||0.66||8.47||ND|
|2) Y=-0.217 × X1-0.114 × X2+3.238 × X3-1.267 × X4-0.0009 × X5-7.0713||0.79||0.63||0.43||0.62||8.53||12|
|3) Y=-0.199 × X1-0.120 × X2+3.078 × X3-1.223 × X4-0.0005 × X5-6.963||0.80||0.65||0.45||0.59||9.14||12, 22|
|4) Y=-0.235 × X1-0.113 × X2+3.122 × X3-1.343 × X4-0.0004 × X5-6.893||0.82||0.67||0.47||0.58||9.74||12, 22, 39|
|5) Y=-0.419 × X1-0.098 × X2+3.180 × X3-1.452 × X4-6.190||0.89||0.80||0.68||0.45||23.10||12, 22, 39, 52|
|6) Y=-0.342 × X1-0.105 × X2+2.976 × X3-1.488 × X4-6.119||0/91||0.83||0.73||0.40||28.38||12, 22, 39, 52, 55|
|7) Y=-0.354 × X1-0.0987 × X2+2.919 × X3-1.527 × X4-5.933||0.92||0.85||0.75||0.38||31.94||12, 22, 39, 52, 55, 8|
Table 3: MLR analysis of training set which generated various QSAR equations.
Y=-0.35434085 × X1-0.098792322 × X2+2.9191632 × X3- 1.5278097 × X4-5.9336643 (1)
X1=Total dipole moment (whole molecule), X2=Lipole Z component (whole molecule), X3=Ipso atom estate index (subst. 3), X4=Number of H-bond acceptors (subst. 2)
s=0.38, f=31.94, r=0.92, r2=0.85, r2cv=0.75
• r2 cv>0.6: The model is fairly good.
• 0.4<r2cv<0.6: The model is questionable.
• r2 cv<0.4: The model is poor.
The above equation shows good correlation coefficient (r) of 0.92 between descriptors and PDE-5 inhibitors. Squared correlation coefficient (r2) of 0.85 explains 85% variance in biological activity. Cross validated squared correlation coefficient of this model was 0.75 which shows the good internal prediction power of the model. Low value of standard deviation 0.38 indicates the accuracy of statistical fit. To assess the significance of individual descriptor t-test was performed. Hydrogen bond acceptor (subst. 2) has high t-(-8.5829) value showing significance of this descriptor in the developed model. The calculated t-values for all descriptors are given in Table 4. The predictive ability of the model was also further validated by using the test set of 6 compounds. The test set was not included during the development of the model. The value of r2 for test set was found to be 0.66 which predicted the robustness of the model. Plot between actual and predicted biological activity of training and test set of MLR is depicted in Figure 1.
|Descriptor Name||Jacknife SE||Covariance SE||t-value|
|Total dipole (whole molecule)||0.16423||0.087308||-4.0585|
|LipoleZ Component (whole molecule)||0.024258||0.023143||-4.2688|
|Ipso atom E-state index (subst. 3)||0.39903||0.691||4.2246|
|Number of H-bond Acceptors||0.25861||0.17801||-8.5829|
Table 4: The t-test values, Jacknife SE and Covariance SE values of the descriptors used for regression analysis.
The best linear model developed by MLR was subjected to a PLS analysis to get model interpretability. The r2 value of test set of MLR and PLS is comparable 0.66 and 0.65 respectively, which represents the predictive ability of the model. Graph between actual and predicted value of training and test set of PLS is shown in Figure 2. Equation generated by PLS analysis is shown in equation 2.
Y=-0.31858131 × X1-0.10244234 × X2+2.9151349 × X3- 1.5390098 × X4-6.0701838 (2)
Where X1=Total dipole moment (whole molecule), X2=Lipole Z component (whole molecule), X3=Ipso atom E-state index(subst.3), X4=Number of H-bond Acceptors (subst.2)
Statistical significant=1.1644, Residual sum of square=3.5602, Predictive sum of squares=5.4425, r2cv=0.78, r2=0.85
We employed the forward feed neural network (FFNN) functionality, which undergoes a supervised training by the back propagation error. In this study, 2 inputs for the neural network were used, while the outputs were the log 1/IC50 values. The number of hidden nodes represents the training and test patterns (in this case 50% was used for training, 50% for test purposes). The statistics obtained from the FFNN PDE-5 inhibitors data included net configuration=4-2-1 (4 input nodes, 2 processing nodes, 1 output node), with r2=0.89 for training set and r2=0. 63 for the test set. Plot between 1/IC50 and the predicted value of training and test set of ANN as shown in Figure 3. The dependencies plot of ANN revealed that Total dipole moment (whole molecule) and Lipole Z component (whole molecule) correlated negatively with the biological activity, Ipso atom E-state index (subst.3) correlated positively with the biological activity and Number of H-bond Acceptors (subst.2) correlated negatively with biological activity as shown in Figure 4. The difference between actual and predicted value of training and test set of MLR, PLS and ANN is shown in Tables 5 and 6.
|S. No.||Actual activity
-log IC50 (nM)
Table 5: Actual and predicted IC50 values of training dataset of MLR, PLS and ANN.
|Compound Name||Actual activity
-log IC50 (nM)
Table 6: Actual and predicted IC50 values of test set of MLR, PLS and ANN.
Figure 4: Dependency graph of Neural analysis illustrating correlation between descriptor and actual activity. (a) total dipole moment (whole molecule) used to train the neural network architecture versus the actual activity data. (b) lipole Z component (whole molecule) used to train the neural network architecture versus the actual activity data. (c) Ipso atom E-state index (subst. 3) used to train the neural network architecture versus the actual activity data. (d) number of hydrogen bond acceptors (subst. 2) used to train the neural network architecture versus the actual activity data.
This study revealed that descriptor such as Total dipole moment (whole molecule), Lipole z component of the whole molecule and hydrogen bond acceptor (subst. 2) is negatively correlated with biological activity and Ipso atom E-state index (subst. 3) is positively correlated with biological activity (Table 7).
|Name of Compound||Biological activity (IC50 (µM))||Total dipole moment (Whole molecule)||Lipole z component (whole molecule)||Ipso atom E-state (sust.3)||Hydrogen bond acceptor (subst. 2)|
Table 7: Correlation of biological activity of active and inactive molecules with all four descriptors.
Total dipole moment (whole molecule)
Total dipole moment describes favorable electronic interactions of the drug molecule and active site . Biological activity increases as the total dipole moment of whole molecule decreases. It was explained by the values of most active compound 50 and least active compound 47. The value of the total dipole moment decreases in most active compound (1.80) as compared to the value (2.13) of least active compound 47. This result concludes that the biological activity is increasing as the total dipole moment of whole molecule decreases.
Lipole z component (whole molecule)
Lipole z component measures the lipophilic distribution along the axis defined by the bond to the point of attachment . Biological activity increases as the lipophilic distribution of whole molecule decreases. The small value of lipole z values indicates a less distribution of lipophilic groups distant from the point of attachment. The present study revealed that the most active compound 50 with showed minimum value (-4.17) of the lipole z component as compared to the value (1.28) of least active 47 compound.
Ipso atom E-state (subst. 3)
Ipso atom of E-state is topological descriptor which explains the steric property of molecule and non-covalent intermolecular interactions . Biological activity increases, as the ipso atom of E-state (subst. 3) increases. The value of most active compound (50) is more (1.67) as compared to the value (1.63) of least active (47) compound.
Hydrogen bond acceptor (subst. 2)
Hydrogen bond acceptor  (subst. 2) is negatively correlated with biological activity. So as we increase the number of hydrogen bond acceptor group on subst. 2 the biological activity of PDE-5 decreases.
The QSAR model MLR, PLS and ANN were employed to study the PDE-5 inhibitory activity of Tetracyclic Guanine inhibitor derivatives. The aim of the present study was to build QSAR models, which provide good intercorreleation between biological activity and physiochemical parameters with good predictive ability. The linear regression model was found to be statistically valid, and the PLS technique investigated the effects of each descriptor in the model. The nonlinear ANN models confirmed the importance of descriptors. The model described that the overall PDE-5 inhibition can be achieved by decreasing the total dipole moment of the whole molecule, lipophilicity of whole molecule and hydrogen bond acceptor of substitution 2. Better PDE-5 inhibition can also be achieved by increasing the steric property of substitution 3 which may enhance the binding affinity of PDE-5 inhibitor to their receptor. may be helpful in the design of the potent and selective PDE-5 inhibitors which may be potent compounds of clinical utility. In summary, the findings of the present study may be helpful in identifying druggable potent PDE-5 inhibitors which could be raised into potential antihypertensive drugs.
We are grateful to Prof. Aditya Shastri, Vice Chancellor, Banasthali University, India for providing necessary computational facilities for the study.