Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
Support Care Cancer ; 31(3): 178, 2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36809570

ABSTRACT

INTRODUCTION: Using GWAS data derived from a large collaborative trial (ECOG-5103), we identified a cluster of 267 SNPs which predicted CIPN in treatment-naive patients as reported in Part 1 of this study. To assess the functional and pathological implications of this set, we identified collective gene signatures were and evaluated the informational value of those signatures in defining CIPN's pathogenesis. METHODS: In Part 1, we analyzed GWAS data derived from ECOG-5103, first identifying those SNPs that were most strongly associated with CIPN using Fisher's ratio. After identifying those SNPs which differentiated CIPN-positive from CIPN-negative phenotypes, we ranked them in order of their discriminatory power to produce a cluster of SNPs which provided the highest predictive accuracy using leave-one-out cross validation (LOOCV). An uncertainty analysis was included. Using the best predictive SNP cluster, we performed gene attribution for each SNP using NCBI Phenotype Genotype Integrator and then assessed functionality by applying GeneAnalytics, Gene Set Enrichment Analysis, and PCViz. RESULTS: Using aggregate data derived from the GWAS, we identified a 267 SNP cluster which was associated with a CIPN+ phenotype with an accuracy of 96.1%. We could attribute 173 genes to the 267 SNP cluster. Six long intergenic non-protein coding genes were excluded. Ultimately, the functional analysis was based on 138 genes. Of the 17 pathways identified by Gene Analytics (GA) software, the irinotecan pharmacokinetic pathway had the highest score. Highly matching gene ontology attributions included flavone metabolic process, flavonoid glucuronidation, xenobiotic glucuronidation, nervous system development, UDP glycosyltransferase activity, retinoic acid binding, protein kinase C binding, and glucoronosyl transferase activity. Gene Set Enrichment Analysis (GSEA) GO terms identified neuron-associated genes as most significant (p = 5.45e-10). Consistent with the GA's output, flavone, and flavonoid associated terms, glucuronidation were noted as were GO terms associated with neurogenesis. CONCLUSION: The application of functional analyses to phenotype-associated SNP clusters provides an independent validation step in assessing the clinical meaningfulness of GWAS-derived data. Functional analyses following gene attribution of a CIPN-predictive SNP cluster identified pathways, gene ontology terms, and a network which were consistent with a neuropathic phenotype.


Subject(s)
Neoplasms , Peripheral Nervous System Diseases , Humans , Polymorphism, Single Nucleotide , Genome-Wide Association Study , Taxoids/adverse effects , Peripheral Nervous System Diseases/chemically induced , Neoplasms/drug therapy
2.
Support Care Cancer ; 31(2): 139, 2023 Jan 28.
Article in English | MEDLINE | ID: mdl-36707490

ABSTRACT

BACKGROUND: Chemotherapy-induced peripheral neuropathy (CIPN) is a common toxicity of taxanes for which there is no effective intervention. Genomic CIPN risk determination has yielded promising, but inconsistent results. The present study assessed the utility of a collective SNP cluster identified using novel analytics to describe taxane-associated CIPN risk. METHODS: We analyzed GWAS data derived from ECOG-5103, first identifying SNPs that were most strongly associated with CIPN using Fisher's ratio (FR). We then ranked ordered those SNPs which discriminated CIPN-positive (CIPN +) from CIPN-negative phenotypes based on their discriminatory power and developed the cluster of SNPs which provided the highest predictive accuracy using leave-one-out cross-validation (LOOCV). RESULTS: Using aggregated genotype data obtained from the previously reported ECOG-5103 clinical trial (in which two different arrays were used, HumanOmniExpress (727,227 SNPs) and HumanOmni1-Quad1 (1,131,857 SNPs)), we identified a 267 SNP cluster which was associated with a CIPN + phenotype with an accuracy of 96.1%. CONCLUSIONS: A cluster of SNPs was identified which prospectively discriminated patients most likely to develop symptomatic CIPN following taxane exposure as part of a breast cancer chemotherapy regimen. Validation using an independent patient cohort should be performed.


Subject(s)
Antineoplastic Agents , Breast Neoplasms , Peripheral Nervous System Diseases , Taxoids , Humans , Antineoplastic Agents/adverse effects , Genome-Wide Association Study , Peripheral Nervous System Diseases/chemically induced , Peripheral Nervous System Diseases/genetics , Polymorphism, Single Nucleotide , Taxoids/adverse effects , Clinical Trials as Topic , Breast Neoplasms/drug therapy , Breast Neoplasms/genetics , Female
3.
Int J Mol Sci ; 23(21)2022 Oct 26.
Article in English | MEDLINE | ID: mdl-36361765

ABSTRACT

Noise is a basic ingredient in data, since observed data are always contaminated by unwanted deviations, i.e., noise, which, in the case of overdetermined systems (with more data than model parameters), cause the corresponding linear system of equations to have an imperfect solution. In addition, in the case of highly underdetermined parameterization, noise can be absorbed by the model, generating spurious solutions. This is a very undesirable situation that might lead to incorrect conclusions. We presented mathematical formalism based on the inverse problem theory combined with artificial intelligence methodologies to perform an enhanced sampling of noisy biomedical data to improve the finding of meaningful solutions. Random sampling methods fail for high-dimensional biomedical problems. Sampling methods such as smart model parameterizations, forward surrogates, and parallel computing are better suited for such problems. We applied these methods to several important biomedical problems, such as phenotype prediction and a problem related to predicting the effects of protein mutations, i.e., if a given single residue mutation is neutral or deleterious, causing a disease. We also applied these methods to de novo drug discovery and drug repositioning (repurposing) through the enhanced exploration of huge chemical space. The purpose of these novel methods that address the problem of noise and uncertainty in biomedical data is to find new therapeutic solutions, perform drug repurposing, and accelerate and optimize drug discovery, thus reestablishing homeostasis. Finding the right target, the right compound, and the right patient are the three bottlenecks to running successful clinical trials from the correct analysis of preclinical models. Artificial intelligence can provide a solution to these problems, considering that the character of the data restricts the quality of the prediction, as in any modeling procedure in data analysis. The use of simple and plain methodologies is crucial to tackling these important and challenging problems, particularly drug repositioning/repurposing in rare diseases.


Subject(s)
Artificial Intelligence , Drug Repositioning , Uncertainty , Drug Repositioning/methods , Drug Discovery/methods , Phenotype
4.
Comput Biol Med ; 149: 106029, 2022 10.
Article in English | MEDLINE | ID: mdl-36067633

ABSTRACT

BACKGROUND: To understand the transcriptomic response to SARS-CoV-2 infection, is of the utmost importance to design diagnostic tools predicting the severity of the infection. METHODS: We have performed a deep sampling analysis of the viral transcriptomic data oriented towards drug repositioning. Using different samplers, the basic principle of this methodology the biological invariance, which means that the pathways altered by the disease, should be independent on the algorithm used to unravel them. RESULTS: The transcriptomic analysis of the altered pathways, reveals a distinctive inflammatory response and potential side effects of infection. The virus replication causes, in some cases, acute respiratory distress syndrome in the lungs, and affects other organs such as heart, brain, and kidneys. Therefore, the repositioned drugs to fight COVID-19 should, not only target the interferon signalling pathway and the control of the inflammation, but also the altered genetic pathways related to the side effects of infection. We also show via Principal Component Analysis that the transcriptome signatures are different from influenza and RSV. The gene COL1A1, which controls collagen production, seems to play a key/vital role in the regulation of the immune system. Additionally, other small-scale signature genes appear to be involved in the development of other COVID-19 comorbidities. CONCLUSIONS: Transcriptome-based drug repositioning offers possible fast-track antiviral therapy for COVID-19 patients. It calls for additional clinical studies using FDA approved drugs for patients with increased susceptibility to infection and with serious medical complications.


Subject(s)
COVID-19 Drug Treatment , COVID-19 , SARS-CoV-2 , Antiviral Agents/pharmacology , Antiviral Agents/therapeutic use , COVID-19/genetics , Drug Repositioning , Humans , Interferons , Transcriptome/genetics
5.
Int J Mol Sci ; 23(9)2022 Apr 22.
Article in English | MEDLINE | ID: mdl-35563034

ABSTRACT

Big data in health care is a fast-growing field and a new paradigm that is transforming case-based studies to large-scale, data-driven research. As big data is dependent on the advancement of new data standards, technology, and relevant research, the future development of big data applications holds foreseeable promise in the modern day health care revolution. Enormously large, rapidly growing collections of biomedical omics-data (genomics, proteomics, transcriptomics, metabolomics, glycomics, etc.) and clinical data create major challenges and opportunities for their analysis and interpretation and open new computational gateways to address these issues. The design of new robust algorithms that are most suitable to properly analyze this big data by taking into account individual variability in genes has enabled the creation of precision (personalized) medicine. We reviewed and highlighted the significance of big data analytics for personalized medicine and health care by focusing mostly on machine learning perspectives on personalized medicine, genomic data models with respect to personalized medicine, the application of data mining algorithms for personalized medicine as well as the challenges we are facing right now in big data analytics.


Subject(s)
Data Science , Precision Medicine , Big Data , Delivery of Health Care , Genomics , Precision Medicine/methods
6.
Comput Math Methods Med ; 2021: 5556433, 2021.
Article in English | MEDLINE | ID: mdl-34422090

ABSTRACT

The prediction of the dynamics of the COVID-19 outbreak and the corresponding needs of the health care system (COVID-19 patients' admissions, the number of critically ill patients, need for intensive care units, etc.) is based on the combination of a limited growth model (Verhulst model) and a short-term predictive model that allows predictions to be made for the following day. In both cases, the uncertainty analysis of the prediction is performed, i.e., the set of equivalent models that adjust the historical data with the same accuracy. This set of models provides the posterior distribution of the parameters of the predictive model that adjusts the historical series. It can be extrapolated to the same analyzed time series (e.g., the number of infected individuals per day) or to another time series of interest to which it is correlated and used, e.g., to predict the number of patients admitted to urgent care units, the number of critically ill patients, or the total number of admissions, which are directly related to health needs. These models can be regionalized, that is, the predictions can be made at the local level if data are disaggregated. We show that the Verhulst and the Gompertz models provide similar results and can be also used to monitor and predict new outbreaks. However, the Verhulst model seems to be easier to interpret and to use.


Subject(s)
COVID-19/epidemiology , Models, Biological , Pandemics , SARS-CoV-2 , COVID-19/transmission , Computational Biology , Health Services Needs and Demand , Humans , Mathematical Concepts , Models, Statistical , Pandemics/statistics & numerical data , Spain/epidemiology , Time Factors
7.
Int J Hyg Environ Health ; 234: 113723, 2021 05.
Article in English | MEDLINE | ID: mdl-33690094

ABSTRACT

An outbreak of the novel COVID-19 virus occurred during February 2020 onwards in almost all the European countries, including Spain. This study covers the correlation found between weather variables (Maximum Temperature, Minimum Temperature, Mean Temperature, Atmospheric Pressure, Daily Rainfall, Daily Sun hours) and the coronavirus propagation in Spain. A strong relationship is found when correlating the virus spread to the mean temperature, minimum temperature, and atmospheric pressure in different Spanish provinces. In this analysis we have used the ratio of the PCR COVID-19 positives with respect to the population size. A linear regression model using the mean temperature is implemented. Moreover, an analysis of variance is used to confirm the influence of mean temperature on the spread of virus. As a second measurement of the COVID-19 outbreak we have used the results of the antibodies tests carried out in Spain that provide an estimation of the heard immunity achieved. Based on this analysis, an estimation of the asymptomatic population is performed. All these results exhibit significant correlation with weather variables. The most affected provinces were Soria, Segovia and Ciudad Real, which are the coldest. On the opposite side, places such as Southern Spain, the Baleares, and Canary Islands showed a lower rate of spread. This might be related to the warmer climate and the insularity of these islands. Besides, the coastal influence and the daily sun hours might also influence the lower rates in the east and west regions in Spain. This analysis provides a deeper insight of the influence of weather variables onto the COVID-19 spread in Spain.


Subject(s)
COVID-19/epidemiology , Climate , Disease Outbreaks/statistics & numerical data , Analysis of Variance , Humans , Linear Models , SARS-CoV-2 , Spain/epidemiology , Temperature , Weather
8.
Cancers (Basel) ; 13(1)2020 Dec 23.
Article in English | MEDLINE | ID: mdl-33374500

ABSTRACT

Artificial intelligence methods may help in unveiling information that is hidden in high-dimensional oncological data. Flow cytometry studies of haematological malignancies provide quantitative data with the potential to be used for the construction of response biomarkers. Many computational methods from the bioinformatics toolbox can be applied to these data, but they have not been exploited in their full potential in leukaemias, specifically for the case of childhood B-cell Acute Lymphoblastic Leukaemia. In this paper, we analysed flow cytometry data that were obtained at diagnosis from 56 paediatric B-cell Acute Lymphoblastic Leukaemia patients from two local institutions. Our aim was to assess the prognostic potential of immunophenotypical marker expression intensity. We constructed classifiers that are based on the Fisher's Ratio to quantify differences between patients with relapsing and non-relapsing disease. We also correlated this with genetic information. The main result that arises from the data was the association between subexpression of marker CD38 and the probability of relapse.

9.
Int J Mol Sci ; 21(10)2020 May 19.
Article in English | MEDLINE | ID: mdl-32438758

ABSTRACT

We present the analysis of the defective genetic pathways of the Late-Onset Alzheimer's Disease (LOAD) compared to the Mild Cognitive Impairment (MCI) and Healthy Controls (HC) using different sampling methodologies. These algorithms sample the uncertainty space that is intrinsic to any kind of highly underdetermined phenotype prediction problem, by looking for the minimum-scale signatures (header genes) corresponding to different random holdouts. The biological pathways can be identified performing posterior analysis of these signatures established via cross-validation holdouts and plugging the set of most frequently sampled genes into different ontological platforms. That way, the effect of helper genes, whose presence might be due to the high degree of under determinacy of these experiments and data noise, is reduced. Our results suggest that common pathways for Alzheimer's disease and MCI are mainly related to viral mRNA translation, influenza viral RNA transcription and replication, gene expression, mitochondrial translation, and metabolism, with these results being highly consistent regardless of the comparative methods. The cross-validated predictive accuracies achieved for the LOAD and MCI discriminations were 84% and 81.5%, respectively. The difference between LOAD and MCI could not be clearly established (74% accuracy). The most discriminatory genes of the LOAD-MCI discrimination are associated with proteasome mediated degradation and G-protein signaling. Based on these findings we have also performed drug repositioning using Dr. Insight package, proposing the following different typologies of drugs: isoquinoline alkaloids, antitumor antibiotics, phosphoinositide 3-kinase PI3K, autophagy inhibitors, antagonists of the muscarinic acetylcholine receptor and histone deacetylase inhibitors. We believe that the potential clinical relevance of these findings should be further investigated and confirmed with other independent studies.


Subject(s)
Alzheimer Disease/drug therapy , Alzheimer Disease/genetics , Drug Repositioning , Signal Transduction , Age of Onset , Case-Control Studies , Cognitive Dysfunction/genetics , Gene Regulatory Networks , Humans , Linear Models , Machine Learning , Phenotype
10.
Molecules ; 25(11)2020 May 26.
Article in English | MEDLINE | ID: mdl-32466409

ABSTRACT

We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive-regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.


Subject(s)
Proteins/chemistry , Algorithms , Discriminant Analysis , Protein Folding , Protein Structure, Tertiary
11.
Pharmgenomics Pers Med ; 13: 105-119, 2020.
Article in English | MEDLINE | ID: mdl-32256101

ABSTRACT

The complexity of orphan diseases, which are those that do not have an effective treatment, together with the high dimensionality of the genetic data used for their analysis and the high degree of uncertainty in the understanding of the mechanisms and genetic pathways which are involved in their development, motivate the use of advanced techniques of artificial intelligence and in-depth knowledge of molecular biology, which is crucial in order to find plausible solutions in drug design, including drug repositioning. Particularly, we show that the use of robust deep sampling methodologies of the altered genetics serves to obtain meaningful results and dramatically decreases the cost of research and development in drug design, influencing very positively the use of precision medicine and the outcomes in patients. The target-centric approach and the use of strong prior hypotheses that are not matched against reality (disease genetic data) are undoubtedly the cause of the high number of drug design failures and attrition rates. Sampling and prediction under uncertain conditions cannot be avoided in the development of precision medicine.

12.
BMC Bioinformatics ; 21(Suppl 2): 89, 2020 Mar 11.
Article in English | MEDLINE | ID: mdl-32164540

ABSTRACT

BACKGROUND: Phenotype prediction problems are usually considered ill-posed, as the amount of samples is very limited with respect to the scrutinized genetic probes. This fact complicates the sampling of the defective genetic pathways due to the high number of possible discriminatory genetic networks involved. In this research, we outline three novel sampling algorithms utilized to identify, classify and characterize the defective pathways in phenotype prediction problems, such as the Fisher's ratio sampler, the Holdout sampler and the Random sampler, and apply each one to the analysis of genetic pathways involved in tumor behavior and outcomes of triple negative breast cancers (TNBC). Altered biological pathways are identified using the most frequently sampled genes and are compared to those obtained via Bayesian Networks (BNs). RESULTS: Random, Fisher's ratio and Holdout samplers were more accurate and robust than BNs, while providing comparable insights about disease genomics. CONCLUSIONS: The three samplers tested are good alternatives to Bayesian Networks since they are less computationally demanding algorithms. Importantly, this analysis confirms the concept of "biological invariance" since the altered pathways should be independent of the sampling methodology and the classifier used for their inference. Nevertheless, still some modifications are needed in the Bayesian networks to be able to sample correctly the uncertainty space in phenotype prediction problems, since the probabilistic parameterization of the uncertainty space is not unique and the use of the optimum network might falsify the pathways analysis.


Subject(s)
Algorithms , Triple Negative Breast Neoplasms/pathology , Bayes Theorem , Databases, Genetic , Female , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Neoplasm Metastasis , Phenotype , Survival Analysis , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/mortality
13.
Int J Mol Sci ; 20(19)2019 Sep 21.
Article in English | MEDLINE | ID: mdl-31546608

ABSTRACT

We present the analysis of defective pathways in multiple myeloma (MM) using two recently developed sampling algorithms of the biological pathways: The Fisher's ratio sampler, and the holdout sampler. We performed the retrospective analyses of different gene expression datasets concerning different aspects of the disease, such as the existing difference between bone marrow stromal cells in MM and healthy controls (HC), the gene expression profiling of CD34+ cells in MM and HC, the difference between hyperdiploid and non-hyperdiploid myelomas, and the prediction of the chromosome 13 deletion, to provide a deeper insight into the molecular mechanisms involved in the disease. Our analysis has shown the importance of different altered pathways related to glycosylation, infectious disease, immune system response, different aspects of metabolism, DNA repair, protein recycling and regulation of the transcription of genes involved in the differentiation of myeloid cells. The main difference in genetic pathways between hyperdiploid and non-hyperdiploid myelomas are related to infectious disease, immune system response and protein recycling. Our work provides new insights on the genetic pathways involved in this complex disease and proposes novel targets for future therapies.


Subject(s)
Bone Marrow Cells/metabolism , Chromosomes, Human, Pair 13/genetics , Hematopoietic Stem Cells/metabolism , Multiple Myeloma/metabolism , Algorithms , Aneuploidy , Antigens, CD34/immunology , Chromosomes, Human, Pair 13/metabolism , Gene Expression Profiling , Hematopoietic Stem Cells/immunology , Humans , Multiple Myeloma/genetics , Multiple Myeloma/immunology , Retrospective Studies , Signal Transduction , Stromal Cells/metabolism
14.
Mech Ageing Dev ; 182: 111129, 2019 09.
Article in English | MEDLINE | ID: mdl-31445068

ABSTRACT

Sarcopenia is an age-related multifactorial process that involved several biological mechanisms, whose specific contribution and interplay is still unknown. The present study proposes prognostic networks based on machine learning approaches to unravel the interplay among those biological mechanisms mainly involved in the development of Sarcopenia. After analyzing 114 biological and clinical variables in adults older than 70 years, and using all the biological prognostic networks detected by machine learning with accuracy higher than 82%, we designed a consensus classifier based on majority vote that improve the predictive accuracy of Sarcopenia up to 91%. Additionally, we applied logistic regression analysis to propose the interplay among the most discriminative biological variables of Sarcopenia: anthropometry, body composition, functional performance of lower limbs, systemic oxidative stress, presence of depression and medication for the digestive system based on proton-pump inhibitors. Our data also demonstrate that besides a loss of muscle mass, impairments on functional performance of lower limbs are more relevant for develop Sarcopenia than those affecting the muscle strength.


Subject(s)
Machine Learning , Sarcopenia , Aged , Aged, 80 and over , Female , Humans , Male , Prognosis , Sarcopenia/diagnosis , Sarcopenia/metabolism , Sarcopenia/pathology
15.
Expert Opin Drug Discov ; 14(8): 769-777, 2019 08.
Article in English | MEDLINE | ID: mdl-31140873

ABSTRACT

Introduction: Drug discovery is the process through which potential new compounds are identified by means of biology, chemistry, and pharmacology. Due to the high complexity of genomic data, AI techniques are increasingly needed to help reduce this and aid the adoption of optimal decisions. Phenotypic prediction is of particular use to drug discovery and precision medicine where sets of genes that predict a given phenotype are determined. Phenotypic prediction is an undetermined problem given that the number of monitored genetic probes markedly exceeds the number of collected samples (from patients). This imbalance creates ambiguity in the characterization of the biological pathways that are responsible for disease development. Areas covered: In this paper, the authors present AI methodologies that perform a robust deep sampling of altered genetic pathways to locate new therapeutic targets, assist in drug repurposing and speed up and optimize the drug selection process. Expert opinion: AI is a potential solution to a number of drug discovery problems, though one should, bear in mind that the quality of data predicts the overall quality of the prediction, as in any modeling task in data science. The use of transparent methodologies is crucial, particularly in drug repositioning/repurposing in rare diseases.


Subject(s)
Artificial Intelligence , Drug Discovery/methods , Drug Repositioning , Humans , Phenotype , Precision Medicine/methods
16.
J Mol Model ; 25(3): 79, 2019 Feb 27.
Article in English | MEDLINE | ID: mdl-30810816

ABSTRACT

We discuss the relationship between the problem of protein tertiary structure prediction from the amino acid sequence and the uncertainty analysis. The algorithm presented in this paper belongs to the category of decoy-based modeling, where different known protein models are used to establish a low dimensional space via principal component analysis. The low dimensional space is utilized to perform an energy optimization via a family of very explorative particle swarm optimizers to find the global minimum. The aim of this procedure is to get a representative sample of the nonlinear equivalent region, that is, protein models that have their energy lower than a certain energy bound. The posterior analysis of this family provides very valuable information about the backbone structure of the native conformation and its possible alternate states. This methodology has the advantage of being simple and fast and can help refine the tertiary protein structure. We comprehensively illustrate the performance of our algorithm on one protein from the CASP-9 protein structure prediction experiment. We also provide a theoretical analysis of the energy landscape found in the tertiary structure protein inverse problem, explaining why model reduction techniques (principal component analysis in this case) serve to alleviate the ill-posed character of this high dimensional optimization problem. In addition, we expand the computational benchmark with a summary of other CASP-9 proteins in the Appendix.


Subject(s)
Caspase 9/chemistry , Computational Biology/methods , Algorithms , Amino Acid Sequence , Computer Simulation , Models, Molecular , Principal Component Analysis , Protein Folding , Protein Structure, Tertiary , Proteins/chemistry , Uncertainty
17.
Biomolecules ; 10(1)2019 12 31.
Article in English | MEDLINE | ID: mdl-31906171

ABSTRACT

Accurate prediction of protein stability changes resulting from amino acid substitutions is of utmost importance in medicine to better understand which mutations are deleterious, leading to diseases, and which are neutral. Since conducting wet lab experiments to get a better understanding of protein mutations is costly and time consuming, and because of huge number of possible mutations the need of computational methods that could accurately predict effects of amino acid mutations is of greatest importance. In this research, we present a robust methodology to predict the energy changes of a proteins upon mutations. The proposed prediction scheme is based on two step algorithm that is a Holdout Random Sampler followed by a neural network model for regression. The Holdout Random Sampler is utilized to analysis the energy change, the corresponding uncertainty, and to obtain a set of admissible energy changes, expressed as a cumulative distribution function. These values are further utilized to train a simple neural network model that can predict the energy changes. Results were blindly tested (validated) against experimental energy changes, giving Pearson correlation coefficients of 0.66 for Single Point Mutations and 0.77 for Multiple Point Mutations. These results confirm the successfulness of our method, since it outperforms majority of previous studies in this field.


Subject(s)
Neural Networks, Computer , Protein Stability , Proteins/genetics , Amino Acids/chemistry , Amino Acids/genetics , Databases, Protein , Machine Learning , Point Mutation/genetics , Proteins/chemistry , Thermodynamics
18.
J Pain Res ; 11: 2981-2990, 2018.
Article in English | MEDLINE | ID: mdl-30538537

ABSTRACT

OBJECTIVES: Fibromyalgia syndrome (FMS) is a chronic and often debilitating condition that is characterized by persistent fatigue, pain, bowel abnormalities, and sleep disturbances. Currently, there are no definitive prognostic or diagnostic biomarkers for FMS. This study attempted to utilize a novel predictive algorithm to identify a group of genes whose differential expression discriminated individuals with FMS diagnosis from healthy controls. METHODS: Secondary analysis of gene expression data from 28 women with FMS and 19 age-and race-matched healthy women. Expression of discriminatory genes were identified using fold-change differential and Fisher's ratio (FR). Discriminatory accuracy of the differential expression of these genes was determined using leave-one-out-cross-validation. Functional networks of the discriminating genes were described from the Ingenuity's Knowledge Base. RESULTS: The small-scale signature contained 57 genes whose expressions were highly discriminatory of the FMS diagnosis. The combination of these high discriminatory genes with FR higher than 1.45 provided a leave-one-out-cross-validation accuracy for the FMS diagnosis of 85.11%. The discriminatory genes were associated with 3 canonical pathways: hepatic stellate cell activation, oxidative phosphorylation, and airway pathology related to COPD. CONCLUSION: The discriminating genes, especially the 2 with the highest accuracy, are associated with mitochondrial function or oxidative phosphorylation and glutamate signaling. Further validation of the clinical utility of this finding is warranted.

19.
Transl Psychiatry ; 8(1): 110, 2018 05 30.
Article in English | MEDLINE | ID: mdl-29849049

ABSTRACT

Cancer-related fatigue (CRF) is a common burden in cancer patients and little is known about its underlying mechanism. The primary aim of this study was to identify gene signatures predictive of post-radiotherapy fatigue in prostate cancer patients. We employed Fisher Linear Discriminant Analysis (LDA) to identify predictive genes using whole genome microarray data from 36 men with prostate cancer. Ingenuity Pathway Analysis was used to determine functional networks of the predictive genes. Functional validation was performed using a T lymphocyte cell line, Jurkat E6.1. Cells were pretreated with metabotropic glutamate receptor 5 (mGluR5) agonist (DHPG), antagonist (MPEP), or control (PBS) for 20 min before irradiation at 8 Gy in a Mark-1 γ-irradiator. NF-κB activation was assessed using a NF-κB/Jurkat/GFP Transcriptional Reporter Cell Line. LDA achieved 83.3% accuracy in predicting post-radiotherapy fatigue. "Glutamate receptor signaling" was the most significant (p = 0.0002) pathway among the predictive genes. Functional validation using Jurkat cells revealed clustering of mGluR5 receptors as well as increased regulated on activation, normal T cell expressed and secreted (RANTES) production post irradiation in cells pretreated with DHPG, whereas inhibition of mGluR5 activity with MPEP decreased RANTES concentration after irradiation. DHPG pretreatment amplified irradiation-induced NF-κB activation suggesting a role of mGluR5 in modulating T cell activation after irradiation. These results suggest that mGluR5 signaling in T cells may play a key role in the development of chronic inflammation resulting in fatigue and contribute to individual differences in immune responses to radiation. Moreover, modulating mGluR5 provides a novel therapeutic option to treat CRF.


Subject(s)
Fatigue/etiology , NF-kappa B/metabolism , Prostatic Neoplasms/radiotherapy , Radiotherapy/adverse effects , Receptor, Metabotropic Glutamate 5/agonists , Receptor, Metabotropic Glutamate 5/antagonists & inhibitors , Aged , Genome-Wide Association Study , Humans , Jurkat Cells , Machine Learning , Male , Methoxyhydroxyphenylglycol/analogs & derivatives , Methoxyhydroxyphenylglycol/pharmacology , Middle Aged , Pyridines/pharmacology , Radiotherapy Dosage , T-Lymphocytes/metabolism , Transcriptome
20.
J Bioinform Comput Biol ; 16(2): 1850005, 2018 04.
Article in English | MEDLINE | ID: mdl-29566640

ABSTRACT

We discuss applicability of principal component analysis (PCA) for protein tertiary structure prediction from amino acid sequence. The algorithm presented in this paper belongs to the category of protein refinement models and involves establishing a low-dimensional space where the sampling (and optimization) is carried out via particle swarm optimizer (PSO). The reduced space is found via PCA performed for a set of low-energy protein models previously found using different optimization techniques. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. This term is aimed at providing high frequency details in the energy optimization. The goal of this research is to analyze how the dimensionality reduction affects the prediction capability of the PSO procedure. For that purpose, different proteins from the Critical Assessment of Techniques for Protein Structure Prediction experiments were modeled. In all the cases, both the energy of the best decoy and the distance to the native structure have decreased. Our analysis also shows how the predicted backbone structure of native conformation and of alternative low energy states varies with respect to the PCA dimensionality. Generally speaking, the reconstruction can be successfully achieved with 10 principal components and the high frequency term. We also provide a computational analysis of protein energy landscape for the inverse problem of reconstructing structure from the reduced number of principal components, showing that the dimensionality reduction alleviates the ill-posed character of this high-dimensional energy optimization problem. The procedure explained in this paper is very fast and allows testing different PCA expansions. Our results show that PSO improves the energy of the best decoy used in the PCA when the adequate number of PCA terms is considered.


Subject(s)
Computational Biology/methods , Models, Molecular , Principal Component Analysis , Protein Structure, Tertiary , Proteins/chemistry , Proteins/metabolism , Uracil-DNA Glycosidase/chemistry , Uracil-DNA Glycosidase/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...