RESUMO
Background: Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem. Methods: Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods. Results: We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs. Conclusion: Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.
RESUMO
Noise is a basic ingredient in data, since observed data are always contaminated by unwanted deviations, i.e., noise, which, in the case of overdetermined systems (with more data than model parameters), cause the corresponding linear system of equations to have an imperfect solution. In addition, in the case of highly underdetermined parameterization, noise can be absorbed by the model, generating spurious solutions. This is a very undesirable situation that might lead to incorrect conclusions. We presented mathematical formalism based on the inverse problem theory combined with artificial intelligence methodologies to perform an enhanced sampling of noisy biomedical data to improve the finding of meaningful solutions. Random sampling methods fail for high-dimensional biomedical problems. Sampling methods such as smart model parameterizations, forward surrogates, and parallel computing are better suited for such problems. We applied these methods to several important biomedical problems, such as phenotype prediction and a problem related to predicting the effects of protein mutations, i.e., if a given single residue mutation is neutral or deleterious, causing a disease. We also applied these methods to de novo drug discovery and drug repositioning (repurposing) through the enhanced exploration of huge chemical space. The purpose of these novel methods that address the problem of noise and uncertainty in biomedical data is to find new therapeutic solutions, perform drug repurposing, and accelerate and optimize drug discovery, thus reestablishing homeostasis. Finding the right target, the right compound, and the right patient are the three bottlenecks to running successful clinical trials from the correct analysis of preclinical models. Artificial intelligence can provide a solution to these problems, considering that the character of the data restricts the quality of the prediction, as in any modeling procedure in data analysis. The use of simple and plain methodologies is crucial to tackling these important and challenging problems, particularly drug repositioning/repurposing in rare diseases.
Assuntos
Inteligência Artificial , Reposicionamento de Medicamentos , Incerteza , Reposicionamento de Medicamentos/métodos , Descoberta de Drogas/métodos , FenótipoRESUMO
Big data in health care is a fast-growing field and a new paradigm that is transforming case-based studies to large-scale, data-driven research. As big data is dependent on the advancement of new data standards, technology, and relevant research, the future development of big data applications holds foreseeable promise in the modern day health care revolution. Enormously large, rapidly growing collections of biomedical omics-data (genomics, proteomics, transcriptomics, metabolomics, glycomics, etc.) and clinical data create major challenges and opportunities for their analysis and interpretation and open new computational gateways to address these issues. The design of new robust algorithms that are most suitable to properly analyze this big data by taking into account individual variability in genes has enabled the creation of precision (personalized) medicine. We reviewed and highlighted the significance of big data analytics for personalized medicine and health care by focusing mostly on machine learning perspectives on personalized medicine, genomic data models with respect to personalized medicine, the application of data mining algorithms for personalized medicine as well as the challenges we are facing right now in big data analytics.
Assuntos
Ciência de Dados , Medicina de Precisão , Big Data , Atenção à Saúde , Genômica , Medicina de Precisão/métodosRESUMO
BACKGROUND: Phenotype prediction problems are usually considered ill-posed, as the amount of samples is very limited with respect to the scrutinized genetic probes. This fact complicates the sampling of the defective genetic pathways due to the high number of possible discriminatory genetic networks involved. In this research, we outline three novel sampling algorithms utilized to identify, classify and characterize the defective pathways in phenotype prediction problems, such as the Fisher's ratio sampler, the Holdout sampler and the Random sampler, and apply each one to the analysis of genetic pathways involved in tumor behavior and outcomes of triple negative breast cancers (TNBC). Altered biological pathways are identified using the most frequently sampled genes and are compared to those obtained via Bayesian Networks (BNs). RESULTS: Random, Fisher's ratio and Holdout samplers were more accurate and robust than BNs, while providing comparable insights about disease genomics. CONCLUSIONS: The three samplers tested are good alternatives to Bayesian Networks since they are less computationally demanding algorithms. Importantly, this analysis confirms the concept of "biological invariance" since the altered pathways should be independent of the sampling methodology and the classifier used for their inference. Nevertheless, still some modifications are needed in the Bayesian networks to be able to sample correctly the uncertainty space in phenotype prediction problems, since the probabilistic parameterization of the uncertainty space is not unique and the use of the optimum network might falsify the pathways analysis.
Assuntos
Algoritmos , Neoplasias de Mama Triplo Negativas/patologia , Teorema de Bayes , Bases de Dados Genéticas , Feminino , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Metástase Neoplásica , Fenótipo , Análise de Sobrevida , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/mortalidadeRESUMO
AIMS: It is known that matrix metalloproteinase (MMP)-11 has a role in tumour development and progression, and also that immune cells can influence cancer cells to increase their proliferative and invasive properties. The aim of the present study was to propose the evaluation of MMP11 expression by intratumoral mononuclear inflammatory cells (MICs) as a useful biological marker for breast cancer prognosis. METHODS AND RESULTS: This study comprised 246 women with invasive breast carcinoma, and a long follow-up period. Patients were stratified with regard to nodal status and to the development of metastatic disease. The median follow-up period in patients without metastasis was 146 months and in patients with metastatic disease 31 months. MMP11 was determined by immunohistochemistry. For relapse-free survival (RFS) and overall survival (OS) analysis we used the Cox's univariate method. Cox's regression model was used to examine the interactions between different prognostic factors in a multivariate analysis. CONCLUSIONS: Our results showed that MMP11 expression by stromal cells was significantly associated with prognosis. MMP11 expression by cancer-associated fibroblasts (CAFs) was associated with both shortened RFS and OS, but MMP11 expression by MICs showed a stronger association with both shortened RFS and OS, therefore being the most potent and independent factor to predict RFS and OS.
Assuntos
Neoplasias da Mama/diagnóstico , Regulação Neoplásica da Expressão Gênica , Metaloproteinase 11 da Matriz/metabolismo , Mama/patologia , Neoplasias da Mama/patologia , Fibroblastos Associados a Câncer/patologia , Intervalo Livre de Doença , Feminino , Humanos , Imuno-Histoquímica , Inflamação/patologia , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Análise Multivariada , Metástase Neoplásica , Prognóstico , Células Estromais/patologiaRESUMO
BACKGROUND: B-cell chronic lymphocytic leukemia (CLL) is a heterogeneous disease and the most common adult leukemia in western countries. IgVH mutational status distinguishes two major types of CLL, each associated with a different prognosis and survival. Sequencing identified NOTCH1 and SF3B1 as the two main recurrent mutations. We described a novel method to clarify how these mutations affect gene expression by finding small-scale signatures that predict the IgVH, NOTCH1 and SF3B1 mutations. We subsequently defined the biological pathways and correlation networks involved in disease development, with the potential goal of identifying new drugable targets. METHODS: We modeled a microarray dataset consisting of 48807 probes derived from 163 samples. The use of Fisher's ratio and fold change combined with feature elimination allowed us to identify the minimum number of genes with the highest predictive mutation power and, subsequently, we applied network and pathway analyses of these genes to identify their biological roles. RESULTS: The mutational status of the patients was accurately predicted (94-99%) using small-scale gene signatures: 13 genes for IgVH, 60 for NOTCH1 and 22 for SF3B1. LPL plays an important role in the case of the IgVH mutation, whereas MSI2, LTK, TFEC and CNTAP2 are involved in the NOTCH1 mutation, and RPL32 and PLAGL1 are involved in the SF3B1 mutation. Four high discriminatory genes (IGHG1, MYBL1, NRIP1 and RGS1) are common to these three mutations. The IL-4-mediated signaling events pathway appears to be involved as a common mechanism and suggests an important role of the immune response mechanisms and antigen presentation. CONCLUSIONS: This retrospective analysis served to provide a deeper understanding of the effects of the different mutations in CLL disease progression, with the expectation that these findings will be clinically applied in the near future to the development of new drugs.
Assuntos
Genômica , Leucemia Linfocítica Crônica de Células B/genética , Biomarcadores Tumorais , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Redes Reguladoras de Genes , Estudos de Associação Genética , Predisposição Genética para Doença , Genômica/métodos , Humanos , Cadeias Pesadas de Imunoglobulinas/genética , Leucemia Linfocítica Crônica de Células B/diagnóstico , Leucemia Linfocítica Crônica de Células B/metabolismo , Leucemia Linfocítica Crônica de Células B/mortalidade , Modelos Biológicos , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Fosfoproteínas/genética , Análise de Componente Principal , Prognóstico , Fatores de Processamento de RNA/genética , Receptores Notch/genética , Reprodutibilidade dos Testes , Estudos Retrospectivos , Transdução de SinaisRESUMO
INTRODUCTION: It has become clear that noise generated during the assay and analytical processes has the ability to disrupt accurate interpretation of genomic studies. Not only does such noise impact the scientific validity and costs of studies, but when assessed in the context of clinically translatable indications such as phenotype prediction, it can lead to inaccurate conclusions that could ultimately impact patients. We applied a sequence of ranking methods to damp noise associated with microarray outputs, and then tested the utility of the approach in three disease indications using publically available datasets. MATERIALS AND METHODS: This study was performed in three phases. We first theoretically analyzed the effect of noise in phenotype prediction problems showing that it can be expressed as a modeling error that partially falsifies the pathways. Secondly, via synthetic modeling, we performed the sensitivity analysis for the main gene ranking methods to different types of noise. Finally, we studied the predictive accuracy of the gene lists provided by these ranking methods in synthetic data and in three different datasets related to cancer, rare and neurodegenerative diseases to better understand the translational aspects of our findings. RESULTS AND DISCUSSION: In the case of synthetic modeling, we showed that Fisher's Ratio (FR) was the most robust gene ranking method in terms of precision for all the types of noise at different levels. Significance Analysis of Microarrays (SAM) provided slightly lower performance and the rest of the methods (fold change, entropy and maximum percentile distance) were much less precise and accurate. The predictive accuracy of the smallest set of high discriminatory probes was similar for all the methods in the case of Gaussian and Log-Gaussian noise. In the case of class assignment noise, the predictive accuracy of SAM and FR is higher. Finally, for real datasets (Chronic Lymphocytic Leukemia, Inclusion Body Myositis and Amyotrophic Lateral Sclerosis) we found that FR and SAM provided the highest predictive accuracies with the smallest number of genes. Biological pathways were found with an expanded list of genes whose discriminatory power has been established via FR. CONCLUSIONS: We have shown that noise in expression data and class assignment partially falsifies the sets of discriminatory probes in phenotype prediction problems. FR and SAM better exploit the principle of parsimony and are able to find subsets with less number of high discriminatory genes. The predictive accuracy and the precision are two different metrics to select the important genes, since in the presence of noise the most predictive genes do not completely coincide with those that are related to the phenotype. Based on the synthetic results, FR and SAM are recommended to unravel the biological pathways that are involved in the disease development.
Assuntos
Genótipo , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Inteligência Artificial , Perfilação da Expressão Gênica , Técnicas Genéticas , Humanos , SoftwareRESUMO
INTRODUCTION: Chronic Lymphocytic Leukemia (CLL) is a disease with highly heterogeneous clinical course. A key goal is the prediction of patients with high risk of disease progression, which could benefit from an earlier or more intense treatment. In this work we introduce a simple methodology based on machine learning methods to help physicians in their decision making in different problems related to CLL. MATERIAL AND METHODS: Clinical data belongs to a retrospective study of a cohort of 265 Caucasians who were diagnosed with CLL between 1997 and 2007 in Hospital Cabueñes (Asturias, Spain). Different machine learning methods were applied to find the shortest list of most discriminatory prognostic variables to predict the need of Chemotherapy Treatment and the development of an Autoimmune Disease. RESULTS: Autoimmune disease occurrence was predicted with very high accuracy (>90%). Autoimmune disease development is currently an unpredictable severe complication of CLL. Chemotherapy Treatment has been predicted with a lower accuracy (80%). Risk analysis showed that the number of false positives and false negatives are well balanced. CONCLUSIONS: Our study highlights the importance of prognostic variables associated with the characteristics of platelets, reticulocytes and natural killers, which are the main targets of the autoimmune haemolytic anemia and immune thrombocytopenia for autoimmune disease development, and also, the relevance of some clinical variables related with the immune characteristics of CLL patients that are not taking into account by current prognostic markers for predicting the need of chemotherapy. Because of its simplicity, this methodology could be implemented in spreadsheets.
Assuntos
Diagnóstico por Computador/métodos , Leucemia Linfocítica Crônica de Células B/diagnóstico , Informática Médica/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Antineoplásicos/uso terapêutico , Doenças Autoimunes/diagnóstico , Tomada de Decisões , Progressão da Doença , Reações Falso-Positivas , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Probabilidade , Prognóstico , Curva ROC , Estudos Retrospectivos , Medição de Risco , Software , Tempo para o TratamentoRESUMO
BACKGROUND: To understand the transcriptomic response to SARS-CoV-2 infection, is of the utmost importance to design diagnostic tools predicting the severity of the infection. METHODS: We have performed a deep sampling analysis of the viral transcriptomic data oriented towards drug repositioning. Using different samplers, the basic principle of this methodology the biological invariance, which means that the pathways altered by the disease, should be independent on the algorithm used to unravel them. RESULTS: The transcriptomic analysis of the altered pathways, reveals a distinctive inflammatory response and potential side effects of infection. The virus replication causes, in some cases, acute respiratory distress syndrome in the lungs, and affects other organs such as heart, brain, and kidneys. Therefore, the repositioned drugs to fight COVID-19 should, not only target the interferon signalling pathway and the control of the inflammation, but also the altered genetic pathways related to the side effects of infection. We also show via Principal Component Analysis that the transcriptome signatures are different from influenza and RSV. The gene COL1A1, which controls collagen production, seems to play a key/vital role in the regulation of the immune system. Additionally, other small-scale signature genes appear to be involved in the development of other COVID-19 comorbidities. CONCLUSIONS: Transcriptome-based drug repositioning offers possible fast-track antiviral therapy for COVID-19 patients. It calls for additional clinical studies using FDA approved drugs for patients with increased susceptibility to infection and with serious medical complications.
Assuntos
Tratamento Farmacológico da COVID-19 , COVID-19 , SARS-CoV-2 , Antivirais/farmacologia , Antivirais/uso terapêutico , COVID-19/genética , Reposicionamento de Medicamentos , Humanos , Interferons , Transcriptoma/genéticaRESUMO
The complexity of orphan diseases, which are those that do not have an effective treatment, together with the high dimensionality of the genetic data used for their analysis and the high degree of uncertainty in the understanding of the mechanisms and genetic pathways which are involved in their development, motivate the use of advanced techniques of artificial intelligence and in-depth knowledge of molecular biology, which is crucial in order to find plausible solutions in drug design, including drug repositioning. Particularly, we show that the use of robust deep sampling methodologies of the altered genetics serves to obtain meaningful results and dramatically decreases the cost of research and development in drug design, influencing very positively the use of precision medicine and the outcomes in patients. The target-centric approach and the use of strong prior hypotheses that are not matched against reality (disease genetic data) are undoubtedly the cause of the high number of drug design failures and attrition rates. Sampling and prediction under uncertain conditions cannot be avoided in the development of precision medicine.
RESUMO
BACKGROUND: Although some studies show that there could be a genetic predisposition to develop Multiple Sclerosis (MS), attempts to find genetic signatures related to MS diagnosis and development are extremely rare. METHOD: We carried out a retrospective analysis of two different microarray datasets, using machine learning techniques to understand the defective pathways involved in this disease. We have modeled two data sets that are publicly accessible. The first was used to establish the list of most discriminatory genes; whereas, the second one was utilized for validation purposes. RESULTS: The analysis provided a list of high discriminatory genes with predictive cross-validation accuracy higher than 95%, both in learning and in blind validation. The results were confirmed via the holdout sampler. The most discriminatory genes were related to the production of Hemoglobin. The biological processes involved were related to T-cell Receptor Signaling and co-stimulation, Interferon-Gamma Signaling and Antigen Processing and Presentation. Drug repositioning via CMAP methodologies highlighted the importance of Trichostatin A and other HDAC inhibitors. CONCLUSIONS: The defective pathways suggest viral or bacterial infections as plausible mechanisms involved in MS development. The pathway analysis also confirmed coincidences with Epstein-Barr virus, Influenza A, Toxoplasmosis, Tuberculosis and Staphylococcus Aureus infections. Th17 Cell differentiation, and CD28 co-stimulation seemed to be crucial in the development of this disease. Furthermore, the additional knowledge provided by this analysis helps to identify new therapeutic targets.
Assuntos
Bases de Dados de Ácidos Nucleicos , Reposicionamento de Medicamentos , Aprendizado de Máquina , Redes e Vias Metabólicas , Modelos Imunológicos , Esclerose Múltipla , Feminino , Regulação da Expressão Gênica/efeitos dos fármacos , Regulação da Expressão Gênica/imunologia , Humanos , Masculino , Redes e Vias Metabólicas/efeitos dos fármacos , Redes e Vias Metabólicas/imunologia , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/imunologia , Análise de Sequência com Séries de Oligonucleotídeos , Estudos Retrospectivos , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/imunologiaRESUMO
OBJECTIVES: Fibromyalgia syndrome (FMS) is a chronic and often debilitating condition that is characterized by persistent fatigue, pain, bowel abnormalities, and sleep disturbances. Currently, there are no definitive prognostic or diagnostic biomarkers for FMS. This study attempted to utilize a novel predictive algorithm to identify a group of genes whose differential expression discriminated individuals with FMS diagnosis from healthy controls. METHODS: Secondary analysis of gene expression data from 28 women with FMS and 19 age-and race-matched healthy women. Expression of discriminatory genes were identified using fold-change differential and Fisher's ratio (FR). Discriminatory accuracy of the differential expression of these genes was determined using leave-one-out-cross-validation. Functional networks of the discriminating genes were described from the Ingenuity's Knowledge Base. RESULTS: The small-scale signature contained 57 genes whose expressions were highly discriminatory of the FMS diagnosis. The combination of these high discriminatory genes with FR higher than 1.45 provided a leave-one-out-cross-validation accuracy for the FMS diagnosis of 85.11%. The discriminatory genes were associated with 3 canonical pathways: hepatic stellate cell activation, oxidative phosphorylation, and airway pathology related to COPD. CONCLUSION: The discriminating genes, especially the 2 with the highest accuracy, are associated with mitochondrial function or oxidative phosphorylation and glutamate signaling. Further validation of the clinical utility of this finding is warranted.
RESUMO
Cancer-related fatigue (CRF) is a common burden in cancer patients and little is known about its underlying mechanism. The primary aim of this study was to identify gene signatures predictive of post-radiotherapy fatigue in prostate cancer patients. We employed Fisher Linear Discriminant Analysis (LDA) to identify predictive genes using whole genome microarray data from 36 men with prostate cancer. Ingenuity Pathway Analysis was used to determine functional networks of the predictive genes. Functional validation was performed using a T lymphocyte cell line, Jurkat E6.1. Cells were pretreated with metabotropic glutamate receptor 5 (mGluR5) agonist (DHPG), antagonist (MPEP), or control (PBS) for 20 min before irradiation at 8 Gy in a Mark-1 γ-irradiator. NF-κB activation was assessed using a NF-κB/Jurkat/GFP Transcriptional Reporter Cell Line. LDA achieved 83.3% accuracy in predicting post-radiotherapy fatigue. "Glutamate receptor signaling" was the most significant (p = 0.0002) pathway among the predictive genes. Functional validation using Jurkat cells revealed clustering of mGluR5 receptors as well as increased regulated on activation, normal T cell expressed and secreted (RANTES) production post irradiation in cells pretreated with DHPG, whereas inhibition of mGluR5 activity with MPEP decreased RANTES concentration after irradiation. DHPG pretreatment amplified irradiation-induced NF-κB activation suggesting a role of mGluR5 in modulating T cell activation after irradiation. These results suggest that mGluR5 signaling in T cells may play a key role in the development of chronic inflammation resulting in fatigue and contribute to individual differences in immune responses to radiation. Moreover, modulating mGluR5 provides a novel therapeutic option to treat CRF.
Assuntos
Fadiga/etiologia , NF-kappa B/metabolismo , Neoplasias da Próstata/radioterapia , Radioterapia/efeitos adversos , Receptor de Glutamato Metabotrópico 5/agonistas , Receptor de Glutamato Metabotrópico 5/antagonistas & inibidores , Idoso , Estudo de Associação Genômica Ampla , Humanos , Células Jurkat , Aprendizado de Máquina , Masculino , Metoxi-Hidroxifenilglicol/análogos & derivados , Metoxi-Hidroxifenilglicol/farmacologia , Pessoa de Meia-Idade , Piridinas/farmacologia , Dosagem Radioterapêutica , Linfócitos T/metabolismo , TranscriptomaRESUMO
Tumor cell plasticity is a major obstacle for the cure of malignancies as it makes tumor cells highly adaptable to microenvironmental changes, enables their phenotype switching among different forms, and favors the generation of prometastatic tumor cell subsets. Phenotype switching toward more aggressive forms involves different functional, phenotypic, and morphologic changes, which are often related to the process known as epithelial-mesenchymal transition (EMT). In this study, we report natural killer (NK) cells may increase the malignancy of melanoma cells by inducing changes relevant to EMT and, more broadly, to phenotype switching from proliferative to invasive forms. In coculture, NK cells induced effects on tumor cells similar to those induced by EMT-promoting cytokines, including upregulation of stemness and EMT markers, morphologic transition, inhibition of proliferation, and increased capacity for Matrigel invasion. Most changes were dependent on the engagement of NKp30 or NKG2D and the release of cytokines including IFNγ and TNFα. Moreover, EMT induction also favored escape from NK-cell attack. Melanoma cells undergoing EMT either increased NK-protective HLA-I expression on their surface or downregulated several tumor-recognizing activating receptors on NK cells. Mass spectrometry-based proteomic analysis revealed in two different melanoma cell lines a partial overlap between proteomic profiles induced by NK cells or by EMT cytokines, indicating that various processes or pathways related to tumor progression are induced by exposure to NK cells.Significance: NK cells can induce prometastatic properties on melanoma cells that escape from killing, providing important clues to improve the efficacy of NK cells in innovative antitumor therapies. Cancer Res; 78(14); 3913-25. ©2018 AACR.
Assuntos
Transição Epitelial-Mesenquimal/imunologia , Células Matadoras Naturais/imunologia , Melanoma/imunologia , Proteoma/imunologia , Linhagem Celular Tumoral , Proliferação de Células/fisiologia , Técnicas de Cocultura/métodos , Citocinas/imunologia , Antígenos de Histocompatibilidade Classe I/imunologia , Humanos , Interferon gama/imunologia , Subfamília K de Receptores Semelhantes a Lectina de Células NK/imunologia , Receptor 3 Desencadeador da Citotoxicidade Natural/imunologia , Fenótipo , Proteômica/métodos , Regulação para Cima/imunologiaRESUMO
Genomics has been used with varying degrees of success in the context of drug discovery and in defining mechanisms of action for diseases like cancer and neurodegenerative and rare diseases in the quest for orphan drugs. To improve its utility, accuracy, and cost-effectiveness optimization of analytical methods, especially those that translate to clinically relevant outcomes, is critical. Here we define a novel tool for genomic analysis termed a biomedical robot in order to improve phenotype prediction, identifying disease pathogenesis and significantly defining therapeutic targets. Biomedical robot analytics differ from historical methods in that they are based on melding feature selection methods and ensemble learning techniques. The biomedical robot mathematically exploits the structure of the uncertainty space of any classification problem conceived as an ill-posed optimization problem. Given a classifier, there exist different equivalent small-scale genetic signatures that provide similar predictive accuracies. We perform the sensitivity analysis to noise of the biomedical robot concept using synthetic microarrays perturbed by different kinds of noises in expression and class assignment. Finally, we show the application of this concept to the analysis of different diseases, inferring the pathways and the correlation networks. The final aim of a biomedical robot is to improve knowledge discovery and provide decision systems to optimize diagnosis, treatment, and prognosis. This analysis shows that the biomedical robots are robust against different kinds of noises and particularly to a wrong class assignment of the samples. Assessing the uncertainty that is inherent to any phenotype prediction problem is the right way to address this kind of problem.
Assuntos
Biomarcadores/análise , Descoberta de Drogas , Robótica/métodos , Esclerose Lateral Amiotrófica/tratamento farmacológico , Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/patologia , Inteligência Artificial , Humanos , Leucemia Linfocítica Crônica de Células B/tratamento farmacológico , Leucemia Linfocítica Crônica de Células B/genética , Leucemia Linfocítica Crônica de Células B/patologia , Mutação , Miosite de Corpos de Inclusão/tratamento farmacológico , Miosite de Corpos de Inclusão/genética , Miosite de Corpos de Inclusão/patologia , Fenótipo , Prognóstico , Robótica/instrumentação , Análise de Sequência de DNA/métodos , SoftwareRESUMO
To better understand the impact of microarray preprocessing normalization techniques on the analysis of biological pathways in the prediction of chronic fatigue (CF) following radiation therapy, this study has compared the list of predictive genes found using the Robust Multiarray Averaging (RMA) and the Affymetrix MAS5 method, with the list that is obtained working with raw data (without any preprocessing). First, we modeled the spiked-in data set where differentially expressed genes were known and spiked-in at different known concentrations, showing that the precisions established by different gene ranking methods were higher than working with raw data. The results obtained from the spiked-in experiment were extrapolated to the CF data set to run learning and blind validation. RMA and MAS5 provided different sets of discriminatory genes that have a higher predictive accuracy in the learning phase, but lower predictive accuracy during the blind validation phase, suggesting that the genetic signatures generated using both preprocessing techniques cannot be generalizable. The pathways found using the raw data set better described what is a priori known for the CF disease. Besides, RMA produced more reliable pathways than MAS5. Understanding the strengths of these two preprocessing techniques in phenotype prediction is critical for precision medicine. Particularly, this article concludes that biological pathways might be better unraveled working with raw expression data. Moreover, the interpretation of the predictive gene profiles generated by RMA and MAS5 should be done with caution. This is an important conclusion with a high translational impact that should be confirmed in other disease data sets.
Assuntos
Biologia Computacional/métodos , Síndrome de Fadiga Crônica/diagnóstico , Perfilação da Expressão Gênica/métodos , Marcadores Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Neoplasias da Próstata/radioterapia , Síndrome de Fadiga Crônica/epidemiologia , Síndrome de Fadiga Crônica/genética , Redes Reguladoras de Genes , Variação Genética , Genômica , Humanos , Masculino , Modelos Genéticos , Radioterapia/efeitos adversos , Transdução de SinaisRESUMO
BACKGROUND: Fatigue is a common side effect of cancer (CA) treatment. We used a novel analytical method to identify and validate a specific gene cluster that is predictive of fatigue risk in prostate cancer patients (PCP) treated with radiotherapy (RT). METHODS: A total of 44 PCP were categorized into high-fatigue (HF) and low-fatigue (LF) cohorts based on fatigue score change from baseline to RT completion. Fold-change differential and Fisher's linear discriminant analyses (LDA) from 27 subjects with gene expression data at baseline and RT completion generated a reduced base of most discriminatory genes (learning phase). A nearest-neighbor risk (k-NN) prediction model was developed based on small-scale prognostic signatures. The predictive model validity was tested in another 17 subjects using baseline gene expression data (validation phase). RESULT: The model generated in the learning phase predicted HF classification at RT completion in the validation phase with 76.5% accuracy. CONCLUSION: The results suggest that a novel analytical algorithm that incorporates fold-change differential analysis, LDA, and a k-NN may have applicability in predicting regimen-related toxicity in cancer patients with high reliability, if we take into account these results and the limited amount of data that we had at disposal. It is expected that the accuracy will be improved by increasing data sampling in the learning phase.