RESUMO
Around 50 years ago, molecular biology opened the path to understand changes in forms, adaptations, complexity, or the basis of human diseases through myriads of reports on gene birth, gene duplication, gene expression regulation, and splicing regulation, among other relevant mechanisms behind gene function. Here, with the advent of big data and artificial intelligence (AI), we focus on an elusive and intriguing mechanism of gene function regulation, RNA editing, in which a single nucleotide from an RNA molecule is changed, with a remarkable impact in the increase of the complexity of the transcriptome and proteome. We present a new generation approach to assess the functional conservation of the RNA-editing targeting mechanism using two AI learning algorithms, random forest (RF) and bidirectional long short-term memory (biLSTM) neural networks with an attention layer. These algorithms, combined with RNA-editing data coming from databases and variant calling from same-individual RNA and DNA-seq experiments from different species, allowed us to predict RNA-editing events using both primary sequence and secondary structure. Then, we devised a method for assessing conservation or divergence in the molecular mechanisms of editing completely in silico: the cross-testing analysis. This novel method not only helps to understand the conservation of the editing mechanism through evolution but could set the basis for achieving a better understanding of the adenosine-targeting mechanism in other fields.
Assuntos
Aprendizado de Máquina , Edição de RNA , Humanos , Algoritmos , Simulação por Computador , Biologia Computacional/métodos , Redes Neurais de Computação , RNA/genética , RNA/metabolismoRESUMO
BACKGROUND: Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis. RESULTS: The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios. CONCLUSIONS: The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.
Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Gráficos por Computador , Cirrose Hepática Biliar/mortalidade , Neoplasias Pulmonares/mortalidade , Linfoma Difuso de Grandes Células B/mortalidade , Máquina de Vetores de Suporte , Humanos , Cirrose Hepática Biliar/genética , Neoplasias Pulmonares/genética , Linfoma Difuso de Grandes Células B/genética , Taxa de SobrevidaRESUMO
BACKGROUND: Pathway expression is multivariate in nature. Thus, from a statistical perspective, to detect differentially expressed pathways between two conditions, methods for inferring differences between mean vectors need to be applied. Maximum mean discrepancy (MMD) is a statistical test to determine whether two samples are from the same distribution, its implementation being greatly simplified using the kernel method. RESULTS: An MMD-based test successfully detected the differential expression between two conditions, specifically the expression of a set of genes involved in certain fatty acid metabolic pathways. Furthermore, we exploited the ability of the kernel method to integrate data and successfully added hepatic fatty acid levels to the test procedure. CONCLUSION: MMD is a non-parametric test that acquires several advantages when combined with the kernelization of data: 1) the number of variables can be greater than the sample size; 2) omics data can be integrated; 3) it can be applied not only to vectors, but to strings, sequences and other common structured data types arising in molecular biology.
Assuntos
Algoritmos , Biologia Computacional/métodos , Expressão Gênica , Animais , Dieta , Ácidos Graxos/metabolismo , Genômica , Fígado/metabolismo , Metabolômica , Camundongos , Camundongos Knockout , Óleos de Plantas/química , Óleos de Plantas/metabolismo , Óleo de GirassolRESUMO
OBJECTIVES: The aim of this study is to determine whether the LIN28B gene is differentially distributed in the Mediterranean region through the analysis of the allele distribution of three single nucleotide polymorphisms (SNPs), namely rs7759938, rs314277, and rs221639, in 24 populations. These SNPs have been recently related to the age at menarche, pubertal height growth, peripubertal body mass index, levels of prenatal testosterone exposure, and cancer survival. METHODS: A total of 1,197 DNA samples were genotyped. The allele frequencies were used to determine the relationship between populations, with data from the 1000 Genomes Project being used for external comparisons. The genotype distributions and the population structure between populations and groups of populations were determined. RESULTS: The population results indicate a significant degree of variation (FST = 0.043, P < 0.0001). Allele frequencies show significant differences among populations. A hierarchical variance analysis is consistent with a primary differentiation between populations on the North and South coasts of the Mediterranean. This difference is especially evident in the unexpected distribution of the SNP rs221639, which shows one of the highest FST (11.5%, P < 0.0001) values described in the Mediterranean region thus far. CONCLUSION: The population differentiation and the structuring of the genetic variance, in agreement with previous studies, indicate that the SNPs in question are good tools for the study of human populations, even at a microgeographic level. Am. J. Hum. Biol. 28:905-912, 2016. © 2016Wiley Periodicals, Inc.
Assuntos
Frequência do Gene , Polimorfismo de Nucleotídeo Único , Proteínas de Ligação a RNA/genética , África do Norte , Humanos , Região do MediterrâneoRESUMO
High-throughput technologies have generated vast amounts of omic data. It is a consensus that the integration of diverse omics sources improves predictive models and biomarker discovery. However, managing multiple omics data poses challenges such as data heterogeneity, noise, high-dimensionality and missing data, especially in block-wise patterns. This study addresses the challenges of high dimensionality and block-wise missing data through a regularization and constrained-based approach. The methodology is implemented in the R package bwm for binary and continuous response variables, and applied to breast cancer and exposome multi-omics datasets, achieving strong performance even in scenarios with missing data present in all omics. In binary classification task, our proposed model achieves accuracy in the range of 86% to 92%, and F1 in the range of 68% to 79%. And, in regression task the correlation between true and predicted responses is in the range of 72% to 76%. However, there is a slight decline in performance metrics as the percentage of missing data increases. In scenarios where block-wise missing data affects multiple omics, the model performance actually surpasses that of scenarios where missing data is present in only one omics. One possible explanation for this might be that the other scenarios introduce a greater diversity of observation profiles, leading to a more robust model. Depending on the specific omics being studied, there is greater consistency in feature selection when comparing block-wise missing data scenarios.
Assuntos
Neoplasias da Mama , Humanos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Genômica/métodos , Algoritmos , Feminino , MultiômicaRESUMO
SCOPE: Evidence on the Mediterranean diet (MD) and age-related cognitive decline (CD) is still inconclusive partly due to self-reported dietary assessment. The aim of the current study is to develop an MD- metabolomic score (MDMS) and investigate its association with CD in community-dwelling older adults. METHODS AND RESULTS: This study includes participants from the Three-City Study from the Bordeaux (n = 418) and Dijon (n = 422) cohorts who are free of dementia at baseline. Repeated measures of cognition over 12 years are collected. An MDMS is designed based on serum biomarkers related to MD key food groups and using a targeted metabolomics platform. Associations with CD are investigated through conditional logistic regression (matched on age, sex, and education level) in both sample sets. The MDMS is found to be inversely associated with CD (odds ratio [OR] [95% confidence interval (CI)] = 0.90 [0.80-1.00]; p = 0.048) in the Bordeaux (discovery) cohort. Results are comparable in the Dijon (validation) cohort, with a trend toward significance (OR [95% CI] = 0.91 [0.83-1.01]; p = 0.084). CONCLUSIONS: A greater adherence to the MD, here assessed by a serum MDMS, is associated with lower odds of CD in older adults.
RESUMO
The use of mouse models has revolutionized the field of Down syndrome (DS), increasing our knowledge about neuropathology and helping to propose new therapies for cognitive impairment. However, concerns about the reproducibility of results in mice and their translatability to humans have become a major issue, and controlling for moderators of behavior is essential. Social and environmental factors, the experience of the researcher, and the sex and strain of the animals can all have effects on behavior, and their impact on DS mouse models has not been explored. Here we analyzed the influence of a number of social and environmental factors, usually not taken into consideration, on the behavior of male and female wild-type and trisomic mice (the Ts65Dn model) in one of the most used tests for proving drug effects on memory, the novel object recognition (NOR) test. Using principal component analysis and correlation matrices, we show that the ratio of trisomic mice in the cage, the experience of the experimenter, and the timing of the test have a differential impact on male and female and on wild-type and trisomic behavior. We conclude that although the NOR test is quite robust and less susceptible to environmental influences than expected, to obtain useful results, the phenotype expression must be contrasted against the influences of social and environmental factors.
RESUMO
Polyphenols have great potential in regulating intestinal health and ameliorating pathological conditions related to increased intestinal permeability (IP). However, the efficacy of dietary interventions with these phytochemicals may significantly be influenced by interindividual variability factors affecting their bioavailability and consequent biological activity. In the present study, urine samples collected from older subjects undergoing a crossover intervention trial with polyphenol-rich foods were subjected to metabolomics analysis for investigating the impact of increased IP on the bioavailability of polyphenols. Interestingly, urinary levels of phase II and microbiota-derived metabolites were significantly different between subjects with healthier intestinal barrier integrity and those with increased IP disruption. Our results support that this IP-dependent impaired bioavailability of polyphenols could be attributed to disturbances in the gut microbial metabolism and phase II methylation processes. Furthermore, we also observed that microbiota-derived metabolites could be largely responsible for the biological activity elicited by dietary polyphenols against age-related disrupted IP.
Assuntos
Envelhecimento/metabolismo , Mucosa Intestinal/metabolismo , Polifenóis/metabolismo , Idoso , Idoso de 80 Anos ou mais , Disponibilidade Biológica , Dieta , Feminino , Microbioma Gastrointestinal , Humanos , Masculino , Pessoa de Meia-Idade , PermeabilidadeRESUMO
Neuroprotection of erythropoietin (EPO) following long-term administration is hampered by the associated undesirable effects on hematopoiesis and body weight. For this reason, we tested carbamylated-EPO (CEPO), which has no effect on erythropoiesis, and compared it with EPO in the AßPP/PS1 mouse model of familial Alzheimer's disease. Groups of 5-month old wild type (WT) and transgenic mice received chronic treatment consisting of CEPO (2,500 or 5,000 UI/kg) or EPO (2,500 U I/kg) 3 days/week for 4 weeks. Memory at the end of treatment was assessed with the object recognition test. Microarray analysis and quantitative-PCR were used for gene expression studies. No alterations in erythropoiesis were observed in CEPO-treated WT and AßPP/PS1 transgenic mice. EPO and CEPO improved memory in AßPP/PS1 animals. However, only EPO decreased amyloid-ß (Aß)plaque burden and soluble Aß(40). Microarray analysis of gene expression revealed a limited number of common genes modulated by EPO and CEPO. CEPO but not EPO significantly increased gene expression of dopamine receptors 1 and 2, and adenosine receptor 2a, and significantly down-regulated adrenergic receptor 1D and gastrin releasing peptide. CEPO treatment resulted in higher protein levels of dopamine receptors 1 and 2 in WT and AßPP/PS1 animals, whereas the adenosine receptor 2a was reduced in WT animals. The present results suggest that the improved behavior observed in AßPP/PS1 transgenic mice after CEPO treatment may be mediated, at least in part, by the observed modulation of the expression of molecules involved in neurotransmission.
Assuntos
Doença de Alzheimer/complicações , Eritropoetina/análogos & derivados , Regulação da Expressão Gênica/efeitos dos fármacos , Transtornos da Memória/tratamento farmacológico , Transtornos da Memória/etiologia , Sinapses/metabolismo , Doença de Alzheimer/genética , Peptídeos beta-Amiloides/metabolismo , Precursor de Proteína beta-Amiloide/genética , Animais , Peso Corporal/efeitos dos fármacos , Peso Corporal/genética , Modelos Animais de Doenças , Eritropoetina/uso terapêutico , Peptídeo Liberador de Gastrina/metabolismo , Regulação da Expressão Gênica/genética , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Mutação/genética , Fragmentos de Peptídeos/metabolismo , Presenilina-1/genética , Receptores de Catecolaminas/metabolismo , Sinapses/genética , Fatores de TempoRESUMO
BACKGROUND: Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. RESULTS: We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. CONCLUSIONS: The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Assuntos
Biologia Computacional/métodos , Nutrigenômica , Análise de Componente Principal , Estatística como AssuntoRESUMO
The exact function of interleukin-19 (IL-19) on immune response is poorly understood. In mice, IL-19 up-regulates TNFα and IL-6 expression and its deficiency increases susceptibility to DSS-induced colitis. In humans, IL-19 favors a Th2 response and is elevated in several diseases. We here investigate the expression and effects of IL-19 on cells from active Crohn's disease (CD) patient. Twenty-three active CD patients and 20 healthy controls (HC) were included. mRNA and protein IL-19 levels were analyzed in monocytes. IL-19 effects were determined in vitro on the T cell phenotype and in the production of cytokines by immune cells. We observed that unstimulated and TLR-activated monocytes expressed significantly lower IL-19 mRNA in active CD patients than in HC (logFC = -1.97 unstimulated; -1.88 with Pam3CSK4; and -1.91 with FSL-1; p<0.001). These results were confirmed at protein level. Exogenous IL-19 had an anti-inflammatory effect on HC but not on CD patients. IL-19 decreased TNFα production in PBMC (850.7 ± 75.29 pg/ml vs 2626.0 ± 350 pg/ml; p<0.01) and increased CTLA4 expression (22.04 ± 1.55% vs 13.98 ± 2.05%; p<0.05) and IL-4 production (32.5 ± 8.9 pg/ml vs 13.5 ± 2.9 pg/ml; p<0.05) in T cells from HC. IL-10 regulated IL-19 production in both active CD patients and HC. We observed that three of the miRNAs that can modulate IL-19 mRNA expression, were up-regulated in monocytes from active CD patients. These results suggested that IL-19 had an anti-inflammatory role in this study. Defects in IL-19 expression and the lack of response to this cytokine could contribute to inflammatory mechanisms in active CD patients.
Assuntos
Doença de Crohn/metabolismo , Interleucinas/deficiência , Monócitos/metabolismo , Adulto , Idoso , Antígeno CTLA-4/biossíntese , Antígeno CTLA-4/genética , Células Cultivadas , Doença de Crohn/imunologia , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Interleucina-10/fisiologia , Interleucinas/biossíntese , Interleucinas/sangue , Interleucinas/genética , Interleucinas/farmacologia , Leucócitos Mononucleares/metabolismo , Ativação Linfocitária/efeitos dos fármacos , Masculino , MicroRNAs/genética , Pessoa de Meia-Idade , Proteínas Recombinantes/farmacologia , Células Th2/imunologia , Receptores Toll-Like/genética , Receptores Toll-Like/fisiologia , Fator de Necrose Tumoral alfa/metabolismoRESUMO
Myxovirus A (MxA), a protein encoded by the MX1 gene with antiviral activity, has proven to be a sensitive measure of IFNß bioactivity in multiple sclerosis (MS). However, the use of MxA as a biomarker of IFNß bioactivity has been criticized for the lack of evidence of its role on disease pathogenesis and the clinical response to IFNß. Here, we aimed to identify specific biomarkers of IFNß bioactivity in order to compare their gene expression induction by type I IFNs with the MxA, and to investigate their potential role in MS pathogenesis. Gene expression microarrays were performed in PBMC from MS patients who developed neutralizing antibodies (NAB) to IFNß at 12 and/or 24 months of treatment and patients who remained NAB negative. Nine genes followed patterns in gene expression over time similar to the MX1, which was considered the gold standard gene, and were selected for further experiments: IFI6, IFI27, IFI44L, IFIT1, HERC5, LY6E, RSAD2, SIGLEC1, and USP18. In vitro experiments in PBMC from healthy controls revealed specific induction of selected biomarkers by IFNß but not IFNγ, and several markers, in particular USP18 and HERC5, were shown to be significantly induced at lower IFNß concentrations and more selective than the MX1 as biomarkers of IFNß bioactivity. In addition, USP18 expression was deficient in MS patients compared with healthy controls (pâ=â0.0004). We propose specific biomarkers that may be considered in addition to the MxA to evaluate IFNß bioactivity, and to further explore their implication in MS pathogenesis.
Assuntos
Interferon beta/metabolismo , Esclerose Múltipla/metabolismo , Adulto , Anticorpos Neutralizantes/imunologia , Biomarcadores/metabolismo , Estudos de Casos e Controles , Endopeptidases/genética , Endopeptidases/metabolismo , Feminino , Proteínas de Ligação ao GTP/genética , Proteínas de Ligação ao GTP/metabolismo , Regulação da Expressão Gênica , Humanos , Esclerose Múltipla/genética , Esclerose Múltipla/terapia , Proteínas de Resistência a Myxovirus , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Tempo , Ubiquitina TiolesteraseRESUMO
The detection of genes that show similar profiles under different experimental conditions is often an initial step in inferring the biological significance of such genes. Visualization tools are used to identify genes with similar profiles in microarray studies. Given the large number of genes recorded in microarray experiments, gene expression data are generally displayed on a low dimensional plot, based on linear methods. However, microarray data show nonlinearity, due to high-order terms of interaction between genes, so alternative approaches, such as kernel methods, may be more appropriate. We introduce a technique that combines kernel principal component analysis (KPCA) and Biplot to visualize gene expression profiles. Our approach relies on the singular value decomposition of the input matrix and incorporates an additional step that involves KPCA. The main properties of our method are the extraction of nonlinear features and the preservation of the input variables (genes) in the output display. We apply this algorithm to colon tumor, leukemia and lymphoma datasets. Our approach reveals the underlying structure of the gene expression profiles and provides a more intuitive understanding of the gene and sample association.