Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Aging (Albany NY) ; 13(3): 3313-3341, 2021 02 11.
Artículo en Inglés | MEDLINE | ID: mdl-33611312

RESUMEN

By combining transcriptomic data with other data sources, inferences can be made about functional changes during ageing. Thus, we conducted a meta-analysis on 127 publicly available microarray and RNA-Seq datasets from mice, rats and humans, identifying a transcriptomic signature of ageing across species and tissues. Analyses on subsets of these datasets produced transcriptomic signatures of ageing for brain, heart and muscle. We then applied enrichment analysis and machine learning to functionally describe these signatures, revealing overexpression of immune and stress response genes and underexpression of metabolic and developmental genes. Further analyses revealed little overlap between genes differentially expressed with age in different tissues, despite ageing differentially expressed genes typically being widely expressed across tissues. Additionally we show that the ageing gene expression signatures (particularly the overexpressed signatures) of the whole meta-analysis, brain and muscle tend to include genes that are central in protein-protein interaction networks. We also show that genes underexpressed with age in the brain are highly central in a co-expression network, suggesting that underexpression of these genes may have broad phenotypic consequences. In sum, we show numerous functional similarities between the ageing transcriptomes of these important tissues, along with unique network properties of genes differentially expressed with age in both a protein-protein interaction and co-expression networks.


Asunto(s)
Envejecimiento/genética , Genómica/métodos , Especificidad de Órganos/genética , Transcriptoma/genética , Animales , Humanos , Aprendizaje Automático , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Mapeo de Interacción de Proteínas , Ratas
2.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2230-2238, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-32324561

RESUMEN

Understanding the ageing process is a very challenging problem for biologists. To help in this task, there has been a growing use of classification methods (from machine learning) to learn models that predict whether a gene influences the process of ageing or promotes longevity. One type of predictive feature often used for learning such classification models is Protein-Protein Interaction (PPI) features. One important property of PPI features is their uncertainty, i.e., a given feature (PPI annotation) is often associated with a confidence score, which is usually ignored by conventional classification methods. Hence, we propose the Lazy Feature Selection for Uncertain Features (LFSUF) method, which is tailored for coping with the uncertainty in PPI confidence scores. In addition, following the lazy learning paradigm, LFSUF selects features for each instance to be classified, making the feature selection process more flexible. We show that our LFSUF method achieves better predictive accuracy when compared to other feature selection methods that either do not explicitly take PPI confidence scores into account or deal with uncertainty globally rather than using a per-instance approach. Also, we interpret the results of the classification process using the features selected by LFSUF, showing that the number of selected features is significantly reduced, assisting the interpretability of the results. The datasets used in the experiments and the program code of the LFSUF method are freely available on the web at http://github.com/pablonsilva/FSforUncertainFeatureSpaces.


Asunto(s)
Envejecimiento/genética , Biología Computacional/métodos , Aprendizaje Automático , Algoritmos , Animales , Drosophila melanogaster/genética , Genoma Humano/genética , Humanos , Ratones , Mapas de Interacción de Proteínas/genética , Incertidumbre , Levaduras/genética
3.
Brief Bioinform ; 21(3): 803-814, 2020 05 21.
Artículo en Inglés | MEDLINE | ID: mdl-30895300

RESUMEN

Biologists very often use enrichment methods based on statistical hypothesis tests to identify gene properties that are significantly over-represented in a given set of genes of interest, by comparison with a 'background' set of genes. These enrichment methods, although based on rigorous statistical foundations, are not always the best single option to identify patterns in biological data. In many cases, one can also use classification algorithms from the machine-learning field. Unlike enrichment methods, classification algorithms are designed to maximize measures of predictive performance and are capable of analysing combinations of gene properties, instead of one property at a time. In practice, however, the majority of studies use either enrichment or classification methods (rather than both), and there is a lack of literature discussing the pros and cons of both types of method. The goal of this paper is to compare and contrast enrichment and classification methods, offering two contributions. First, we discuss the (to some extent complementary) advantages and disadvantages of both types of methods for identifying gene properties that discriminate between gene classes. Second, we provide a set of high-level recommendations for using enrichment and classification methods. Overall, by highlighting the strengths and the weaknesses of both types of methods we argue that both should be used in bioinformatics analyses.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , Algoritmos
4.
Bioinformatics ; 36(7): 2202-2208, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31845988

RESUMEN

MOTIVATION: One way to identify genes possibly associated with ageing is to build a classification model (from the machine learning field) capable of classifying genes as associated with multiple age-related diseases. To build this model, we use a pre-compiled list of human genes associated with age-related diseases and apply a novel Deep Neural Network (DNN) method to find associations between gene descriptors (e.g. Gene Ontology terms, protein-protein interaction data and biological pathway information) and age-related diseases. RESULTS: The novelty of our new DNN method is its modular architecture, which has the capability of combining several sources of biological data to predict which ageing-related diseases a gene is associated with (if any). Our DNN method achieves better predictive performance than standard DNN approaches, a Gradient Boosted Tree classifier (a strong baseline method) and a Logistic Regression classifier. Given the DNN model produced by our method, we use two approaches to identify human genes that are not known to be associated with age-related diseases according to our dataset. First, we investigate genes that are close to other disease-associated genes in a complex multi-dimensional feature space learned by the DNN algorithm. Second, using the class label probabilities output by our DNN approach, we identify genes with a high probability of being associated with age-related diseases according to the model. We provide evidence of these putative associations retrieved from the DNN model with literature support. AVAILABILITY AND IMPLEMENTATION: The source code and datasets can be found at: https://github.com/fabiofabris/Bioinfo2019. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Aprendizaje Automático , Envejecimiento , Ontología de Genes , Humanos , Redes Neurales de la Computación
5.
Genome Biol ; 20(1): 244, 2019 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-31744546

RESUMEN

BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


Asunto(s)
Anotación de Secuencia Molecular/tendencias , Animales , Biopelículas , Candida albicans/genética , Drosophila melanogaster/genética , Genoma Bacteriano , Genoma Fúngico , Humanos , Locomoción , Memoria a Largo Plazo , Anotación de Secuencia Molecular/métodos , Pseudomonas aeruginosa/genética
6.
Bioinformatics ; 34(14): 2449-2456, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29462247

RESUMEN

Motivation: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. Results: The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. Availability and implementation: The dataset and source codes used in this paper are available as 'Supplementary Material' and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Envejecimiento/genética , Encéfalo/metabolismo , Biología Computacional/métodos , Regulación de la Expresión Génica , Programas Informáticos , Animales , Ontología de Genes , Humanos , Aprendizaje Automático
7.
Biogerontology ; 18(2): 171-188, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28265788

RESUMEN

Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.


Asunto(s)
Envejecimiento/fisiología , Biología Computacional/métodos , Modelos Biológicos , Proyectos de Investigación , Aprendizaje Automático Supervisado , Animales , Simulación por Computador , Humanos
8.
Bioinformatics ; 32(19): 2988-95, 2016 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-27318209

RESUMEN

MOTIVATION: The incidence of ageing-related diseases has been constantly increasing in the last decades, raising the need for creating effective methods to analyze ageing-related protein data. These methods should have high predictive accuracy and be easily interpretable by ageing experts. To enable this, one needs interpretable classification models (supervised machine learning) and features with rich biological meaning. In this paper we propose two interpretable feature types based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and compare them with traditional feature types in hierarchical classification (a more challenging classification task regarding predictive performance) and binary classification (a classification task producing easier to interpret classification models). As far as we know, this work is the first to: (i) explore the potential of the KEGG pathway data in the hierarchical classification setting, (i) use the graph structure of KEGG pathways to create a feature type that quantifies the influence of a current protein on another specific protein within a KEGG pathway graph and (iii) propose a method for interpreting the classification models induced using KEGG features. RESULTS: We performed tests measuring predictive accuracy considering hierarchical and binary class labels extracted from the Mouse Phenotype Ontology. One of the KEGG feature types leads to the highest predictive accuracy among five individual feature types across three hierarchical classification algorithms. Additionally, the combination of the two KEGG feature types proposed in this work results in one of the best predictive accuracies when using the binary class version of our datasets, at the same time enabling the extraction of knowledge from ageing-related data using quantitative influence information. AVAILABILITY AND IMPLEMENTATION: The datasets created in this paper will be freely available after publication. CONTACT: ff79@kent.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Envejecimiento , Genoma , Proteínas , Algoritmos , Animales , Ratones , Fenotipo
9.
IEEE/ACM Trans Comput Biol Bioinform ; 13(6): 1045-1058, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26661786

RESUMEN

This study comprehensively evaluates the performance of five types of probabilistic hierarchical classification methods used for predicting Gene Ontology (GO) terms related to ageing. Of those tested, a new hybrid of a Local Hierarchical Classifier (LHC) and the Predictive Clustering Tree algorithm (LHC-PCT) had the best predictive accuracy results. We also tested the impact of two types of variations in most hierarchical classification algorithms, namely: (a) changing the base algorithm (we tested Naive Bayes and Support Vector Machines), and the impact of (b) using or not the Correlation based Feature Selection (CFS) algorithm in a pre-processing step. In total, we evaluated the predictive performance of 17 variations of hierarchical classifiers across 15 datasets of ageing and longevity-related genes. We conclude that the LHC-PCT algorithm ranks better across several tests (seven out of 12). In addition, we interpreted the models generated by the PCT algorithm to show how hierarchical classification algorithms can be used to extract biological insights out of the ageing-related datasets that we compiled.


Asunto(s)
Envejecimiento/genética , Perfilación de la Expresión Génica/métodos , Modelos Genéticos , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteoma/genética , Algoritmos , Simulación por Computador , Minería de Datos/métodos , Bases de Datos Genéticas , Humanos , Aprendizaje Automático
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...