Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Bioinformatics ; 35(14): i154-i163, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510704

RESUMEN

MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical factors can lead to a distribution mismatch between datasets acquired at different times, causing model performance to deteriorate on new data. A common additional obstacle in computational biology is scarce data with many more features than samples. To address these problems, we propose a method for unsupervised domain adaptation that is based on a weighted elastic net. The key idea of our approach is to compare dependencies between inputs in training and test data and to increase the cost of differently behaving features in the elastic net regularization term. In doing so, we encourage the model to assign a higher importance to features that are robust and behave similarly across domains. RESULTS: We evaluate our method both on simulated data with varying degrees of distribution mismatch and on real data, considering the problem of age prediction based on DNA methylation data across multiple tissues. Compared with a non-adaptive standard model, our approach substantially reduces errors on samples with a mismatched distribution. On real data, we achieve far lower errors on cerebellum samples, a tissue which is not part of the training data and poorly predicted by standard models. Our results demonstrate that unsupervised domain adaptation is possible for applications in computational biology, even with many more features than samples. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/PfeiferLabTue/wenda. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Metilación de ADN , Programas Informáticos
2.
BMC Bioinformatics ; 18(1): 370, 2017 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-28814324

RESUMEN

BACKGROUND: Discriminating driver mutations from the ones that play no role in cancer is a severe bottleneck in elucidating molecular mechanisms underlying cancer development. Since protein domains are representatives of functional regions within proteins, mutations on them may disturb the protein functionality. Therefore, studying mutations at domain level may point researchers to more accurate assessment of the functional impact of the mutations. RESULTS: This article presents a comprehensive study to map mutations from 29 cancer types to both sequence- and structure-based domains. Statistical analysis was performed to identify candidate domains in which mutations occur with high statistical significance. For each cancer type, the corresponding type-specific domains were distinguished among all candidate domains. Subsequently, cancer type-specific domains facilitated the identification of specific proteins for each cancer type. Besides, performing interactome analysis on specific proteins of each cancer type showed high levels of interconnectivity among them, which implies their functional relationship. To evaluate the role of mitochondrial genes, stem cell-specific genes and DNA repair genes in cancer development, their mutation frequency was determined via further analysis. CONCLUSIONS: This study has provided researchers with a publicly available data repository for studying both CATH and Pfam domain regions on protein-coding genes. Moreover, the associations between different groups of genes/domains and various cancer types have been clarified. The work is available at http://www.cancerouspdomains.ir .


Asunto(s)
Neoplasias/genética , Proteínas/genética , Reparación del ADN/genética , Bases de Datos Genéticas , Humanos , Internet , Mitocondrias/genética , Mutación , Neoplasias/metabolismo , Neoplasias/patología , Células Madre Neoplásicas/metabolismo , Mapas de Interacción de Proteínas/genética , Proteínas/metabolismo , Interfaz Usuario-Computador
3.
BMC Genomics ; 17: 501, 2016 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-27435615

RESUMEN

BACKGROUND: Molecular measurements from cancer patients such as gene expression and DNA methylation can be influenced by several external factors. This makes it harder to reproduce the exact values of measurements coming from different laboratories. Furthermore, some cancer types are very heterogeneous, meaning that there might be different underlying causes for the same type of cancer among different individuals. If a model does not take potential biases in the data into account, this can lead to problems when trying to predict the stage of a certain cancer type. This is especially true when these biases differ between the training and test set. RESULTS: We introduce a method that can estimate this bias on a per-feature level and incorporate calculated feature confidences into a weighted combination of classifiers with disjoint feature sets. In this way, the method provides a prediction that is adjusted for the potential biases on a per-patient basis, providing a personalized prediction for each test patient. The new method achieves state-of-the-art performance on many different cancer data sets with measured DNA methylation or gene expression. Moreover, we show how to visualize the learned classifiers to display interesting associations with the target label. Applied to a leukemia data set, our method finds several ribosomal proteins associated with the risk group, which might be interesting targets for follow-up studies. This discovery supports the hypothesis that the ribosomes are a new frontier in genadaptivelearninge regulation. CONCLUSION: We introduce a new method for robust prediction of phenotypes from molecular measurements such as DNA methylation or gene expression. Furthermore, the visualization capabilities enable exploratory analysis on the learnt dependencies and pave the way for a personalized prediction of phenotypes. The software is available under GPL2+ from https://github.com/adrinjalali/Network-Classifier/tree/v1.0 .


Asunto(s)
Biología Computacional/métodos , Estudios de Asociación Genética/métodos , Neoplasias/diagnóstico , Neoplasias/genética , Programas Informáticos , Algoritmos , Biomarcadores de Tumor , Metilación de ADN , Bases de Datos de Ácidos Nucleicos , Regulación Neoplásica de la Expresión Génica , Humanos , Reproducibilidad de los Resultados
4.
Bioinformatics ; 31(8): 1337-9, 2015 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-25481008

RESUMEN

MOTIVATION: Finding one or more cell populations of interest, such as those correlating to a specific disease, is critical when analysing flow cytometry data. However, labelling of cell populations is not well defined, making it difficult to integrate the output of algorithms to external knowledge sources. RESULTS: We developed flowCL, a software package that performs semantic labelling of cell populations based on their surface markers and applied it to labelling of the Federation of Clinical Immunology Societies Human Immunology Project Consortium lyoplate populations as a use case. CONCLUSION: By providing automated labelling of cell populations based on their immunophenotype, flowCL allows for unambiguous and reproducible identification of standardized cell types. AVAILABILITY AND IMPLEMENTATION: Code, R script and documentation are available under the Artistic 2.0 license through Bioconductor (http://www.bioconductor.org/packages/devel/bioc/html/flowCL.html). CONTACT: rbrinkman@bccrc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Fenómenos Fisiológicos Celulares , Citometría de Flujo/métodos , Ontología de Genes , Inmunofenotipificación/métodos , Programas Informáticos , Humanos , Antígenos Comunes de Leucocito/análisis , Receptores CCR7/análisis
5.
Bioinformatics ; 30(9): 1329-30, 2014 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24407226

RESUMEN

We present a significantly improved version of the flowType and RchyOptimyx BioConductor-based pipeline that is both 14 times faster and can accommodate multiple levels of biomarker expression for up to 96 markers. With these improvements, the pipeline is positioned to be an integral part of data analysis for high-throughput experiments on high-dimensional single-cell assay platforms, including flow cytometry, mass cytometry and single-cell RT-qPCR.


Asunto(s)
Citometría de Flujo/métodos , Antígenos CD/análisis , Biomarcadores/análisis , Programas Informáticos
6.
Bioinformatics ; 28(7): 1009-16, 2012 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-22383736

RESUMEN

MOTIVATION: Polychromatic flow cytometry (PFC), has enormous power as a tool to dissect complex immune responses (such as those observed in HIV disease) at a single cell level. However, analysis tools are severely lacking. Although high-throughput systems allow rapid data collection from large cohorts, manual data analysis can take months. Moreover, identification of cell populations can be subjective and analysts rarely examine the entirety of the multidimensional dataset (focusing instead on a limited number of subsets, the biology of which has usually already been well-described). Thus, the value of PFC as a discovery tool is largely wasted. RESULTS: To address this problem, we developed a computational approach that automatically reveals all possible cell subsets. From tens of thousands of subsets, those that correlate strongly with clinical outcome are selected and grouped. Within each group, markers that have minimal relevance to the biological outcome are removed, thereby distilling the complex dataset into the simplest, most clinically relevant subsets. This allows complex information from PFC studies to be translated into clinical or resource-poor settings, where multiparametric analysis is less feasible. We demonstrate the utility of this approach in a large (n=466), retrospective, 14-parameter PFC study of early HIV infection, where we identify three T-cell subsets that strongly predict progression to AIDS (only one of which was identified by an initial manual analysis). AVAILABILITY: The 'flowType: Phenotyping Multivariate PFC Assays' package is available through Bioconductor. Additional documentation and examples are available at: www.terryfoxlab.ca/flowsite/flowType/ SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: rbrinkman@bccrc.ca.


Asunto(s)
Biología Computacional/métodos , Citometría de Flujo , Infecciones por VIH/inmunología , Subgrupos de Linfocitos T/inmunología , Biomarcadores/análisis , Humanos , Inmunofenotipificación/métodos , Valor Predictivo de las Pruebas , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Subgrupos de Linfocitos T/citología
7.
Cytometry A ; 81(12): 1022-30, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23044634

RESUMEN

Analysis of high-dimensional flow cytometry datasets can reveal novel cell populations with poorly understood biology. Following discovery, characterization of these populations in terms of the critical markers involved is an important step, as this can help to both better understand the biology of these populations and aid in designing simpler marker panels to identify them on simpler instruments and with fewer reagents (i.e., in resource poor or highly regulated clinical settings). However, current tools to design panels based on the biological characteristics of the target cell populations work exclusively based on technical parameters (e.g., instrument configurations, spectral overlap, and reagent availability). To address this shortcoming, we developed RchyOptimyx (cellular hieraRCHY OPTIMization), a computational tool that constructs cellular hierarchies by combining automated gating with dynamic programming and graph theory to provide the best gating strategies to identify a target population to a desired level of purity or correlation with a clinical outcome, using the simplest possible marker panels. RchyOptimyx can assess and graphically present the trade-offs between marker choice and population specificity in high-dimensional flow or mass cytometry datasets. We present three proof-of-concept use cases for RchyOptimyx that involve 1) designing a panel of surface markers for identification of rare populations that are primarily characterized using their intracellular signature; 2) simplifying the gating strategy for identification of a target cell population; 3) identification of a non-redundant marker set to identify a target cell population.


Asunto(s)
Células de la Médula Ósea/citología , Citometría de Flujo/métodos , Programas Informáticos , Algoritmos , Antígenos CD/análisis , Antígenos CD/inmunología , Biomarcadores/análisis , Células de la Médula Ósea/inmunología , Biología Computacional/métodos , Infecciones por VIH/inmunología , Humanos , Inmunofenotipificación/métodos , Interleucina-7/inmunología , Lipopolisacáridos/inmunología , Fenotipo , Coloración y Etiquetado , Linfocitos T/citología , Linfocitos T/inmunología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA