Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Nucleic Acids Res ; 51(7): e38, 2023 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-36762475

RESUMEN

Inference of global gene regulatory networks from omics data is a long-term goal of systems biology. Most methods developed for inferring transcription factor (TF)-gene interactions either relied on a small dataset or used snapshot data which is not suitable for inferring a process that is inherently temporal. Here, we developed a new computational method that combines neural networks and multi-task learning to predict RNA velocity rather than gene expression values. This allows our method to overcome many of the problems faced by prior methods leading to more accurate and more comprehensive set of identified regulatory interactions. Application of our method to atlas scale single cell data from 6 HuBMAP tissues led to several validated and novel predictions and greatly improved on prior methods proposed for this task.


Asunto(s)
Biología Computacional , Algoritmos , Redes Reguladoras de Genes , Biología de Sistemas , Análisis de la Célula Individual , Atlas como Asunto
2.
Bioinformatics ; 39(39 Suppl 1): i140-i148, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387167

RESUMEN

MOTIVATION: Spatial proteomics data have been used to map cell states and improve our understanding of tissue organization. More recently, these methods have been extended to study the impact of such organization on disease progression and patient survival. However, to date, the majority of supervised learning methods utilizing these data types did not take full advantage of the spatial information, impacting their performance and utilization. RESULTS: Taking inspiration from ecology and epidemiology, we developed novel spatial feature extraction methods for use with spatial proteomics data. We used these features to learn prediction models for cancer patient survival. As we show, using the spatial features led to consistent improvement over prior methods that used the spatial proteomics data for the same task. In addition, feature importance analysis revealed new insights about the cell interactions that contribute to patient survival. AVAILABILITY AND IMPLEMENTATION: The code for this work can be found at gitlab.com/enable-medicine-public/spatsurv.


Asunto(s)
Neoplasias , Proteómica , Humanos , Neoplasias/diagnóstico por imagen , Comunicación Celular , Progresión de la Enfermedad , Análisis de Supervivencia
3.
PLoS Comput Biol ; 18(9): e1010468, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36095011

RESUMEN

Studies comparing single cell RNA-Seq (scRNA-Seq) data between conditions mainly focus on differences in the proportion of cell types or on differentially expressed genes. In many cases these differences are driven by changes in cell interactions which are challenging to infer without spatial information. To determine cell-cell interactions that differ between conditions we developed the Cell Interaction Network Inference (CINS) pipeline. CINS combines Bayesian network analysis with regression-based modeling to identify differential cell type interactions and the proteins that underlie them. We tested CINS on a disease case control and on an aging mouse dataset. In both cases CINS correctly identifies cell type interactions and the ligands involved in these interactions improving on prior methods suggested for cell interaction predictions. We performed additional mouse aging scRNA-Seq experiments which further support the interactions identified by CINS.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Animales , Teorema de Bayes , Comunicación Celular , Perfilación de la Expresión Génica/métodos , Ligandos , Ratones , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
4.
PLoS Comput Biol ; 15(2): e1006730, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30742607

RESUMEN

Prediction of response to specific cancer treatments is complicated by significant heterogeneity between tumors in terms of mutational profiles, gene expression, and clinical measures. Here we focus on the response of Estrogen Receptor (ER)+ post-menopausal breast cancer tumors to aromatase inhibitors (AI). We use a network smoothing algorithm to learn novel features that integrate several types of high throughput data and new cell line experiments. These features greatly improve the ability to predict response to AI when compared to prior methods. For a subset of the patients, for which we obtained more detailed clinical information, we can further predict response to a specific AI drug.


Asunto(s)
Biología Computacional/métodos , Pruebas Genéticas/métodos , Algoritmos , Inhibidores de la Aromatasa/farmacología , Neoplasias de la Mama/genética , Línea Celular Tumoral , Resistencia a Antineoplásicos/efectos de los fármacos , Femenino , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Humanos , Redes Neurales de la Computación , Receptores de Estrógenos/genética
5.
Genes Chromosomes Cancer ; 58(1): 34-42, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30285311

RESUMEN

In the tumor microenvironment, immune cells have emerged as key regulators of cancer progression. While much work has focused on characterizing tumor-related immune cells through gene expression profiling, microRNAs (miRNAs) have also been reported to regulate immune cells in the tumor microenvironment. Using regression-based computational methods, we have constructed for the first time, immune cell signatures based on miRNA expression from The Cancer Genome Atlas breast and ovarian cancer datasets. Combined with existing mRNA immune cell signatures, the integrated mRNA-miRNA leukocyte signatures are better able to delineate prognostic immune cell subsets within both cancers compared to the mRNA or miRNA signatures alone. Moreover, using the miRNA signatures, the anti-inflammatory M2 macrophages emerged as the most significantly prognostic cell type in the breast cancer data (HR [hazard ratio]: 12.9; CI [confidence interval]: 3.09-52.9; P = 4.22E-4), whereas the pro-inflammatory M1 macrophages emerged as the most prognostic immune cell type in the ovarian cancer data (HR: 0.2; CI: 0.04-0.56, P = 5.02E-3). These results suggest that our integrated miRNA and mRNA leukocyte signatures could be used to better delineate prognostic leukocyte subsets within cancers, whereas continued investigation may further support the regulatory relationships predicted between the miRNAs and immune cells found within our signature matrices.


Asunto(s)
Neoplasias de la Mama/inmunología , MicroARNs/inmunología , Neoplasias Ováricas/inmunología , ARN Mensajero/inmunología , Biomarcadores de Tumor/inmunología , Mama/inmunología , Mama/patología , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Biología Computacional , Femenino , Regulación Neoplásica de la Expresión Génica/inmunología , Humanos , Células Asesinas Naturales/inmunología , MicroARNs/genética , Neoplasias Ováricas/genética , Neoplasias Ováricas/patología , Pronóstico , ARN Mensajero/genética , Linfocitos T/inmunología , Microambiente Tumoral/inmunología
6.
BMC Cancer ; 19(1): 370, 2019 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-31014259

RESUMEN

BACKGROUND: Most methods that integrate network and mutation data to study cancer focus on the effects of genes/proteins, quantifying the effect of mutations or differential expression of a gene and its neighbors, or identifying groups of genes that are significantly up- or down-regulated. However, several mutations are known to disrupt specific protein-protein interactions, and network dynamics are often ignored by such methods. Here we introduce a method that allows for predicting the disruption of specific interactions in cancer patients using somatic mutation data and protein interaction networks. METHODS: We extend standard network smoothing techniques to assign scores to the edges in a protein interaction network in addition to nodes. We use somatic mutations as input to our modified network smoothing method, producing scores that quantify the proximity of each edge to somatic mutations in individual samples. RESULTS: Using breast cancer mutation data, we show that predicted edges are significantly associated with patient survival and known ligand binding site mutations. In-silico analysis of protein binding further supports the ability of the method to infer novel disrupted interactions and provides a mechanistic explanation for the impact of mutations on key pathways. CONCLUSIONS: Our results show the utility of our method both in identifying disruptions of protein interactions from known ligand binding site mutations, and in selecting novel clinically significant interactions. Supporting website with software and data: https://www.cs.cmu.edu/~mruffalo/mut-edge-disrupt/ .


Asunto(s)
Algoritmos , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Neoplasias de la Mama/patología , Biología Computacional/métodos , Mutación , Mapas de Interacción de Proteínas , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Redes Reguladoras de Genes , Humanos , Pronóstico , Transducción de Señal , Programas Informáticos
7.
Bioinformatics ; 32(17): i746-i754, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587697

RESUMEN

MOTIVATION: Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. RESULTS: To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. AVAILABILITY AND IMPLEMENTATION: Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ CONTACT: zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Aprendizaje Automático , MicroARNs , Factores de Transcripción , Perfilación de la Expresión Génica , Humanos , ARN Mensajero
8.
PLoS Comput Biol ; 11(12): e1004595, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26683094

RESUMEN

Development of high-throughput monitoring technologies enables interrogation of cancer samples at various levels of cellular activity. Capitalizing on these developments, various public efforts such as The Cancer Genome Atlas (TCGA) generate disparate omic data for large patient cohorts. As demonstrated by recent studies, these heterogeneous data sources provide the opportunity to gain insights into the molecular changes that drive cancer pathogenesis and progression. However, these insights are limited by the vast search space and as a result low statistical power to make new discoveries. In this paper, we propose methods for integrating disparate omic data using molecular interaction networks, with a view to gaining mechanistic insights into the relationship between molecular changes at different levels of cellular activity. Namely, we hypothesize that genes that play a role in cancer development and progression may be implicated by neither frequent mutation nor differential expression, and that network-based integration of mutation and differential expression data can reveal these "silent players". For this purpose, we utilize network-propagation algorithms to simulate the information flow in the cell at a sample-specific resolution. We then use the propagated mutation and expression signals to identify genes that are not necessarily mutated or differentially expressed genes, but have an essential role in tumor development and patient outcome. We test the proposed method on breast cancer and glioblastoma multiforme data obtained from TCGA. Our results show that the proposed method can identify important proteins that are not readily revealed by molecular data, providing insights beyond what can be gleaned by analyzing different types of molecular data in isolation.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genes Relacionados con las Neoplasias/genética , Genómica/métodos , Proteínas de Neoplasias/genética , Neoplasias/genética , Mutación Silenciosa/genética , Algoritmos , Mapeo Cromosómico/métodos , Minería de Datos/métodos , Bases de Datos Genéticas , Estudios de Asociación Genética/métodos , Marcadores Genéticos/genética , Humanos , Transducción de Señal/genética
9.
J Biomed Inform ; 58: 104-113, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26453823

RESUMEN

PURPOSE: To date the standard nosology and prognostic schemes for myeloid neoplasms have been based on morphologic and cytogenetic criteria. We sought to test the hypothesis that a comprehensive, unbiased analysis of somatic mutations may allow for an improved classification of these diseases to predict outcome (overall survival). EXPERIMENTAL DESIGN: We performed whole-exome sequencing (WES) of 274 myeloid neoplasms, including myelodysplastic syndrome (MDS, N=75), myelodysplastic/myeloproliferative neoplasia (MDS/MPN, N=33), and acute myeloid leukemia (AML, N=22), augmenting the resulting mutational data with public WES results from AML (N=144). We fit random survival forests (RSFs) to the patient survival and clinical/cytogenetic data, with and without gene mutation information, to build prognostic classifiers. A targeted sequencing assay was used to sequence predictor genes in an independent cohort of 507 patients, whose accompanying data were used to evaluate performance of the risk classifiers. RESULTS: We show that gene mutations modify the impact of standard clinical variables on patient outcome, and therefore their incorporation hones the accuracy of prediction. The mutation-based classification scheme robustly predicted patient outcome in the validation set (log rank P=6.77 × 10(-21); poor prognosis vs. good prognosis categories HR 10.4, 95% CI 3.21-33.6). The RSF-based approach also compares favorably with recently-published efforts to incorporate mutational information for MDS prognosis. CONCLUSION: The results presented here support the inclusion of mutational information in prognostic classification of myeloid malignancies. Our classification scheme is implemented in a publicly available web-based tool (http://myeloid-risk. CASE: edu/).


Asunto(s)
Neoplasias de la Médula Ósea/genética , Exoma , Neoplasias de la Médula Ósea/clasificación , Neoplasias de la Médula Ósea/fisiopatología , Estudios de Cohortes , Pronóstico
10.
bioRxiv ; 2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38826261

RESUMEN

The Human BioMolecular Atlas Program (HuBMAP) aims to construct a reference 3D structural, cellular, and molecular atlas of the healthy adult human body. The HuBMAP Data Portal (https://portal.hubmapconsortium.org) serves experimental datasets and supports data processing, search, filtering, and visualization. The Human Reference Atlas (HRA) Portal (https://humanatlas.io) provides open access to atlas data, code, procedures, and instructional materials. Experts from more than 20 consortia are collaborating to construct the HRA's Common Coordinate Framework (CCF), knowledge graphs, and tools that describe the multiscale structure of the human body (from organs and tissues down to cells, genes, and biomarkers) and to use the HRA to understand changes that occur at each of these levels with aging, disease, and other perturbations. The 6th release of the HRA v2.0 covers 36 organs with 4,499 unique anatomical structures, 1,195 cell types, and 2,089 biomarkers (e.g., genes, proteins, lipids) linked to ontologies. In addition, three workflows were developed to map new experimental data into the HRA's CCF. This paper describes the HRA user stories, terminology, data formats, ontology validation, unified analysis workflows, user interfaces, instructional materials, application programming interface (APIs), flexible hybrid cloud infrastructure, and demonstrates first atlas usage applications and previews.

11.
Bioinformatics ; 28(18): i349-i355, 2012 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-22962451

RESUMEN

MOTIVATION: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment-in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. APPROACH: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. RESULTS: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can 'resurrect' many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. AVAILABILITY: LoQuM is available as open source at http://compbio.case.edu/loqum/. CONTACT: matthew.ruffalo@case.edu.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Inteligencia Artificial , Mapeo Cromosómico , Genoma Humano , Humanos , Modelos Logísticos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
12.
Bioinformatics ; 27(20): 2790-6, 2011 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-21856737

RESUMEN

MOTIVATION: The advent of next-generation sequencing (NGS) techniques presents many novel opportunities for many applications in life sciences. The vast number of short reads produced by these techniques, however, pose significant computational challenges. The first step in many types of genomic analysis is the mapping of short reads to a reference genome, and several groups have developed dedicated algorithms and software packages to perform this function. As the developers of these packages optimize their algorithms with respect to various considerations, the relative merits of different software packages remain unclear. However, for scientists who generate and use NGS data for their specific research projects, an important consideration is choosing the software that is most suitable for their application. RESULTS: With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage. We also develop criteria to compare the performances of software with disparate output structure (e.g. some packages return a single alignment while some return multiple possible alignments). Using these criteria, we comprehensively evaluate the performances of Bowtie, BWA, mr- and mrsFAST, Novoalign, SHRiMP and SOAPv2, with regard to accuracy and runtime. CONCLUSION: We expect that the results presented here will be useful to investigators in choosing the alignment software that is most suitable for their specific research aims. Our results also provide insights into the factors that should be considered to use alignment results effectively. Seal can also be used to evaluate the performance of algorithms that use deep sequencing data for various purposes (e.g. identification of genomic variants). AVAILABILITY: Seal is available as open source at http://compbio.case.edu/seal/. CONTACT: matthew.ruffalo@case.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Alineación de Secuencia/métodos , Genoma Humano , Genómica/métodos , Humanos , Mutación INDEL , Programas Informáticos
13.
Nat Commun ; 9(1): 4768, 2018 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-30425249

RESUMEN

Single cell RNA-Seq (scRNA-seq) studies profile thousands of cells in heterogeneous environments. Current methods for characterizing cells perform unsupervised analysis followed by assignment using a small set of known marker genes. Such approaches are limited to a few, well characterized cell types. We developed an automated pipeline to download, process, and annotate publicly available scRNA-seq datasets to enable large scale supervised characterization. We extend supervised neural networks to obtain efficient and accurate representations for scRNA-seq data. We apply our pipeline to analyze data from over 500 different studies with over 300 unique cell types and show that supervised methods outperform unsupervised methods for cell type identification. A case study highlights the usefulness of these methods for comparing cell type distributions in healthy and diseased mice. Finally, we present scQuery, a web server which uses our neural networks and fast matching methods to determine cell types, key genes, and more.


Asunto(s)
ARN Citoplasmático Pequeño/análisis , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Animales , Encéfalo , Biología Computacional/métodos , Bases de Datos Genéticas , Regulación de la Expresión Génica , Marcadores Genéticos , Internet , Macrófagos , Ratones , Redes Neurales de la Computación , Mapeo de Interacción de Proteínas
14.
BMC Syst Biol ; 11(1): 96, 2017 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-29017547

RESUMEN

BACKGROUND: Translating in vitro results to clinical tests is a major challenge in systems biology. Here we present a new Multi-Task learning framework which integrates thousands of cell line expression experiments to reconstruct drug specific response networks in cancer. RESULTS: The reconstructed networks correctly identify several shared key proteins and pathways while simultaneously highlighting many cell type specific proteins. We used top proteins from each drug network to predict survival for patients prescribed the drug. CONCLUSIONS: Predictions based on proteins from the in-vitro derived networks significantly outperformed predictions based on known cancer genes indicating that Multi-Task learning can indeed identify accurate drug response networks.


Asunto(s)
Antineoplásicos/farmacología , Biología Computacional/métodos , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Antineoplásicos/uso terapéutico , Neoplasias/genética , Análisis de Supervivencia , Resultado del Tratamiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA