Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2023 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-37131792

RESUMO

Gene regulatory networks play a critical role in understanding cell states, gene expression, and biological processes. Here, we investigated the utility of transcription factors (TFs) and microRNAs (miRNAs) in creating a low-dimensional representation of cell states and predicting gene expression across 31 cancer types. We identified 28 clusters of miRNAs and 28 clusters of TFs, demonstrating that they can differentiate tissue of origin. Using a simple SVM classifier, we achieved an average accuracy of 92.8% in tissue classification. We also predicted the entire transcriptome using Tissue-Agnostic and Tissue-Aware models, with average R2 values of 0.45 and 0.70, respectively. Our Tissue-Aware model, using 56 selected features, showed comparable predictive power to the widely-used L1000 genes. However, the model's transportability was impacted by covariate shift, particularly inconsistent microRNA expression across datasets.

2.
Comput Syst Oncol ; 2(2)2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35966389

RESUMO

Cancer progression, including the development of intratumor heterogeneity, is inherently a spatial process. Mathematical models of tumor evolution may be a useful starting point for understanding the patterns of heterogeneity that can emerge in the presence of spatial growth. A commonly studied spatial growth model assumes that tumor cells occupy sites on a lattice and replicate into neighboring sites. Our R package SITH provides a convenient interface for exploring this model. Our efficient simulation algorithm allows for users to generate 3D tumors with millions of cells in under a minute. For visualizing the distribution of mutations throughout the tumor, SITH provides interactive graphics and summary plots. Additionally, SITH can produce synthetic bulk and single-cell DNA-seq datasets by sampling from the simulated tumor. A streamlined API makes SITH a useful tool for investigating the relationship between spatial growth and intratumor heterogeneity. SITH is a part of CRAN and can be installed by running install.packages("SITH") from the R console. See https://CRAN.R-project.org/package=SITH for the user manual and package vignette.

3.
J Comput Biol ; 27(7): 1157-1170, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31794247

RESUMO

The transcriptome of a tumor contains detailed information about the disease. Although advances in sequencing technologies have generated larger data sets, there are still many questions about exactly how the transcriptome is regulated. One class of regulatory elements consists of microRNAs (or miRs), many of which are known to be associated with cancer. To better understand the relationships between miRs and cancers, we analyzed ∼9000 samples from 32 cancer types studied in The Cancer Genome Atlas. Our feature reduction algorithm found evidence for 21 biologically interpretable clusters of miRs, many of which were statistically associated with a specific type of cancer. Moreover, the clusters contain sufficient information to distinguish between most types of cancer. We then used linear models to measure, genome-wide, how much variation in gene expression could be explained by the 21 average expression values ("scores") of the clusters. Based on the ∼20,000 per-gene R2 values, we found that (1) mean differences between tissues of origin explain about 36% of variation; (2) the 21 miR cluster scores explain about 30% of the variation; and (3) combining tissue type with the miR scores explained about 56% of the total genome-wide variation in gene expression. Our analysis of poorly explained genes shows that they are enriched for olfactory receptor processes, sensory perception, and nervous system processing, which are necessary to receive and interpret signals from outside the organism. Therefore, it is reasonable for those genes to be always active and not get downregulated by miRs. In contrast, highly explained genes are characterized by genes translating to proteins necessary for transport, plasma membrane, or metabolic processes that are heavily regulated processes inside the cell. Other genetic regulatory elements such as transcription factors and methylation might help explain some of the remaining variation in gene expression.


Assuntos
Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , Neoplasias/genética , Feminino , Humanos , Aprendizado de Máquina , Família Multigênica
4.
Bull Math Biol ; 81(7): 2052-2073, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31069599

RESUMO

We describe a recent framework for statistical shape analysis of curves and show its applicability to various biological datasets. The presented methods are based on a functional representation of shape called the square-root velocity function and a closely related elastic metric. The main benefit of this approach is its invariance to reparameterization (in addition to the standard shape-preserving transformations of translation, rotation and scale), and ability to compute optimal registrations (point correspondences) across objects. Building upon the defined distance between shapes, we additionally describe tools for computing sample statistics including the mean and covariance. Based on the covariance structure, one can also explore variability in shape samples via principal component analysis. Finally, the estimated mean and covariance can be used to define Wrapped Gaussian models on the shape space, which are easy to sample from. We present multiple case studies on various biological datasets including (1) leaf outlines, (2) internal carotid arteries, (3) Diffusion Tensor Magnetic Resonance Imaging fiber tracts, (4) Glioblastoma Multiforme tumors, and (5) vertebrae in mice. We additionally provide a MATLAB package that can be used to produce the results given in this manuscript.


Assuntos
Modelos Biológicos , Modelos Estatísticos , Animais , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Artéria Carótida Interna/anatomia & histologia , Artéria Carótida Interna/diagnóstico por imagem , Simulação por Computador , Bases de Dados Factuais/estatística & dados numéricos , Imagem de Tensor de Difusão/estatística & dados numéricos , Elasticidade , Glioblastoma/diagnóstico por imagem , Glioblastoma/patologia , Humanos , Interpretação de Imagem Assistida por Computador , Conceitos Matemáticos , Camundongos , Modelos Anatômicos , Distribuição Normal , Reconhecimento Automatizado de Padrão , Folhas de Planta/anatomia & histologia , Análise de Componente Principal , Software , Coluna Vertebral/anatomia & histologia
5.
J Biomed Semantics ; 6: 31, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26185615

RESUMO

BACKGROUND: Large quantities of biomedical data are being produced at a rapid pace for a variety of organisms. With ontologies proliferating, data is increasingly being stored using the RDF data model and queried using RDF based querying languages. While existing systems facilitate the querying in various ways, the scientist must map the question in his or her mind to the interface used by the systems. The field of natural language processing has long investigated the challenges of designing natural language based retrieval systems. Recent efforts seek to bring the ability to pose natural language questions to RDF data querying systems while leveraging the associated ontologies. These analyze the input question and extract triples (subject, relationship, object), if possible, mapping them to RDF triples in the data. However, in the biomedical context, relationships between entities are not always explicit in the question and these are often complex involving many intermediate concepts. RESULTS: We present a new framework, OntoNLQA, for querying RDF data annotated using ontologies which allows posing questions in natural language. OntoNLQA offers five steps in order to answer natural language questions. In comparison to previous systems, OntoNLQA differs in how some of the methods are realized. In particular, it introduces a novel approach for discovering the sophisticated semantic associations that may exist between the key terms of a natural language question, in order to build an intuitive query and retrieve precise answers. We apply this framework to the context of parasite immunology data, leading to a system called AskCuebee that allows parasitologists to pose genomic, proteomic and pathway questions in natural language related to the parasite, Trypanosoma cruzi. We separately evaluate the accuracy of each component of OntoNLQA as implemented in AskCuebee and the accuracy of the whole system. AskCuebee answers 68 % of the questions in a corpus of 125 questions, and 60 % of the questions in a new previously unseen corpus. If we allow simple corrections by the scientists, this proportion increases to 92 %. CONCLUSIONS: We introduce a novel framework for question answering and apply it to parasite immunology data. Evaluations of translating the questions to RDF triple queries by combining machine learning, lexical similarity matching with ontology classes, properties and instances for specificity, and discovering associations between them demonstrate that the approach performs well and improves on previous systems. Subsequently, OntoNLQA offers a viable framework for building question answering systems in other biomedical domains.

6.
PLoS Negl Trop Dis ; 6(1): e1458, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22272365

RESUMO

BACKGROUND: Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. METHODOLOGY/PRINCIPAL FINDINGS: We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. CONCLUSION/SIGNIFICANCE: The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.


Assuntos
Classificação/métodos , Bases de Dados de Proteínas , Proteínas de Protozoários/metabolismo , Semântica , Terminologia como Assunto , Trypanosoma cruzi/metabolismo , Sistemas de Gerenciamento de Base de Dados , Regulação da Expressão Gênica , Parasitologia , Proteínas de Protozoários/genética , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...