Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Nat Cancer ; 5(2): 299-314, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38253803

RESUMO

Contemporary analyses focused on a limited number of clinical and molecular biomarkers have been unable to accurately predict clinical outcomes in pancreatic ductal adenocarcinoma. Here we describe a precision medicine platform known as the Molecular Twin consisting of advanced machine-learning models and use it to analyze a dataset of 6,363 clinical and multi-omic molecular features from patients with resected pancreatic ductal adenocarcinoma to accurately predict disease survival (DS). We show that a full multi-omic model predicts DS with the highest accuracy and that plasma protein is the top single-omic predictor of DS. A parsimonious model learning only 589 multi-omic features demonstrated similar predictive performance as the full multi-omic model. Our platform enables discovery of parsimonious biomarker panels and performance assessment of outcome prediction models learning from resource-intensive panels. This approach has considerable potential to impact clinical care and democratize precision cancer medicine worldwide.


Assuntos
Adenocarcinoma , Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Adenocarcinoma/genética , Adenocarcinoma/cirurgia , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/cirurgia , Multiômica , Inteligência Artificial , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/cirurgia , Inteligência
2.
J Comput Biol ; 24(6): 547-557, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-27828710

RESUMO

Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 - 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Simulação por Computador , Humanos , Probabilidade , Software
3.
Algorithms Mol Biol ; 9: 14, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24868242

RESUMO

Chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The ensemble of domains we identify allows us to quantify the degree to which the domain structure is hierarchical as opposed to overlapping, and our analysis reveals a pronounced hierarchical structure in which larger stable domains tend to completely contain smaller domains. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone marks at the boundaries.

4.
Algorithms Mol Biol ; 8(1): 8, 2013 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-23497444

RESUMO

BACKGROUND: Chromosome structure is closely related to its function and Chromosome Conformation Capture (3C) is a widely used technique for exploring spatial properties of chromosomes. 3C interaction frequencies are usually associated with spatial distances. However, the raw data from 3C experiments is an aggregation of interactions from many cells, and the spatial distances of any given interaction are uncertain. RESULTS: We introduce a new method for filtering 3C interactions that selects subsets of interactions that obey metric constraints of various strictness. We demonstrate that, although the problem is computationally hard, near-optimal results are often attainable in practice using well-designed heuristics and approximation algorithms. Further, we show that, compared with a standard technique, this metric filtering approach leads to (a) subgraphs with higher statistical significance, (b) lower embedding error, (c) lower sensitivity to initial conditions of the embedding algorithm, and (d) structures with better agreement with light microscopy measurements. Our filtering scheme is applicable for a strict frequency-to-distance mapping and a more relaxed mapping from frequency to a range of distances. CONCLUSIONS: Our filtering method for 3C data considers both metric consistency and statistical confidence simultaneously resulting in lower-error embeddings that are biologically more plausible.

5.
BMC Bioinformatics ; 13: 276, 2012 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-23102108

RESUMO

BACKGROUND: Clustering has become a standard analysis for many types of biological data (e.g interaction networks, gene expression, metagenomic abundance). In practice, it is possible to obtain a large number of contradictory clusterings by varying which clustering algorithm is used, which data attributes are considered, how algorithmic parameters are set, and which near-optimal clusterings are chosen. It is a difficult task to sift though such a large collection of varied clusterings to determine which clustering features are affected by parameter settings or are artifacts of particular algorithms and which represent meaningful patterns. Knowing which items are often clustered together helps to improve our understanding of the underlying data and to increase our confidence about generated modules. RESULTS: We present Coral, an application for interactive exploration of large ensembles of clusterings. Coral makes all-to-all clustering comparison easy, supports exploration of individual clusterings, allows tracking modules across clusterings, and supports identification of core and peripheral items in modules. We discuss how each visual component in Coral tackles a specific question related to clustering comparison and provide examples of their use. We also show how Coral could be used to visually and quantitatively compare clusterings with a ground truth clustering. CONCLUSION: As a case study, we compare clusterings of a recently published protein interaction network of Arabidopsis thaliana. We use several popular algorithms to generate the network's clusterings. We find that the clusterings vary significantly and that few proteins are consistently co-clustered in all clusterings. This is evidence that several clusterings should typically be considered when evaluating modules of genes, proteins, or sequences, and Coral can be used to perform a comprehensive analysis of these clustering ensembles.


Assuntos
Análise por Conglomerados , Gráficos por Computador , Software , Algoritmos , Arabidopsis/metabolismo , Expressão Gênica , Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Proteínas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA