Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nature ; 630(8015): 181-188, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38778098

RESUMO

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1-3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision-language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.


Assuntos
Conjuntos de Dados como Assunto , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Patologia Clínica , Humanos , Benchmarking , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/classificação , Neoplasias/diagnóstico , Neoplasias/patologia , Patologia Clínica/métodos , Masculino , Feminino
2.
Patterns (N Y) ; 4(4): 100726, 2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37123439

RESUMO

Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep-learning methods attain test area under the receiver operating characteristic curve (AUROC) values of 94%-99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Ablation results demonstrate the superiority of these advanced deep-learning methods. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels.

3.
PLoS Comput Biol ; 15(6): e1006758, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31246951

RESUMO

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Redes Reguladoras de Genes/genética , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Bases de Dados Genéticas , HIV , Infecções por HIV/genética , Infecções por HIV/virologia , Humanos
4.
Nat Genet ; 49(9): 1319-1325, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28783162

RESUMO

In this study, we used insurance claims for over one-third of the entire US population to create a subset of 128,989 families (481,657 unique individuals). We then used these data to (i) estimate the heritability and familial environmental patterns of 149 diseases and (ii) infer the genetic and environmental correlations for disease pairs from a set of 29 complex diseases. The majority (52 of 65) of our study's heritability estimates matched earlier reports, and 84 of our estimates appear to have been obtained for the first time. We used correlation matrices to compute environmental and genetic disease classifications and corresponding reliability measures. Among unexpected observations, we found that migraine, typically classified as a disease of the central nervous system, appeared to be most genetically similar to irritable bowel syndrome and most environmentally similar to cystitis and urethritis, all of which are inflammatory diseases.


Assuntos
Doença/genética , Meio Ambiente , Predisposição Genética para Doença/genética , Formulário de Reclamação de Seguro/estatística & dados numéricos , Cistite/classificação , Cistite/genética , Doença/classificação , Feminino , Humanos , Inflamação/classificação , Inflamação/genética , Padrões de Herança/genética , Síndrome do Intestino Irritável/classificação , Síndrome do Intestino Irritável/genética , Modelos Lineares , Masculino , Transtornos de Enxaqueca/classificação , Transtornos de Enxaqueca/genética , Análise Multivariada , Linhagem , Fatores de Risco , Estados Unidos , Uretrite/classificação , Uretrite/genética
5.
Proc Natl Acad Sci U S A ; 114(36): E7554-E7563, 2017 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-28784769

RESUMO

Translating the genetic and epigenetic heterogeneity underlying human cancers into therapeutic strategies is an ongoing challenge. Large-scale sequencing efforts have uncovered a spectrum of mutations in many hematologic malignancies, including acute myeloid leukemia (AML), suggesting that combinations of agents will be required to treat these diseases effectively. Combinatorial approaches will also be critical for combating the emergence of genetically heterogeneous subclones, rescue signals in the microenvironment, and tumor-intrinsic feedback pathways that all contribute to disease relapse. To identify novel and effective drug combinations, we performed ex vivo sensitivity profiling of 122 primary patient samples from a variety of hematologic malignancies against a panel of 48 drug combinations. The combinations were designed as drug pairs that target nonoverlapping biological pathways and comprise drugs from different classes, preferably with Food and Drug Administration approval. A combination ratio (CR) was derived for each drug pair, and CRs were evaluated with respect to diagnostic categories as well as against genetic, cytogenetic, and cellular phenotypes of specimens from the two largest disease categories: AML and chronic lymphocytic leukemia (CLL). Nearly all tested combinations involving a BCL2 inhibitor showed additional benefit in patients with myeloid malignancies, whereas select combinations involving PI3K, CSF1R, or bromodomain inhibitors showed preferential benefit in lymphoid malignancies. Expanded analyses of patients with AML and CLL revealed specific patterns of ex vivo drug combination efficacy that were associated with select genetic, cytogenetic, and phenotypic disease subsets, warranting further evaluation. These findings highlight the heuristic value of an integrated functional genomic approach to the identification of novel treatment strategies for hematologic malignancies.


Assuntos
Antineoplásicos/uso terapêutico , Neoplasias Hematológicas/tratamento farmacológico , Leucemia Linfocítica Crônica de Células B/tratamento farmacológico , Leucemia Mieloide Aguda/tratamento farmacológico , Combinação de Medicamentos , Neoplasias Hematológicas/metabolismo , Humanos , Leucemia Linfocítica Crônica de Células B/metabolismo , Leucemia Mieloide Aguda/metabolismo , Mutação/efeitos dos fármacos , Fosfatidilinositol 3-Quinases/metabolismo , Proteínas Proto-Oncogênicas c-bcl-2/metabolismo , Receptores de Fator Estimulador das Colônias de Granulócitos e Macrófagos/metabolismo
6.
Pac Symp Biocomput ; : 120-31, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25592574

RESUMO

Biological pathways are central to understanding complex diseases such as cancer. The majority of this knowledge is scattered in the vast and rapidly growing research literature. To automate knowledge extraction, machine learning approaches typically require annotated examples, which are expensive and time-consuming to acquire. Recently, there has been increasing interest in leveraging databases for distant supervision in knowledge extraction, but existing applications focus almost exclusively on newswire domains. In this paper, we present the first attempt to formulate the distant supervision problem for pathway extraction and apply a state-of-the-art method to extracting pathway interactions from PubMed abstracts. Experiments show that distant supervision can effectively compensate for the lack of annotation, attaining an accuracy approaching supervised results. From 22 million PubMed abstracts, we extracted 1.5 million pathway interactions at a precision of 25%. More than 10% of interactions are mentioned in the context of one or more cancer types, analysis of which yields interesting insights.


Assuntos
Mineração de Dados/métodos , Neoplasias/genética , Neoplasias/metabolismo , Biologia Computacional , Bases de Dados Genéticas , Humanos , Bases de Conhecimento , Redes e Vias Metabólicas/genética , Mutação , Oncogenes , PubMed , Aprendizado de Máquina Supervisionado
7.
Sci Rep ; 3: 1099, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23346356

RESUMO

We present an approach for genome-wide association analysis with improved power on the Wellcome Trust data consisting of seven common phenotypes and shared controls. We achieved improved power by expanding the control set to include other disease cohorts, multiple races, and closely related individuals. Within this setting, we conducted exhaustive univariate and epistatic interaction association analyses. Use of the expanded control set identified more known associations with Crohn's disease and potential new biology, including several plausible epistatic interactions in several diseases. Our work suggests that carefully combining data from large repositories could reveal many new biological insights through increased power. As a community resource, all results have been made available through an interactive web server.


Assuntos
Epistasia Genética/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Estudos de Coortes , Doença de Crohn/genética , Interpretação Estatística de Dados , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA