Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 155(1): 70-80, 2013 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-24074861

RESUMO

Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.


Assuntos
Doença/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Modelos Genéticos , Registros de Saúde Pessoal , Humanos , Penetrância , Polimorfismo de Nucleotídeo Único
2.
PLoS Comput Biol ; 19(3): e1010944, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36913405

RESUMO

We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.


Assuntos
Software , Vocabulário Controlado , Registros
3.
Trends Genet ; 35(3): 223-234, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30691868

RESUMO

Data commons collate data with cloud computing infrastructure and commonly used software services, tools, and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize, and share large-scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import, and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing, and sharing genomic data, with an emphasis on data commons, but also cover data ecosystems and data lakes.


Assuntos
Computação em Nuvem/tendências , Genômica/métodos , Disseminação de Informação/métodos , Software , Big Data , Pesquisa Biomédica/tendências , Biologia Computacional/tendências , Humanos
4.
Genome Res ; 27(10): 1743-1751, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28847918

RESUMO

Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs.


Assuntos
Antineoplásicos/farmacologia , Biomarcadores Tumorais/genética , Genoma Humano , Genômica/métodos , Neoplasias , Testes Farmacogenômicos/métodos , Feminino , Humanos , Masculino , Neoplasias/tratamento farmacológico , Neoplasias/genética
5.
Blood ; 130(4): 453-459, 2017 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-28600341

RESUMO

The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.


Assuntos
Bases de Dados Genéticas , Neoplasias/genética , Medicina de Precisão , Software , Humanos , National Cancer Institute (U.S.) , Estados Unidos
6.
Genome Res ; 24(7): 1224-35, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24985916

RESUMO

Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and how that process goes awry in the case of disease. One goal of the modENCODE (model organism ENCyclopedia of DNA Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show association of the Trithorax-like (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mechanisms directing Drosophila development.


Assuntos
Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulação da Expressão Gênica , Genoma de Inseto , Fatores de Transcrição , Transcrição Gênica , Animais , Sequência de Bases , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Elementos Facilitadores Genéticos , Perfilação da Expressão Gênica , Genômica/métodos , Motivos de Nucleotídeos , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo
7.
Nature ; 471(7339): 527-31, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21430782

RESUMO

Systematic annotation of gene regulatory elements is a major challenge in genome science. Direct mapping of chromatin modification marks and transcriptional factor binding sites genome-wide has successfully identified specific subtypes of regulatory elements. In Drosophila several pioneering studies have provided genome-wide identification of Polycomb response elements, chromatin states, transcription factor binding sites, RNA polymerase II regulation and insulator elements; however, comprehensive annotation of the regulatory genome remains a significant challenge. Here we describe results from the modENCODE cis-regulatory annotation project. We produced a map of the Drosophila melanogaster regulatory genome on the basis of more than 300 chromatin immunoprecipitation data sets for eight chromatin features, five histone deacetylases and thirty-eight site-specific transcription factors at different stages of development. Using these data we inferred more than 20,000 candidate regulatory elements and validated a subset of predictions for promoters, enhancers and insulators in vivo. We identified also nearly 2,000 genomic regions of dense transcription factor binding associated with chromatin activity and accessibility. We discovered hundreds of new transcription factor co-binding relationships and defined a transcription factor network with over 800 potential regulatory relationships.


Assuntos
Drosophila melanogaster/genética , Genoma de Inseto/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/metabolismo , Montagem e Desmontagem da Cromatina , Imunoprecipitação da Cromatina , Elementos Facilitadores Genéticos/genética , Histona Desacetilases/metabolismo , Elementos Isolantes/genética , Regiões Promotoras Genéticas/genética , Reprodutibilidade dos Testes , Elementos Silenciadores Transcricionais/genética , Fatores de Transcrição/metabolismo
8.
Comput Sci Eng ; 18(5): 10-20, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-29033693

RESUMO

Data commons collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. An architecture for data commons is described, as well as some lessons learned from operating several large-scale data commons.

9.
Blood ; 121(6): 975-83, 2013 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-23212519

RESUMO

Loss of chromosome 7 and del(7q) [-7/del(7q)] are recurring cytogenetic abnormalities in hematologic malignancies, including acute myeloid leukemia and therapy-related myeloid neoplasms, and associated with an adverse prognosis. Despite intensive effort by many laboratories, the putative myeloid tumor suppressor(s) on chromosome 7 has not yet been identified.We performed transcriptome sequencing and SNP array analysis on de novo and therapy-related myeloid neoplasms, half with -7/del(7q). We identified a 2.17-Mb commonly deleted segment on chromosome band 7q22.1 containing CUX1, a gene encoding a homeodomain-containing transcription factor. In 1 case, CUX1 was disrupted by a translocation, resulting in a loss-of-function RNA fusion transcript. CUX1 was the most significantly differentially expressed gene within the commonly deleted segment and was expressed at haploinsufficient levels in -7/del(7q) leukemias. Haploinsufficiency of the highly conserved ortholog, cut, led to hemocyte overgrowth and tumor formation in Drosophila melanogaster. Similarly, haploinsufficiency of CUX1 gave human hematopoietic cells a significant engraftment advantage on transplantation into immunodeficient mice. Within the RNA-sequencing data, we identified a CUX1-associated cell cycle transcriptional gene signature, suggesting that CUX1 exerts tumor suppressor activity by regulating proliferative genes. These data identify CUX1 as a conserved, haploinsufficient tumor suppressor frequently deleted in myeloid neoplasms.


Assuntos
Deleção Cromossômica , Cromossomos Humanos Par 7/genética , Proteínas de Homeodomínio/genética , Leucemia Mieloide/genética , Proteínas Nucleares/genética , Proteínas Repressoras/genética , Doença Aguda , Animais , Western Blotting , Linhagem Celular Tumoral , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Haploinsuficiência , Células HeLa , Proteínas de Homeodomínio/metabolismo , Humanos , Subunidade gama Comum de Receptores de Interleucina/deficiência , Subunidade gama Comum de Receptores de Interleucina/genética , Células K562 , Leucemia Mieloide/metabolismo , Leucemia Mieloide/patologia , Camundongos , Camundongos Endogâmicos NOD , Camundongos Knockout , Camundongos SCID , Proteínas Nucleares/metabolismo , Interferência de RNA , Proteínas Repressoras/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Fatores de Transcrição , Translocação Genética , Proteínas Supressoras de Tumor/genética , Proteínas Supressoras de Tumor/metabolismo , Células U937 , Ensaios Antitumorais Modelo de Xenoenxerto
11.
JAMIA Open ; 7(1): ooae004, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38304249

RESUMO

Objective: The Pediatric Cancer Data Commons (PCDC)-a project of Data for the Common Good-houses clinical pediatric oncology data and utilizes the open-source Gen3 platform. To meet the needs of end users, the PCDC development team expanded the out-of-box functionality and developed additional custom features that should be useful to any group developing similar data commons. Materials and Methods: Modifications of the PCDC data portal software were implemented to facilitate desired functionality. Results: Newly developed functionality includes updates to authorization methods, expansion of filtering capabilities, and addition of data analysis functions. Discussion: We describe the process by which custom functionalities were developed. Features are open source and available to be implemented and adapted to suit needs of data portals that utilize the Gen3 platform. Conclusion: Data portals are indispensable tools for facilitating data sharing. Open-source infrastructure facilitates a modular and collaborative approach for meeting needs of end users and stakeholders.

12.
JAMIA Open ; 7(2): ooae025, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38617994

RESUMO

Objectives: A data commons is a software platform for managing, curating, analyzing, and sharing data with a community. The Pandemic Response Commons (PRC) is a data commons designed to provide a data platform for researchers studying an epidemic or pandemic. Methods: The PRC was developed using the open source Gen3 data platform and is based upon consortium, data, and platform agreements developed by the not-for-profit Open Commons Consortium. A formal consortium of Chicagoland area organizations was formed to develop and operate the PRC. Results: The consortium developed a general PRC and an instance of it for the Chicagoland region called the Chicagoland COVID-19 Commons. A Gen3 data platform was set up and operated with policies, procedures, and controls for a NIST SP 800-53 revision 4 Moderate system. A consensus data model for the commons was developed, and a variety of datasets were curated, harmonized and ingested, including statistical summary data about COVID cases, patient level clinical data, and SARS-CoV-2 viral variant data. Discussion and conclusions: Given the various legal and data agreements required to operate a data commons, a PRC is designed to be in place and operating at a low level prior to the occurrence of an epidemic, with the activities increasing as required during an epidemic. A regional instance of a PRC can also be part of a broader data ecosystem or data mesh consisting of multiple regional commons supporting pandemic response through sharing regional data.

13.
Cancer Res ; 84(9): 1384-1387, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38488505

RESUMO

The NCI Cancer Research Data Commons (CRDC) is a collection of data commons, analysis platforms, and tools that make existing cancer data more findable and accessible by the cancer research community. In practice, the two biggest hurdles to finding and using data for discovery are the wide variety of models and ontologies used to describe data, and the dispersed storage of that data. Here, we outline core CRDC services to aggregate descriptive information from multiple studies for findability via a single interface and to provide a single access method that spans multiple data commons. See related articles by Wang et al., p. 1388, Pot et al., p. 1396, and Kim et al., p. 1404.


Assuntos
National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Pesquisa Biomédica/normas , Bases de Dados Factuais
14.
Stud Health Technol Inform ; 310: 735-739, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269906

RESUMO

High-resolution whole slide image scans of histopathology slides have been widely used in recent years for prediction in cancer. However, in some cases, clinical informatics practitioners may only have access to low-resolution snapshots of histopathology slides, not high-resolution scans. We evaluated strategies for training neural network prognostic models in non-small cell lung cancer (NSCLC) based on low-resolution snapshots, using data from the Veterans Affairs Precision Oncology Data Repository. We compared strategies without transfer learning, with transfer learning from general domain images, and with transfer learning from publicly available high-resolution histopathology scans. We found transfer learning from high-resolution scans achieved significantly better performance than other strategies. Our contribution provides a foundation for future development of prognostic models in NSCLC that incorporate data from low-resolution pathology slide snapshots alongside known clinical predictors.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Informática Médica , Humanos , Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico por imagem , Medicina de Precisão , Aprendizado de Máquina
15.
Cancer Res ; 84(9): 1388-1395, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38488507

RESUMO

Since 2014, the NCI has launched a series of data commons as part of the Cancer Research Data Commons (CRDC) ecosystem housing genomic, proteomic, imaging, and clinical data to support cancer research and promote data sharing of NCI-funded studies. This review describes each data commons (Genomic Data Commons, Proteomic Data Commons, Integrated Canine Data Commons, Cancer Data Service, Imaging Data Commons, and Clinical and Translational Data Commons), including their unique and shared features, accomplishments, and challenges. Also discussed is how the CRDC data commons implement Findable, Accessible, Interoperable, Reusable (FAIR) principles and promote data sharing in support of the new NIH Data Management and Sharing Policy. See related articles by Brady et al., p. 1384, Pot et al., p. 1396, and Kim et al., p. 1404.


Assuntos
Disseminação de Informação , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/metabolismo , Disseminação de Informação/métodos , Pesquisa Biomédica , Genômica/métodos , Animais , Proteômica/métodos
16.
Appl Environ Microbiol ; 79(5): 1757-9, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23315735

RESUMO

"Candidatus Portiera aleyrodidarum" is the primary endosymbiont of whiteflies. We report two complete genome sequences of this bacterium from the worldwide invasive B and Q biotypes of the whitefly Bemisia tabaci. Differences in the two genome sequences may add insights into the complex differences in the biology of both biotypes.


Assuntos
Halomonadaceae/isolamento & purificação , Halomonadaceae/fisiologia , Hemípteros/microbiologia , Simbiose , Animais , DNA Bacteriano/química , DNA Bacteriano/genética , Genoma Bacteriano , Halomonadaceae/classificação , Halomonadaceae/genética , Dados de Sequência Molecular
17.
bioRxiv ; 2023 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-38106010

RESUMO

Spatial transcriptomics (ST) has enhanced RNA analysis in tissue biopsies, but interpreting these data is challenging without expert input. We present Automated Tissue Alignment and Traversal (ATAT), a novel computational framework designed to enhance ST analysis in the context of multiple and complex tissue architectures and morphologies, such as those found in biopsies of the gastrointestinal tract. ATAT utilizes self-supervised contrastive learning on hematoxylin and eosin (H&E) stained images to automate the alignment and traversal of ST data. This approach addresses a critical gap in current ST analysis methodologies, which rely heavily on manual annotation and pathologist expertise to delineate regions of interest for accurate gene expression modeling. Our framework not only streamlines the alignment of multiple ST samples, but also demonstrates robustness in modeling gene expression transitions across specific regions. Additionally, we highlight the ability of ATAT to traverse complex tissue topologies in real-world cases from various individuals and conditions. Our method successfully elucidates differences in immune infiltration patterns across the intestinal wall, enabling the modeling of transcriptional changes across histological layers. We show that ATAT achieves comparable performance to the state-of-the-art method, while alleviating the burden of manual annotation and enabling alignment of tissue samples with complex morphologies.

18.
Artigo em Inglês | MEDLINE | ID: mdl-38050021

RESUMO

Veterans are at an increased risk for prostate cancer, a disease with extraordinary clinical and molecular heterogeneity, compared with the general population. However, little is known about the underlying molecular heterogeneity within the veteran population and its impact on patient management and treatment. Using clinical and targeted tumor sequencing data from the National Veterans Affairs health system, we conducted a retrospective cohort study on 45 patients with advanced prostate cancer in the Veterans Precision Oncology Data Commons (VPODC), most of whom were metastatic castration-resistant. We characterized the mutational burden in this cohort and conducted unsupervised clustering analysis to stratify patients by molecular alterations. Veterans with prostate cancer exhibited a mutational landscape broadly similar to prior studies, including KMT2A and NOTCH1 mutations associated with neuroendocrine prostate cancer phenotype, previously reported to be enriched in veterans. We also identified several potential novel mutations in PTEN, MSH6, VHL, SMO, and ABL1 Hierarchical clustering analysis revealed two subgroups containing therapeutically targetable molecular features with novel mutational signatures distinct from those reported in the Catalogue of Somatic Mutations in Cancer database. The clustering approach presented in this study can potentially be used to clinically stratify patients based on their distinct mutational profiles and identify actionable somatic mutations for precision oncology.


Assuntos
Neoplasias da Próstata , Veteranos , Masculino , Humanos , Estudos Retrospectivos , Medicina de Precisão , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Oncologia , Mutação
19.
J Mol Diagn ; 25(3): 143-155, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36828596

RESUMO

The Blood Profiling Atlas in Cancer (BLOODPAC) Consortium is a collaborative effort involving stakeholders from the public, industry, academia, and regulatory agencies focused on developing shared best practices on liquid biopsy. This report describes the results from the JFDI (Just Freaking Do It) study, a BLOODPAC initiative to develop standards on the use of contrived materials mimicking cell-free circulating tumor DNA, to comparatively evaluate clinical laboratory testing procedures. Nine independent laboratories tested the concordance, sensitivity, and specificity of commercially available contrived materials with known variant-allele frequencies (VAFs) ranging from 0.1% to 5.0%. Each participating laboratory utilized its own proprietary evaluation procedures. The results demonstrated high levels of concordance and sensitivity at VAFs of >0.1%, but reduced concordance and sensitivity at a VAF of 0.1%; these findings were similar to those from previous studies, suggesting that commercially available contrived materials can support the evaluation of testing procedures across multiple technologies. Such materials may enable more objective comparisons of results on materials formulated in-house at each center in multicenter trials. A unique goal of the collaborative effort was to develop a data resource, the BLOODPAC Data Commons, now available to the liquid-biopsy community for further study. This resource can be used to support independent evaluations of results, data extension through data integration and new studies, and retrospective evaluation of data collection.


Assuntos
DNA Tumoral Circulante , Neoplasias Hematológicas , Neoplasias , Humanos , Estudos Retrospectivos , Neoplasias/genética , Biópsia Líquida/métodos
20.
Cancer Res ; 83(8): 1175-1182, 2023 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-36625843

RESUMO

Big data in healthcare can enable unprecedented understanding of diseases and their treatment, particularly in oncology. These data may include electronic health records, medical imaging, genomic sequencing, payor records, and data from pharmaceutical research, wearables, and medical devices. The ability to combine datasets and use data across many analyses is critical to the successful use of big data and is a concern for those who generate and use the data. Interoperability and data quality continue to be major challenges when working with different healthcare datasets. Mapping terminology across datasets, missing and incorrect data, and varying data structures make combining data an onerous and largely manual undertaking. Data privacy is another concern addressed by the Health Insurance Portability and Accountability Act, the Common Rule, and the General Data Protection Regulation. The use of big data is now included in the planning and activities of the FDA and the European Medicines Agency. The willingness of organizations to share data in a precompetitive fashion, agreements on data quality standards, and institution of universal and practical tenets on data privacy will be crucial to fully realizing the potential for big data in medicine.


Assuntos
Big Data , Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/terapia , Medicina de Precisão , Armazenamento e Recuperação da Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA