RESUMO
The Data Coordinating Center (DCC) of the Human Tumor Atlas Network (HTAN) has played a crucial role in enabling the broad sharing and effective utilization of HTAN data within the scientiï¬c community. Data from the ï¬rst phase of HTAN are now available publicly. We describe the diverse datasets and modalities shared, multiple access routes to HTAN assay data and metadata, data standards, technical infrastructure and governance approaches, as well as our approach to sustained community engagement. HTAN data can be accessed via the HTAN Portal, explored in visualization tools-including CellxGene, Minerva, and cBioPortal-and analyzed in the cloud through the NCI Cancer Research Data Commons nodes. We have developed a streamlined infrastructure to ingest and disseminate data by leveraging the Synapse platform. Taken together, the HTAN DCC's approach demonstrates a successful model for coordinating, standardizing, and disseminating complex cancer research data via multiple resources in the cancer data ecosystem, offering valuable insights for similar consortia, and researchers looking to leverage HTAN data.
RESUMO
International cancer registries make real-world genomic and clinical data available, but their joint analysis remains a challenge. AACR Project GENIE, an international cancer registry collecting data from 19 cancer centers, makes data from >130,000 patients publicly available through the cBioPortal for Cancer Genomics (https://genie.cbioportal.org). For 25,000 patients, additional real-world longitudinal clinical data, including treatment and outcome data, are being collected by the AACR Project GENIE Biopharma Collaborative using the PRISSMM data curation model. Several thousand of these cases are now also available in cBioPortal. We have significantly enhanced the functionalities of cBioPortal to support the visualization and analysis of this rich clinico-genomic linked dataset, as well as datasets generated by other centers and consortia. Examples of these enhancements include (i) visualization of the longitudinal clinical and genomic data at the patient level, including timelines for diagnoses, treatments, and outcomes; (ii) the ability to select samples based on treatment status, facilitating a comparison of molecular and clinical attributes between samples before and after a specific treatment; and (iii) survival analysis estimates based on individual treatment regimens received. Together, these features provide cBioPortal users with a toolkit to interactively investigate complex clinico-genomic data to generate hypotheses and make discoveries about the impact of specific genomic variants on prognosis and therapeutic sensitivities in cancer. SIGNIFICANCE: Enhanced cBioPortal features allow clinicians and researchers to effectively investigate longitudinal clinico-genomic data from patients with cancer, which will improve exploration of data from the AACR Project GENIE Biopharma Collaborative and similar datasets.
Assuntos
Genômica , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/terapia , Medicina de PrecisãoRESUMO
PURPOSE: Interpretation of genomic variants in tumor samples still presents a challenge in research and the clinical setting. A major issue is that information for variant interpretation is fragmented across disparate databases, and aggregation of information from these requires building extensive infrastructure. To this end, we have developed Genome Nexus, a one-stop shop for variant annotation with a user-friendly interface for cancer researchers and clinicians. METHODS: Genome Nexus (1) aggregates variant information from sources that are relevant to cancer research and clinical applications, (2) allows high-performance programmatic access to the aggregated data via a unified application programming interface, (3) provides a reference page for individual cancer variants, (4) provides user-friendly tools for annotating variants in patients, and (5) is freely available under an open source license and can be installed in a private cloud or local environment and integrated with local institutional resources. RESULTS: Genome Nexus is available at https://www.genomenexus.org. It displays annotations from more than a dozen resources including those that provide variant effect information (variant effect predictor), protein sequence annotation (Uniprot, Pfam, and dbPTM), functional consequence prediction (Polyphen-2, Mutation Assessor, and SIFT), population prevalences (gnomAD, dbSNP, and ExAC), cancer population prevalences (Cancer hotspots and SignalDB), and clinical actionability (OncoKB, CIViC, and ClinVar). We describe several use cases that demonstrate the utility of Genome Nexus to clinicians, researchers, and bioinformaticians. We cover single-variant annotation, cohort analysis, and programmatic use of the application programming interface. Genome Nexus is unique in providing a user-friendly interface specific to cancer that allows high-performance annotation of any variant including unknown ones. CONCLUSION: Interpretation of cancer genomic variants is improved tremendously by having an integrated resource for annotations. Genome Nexus is freely available under an open source license.
Assuntos
Neoplasias , Software , Genômica , Humanos , Anotação de Sequência Molecular , Mutação , Neoplasias/genéticaRESUMO
Human cancers arise from environmental, heritable and somatic factors, but how these mechanisms interact in tumorigenesis is poorly understood. Studying 17,152 prospectively sequenced patients with cancer, we identified pathogenic germline variants in cancer predisposition genes, and assessed their zygosity and co-occurring somatic alterations in the concomitant tumors. Two major routes to tumorigenesis were apparent. In carriers of pathogenic germline variants in high-penetrance genes (5.1% overall), lineage-dependent patterns of biallelic inactivation led to tumors exhibiting mechanism-specific somatic phenotypes and fewer additional somatic oncogenic drivers. Nevertheless, 27% of cancers in these patients, and most tumors in patients with pathogenic germline variants in lower-penetrance genes, lacked particular hallmarks of tumorigenesis associated with the germline allele. The dependence of tumors on pathogenic germline variants is variable and often dictated by both penetrance and lineage, a finding with implications for clinical management.
Assuntos
Mutação em Linhagem Germinativa , Neoplasias/genética , Carcinogênese/genética , Variações do Número de Cópias de DNA , Reparo de Erro de Pareamento de DNA/genética , Predisposição Genética para Doença , Heterozigoto , Humanos , FenótipoRESUMO
Most mutations in cancer are rare, which complicates the identification of therapeutically significant mutations and thus limits the clinical impact of genomic profiling in patients with cancer. Here, we analyzed 24,592 cancers including 10,336 prospectively sequenced patients with advanced disease to identify mutant residues arising more frequently than expected in the absence of selection. We identified 1,165 statistically significant hotspot mutations of which 80% arose in 1 in 1,000 or fewer patients. Of 55 recurrent in-frame indels, we validated that novel AKT1 duplications induced pathway hyperactivation and conferred AKT inhibitor sensitivity. Cancer genes exhibit different rates of hotspot discovery with increasing sample size, with few approaching saturation. Consequently, 26% of all hotspots in therapeutically actionable oncogenes were novel. Upon matching a subset of affected patients directly to molecularly targeted therapy, we observed radiographic and clinical responses. Population-scale mutant allele discovery illustrates how the identification of driver mutations in cancer is far from complete.Significance: Our systematic computational, experimental, and clinical analysis of hotspot mutations in approximately 25,000 human cancers demonstrates that the long right tail of biologically and therapeutically significant mutant alleles is still incompletely characterized. Sharing prospective genomic data will accelerate hotspot identification, thereby expanding the reach of precision oncology in patients with cancer. Cancer Discov; 8(2); 174-83. ©2017 AACR.This article is highlighted in the In This Issue feature, p. 127.
Assuntos
Alelos , Biomarcadores Tumorais , Estudos de Associação Genética , Predisposição Genética para Doença , Mutação , Neoplasias/genética , Códon , Estudos de Associação Genética/métodos , Humanos , Mutação INDELRESUMO
Many mutations in cancer are of unknown functional significance. Standard methods use statistically significant recurrence of mutations in tumor samples as an indicator of functional impact. We extend such analyses into the long tail of rare mutations by considering recurrence of mutations in clusters of spatially close residues in protein structures. Analyzing 10,000 tumor exomes, we identify more than 3000 rarely mutated residues in proteins as potentially functional and experimentally validate several in RAC1 and MAP2K1. These potential driver mutations (web resources: 3dhotspots.org and cBioPortal.org) can extend the scope of genomically informed clinical trials and of personalized choice of therapy.
Assuntos
Análise Mutacional de DNA/métodos , Genômica/métodos , Mutação , Proteínas de Neoplasias/genética , Neoplasias/genética , Exoma , Humanos , MAP Quinase Quinase 1/química , MAP Quinase Quinase 1/genética , MAP Quinase Quinase 1/metabolismo , Proteínas de Neoplasias/química , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Estrutura Terciária de Proteína , Proteínas rac1 de Ligação ao GTP/química , Proteínas rac1 de Ligação ao GTP/genética , Proteínas rac1 de Ligação ao GTP/metabolismoRESUMO
Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as 'cancer hallmarks'. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody-related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer.
Assuntos
Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Mapas de Interação de Proteínas/fisiologia , Proteômica/métodos , Software , Análise por Conglomerados , Bases de Dados de Proteínas , Perfilação da Expressão Gênica , Humanos , Proteínas de Neoplasias/análise , Proteínas de Neoplasias/genética , Neoplasias/genética , Análise de Componente PrincipalRESUMO
The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects.
Assuntos
Bases de Dados Genéticas , Mutação , Neoplasias/genética , Estrutura Terciária de Proteína/genética , Genômica , Humanos , Alinhamento de Sequência , SoftwareRESUMO
BACKGROUND: Information about cellular processes and pathways is becoming increasingly available in detailed, computable standard formats such as BioPAX and SBGN. Effective visualization of this information is a key recurring requirement for biological data analysis, especially for -omic data. Biological data analysis is rapidly migrating to web based platforms; thus there is a substantial need for sophisticated web based pathway viewers that support these platforms and other use cases. RESULTS: Towards this goal, we developed a web based viewer named SBGNViz for process description maps in SBGN (SBGN-PD). SBGNViz can visualize both BioPAX and SBGN formats. Unique features of SBGNViz include the ability to nest nodes to arbitrary depths to represent molecular complexes and cellular locations, automatic pathway layout, editing and highlighting facilities to enable focus on sub-maps, and the ability to inspect pathway members for detailed information from EntrezGene. SBGNViz can be used within a web browser without any installation and can be readily embedded into web pages. SBGNViz has two editions built with ActionScript and JavaScript. The JavaScript edition, which also works on touch enabled devices, introduces novel methods for managing and reducing complexity of large SBGN-PD maps for more effective analysis. CONCLUSION: SBGNViz fills an important gap by making the large and fast-growing corpus of rich pathway information accessible to web based platforms. SBGNViz can be used in a variety of contexts and in multiple scenarios ranging from visualization of the results of a single study in a web page to building data analysis platforms.
Assuntos
Transdução de Sinais/fisiologia , Estatística como Assunto/métodos , Tecnologia/métodos , Acesso à Informação , Gráficos por Computador , Internet , Software , Biologia de Sistemas/métodos , NavegadorRESUMO
MOTIVATION: BioPAX is a standard language for representing complex cellular processes, including metabolic networks, signal transduction and gene regulation. Owing to the inherent complexity of a BioPAX model, searching for a specific type of subnetwork can be non-trivial and difficult. RESULTS: We developed an open source and extensible framework for defining and searching graph patterns in BioPAX models. We demonstrate its use with a sample pattern that captures directed signaling relations between proteins. We provide search results for the pattern obtained from the Pathway Commons database and compare these results with the current data in signaling databases SPIKE and SignaLink. Results show that a pattern search in public pathway data can identify a substantial amount of signaling relations that do not exist in signaling databases. AVAILABILITY: BioPAX-pattern software was developed in Java. Source code and documentation is freely available at http://code.google.com/p/biopax-pattern under Lesser GNU Public License.
Assuntos
Linguagens de Programação , Fenômenos Fisiológicos Celulares , Bases de Dados Factuais , Redes e Vias Metabólicas , Modelos Biológicos , FosforilaçãoRESUMO
The cBio Cancer Genomics Portal (http://cbioportal.org) is an open-access resource for interactive exploration of multidimensional cancer genomics data sets, currently providing access to data from more than 5,000 tumor samples from 20 cancer studies. The cBio Cancer Genomics Portal significantly lowers the barriers between complex genomic data and cancer researchers who want rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects and empowers researchers to translate these rich data sets into biologic insights and clinical applications.