RESUMO
The Drug-Gene Interaction Database (DGIdb, https://dgidb.org) is a publicly accessible resource that aggregates genes or gene products, drugs and drug-gene interaction records to drive hypothesis generation and discovery for clinicians and researchers. DGIdb 5.0 is the latest release and includes substantial architectural and functional updates to support integration into clinical and drug discovery pipelines. The DGIdb service architecture has been split into separate client and server applications, enabling consistent data access for users of both the application programming interface (API) and web interface. The new interface was developed in ReactJS, and includes dynamic visualizations and consistency in the display of user interface elements. A GraphQL API has been added to support customizable queries for all drugs, genes, annotations and associated data. Updated documentation provides users with example queries and detailed usage instructions for these new features. In addition, six sources have been added and many existing sources have been updated. Newly added sources include ChemIDplus, HemOnc, NCIt (National Cancer Institute Thesaurus), Drugs@FDA, HGNC (HUGO Gene Nomenclature Committee) and RxNorm. These new sources have been incorporated into DGIdb to provide additional records and enhance annotations of regulatory approval status for therapeutics. Methods for grouping drugs and genes have been expanded upon and developed as independent modular normalizers during import. The updates to these sources and grouping methods have resulted in an improvement in FAIR (findability, accessibility, interoperability and reusability) data representation in DGIdb.
Assuntos
Medicina de Precisão , Humanos , Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas , Internet , Interface Usuário-Computador , Vocabulário ControladoRESUMO
The complexity of diagnostic (surgical) pathology has increased substantially over the last decades with respect to histomorphological and molecular profiling. Pathology has steadily expanded its role in tumor diagnostics and beyond from disease entity identification via prognosis estimation to precision therapy prediction. It is therefore not surprising that pathology is among the disciplines in medicine with high expectations in the application of artificial intelligence (AI) or machine learning approaches given their capabilities to analyze complex data in a quantitative and standardized manner to further enhance scope and precision of diagnostics. While an obvious application is the analysis of histological images, recent applications for the analysis of molecular profiling data from different sources and clinical data support the notion that AI will enhance both histopathology and molecular pathology in the future. At the same time, current literature should not be misunderstood in a way that pathologists will likely be replaced by AI applications in the foreseeable future. Although AI will transform pathology in the coming years, recent studies reporting AI algorithms to diagnose cancer or predict certain molecular properties deal with relatively simple diagnostic problems that fall short of the diagnostic complexity pathologists face in clinical routine. Here, we review the pertinent literature of AI methods and their applications to pathology, and put the current achievements and what can be expected in the future in the context of the requirements for research and routine diagnostics.
Assuntos
Inteligência Artificial , Neoplasias , Humanos , Aprendizado de Máquina , Neoplasias/diagnóstico , Neoplasias/genética , PrognósticoRESUMO
MOTIVATION: Despite the increasing evidence of utility of genomic medicine in clinical practice, systematically integrating genomic medicine information and knowledge into clinical systems with a high-level of consistency, scalability and computability remains challenging. A comprehensive terminology is required for relevant concepts and the associated knowledge model for representing relationships. In this study, we leveraged PharmGKB, a comprehensive pharmacogenomics (PGx) knowledgebase, to formulate a terminology for drug response phenotypes that can represent relationships between genetic variants and treatments. We evaluated coverage of the terminology through manual review of a randomly selected subset of 200 sentences extracted from genetic reports that contained concepts for 'Genes and Gene Products' and 'Treatments'. RESULTS: Results showed that our proposed drug response phenotype terminology could cover 96% of the drug response phenotypes in genetic reports. Among 18â653 sentences that contained both 'Genes and Gene Products' and 'Treatments', 3011 sentences were able to be mapped to a drug response phenotype in our proposed terminology, among which the most discussed drug response phenotypes were response (994), sensitivity (829) and survival (332). In addition, we were able to re-analyze genetic report context incorporating the proposed terminology and enrich our previously proposed PGx knowledge model to reveal relationships between genetic variants and treatments. In conclusion, we proposed a drug response phenotype terminology that enhanced structured knowledge representation of genomic medicine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Medicina Genômica , Farmacogenética , Farmacogenética/métodos , Bases de Conhecimento , FenótipoRESUMO
The Drug-Gene Interaction Database (DGIdb, www.dgidb.org) is a web resource that provides information on drug-gene interactions and druggable genes from publications, databases, and other web-based sources. Drug, gene, and interaction data are normalized and merged into conceptual groups. The information contained in this resource is available to users through a straightforward search interface, an application programming interface (API), and TSV data downloads. DGIdb 4.0 is the latest major version release of this database. A primary focus of this update was integration with crowdsourced efforts, leveraging the Drug Target Commons for community-contributed interaction data, Wikidata to facilitate term normalization, and export to NDEx for drug-gene interaction network representations. Seven new sources have been added since the last major version release, bringing the total number of sources included to 41. Of the previously aggregated sources, 15 have been updated. DGIdb 4.0 also includes improvements to the process of drug normalization and grouping of imported sources. Other notable updates include the introduction of a more sophisticated Query Score for interaction search results, an updated Interaction Score, the inclusion of interaction directionality, and several additional improvements to search features, data releases, licensing documentation and the application framework.
Assuntos
Crowdsourcing , Bases de Dados Factuais , Bases de Dados Genéticas , Drogas em Investigação/farmacologia , Genoma Humano/efeitos dos fármacos , Medicamentos sob Prescrição/farmacologia , Bases de Dados de Compostos Químicos , Drogas em Investigação/química , Genótipo , Humanos , Internet , Bases de Conhecimento , Fenótipo , Medicamentos sob Prescrição/química , SoftwareRESUMO
BACKGROUND: Cholangiocarcinoma (CCA) is a primary malignancy of the biliary tract with a dismal prognosis. Recently, several actionable genetic aberrations were identified with significant enrichment in intrahepatic CCA, including FGFR2 gene fusions with a prevalence of 10-15%. Recent clinical data demonstrate that these fusions are druggable in a second-line setting in advanced/metastatic disease and the efficacy in earlier lines of therapy is being evaluated in ongoing clinical trials. This scenario warrants standardised molecular profiling of these tumours. METHODS: A detailed analysis of the original genetic data from the FIGHT-202 trial, on which the approval of Pemigatinib was based, was conducted. RESULTS: Comparing different detection approaches and displaying representative cases, we described the genetic landscape and architecture of FGFR2 fusions in iCCA and show biological and technical aspects to be considered for their detection. We elaborated parameters, including a suggestion for annotation, that should be stated in a molecular diagnostic FGFR2 report to allow a complete understanding of the analysis performed and the information provided. CONCLUSION: This study provides a detailed presentation and dissection of the technical and biological aspects regarding FGFR2 fusion detection, which aims to support molecular pathologists, pathologists and clinicians in diagnostics, reporting of the results and decision-making.
Assuntos
Neoplasias dos Ductos Biliares , Colangiocarcinoma , Neoplasias dos Ductos Biliares/tratamento farmacológico , Ductos Biliares Intra-Hepáticos/patologia , Colangiocarcinoma/tratamento farmacológico , Genômica , Humanos , Técnicas de Diagnóstico Molecular , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/genéticaRESUMO
PURPOSE: Several professional societies have published guidelines for the clinical interpretation of somatic variants, which specifically address diagnostic, prognostic, and therapeutic implications. Although these guidelines for the clinical interpretation of variants include data types that may be used to determine the oncogenicity of a variant (eg, population frequency, functional, and in silico data or somatic frequency), they do not provide a direct, systematic, and comprehensive set of standards and rules to classify the oncogenicity of a somatic variant. This insufficient guidance leads to inconsistent classification of rare somatic variants in cancer, generates variability in their clinical interpretation, and, importantly, affects patient care. Therefore, it is essential to address this unmet need. METHODS: Clinical Genome Resource (ClinGen) Somatic Cancer Clinical Domain Working Group and ClinGen Germline/Somatic Variant Subcommittee, the Cancer Genomics Consortium, and the Variant Interpretation for Cancer Consortium used a consensus approach to develop a standard operating procedure (SOP) for the classification of oncogenicity of somatic variants. RESULTS: This comprehensive SOP has been developed to improve consistency in somatic variant classification and has been validated on 94 somatic variants in 10 common cancer-related genes. CONCLUSION: The comprehensive SOP is now available for classification of oncogenicity of somatic variants.
Assuntos
Genoma Humano , Neoplasias , Testes Genéticos/métodos , Variação Genética/genética , Genoma Humano/genética , Genômica/métodos , Humanos , Neoplasias/genética , VirulênciaRESUMO
BACKGROUND: Pediatric cancers typically have a distinct genomic landscape when compared to adult cancers and frequently carry somatic gene fusion events that alter gene expression and drive tumorigenesis. Sensitive and specific detection of gene fusions through the analysis of next-generation-based RNA sequencing (RNA-Seq) data is computationally challenging and may be confounded by low tumor cellularity or underlying genomic complexity. Furthermore, numerous computational tools are available to identify fusions from supporting RNA-Seq reads, yet each algorithm demonstrates unique variability in sensitivity and precision, and no clearly superior approach currently exists. To overcome these challenges, we have developed an ensemble fusion calling approach to increase the accuracy of identifying fusions. RESULTS: Our Ensemble Fusion (EnFusion) approach utilizes seven fusion calling algorithms: Arriba, CICERO, FusionMap, FusionCatcher, JAFFA, MapSplice, and STAR-Fusion, which are packaged as a fully automated pipeline using Docker and Amazon Web Services (AWS) serverless technology. This method uses paired end RNA-Seq sequence reads as input, and the output from each algorithm is examined to identify fusions detected by a consensus of at least three algorithms. These consensus fusion results are filtered by comparison to an internal database to remove likely artifactual fusions occurring at high frequencies in our internal cohort, while a "known fusion list" prevents failure to report known pathogenic events. We have employed the EnFusion pipeline on RNA-Seq data from 229 patients with pediatric cancer or blood disorders studied under an IRB-approved protocol. The samples consist of 138 central nervous system tumors, 73 solid tumors, and 18 hematologic malignancies or disorders. The combination of an ensemble fusion-calling pipeline and a knowledge-based filtering strategy identified 67 clinically relevant fusions among our cohort (diagnostic yield of 29.3%), including RBPMS-MET, BCAN-NTRK1, and TRIM22-BRAF fusions. Following clinical confirmation and reporting in the patient's medical record, both known and novel fusions provided medically meaningful information. CONCLUSIONS: The EnFusion pipeline offers a streamlined approach to discover fusions in cancer, at higher levels of sensitivity and accuracy than single algorithm methods. Furthermore, this method accurately identifies driver fusions in pediatric cancer, providing clinical impact by contributing evidence to diagnosis and, when appropriate, indicating targeted therapies.
Assuntos
Genoma , Neoplasias , Criança , Genômica , Humanos , Neoplasias/genética , Análise de Sequência de DNA , Análise de Sequência de RNARESUMO
The drug-gene interaction database (DGIdb, www.dgidb.org) consolidates, organizes and presents drug-gene interactions and gene druggability information from papers, databases and web resources. DGIdb normalizes content from 30 disparate sources and allows for user-friendly advanced browsing, searching and filtering for ease of access through an intuitive web user interface, application programming interface (API) and public cloud-based server image. DGIdb v3.0 represents a major update of the database. Nine of the previously included 24 sources were updated. Six new resources were added, bringing the total number of sources to 30. These updates and additions of sources have cumulatively resulted in 56 309 interaction claims. This has also substantially expanded the comprehensive catalogue of druggable genes and anti-neoplastic drug-gene interactions included in the DGIdb. Along with these content updates, v3.0 has received a major overhaul of its codebase, including an updated user interface, preset interaction search filters, consolidation of interaction information into interaction groups, greatly improved search response times and upgrading the underlying web application framework. In addition, the expanded API features new endpoints which allow users to extract more detailed information about queried drugs, genes and drug-gene interactions, including listings of PubMed IDs, interaction type and other interaction metadata.
Assuntos
Bases de Dados de Produtos Farmacêuticos , Genes/efeitos dos fármacos , Antineoplásicos , Interface Usuário-ComputadorRESUMO
PURPOSE: Following automated variant calling, manual review of aligned read sequences is required to identify a high-quality list of somatic variants. Despite widespread use in analyzing sequence data, methods to standardize manual review have not been described, resulting in high inter- and intralab variability. METHODS: This manual review standard operating procedure (SOP) consists of methods to annotate variants with four different calls and 19 tags. The calls indicate a reviewer's confidence in each variant and the tags indicate commonly observed sequencing patterns and artifacts that inform the manual review call. Four individuals were asked to classify variants prior to, and after, reading the SOP and accuracy was assessed by comparing reviewer calls with orthogonal validation sequencing. RESULTS: After reading the SOP, average accuracy in somatic variant identification increased by 16.7% (p value = 0.0298) and average interreviewer agreement increased by 12.7% (p value < 0.001). Manual review conducted after reading the SOP did not significantly increase reviewer time. CONCLUSION: This SOP supports and enhances manual somatic variant detection by improving reviewer accuracy while reducing the interreviewer variability for variant calling and annotation.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Mutação/genética , Neoplasias/genética , Software , Algoritmos , Humanos , Neoplasias/patologia , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de SequênciaRESUMO
Harmonization of cancer variant representation, efficient communication, and free distribution of clinical variant-associated knowledge are central problems that arise with increased usage of clinical next-generation sequencing. The Clinical Genome Resource (ClinGen) Somatic Working Group (WG) developed a minimal variant level data (MVLD) representation of cancer variants, and has an ongoing collaboration with Clinical Interpretations of Variants in Cancer (CIViC), an open-source platform supporting crowdsourced and expert-moderated cancer variant curation. Harmonization between MVLD and CIViC variant formats was assessed by formal field-by-field analysis. Adjustments to the CIViC format were made to harmonize with MVLD and support ClinGen Somatic WG curation activities, including four new features in CIViC: (1) introduction of an assertions feature for clinical variant assessment following the Association of Molecular Pathologists (AMP) guidelines, (2) group-level curation tracking for organizations, enabling member transparency, and curation effort summaries, (3) introduction of ClinGen Allele Registry IDs to CIViC, and (4) mapping of CIViC assertions into ClinVar submission with automated submissions. A generalizable workflow utilizing MVLD and new CIViC features is outlined for use by ClinGen Somatic WG task teams for curation and submission to ClinVar, and provides a model for promoting harmonization of cancer variant representation and efficient distribution of this information.
Assuntos
Genoma Humano/genética , Neoplasias/genética , Bases de Dados Genéticas , Testes Genéticos , Variação Genética/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SoftwareRESUMO
The Drug-Gene Interaction Database (DGIdb, www.dgidb.org) is a web resource that consolidates disparate data sources describing drug-gene interactions and gene druggability. It provides an intuitive graphical user interface and a documented application programming interface (API) for querying these data. DGIdb was assembled through an extensive manual curation effort, reflecting the combined information of twenty-seven sources. For DGIdb 2.0, substantial updates have been made to increase content and improve its usefulness as a resource for mining clinically actionable drug targets. Specifically, nine new sources of drug-gene interactions have been added, including seven resources specifically focused on interactions linked to clinical trials. These additions have more than doubled the overall count of drug-gene interactions. The total number of druggable gene claims has also increased by 30%. Importantly, a majority of the unrestricted, publicly-accessible sources used in DGIdb are now automatically updated on a weekly basis, providing the most current information for these sources. Finally, a new web view and API have been developed to allow searching for interactions by drug identifiers to complement existing gene-based search functionality. With these updates, DGIdb represents a comprehensive and user friendly tool for mining the druggable genome for precision medicine hypothesis generation.
Assuntos
Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas , Genes/efeitos dos fármacos , Mineração de Dados , LigantesRESUMO
UNLABELLED: Visualizing and summarizing data from genomic studies continues to be a challenge. Here, we introduce the GenVisR package to addresses this challenge by providing highly customizable, publication-quality graphics focused on cohort level genome analyses. GenVisR provides a rapid and easy-to-use suite of genomic visualization tools, while maintaining a high degree of flexibility by leveraging the abilities of ggplot2 and Bioconductor. AVAILABILITY AND IMPLEMENTATION: GenVisR is an R package available via Bioconductor (https://bioconductor.org/packages/GenVisR) under GPLv3. Support is available via GitHub (https://github.com/griffithlab/GenVisR/issues) and the Bioconductor support website. CONTACTS: obigriffith@wustl.edu or mgriffit@wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica , Software , GenomaRESUMO
Suicidal behavior imposes a tremendous cost, with current US estimates reporting approximately 1.3 million suicide attempts and more than 40,000 suicide deaths each year. Several recent research efforts have identified an association between suicidal behavior and the expression level of the spermidine/spermine N1-acetyltransferase 1 (SAT1) gene. To date, several SAT1 genetic variants have been inconsistently associated with altered gene expression and/or directly with suicidal behavior. To clarify the role SAT1 genetic variation plays in suicidal behavior risk, we present a whole-gene sequencing effort of SAT1 in 476 bipolar disorder subjects with a history of suicide attempt and 473 subjects with bipolar disorder but no suicide attempts. Agilent SureSelect target enrichment was used to sequence all exons, introns, promoter regions, and putative regulatory regions identified from the ENCODE project within 10 kb of SAT1. Individual variant, haplotype, and collapsing variant tests were performed. Our results identified no variant or assessed region of SAT1 that showed a significant association with attempted suicide, nor did any assessment show evidence for replication of previously reported associations. Overall, no evidence for SAT1 sequence variation contributing to the risk for attempted suicide could be identified. It is possible that past associations of SAT1 expression with suicidal behavior arise from variation not captured in this study, or that causal variants in the region are too rare to be detected within our sample. Larger sample sizes and broader sequencing efforts will likely be required to identify the source of SAT1 expression level associations with suicidal behavior. © 2016 Wiley Periodicals, Inc.
Assuntos
Acetiltransferases/genética , Tentativa de Suicídio/psicologia , Acetiltransferases/metabolismo , Acetiltransferases/fisiologia , Adulto , Transtorno Bipolar/genética , Feminino , Regulação da Expressão Gênica , Predisposição Genética para Doença , Variação Genética/genética , Haplótipos/genética , Humanos , Masculino , Fatores de Risco , Análise de Sequência de DNA , Ideação Suicida , Suicídio/psicologiaRESUMO
Mutations in ABCA4 cause Stargardt disease and other blinding autosomal recessive retinal disorders. However, sequencing of the complete coding sequence in patients with clinical features of Stargardt disease sometimes fails to detect one or both mutations. For example, among 208 individuals with clear clinical evidence of ABCA4 disease ascertained at a single institution, 28 had only one disease-causing allele identified in the exons and splice junctions of the primary retinal transcript of the gene. Haplotype analysis of these 28 probands revealed 3 haplotypes shared among ten families, suggesting that 18 of the 28 missing alleles were rare enough to be present only once in the cohort. We hypothesized that mutations near rare alternate splice junctions in ABCA4 might cause disease by increasing the probability of mis-splicing at these sites. Next-generation sequencing of RNA extracted from human donor eyes revealed more than a dozen alternate exons that are occasionally incorporated into the ABCA4 transcript in normal human retina. We sequenced the genomic DNA containing 15 of these minor exons in the 28 one-allele subjects and observed five instances of two different variations in the splice signals of exon 36.1 that were not present in normal individuals (P < 10(-6)). Analysis of RNA obtained from the keratinocytes of patients with these mutations revealed the predicted alternate transcript. This study illustrates the utility of RNA sequence analysis of human donor tissue and patient-derived cell lines to identify mutations that would be undetectable by exome sequencing.
Assuntos
Transportadores de Cassetes de Ligação de ATP/genética , Processamento Alternativo/genética , Retina/patologia , Adulto , Idoso de 80 Anos ou mais , Alelos , Exoma/genética , Éxons/genética , Feminino , Haplótipos , Humanos , Degeneração Macular/genética , Degeneração Macular/fisiopatologia , Masculino , Mutação , Linhagem , Sítios de Splice de RNA/genética , Doença de StargardtRESUMO
Proper spatial differentiation of retinal cell types is necessary for normal human vision. Many retinal diseases, such as Best disease and male germ cell associated kinase (MAK)-associated retinitis pigmentosa, preferentially affect distinct topographic regions of the retina. While much is known about the distribution of cell types in the retina, the distribution of molecular components across the posterior pole of the eye has not been well-studied. To investigate regional difference in molecular composition of ocular tissues, we assessed differential gene expression across the temporal, macular, and nasal retina and retinal pigment epithelium (RPE)/choroid of human eyes using RNA-Seq. RNA from temporal, macular, and nasal retina and RPE/choroid from four human donor eyes was extracted, poly-A selected, fragmented, and sequenced as 100 bp read pairs. Digital read files were mapped to the human genome and analyzed for differential expression using the Tuxedo software suite. Retina and RPE/choroid samples were clearly distinguishable at the transcriptome level. Numerous transcription factors were differentially expressed between regions of the retina and RPE/choroid. Photoreceptor-specific genes were enriched in the peripheral samples, while ganglion cell and amacrine cell genes were enriched in the macula. Within the RPE/choroid, RPE-specific genes were upregulated at the periphery while endothelium associated genes were upregulated in the macula. Consistent with previous studies, BEST1 expression was lower in macular than extramacular regions. The MAK gene was expressed at lower levels in macula than in extramacular regions, but did not exhibit a significant difference between nasal and temporal retina. The regional molecular distinction is greatest between macula and periphery and decreases between different peripheral regions within a tissue. Datasets such as these can be used to prioritize candidate genes for possible involvement in retinal diseases with regional phenotypes.
Assuntos
Perfilação da Expressão Gênica , Macula Lutea/metabolismo , Epitélio Pigmentado Ocular/metabolismo , RNA Mensageiro/genética , Doenças Retinianas/genética , Idoso , Idoso de 80 Anos ou mais , Corioide , Feminino , Humanos , Macula Lutea/patologia , Masculino , Epitélio Pigmentado Ocular/patologia , Doenças Retinianas/metabolismo , Doenças Retinianas/patologiaRESUMO
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
Assuntos
Metadados , Projetos de Pesquisa , Reprodutibilidade dos TestesRESUMO
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.
RESUMO
The discovery of novel disease-associated variations in genes is often a daunting task in highly heterogeneous disease classes. We seek a generalizable algorithm that integrates multiple publicly available genomic data sources in a machine-learning model for the prioritization of candidates identified in patients with retinal disease. To approach this problem, we generate a set of feature vectors from publicly available microarray, RNA-seq, and ChIP-seq datasets of biological relevance to retinal disease, to observe patterns in gene expression specificity among tissues of the body and the eye, in addition to photoreceptor-specific signals by the CRX transcription factor. Using these features, we describe a novel algorithm, positive and unlabeled learning for prioritization (PULP). This article compares several popular supervised learning techniques as the regression function for PULP. The results demonstrate a highly significant enrichment for previously characterized disease genes using a logistic regression method. Finally, a comparison of PULP with the popular gene prioritization tool ENDEAVOUR shows superior prioritization of retinal disease genes from previous studies. The java source code, compiled binary, assembled feature vectors, and instructions are available online at https://github.com/ahwagner/PULP.