Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38000386

RESUMO

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.


Assuntos
Bases de Dados Factuais , Doença , Genes , Fenótipo , Humanos , Internet , Bases de Dados Factuais/normas , Software , Genes/genética , Doença/genética
2.
Hum Genomics ; 18(1): 44, 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38685113

RESUMO

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.


Assuntos
Doenças Raras , Humanos , Doenças Raras/genética , Doenças Raras/diagnóstico , Genoma Humano/genética , Variação Genética/genética , Biologia Computacional/métodos , Fenótipo
3.
Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-34289339

RESUMO

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.


Assuntos
Algoritmos , Curadoria de Dados/métodos , Doenças Genéticas Inatas/genética , Sítios de Splice de RNA , Splicing de RNA , Software , Sequência de Bases , Biologia Computacional/métodos , Exoma , Éxons , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Mutação , Sequenciamento do Exoma
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35595299

RESUMO

Yuan et al. recently described an independent evaluation of several phenotype-driven gene prioritization methods for Mendelian disease on two separate, clinical datasets. Although they attempted to use default settings for each tool, we describe three key differences from those we currently recommend for our Exomiser and PhenIX tools. These influence how variant frequency, quality and predicted pathogenicity are used for filtering and prioritization. We propose that these differences account for much of the discrepancy in performance between that reported by them (15-26% diagnoses ranked top by Exomiser) and previously published reports by us and others (72-77%). On a set of 161 singleton samples, we show using these settings increases performance from 34% to 72% and suggest a reassessment of Exomiser and PhenIX on their datasets using these would show a similar uplift.


Assuntos
Doenças Genéticas Inatas , Fenótipo , Biologia Computacional , Humanos
5.
Prenat Diagn ; 44(4): 454-464, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38242839

RESUMO

Advances in sequencing and imaging technologies enable enhanced assessment in the prenatal space, with a goal to diagnose and predict the natural history of disease, to direct targeted therapies, and to implement clinical management, including transfer of care, election of supportive care, and selection of surgical interventions. The current lack of standardization and aggregation stymies variant interpretation and gene discovery, which hinders the provision of prenatal precision medicine, leaving clinicians and patients without an accurate diagnosis. With large amounts of data generated, it is imperative to establish standards for data collection, processing, and aggregation. Aggregated and homogeneously processed genetic and phenotypic data permits dissection of the genomic architecture of prenatal presentations of disease and provides a dataset on which data analysis algorithms can be tuned to the prenatal space. Here we discuss the importance of generating aggregate data sets and how the prenatal space is driving the development of interoperable standards and phenotype-driven tools.


Assuntos
Medicina de Precisão , Diagnóstico Pré-Natal , Gravidez , Feminino , Humanos , Fenótipo , Genômica , Algoritmos
6.
Am J Hum Genet ; 107(3): 403-417, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32755546

RESUMO

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Doenças Raras/diagnóstico , Algoritmos , Exoma/genética , Humanos , Fenótipo , Doenças Raras/genética , Software
7.
Hum Mutat ; 43(8): 1071-1081, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35391505

RESUMO

Rare disease diagnostics and disease gene discovery have been revolutionized by whole-exome and genome sequencing but identifying the causative variant(s) from the millions in each individual remains challenging. The use of deep phenotyping of patients and reference genotype-phenotype knowledge, alongside variant data such as allele frequency, segregation, and predicted pathogenicity, has proved an effective strategy to tackle this issue. Here we review the numerous tools that have been developed to automate this approach and demonstrate the power of such an approach on several thousand diagnosed cases from the 100,000 Genomes Project. Finally, we discuss the challenges that need to be overcome if we are going to improve detection rates and help the majority of patients that still remain without a molecular diagnosis after state-of-the-art genomic interpretation.


Assuntos
Exoma , Doenças Raras , Exoma/genética , Genômica , Humanos , Fenótipo , Doenças Raras/diagnóstico , Doenças Raras/genética , Sequenciamento do Exoma
8.
Hum Mutat ; 43(6): 717-733, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35178824

RESUMO

Rare disease patients are more likely to receive a rapid molecular diagnosis nowadays thanks to the wide adoption of next-generation sequencing. However, many cases remain undiagnosed even after exome or genome analysis, because the methods used missed the molecular cause in a known gene, or a novel causative gene could not be identified and/or confirmed. To address these challenges, the RD-Connect Genome-Phenome Analysis Platform (GPAP) facilitates the collation, discovery, sharing, and analysis of standardized genome-phenome data within a collaborative environment. Authorized clinicians and researchers submit pseudonymised phenotypic profiles encoded using the Human Phenotype Ontology, and raw genomic data which is processed through a standardized pipeline. After an optional embargo period, the data are shared with other platform users, with the objective that similar cases in the system and queries from peers may help diagnose the case. Additionally, the platform enables bidirectional discovery of similar cases in other databases from the Matchmaker Exchange network. To facilitate genome-phenome analysis and interpretation by clinical researchers, the RD-Connect GPAP provides a powerful user-friendly interface and leverages tens of information sources. As a result, the resource has already helped diagnose hundreds of rare disease patients and discover new disease causing genes.


Assuntos
Genômica , Doenças Raras , Exoma , Estudos de Associação Genética , Genômica/métodos , Humanos , Fenótipo , Doenças Raras/diagnóstico , Doenças Raras/genética
9.
Genet Med ; 24(7): 1512-1522, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35442193

RESUMO

PURPOSE: Genomic test results, regardless of laboratory variant classification, require clinical practitioners to judge the applicability of a variant for medical decisions. Teaching and standardizing clinical interpretation of genomic variation calls for a methodology or tool. METHODS: To generate such a tool, we distilled the Clinical Genome Resource framework of causality and the American College of Medical Genetics/Association of Molecular Pathology and Quest Diagnostic Laboratory scoring of variant deleteriousness into the Clinical Variant Analysis Tool (CVAT). Applying this to 289 clinical exome reports, we compared the performance of junior practitioners with that of experienced medical geneticists and assessed the utility of reported variants. RESULTS: CVAT enabled performance comparable to that of experienced medical geneticists. In total, 124 of 289 (42.9%) exome reports and 146 of 382 (38.2%) reported variants supported a diagnosis. Overall, 10.5% (1 pathogenic [P] or likely pathogenic [LP] variant and 39 variants of uncertain significance [VUS]) of variants were reported in genes without established disease association; 20.2% (23 P/LP and 54 VUS) were in genes without sufficient phenotypic concordance; 7.3% (15 P/LP and 13 VUS) conflicted with the known molecular disease mechanism; and 24% (91 VUS) had insufficient evidence for deleteriousness. CONCLUSION: Implementation of CVAT standardized clinical interpretation of genomic variation and emphasized the need for collaborative and transparent reporting of genomic variation.


Assuntos
Testes Genéticos , Variação Genética , Exoma , Testes Genéticos/métodos , Variação Genética/genética , Genômica/métodos , Humanos , Sequenciamento do Exoma
10.
Nucleic Acids Res ; 48(D1): D704-D715, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31701156

RESUMO

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.


Assuntos
Biologia Computacional/métodos , Genótipo , Fenótipo , Algoritmos , Animais , Ontologias Biológicas , Bases de Dados Genéticas , Exoma , Estudos de Associação Genética , Variação Genética , Genômica , Humanos , Internet , Software , Pesquisa Translacional Biomédica , Interface Usuário-Computador
11.
Nucleic Acids Res ; 47(D1): D1018-D1027, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30476213

RESUMO

The Human Phenotype Ontology (HPO)-a standardized vocabulary of phenotypic abnormalities associated with 7000+ diseases-is used by thousands of researchers, clinicians, informaticians and electronic health record systems around the world. Its detailed descriptions of clinical abnormalities and computable disease definitions have made HPO the de facto standard for deep phenotyping in the field of rare disease. The HPO's interoperability with other ontologies has enabled it to be used to improve diagnostic accuracy by incorporating model organism data. It also plays a key role in the popular Exomiser tool, which identifies potential disease-causing variants from whole-exome or whole-genome sequencing data. Since the HPO was first introduced in 2008, its users have become both more numerous and more diverse. To meet these emerging needs, the project has added new content, language translations, mappings and computational tooling, as well as integrations with external community data. The HPO continues to collaborate with clinical adopters to improve specific areas of the ontology and extend standardized disease descriptions. The newly redesigned HPO website (www.human-phenotype-ontology.org) simplifies browsing terms and exploring clinical features, diseases, and human genes.


Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Anormalidades Congênitas/genética , Predisposição Genética para Doença/genética , Bases de Conhecimento , Doenças Raras/genética , Anormalidades Congênitas/diagnóstico , Bases de Dados Genéticas , Variação Genética , Humanos , Internet , Fenótipo , Doenças Raras/diagnóstico , Sequenciamento Completo do Genoma/métodos
13.
Am J Hum Genet ; 99(3): 595-606, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27569544

RESUMO

The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.


Assuntos
Algoritmos , Doenças Genéticas Inatas/genética , Genoma Humano/genética , Mutação/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Aprendizado de Máquina , Fases de Leitura Aberta/genética , Fenótipo , Mutação Puntual/genética
14.
Nucleic Acids Res ; 45(D1): D712-D722, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899636

RESUMO

The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype-phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.


Assuntos
Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Genótipo , Fenótipo , Animais , Evolução Biológica , Biologia Computacional/métodos , Curadoria de Dados , Humanos , Ferramenta de Busca , Software , Especificidade da Espécie , Interface Usuário-Computador , Navegador
15.
Hum Mutat ; 39(12): 1827-1834, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30240502

RESUMO

Rare disease investigators constantly face challenges in identifying additional cases to build evidence for gene-disease causality. The Matchmaker Exchange (MME) addresses this limitation by providing a mechanism for matching patients across genomic centers via a federated network. The MME has revolutionized searching for additional cases by making it possible to query across institutional boundaries, so that what was once a laborious and manual process of contacting researchers is now automated and computable. However, while the MME network is beginning to scale, the growth of additional nodes is limited by the lack of easy-to-use solutions that can be implemented by any rare disease database owner, even one without significant software engineering resources. Here, we describe matchbox, which is an open-source, platform-independent, portable bridge between any given rare disease genomic center and the MME network, which has already led to novel gene discoveries. We also describe how matchbox greatly reduces the barrier to participation by overcoming challenges for new databases to join the MME.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Seleção de Pacientes , Doenças Raras/genética , Acesso à Informação , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Disseminação de Informação/métodos , Fenótipo , Software , Navegador
16.
Bioinformatics ; 33(15): 2421-2423, 2017 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-28334266

RESUMO

SUMMARY: Phenopolis is an open-source web server providing an intuitive interface to genetic and phenotypic databases. It integrates analysis tools such as variant filtering and gene prioritization based on phenotype. The Phenopolis platform will accelerate clinical diagnosis, gene discovery and encourage wider adoption of the Human Phenotype Ontology in the study of rare genetic diseases. AVAILABILITY AND IMPLEMENTATION: A demo of the website is available at https://phenopolis.github.io . If you wish to install a local copy, source code and installation instruction are available at https://github.com/phenopolis . The software is implemented using Python, MongoDB, HTML/Javascript and various bash shell scripts. CONTACT: n.pontikos@ucl.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Doenças Genéticas Inatas/genética , Fenótipo , Software , Bases de Dados Factuais , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Humanos , Doenças Raras/diagnóstico , Doenças Raras/genética , Doenças Raras/patologia
17.
Nucleic Acids Res ; 42(Database issue): D485-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24319146

RESUMO

Understanding which are the catalytic residues in an enzyme and what function they perform is crucial to many biology studies, particularly those leading to new therapeutics and enzyme design. The original version of the Catalytic Site Atlas (CSA) (http://www.ebi.ac.uk/thornton-srv/databases/CSA) published in 2004, which catalogs the residues involved in enzyme catalysis in experimentally determined protein structures, had only 177 curated entries and employed a simplistic approach to expanding these annotations to homologous enzyme structures. Here we present a new version of the CSA (CSA 2.0), which greatly expands the number of both curated (968) and automatically annotated catalytic sites in enzyme structures, utilizing a new method for annotation transfer. The curated entries are used, along with the variation in residue type from the sequence comparison, to generate 3D templates of the catalytic sites, which in turn can be used to find catalytic sites in new structures. To ease the transfer of CSA annotations to other resources a new ontology has been developed: the Enzyme Mechanism Ontology, which has permitted the transfer of annotations to Mechanism, Annotation and Classification in Enzymes (MACiE) and UniProt Knowledge Base (UniProtKB) resources. The CSA database schema has been re-designed and both the CSA data and search capabilities are presented in a new modern web interface.


Assuntos
Domínio Catalítico , Bases de Dados de Proteínas , Enzimas/química , Ontologias Biológicas , Internet , Análise de Sequência de Proteína
18.
Mamm Genome ; 26(9-10): 413-21, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26314589

RESUMO

The International Mouse Phenotyping Consortium (IMPC) is providing the world's first functional catalogue of a mammalian genome by characterising a knockout mouse strain for every gene. A robust and highly structured informatics platform has been developed to systematically collate, analyse and disseminate the data produced by the IMPC. As the first phase of the project, in which 5000 new knockout strains are being broadly phenotyped, nears completion, the informatics platform is extending and adapting to support the increasing volume and complexity of the data produced as well as addressing a large volume of users and emerging user groups. An intuitive interface helps researchers explore IMPC data by giving overviews and the ability to find and visualise data that support a phenotype assertion. Dedicated disease pages allow researchers to find new mouse models of human diseases, and novel viewers provide high-resolution images of embryonic and adult dysmorphologies. With each monthly release, the informatics platform will continue to evolve to support the increased data volume and to maintain its position as the primary route of access to IMPC data and as an invaluable resource for clinical and non-clinical researchers.


Assuntos
Biologia Computacional , Genoma , Camundongos Endogâmicos/genética , Camundongos Knockout/genética , Animais , Humanos , Camundongos , Fenótipo
19.
medRxiv ; 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38854034

RESUMO

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

20.
PLoS One ; 18(5): e0285433, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37196000

RESUMO

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.


Assuntos
Neoplasias , Software , Humanos , Genômica , Bases de Dados Factuais , Biblioteca Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA