Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 58
Filtrer
1.
Lancet Glob Health ; 12(7): e1192-e1199, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38876765

RÉSUMÉ

Rare diseases affect over 300 million people worldwide and are gaining recognition as a global health priority. Their inclusion in the UN Sustainable Development Goals, the UN Resolution on Addressing the Challenges of Persons Living with a Rare Disease, and the anticipated WHO Global Network for Rare Diseases and WHO Resolution on Rare Diseases, which is yet to be announced, emphasise their significance. People with rare diseases often face unmet health needs, including access to screening, diagnosis, therapy, and comprehensive health care. These challenges highlight the need for awareness and targeted interventions, including comprehensive education, especially in primary care. The majority of rare disease research, clinical services, and health systems are addressed with specialist care. WHO Member States have committed to focusing on primary health care in both universal health coverage and health-related Sustainable Development Goals. Recognising this opportunity, the International Rare Diseases Research Consortium (IRDiRC) assembled a global, multistakeholder task force to identify key barriers and opportunities for empowering primary health-care providers in addressing rare disease challenges.


Sujet(s)
Santé mondiale , Soins de santé primaires , Maladies rares , Humains , Accessibilité des services de santé , Soins de santé primaires/organisation et administration , Maladies rares/thérapie , Maladies rares/épidémiologie , Organisation mondiale de la santé , Politique de santé
2.
medRxiv ; 2024 May 29.
Article de Anglais | MEDLINE | ID: mdl-38854034

RÉSUMÉ

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

3.
Bioinformatics ; 40(7)2024 Jul 01.
Article de Anglais | MEDLINE | ID: mdl-38913850

RÉSUMÉ

MOTIVATION: Human Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS: We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically equivalent tokens-to address lexical variability and a more effective CR step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10 000 publication abstracts in 5 s. AVAILABILITY AND IMPLEMENTATION: FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.


Sujet(s)
Ontologies biologiques , Phénotype , Humains , Traitement du langage naturel , Logiciel , Algorithmes
4.
BMC Med Inform Decis Mak ; 24(1): 30, 2024 Jan 31.
Article de Anglais | MEDLINE | ID: mdl-38297371

RÉSUMÉ

OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.


Sujet(s)
Savoir , Langage , Humains , Apprentissage machine , Phénotype , Maladies rares
5.
medRxiv ; 2024 Feb 26.
Article de Anglais | MEDLINE | ID: mdl-37503093

RÉSUMÉ

Objective: Large Language Models such as GPT-4 previously have been applied to differential diagnostic challenges based on published case reports. Published case reports have a sophisticated narrative style that is not readily available from typical electronic health records (EHR). Furthermore, even if such a narrative were available in EHRs, privacy requirements would preclude sending it outside the hospital firewall. We therefore tested a method for parsing clinical texts to extract ontology terms and programmatically generating prompts that by design are free of protected health information. Materials and Methods: We investigated different methods to prepare prompts from 75 recently published case reports. We transformed the original narratives by extracting structured terms representing phenotypic abnormalities, comorbidities, treatments, and laboratory tests and creating prompts programmatically. Results: Performance of all of these approaches was modest, with the correct diagnosis ranked first in only 5.3-17.6% of cases. The performance of the prompts created from structured data was substantially worse than that of the original narrative texts, even if additional information was added following manual review of term extraction. Moreover, different versions of GPT-4 demonstrated substantially different performance on this task. Discussion: The sensitivity of the performance to the form of the prompt and the instability of results over two GPT-4 versions represent important current limitations to the use of GPT-4 to support diagnosis in real-life clinical settings. Conclusion: Research is needed to identify the best methods for creating prompts from typically available clinical data to support differential diagnostics.

6.
Front Pediatr ; 11: 1283880, 2023.
Article de Anglais | MEDLINE | ID: mdl-38027298

RÉSUMÉ

The diagnostic odyssey for people living with rare diseases (PLWRD) is often prolonged for myriad reasons including an initial failure to consider rare disease and challenges to systemically and systematically identifying and tracking undiagnosed diseases across the diagnostic journey. This often results in isolation, uncertainty, a delay to targeted treatments and increase in risk of complications with significant consequences for patient and family wellbeing. This article aims to highlight key time points to consider a rare disease diagnosis along with elements to consider in the potential operational classification for undiagnosed rare diseases during the diagnostic odyssey. We discuss the need to create a coding framework that traverses all stages of the diagnostic odyssey for PLWRD along with the potential benefits this will have to PLWRD and the wider community.

7.
Bioinformatics ; 39(12)2023 12 01.
Article de Anglais | MEDLINE | ID: mdl-38001031

RÉSUMÉ

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.


Sujet(s)
Algorithmes , Langage , Humains , Alignement de séquences , Dossiers médicaux électroniques , Publications
8.
Mamm Genome ; 34(3): 379-388, 2023 09.
Article de Anglais | MEDLINE | ID: mdl-37154937

RÉSUMÉ

Experiments in which data are collected by multiple independent resources, including multicentre data, different laboratories within the same centre or with different operators, are challenging in design, data collection and interpretation. Indeed, inconsistent results across the resources are possible. In this paper, we propose a statistical solution for the problem of multi-resource consensus inferences when statistical results from different resources show variation in magnitude, directionality, and significance. Our proposed method allows combining the corrected p-values, effect sizes and the total number of centres into a global consensus score. We apply this method to obtain a consensus score for data collected by the International Mouse Phenotyping Consortium (IMPC) across 11 centres. We show the application of this method to detect sexual dimorphism in haematological data and discuss the suitability of the methodology.


Sujet(s)
Consensus , Souris , Animaux , Collecte de données/méthodes
9.
PLoS One ; 18(5): e0285433, 2023.
Article de Anglais | MEDLINE | ID: mdl-37196000

RÉSUMÉ

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.


Sujet(s)
Tumeurs , Logiciel , Humains , Génomique , Bases de données factuelles , Banque de gènes
10.
Adv Genet (Hoboken) ; 4(1): 2200016, 2023 Mar.
Article de Anglais | MEDLINE | ID: mdl-36910590

RÉSUMÉ

The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.

11.
Nucleic Acids Res ; 51(D1): D1360-D1366, 2023 01 06.
Article de Anglais | MEDLINE | ID: mdl-36399494

RÉSUMÉ

PDCM Finder (www.cancermodels.org) is a cancer research platform that aggregates clinical, genomic and functional data from patient-derived xenografts, organoids and cell lines. It was launched in April 2022 as a successor of the PDX Finder portal, which focused solely on patient-derived xenograft models. Currently the portal has over 6200 models across 13 cancer types, including rare paediatric models (17%) and models from minority ethnic backgrounds (33%), making it the largest free to consumer and open access resource of this kind. The PDCM Finder standardises, harmonises and integrates the complex and diverse data associated with PDCMs for the cancer community and displays over 90 million data points across a variety of data types (clinical metadata, molecular and treatment-based). PDCM data is FAIR and underpins the generation and testing of new hypotheses in cancer mechanisms and personalised medicine development.


Sujet(s)
Tumeurs , Humains , Enfant , Tumeurs/génétique , Tumeurs/thérapie , Organoïdes , Tests d'activité antitumorale sur modèle de xénogreffe
12.
Nucleic Acids Res ; 51(D1): D1038-D1045, 2023 01 06.
Article de Anglais | MEDLINE | ID: mdl-36305825

RÉSUMÉ

The International Mouse Phenotyping Consortium (IMPC; https://www.mousephenotype.org/) web portal makes available curated, integrated and analysed knockout mouse phenotyping data generated by the IMPC project consisting of 85M data points and over 95,000 statistically significant phenotype hits mapped to human diseases. The IMPC portal delivers a substantial reference dataset that supports the enrichment of various domain-specific projects and databases, as well as the wider research and clinical community, where the IMPC genotype-phenotype knowledge contributes to the molecular diagnosis of patients affected by rare disorders. Data from 9,000 mouse lines and 750 000 images provides vital resources enabling the interpretation of the ignorome, and advancing our knowledge on mammalian gene function and the mechanisms underlying phenotypes associated with human diseases. The resource is widely integrated and the lines have been used in over 4,600 publications indicating the value of the data and the materials.


Sujet(s)
Bases de données factuelles , Modèles animaux de maladie humaine , Souris knockout , Animaux , Humains , Souris , Phénotype
13.
Nucleic Acids Res ; 51(D1): D977-D985, 2023 01 06.
Article de Anglais | MEDLINE | ID: mdl-36350656

RÉSUMÉ

The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.


Sujet(s)
Étude d'association pangénomique , Bases de connaissances , Animaux , Humains , Souris , Variations de nombre de copies de segment d'ADN , National Human Genome Research Institute (USA) , Phénotype , Polymorphisme de nucléotide simple , Logiciel , États-Unis
15.
NPJ Genom Med ; 5(1): 54, 2020 Dec 10.
Article de Anglais | MEDLINE | ID: mdl-33303739

RÉSUMÉ

Exome sequencing has enabled molecular diagnoses for rare disease patients but often with initial diagnostic rates of ~25-30%. Here we develop a robust computational pipeline to rank variants for reassessment of unsolved rare disease patients. A comprehensive web-based patient report is generated in which all deleterious variants can be filtered by gene, variant characteristics, OMIM disease and Phenolyzer scores, and all are annotated with an ACMG classification and links to ClinVar. The pipeline ranked 21/34 previously diagnosed variants as top, with 26 in total ranked ≤7th, 3 ranked ≥13th; 5 failed the pipeline filters. Pathogenic/likely pathogenic variants by ACMG criteria were identified for 22/145 unsolved cases, and a previously undefined candidate disease variant for 27/145. This open access pipeline supports the partnership between clinical and research laboratories to improve the diagnosis of unsolved exomes. It provides a flexible framework for iterative developments to further improve diagnosis.

16.
F1000Res ; 9: 136, 2020.
Article de Anglais | MEDLINE | ID: mdl-32308977

RÉSUMÉ

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Sujet(s)
Disciplines des sciences biologiques , Biologie informatique , Toile sémantique , Fouille de données , Métadonnées , Reproductibilité des résultats
18.
J Med Genet ; 57(7): 479-486, 2020 07.
Article de Anglais | MEDLINE | ID: mdl-31980565

RÉSUMÉ

BACKGROUND: This study provides an integrated assessment of the economic and social impacts of genomic sequencing for the detection of monogenic disorders resulting in intellectual disability (ID). METHODS: Multiple knowledge bases were cross-referenced and analysed to compile a reference list of monogenic disorders associated with ID. Multiple literature searches were used to quantify the health and social costs for the care of people with ID. Health and social expenditures and the current cost of whole-exome sequencing and whole-genome sequencing were quantified in relation to the more common causes of ID and their impact on lifespan. RESULTS: On average, individuals with ID incur annual costs in terms of health costs, disability support, lost income and other social costs of US$172 000, accumulating to many millions of dollars over a lifetime. CONCLUSION: The diagnosis of monogenic disorders through genomic testing provides the opportunity to improve the diagnosis and management, and to reduce the costs of ID through informed reproductive decisions, reductions in unproductive diagnostic tests and increasingly targeted therapies.


Sujet(s)
/économie , Génomique/économie , Déficience intellectuelle/économie , Déficience intellectuelle/génétique , Coûts des soins de santé/statistiques et données numériques , Humains , Déficience intellectuelle/diagnostic , Déficience intellectuelle/épidémiologie
19.
Curr Protoc Hum Genet ; 103(1): e92, 2019 09.
Article de Anglais | MEDLINE | ID: mdl-31479590

RÉSUMÉ

The Human Phenotype Ontology (HPO) is a standardized set of phenotypic terms that are organized in a hierarchical fashion. It is a widely used resource for capturing human disease phenotypes for computational analysis to support differential diagnostics. The HPO is frequently used to create a set of terms that accurately describe the observed clinical abnormalities of an individual being evaluated for suspected rare genetic disease. This profile is compared with computational disease profiles in the HPO database with the aim of identifying genetic diseases with comparable phenotypic profiles. The computational analysis can be coupled with the analysis of whole-exome or whole-genome sequencing data through applications such as Exomiser. This article explains how to choose an optimal set of HPO terms for these cases and enter them with software, such as PhenoTips and PatientArchive, and demonstrates how to use Phenomizer and Exomiser to generate a computational differential diagnosis. © 2019 by John Wiley & Sons, Inc.


Sujet(s)
Ontologies biologiques , Biologie informatique , Bases de données génétiques , Maladies génétiques congénitales/diagnostic , Logiciel , Diagnostic différentiel , Exome/génétique , Maladies génétiques congénitales/génétique , Humains , Phénotype , Séquençage du génome entier
20.
Front Genet ; 10: 611, 2019.
Article de Anglais | MEDLINE | ID: mdl-31417602

RÉSUMÉ

The clinical utility of computational phenotyping for both genetic and rare diseases is increasingly appreciated; however, its true potential is yet to be fully realized. Alongside the growing clinical and research availability of sequencing technologies, precise deep and scalable phenotyping is required to serve unmet need in genetic and rare diseases. To improve the lives of individuals affected with rare diseases through deep phenotyping, global big data interrogation is necessary to aid our understanding of disease biology, assist diagnosis, and develop targeted treatment strategies. This includes the application of cutting-edge machine learning methods to image data. As with most digital tools employed in health care, there are ethical and data governance challenges associated with using identifiable personal image data. There are also risks with failing to deliver on the patient benefits of these new technologies, the biggest of which is posed by data siloing. The Minerva Initiative has been designed to enable the public good of deep phenotyping while mitigating these ethical risks. Its open structure, enabling collaboration and data sharing between individuals, clinicians, researchers and private enterprise, is key for delivering precision public health.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...