Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 587
Filtrar
1.
Genet Med ; : 101292, 2024 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-39396132

RESUMEN

PURPOSE: Clinical intuition is commonly incorporated into the differential diagnosis as an assessment of the likelihood of candidate diagnoses based either on the patient population being seen in a specific clinic or on the signs and symptoms of the initial presentation. Algorithms to support diagnostic sequencing in individuals with a suspected rare genetic disease do not yet incorporate intuition and instead assume that each Mendelian disease has an equal pretest probability. METHODS: The LIRICAL algorithm calculates the likelihood ratio of clinical manifestations represented by Human Phenotype Ontology (HPO) terms to rank candidate diagnoses. The initial version of LIRICAL assumed an equal pretest probability for each disease in its calculation of the posttest probability (where the test is diagnostic exome or genome sequencing). We introduce Clinical Intuition for Likelihood Ratios (ClintLR), an extension of the LIRICAL algorithm that boosts the pretest probability of groups of related diseases deemed to be more likely. RESULTS: The average rank of the correct diagnosis in simulations using ClintLR showed a statistically significant improvement over a range of adjustment factors. CONCLUSION: ClintLR successfully encodes clinical intuition to improve ranking of rare diseases in diagnostic sequencing. ClintLR is freely available at https://github.com/TheJacksonLaboratory/ClintLR.

2.
HGG Adv ; 6(1): 100371, 2024 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-39394689

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

3.
Orphanet J Rare Dis ; 19(1): 357, 2024 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-39334316

RESUMEN

Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.


Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Genómica , Pruebas Genéticas/métodos
4.
medRxiv ; 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39228707

RESUMEN

Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10,000 rare diseases. We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration for rare diseases. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process 4,918 unique medical abstracts and identified annotations for 21 rare genetic diseases, we extracted 18,631 candidate disease-treatment curations, 538 of which were confirmed and transferred to the MAxO annotation dataset. The results of this project underscore the potential of generative AI to accelerate precision medicine by enabling a robust and comprehensive curation of the primary literature to represent information about diseases and procedures in a structured fashion. Although we focused on MAxO in this project, similar approaches could be taken for other biomedical curation tasks.

5.
bioRxiv ; 2024 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-39345458

RESUMEN

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.

6.
Orphanet J Rare Dis ; 19(1): 334, 2024 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-39261914

RESUMEN

Improving health and social equity for persons living with a rare disease (PLWRD) is increasingly recognized as a global policy priority. However, there is currently no international alignment on how to define and describe rare diseases. A global reference is needed to establish a mutual understanding to inform a wide range of stakeholders for actions. A multi-stakeholder, global panel of rare disease experts, came together and developed an Operational Description of Rare Diseases. This reference describes which diseases are considered rare, how many persons are affected and why the rare disease population demands specific attention. The operational description of rare diseases is framed in two parts: a core definition of rare diseases, complemented by a descriptive framework of rare diseases. The core definition includes parameters that permit the identification of which diseases are considered rare, and how many persons are affected. The descriptive framework elaborates on the impact and burden of rare diseases on patients, their caregivers and families, healthcare systems, and society overall. The Operational Description of Rare Diseases establishes a common point of reference for decision-makers across the world who strive to understand and address the unmet needs of persons living with a rare disease. Adoption of this reference is essential to improving the visibility of rare conditions in health systems across the world. Greater recognition of the burden of rare diseases will motivate new actions and policies to address the unmet needs of the rare disease community.


Asunto(s)
Enfermedades Raras , Enfermedades Raras/diagnóstico , Humanos
7.
medRxiv ; 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39108510

RESUMEN

Large language models (LLM) have shown great promise in supporting differential diagnosis, but 23 available published studies on the diagnostic accuracy evaluated small cohorts (number of cases, 30-422, mean 104) and have evaluated LLM responses subjectively by manual curation (23/23 studies). The performance of LLMs for rare disease diagnosis has not been evaluated systematically. Here, we perform a rigorous and large-scale analysis of the performance of a GPT-4 in prioritizing candidate diagnoses, using the largest-ever cohort of rare disease patients. Our computational study used 5267 computational case reports from previously published data. Each case was formatted as a Global Alliance for Genomics and Health (GA4GH) phenopacket, in which clinical anomalies were represented as Human Phenotype Ontology (HPO) terms. We developed software to generate prompts from each phenopacket. Prompts were sent to Generative Pre-trained Transformer 4 (GPT-4), and the rank of the correct diagnosis, if present in the response, was recorded. The mean reciprocal rank of the correct diagnosis was 0.24 (with the reciprocal of the MRR corresponding to a rank of 4.2), and the correct diagnosis was placed in rank 1 in 19.2% of the cases, in the first 3 ranks in 28.6%, and in the first 10 ranks in 32.5%. Our study is the largest to be reported to date and provides a realistic estimate of the performance of GPT-4 in rare disease medicine.

8.
Sci Data ; 11(1): 906, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39174566

RESUMEN

The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. To develop RNA-KG, we first identified, pre-processed, and characterized each data source; next, we built a meta-graph that provides an ontological description of the KG by representing all the bio-molecular entities and medical concepts of interest in this domain, as well as the types of interactions connecting them. Finally, we leveraged an instance-based semantically abstracted knowledge model to specify the ontological alignment according to which RNA-KG was generated. RNA-KG can be downloaded in different formats and also queried by a SPARQL endpoint. A thorough topological analysis of the resulting heterogeneous graph provides further insights into the characteristics of the "RNA world". RNA-KG can be both directly explored and visualized, and/or analyzed by applying computational methods to infer bio-medical knowledge from its heterogeneous nodes and edges. The resource can be easily updated with new experimental data, and specific views of the overall KG can be extracted according to the bio-medical problem to be studied.


Asunto(s)
ARN , ARN/genética , Humanos , Ontologías Biológicas
9.
Cancer Res ; 84(13): 2060-2072, 2024 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-39082680

RESUMEN

Patient-derived xenografts (PDX) model human intra- and intertumoral heterogeneity in the context of the intact tissue of immunocompromised mice. Histologic imaging via hematoxylin and eosin (H&E) staining is routinely performed on PDX samples, which could be harnessed for computational analysis. Prior studies of large clinical H&E image repositories have shown that deep learning analysis can identify intercellular and morphologic signals correlated with disease phenotype and therapeutic response. In this study, we developed an extensive, pan-cancer repository of >1,000 PDX and paired parental tumor H&E images. These images, curated from the PDX Development and Trial Centers Research Network Consortium, had a range of associated genomic and transcriptomic data, clinical metadata, pathologic assessments of cell composition, and, in several cases, detailed pathologic annotations of neoplastic, stromal, and necrotic regions. The amenability of these images to deep learning was highlighted through three applications: (i) development of a classifier for neoplastic, stromal, and necrotic regions; (ii) development of a predictor of xenograft-transplant lymphoproliferative disorder; and (iii) application of a published predictor of microsatellite instability. Together, this PDX Development and Trial Centers Research Network image repository provides a valuable resource for controlled digital pathology analysis, both for the evaluation of technical issues and for the development of computational image-based methods that make clinical predictions based on PDX treatment studies. Significance: A pan-cancer repository of >1,000 patient-derived xenograft hematoxylin and eosin-stained images will facilitate cancer biology investigations through histopathologic analysis and contributes important model system data that expand existing human histology repositories.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Animales , Ratones , Neoplasias/genética , Neoplasias/patología , Neoplasias/diagnóstico por imagen , Genómica/métodos , Xenoinjertos , Ensayos Antitumor por Modelo de Xenoinjerto , Trastornos Linfoproliferativos/genética , Trastornos Linfoproliferativos/patología , Procesamiento de Imagen Asistido por Computador/métodos
10.
Front Cell Dev Biol ; 12: 1240384, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38989060

RESUMEN

Cell level functions underlie tissue and organ physiology. Gene expression patterns offer extensive views of the pathways and processes within and between cells. Single cell transcriptomics provides detailed information on gene expression within cells, cell types, subtypes and their relative proportions in organs. Functional pathways can be scalably connected to physiological functions at the cell and organ levels. Integrating experimentally obtained gene expression patterns with prior knowledge of pathway interactions enables identification of networks underlying whole cell functions such as growth, contractility, and secretion. These pathways can be computationally modeled using differential equations to simulate cell and organ physiological dynamics regulated by gene expression changes. Such computational systems can be thought of as parts of digital twins of organs. Digital twins, at the core, need computational models that represent in detail and simulate how dynamics of pathways and networks give rise to whole cell level physiological functions. Integration of transcriptomic responses and numerical simulations could simulate and predict whole cell functional outputs from transcriptomic data. We developed a computational pipeline that integrates gene expression timelines and systems of coupled differential equations to generate cell-type selective dynamical models. We tested our integrative algorithm on the eicosanoid biosynthesis network in macrophages. Converting transcriptomic changes to a dynamical model allowed us to predict dynamics of prostaglandin and thromboxane synthesis and secretion by macrophages that matched published lipidomics data obtained in the same experiments. Integration of cell-level system biology simulations with genomic and clinical data using a knowledge graph framework will allow us to create explicit predictive models that mechanistically link genomic determinants to organ function. Such integration requires a multi-domain ontological framework to connect genomic determinants to gene expression and cell pathways and functions to organ level phenotypes in healthy and diseased states. These integrated scalable models of tissues and organs as accurate digital twins predict health and disease states for precision medicine.

11.
Cureus ; 16(6): e61601, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38962621

RESUMEN

Longitudinally extensive transverse myelitis (LETM) is traditionally classified as an inflammatory disorder of the spinal cord spanning three or more vertebral segments. The differential diagnosis for TM is vast and can include infectious, nutritional, and can even be idiopathic in some reported cases. However, autoimmune etiologies such as systemic lupus erythematosus (SLE) can rarely present with neurological manifestations such as LETM. In this case report, we present a 33-year-old female with a prior history of SLE who developed an LETM in the setting of possible provoking factors such as nutritional deficiencies and a recent viral illness. In this case report, we highlight her clinical course, recovery, and working differential diagnosis after laboratory testing and neurological imaging. Finally, we discuss the different treatments that ultimately lead to her successful recovery after her prolonged clinical course.

12.
Animals (Basel) ; 14(13)2024 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-38998017

RESUMEN

Eighty-four autumn (ACS, n = 45)- and spring (SCS, n = 39)-calved multiparous early lactation Holstein cows were assigned to groups of either: (a) grazing + mixed ration (MR) during partial confinement in outdoor soil-bedded pens with shade (OD-GRZ); (b) grazing + MR during partial confinement in a compost-bedded pack barn with cooling (CB-GRZ); or (c) total confinement fed a totally mixed ration (CB-TMR) in a compost-bedded pack barn. Data were analyzed using the SAS MIXED procedure with significance at p ≤ 0.05. In both seasons, despite behavioral differences (p < 0.05) between the OD-GRZ and CB-GRZ groups (i.e., standing, first grazing meal length, bite rate), the milk and component yields, DM intake, microbial CP output (MCP) and NE efficiency were unaffected by the housing conditions, possibly due to mild weather conditions. The milk yield was substantially higher in the CB-TMR group versus the OD-TMR and CB-TMR groups (p < 0.01) in both ACS (~35%) and SCS (~20%) despite there being no intake differences, without any impact on milk component levels. In ACS, this was associated with a higher MCP, likely due to the higher nutritional value of TMR compared to pasture, which was not the case in SCS. In conclusion, the OD-GRZ group achieved the same milk production as the CB-GRZ group through behavior adaptation, under mild weather conditions, in both calving seasons. The CB-TMR group outperformed the grazing systems in both calving seasons, regardless of the MCP.

13.
bioRxiv ; 2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-39005436

RESUMEN

Objectives: Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. Materials and methods: We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set. Results: We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. Discussion and Conclusion: This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

15.
bioRxiv ; 2024 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-38915571

RESUMEN

Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools. Results: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. Conclusions: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.

16.
Transl Psychiatry ; 14(1): 246, 2024 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-38851761

RESUMEN

Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19. A retrospective electronic health record (EHR) cohort study of 2,391,006 individuals with acute COVID-19 was performed to evaluate whether non-psychiatric PASC-AMs are associated with new-onset psychiatric disease. Data were obtained from the National COVID Cohort Collaborative (N3C), which has EHR data from 76 clinical organizations. EHR codes were mapped to 151 non-psychiatric PASC-AMs recorded 28-120 days following SARS-CoV-2 diagnosis and before diagnosis of new-onset psychiatric disease. Association of newly diagnosed psychiatric disease with age, sex, race, pre-existing comorbidities, and PASC-AMs in seven categories was assessed by logistic regression. There were significant associations between a diagnosis of any psychiatric disease and five categories of PASC-AMs with odds ratios highest for neurological, cardiovascular, and constitutional PASC-AMs with odds ratios of 1.31, 1.29, and 1.23 respectively. Secondary analysis revealed that the proportions of 50 individual clinical features significantly differed between patients diagnosed with different psychiatric diseases. Our study provides evidence for association between non-psychiatric PASC-AMs and the incidence of newly diagnosed psychiatric disease. Significant associations were found for features related to multiple organ systems. This information could prove useful in understanding risk stratification for new-onset psychiatric disease following COVID-19. Prospective studies are needed to corroborate these findings.


Asunto(s)
COVID-19 , Trastornos Mentales , SARS-CoV-2 , Humanos , COVID-19/psicología , COVID-19/complicaciones , COVID-19/epidemiología , Masculino , Femenino , Trastornos Mentales/epidemiología , Persona de Mediana Edad , Adulto , Estudios Retrospectivos , Anciano , Fenotipo , Síndrome Post Agudo de COVID-19 , Comorbilidad , Registros Electrónicos de Salud , Adulto Joven , Factores de Riesgo , Adolescente
17.
medRxiv ; 2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38854034

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

18.
Bioinformatics ; 40(7)2024 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-38913850

RESUMEN

MOTIVATION: Human Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS: We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically equivalent tokens-to address lexical variability and a more effective CR step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10 000 publication abstracts in 5 s. AVAILABILITY AND IMPLEMENTATION: FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.


Asunto(s)
Ontologías Biológicas , Fenotipo , Humanos , Procesamiento de Lenguaje Natural , Programas Informáticos , Algoritmos
20.
Front Robot AI ; 11: 1362735, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38694882

RESUMEN

We introduce a novel approach to training data augmentation in brain-computer interfaces (BCIs) using neural field theory (NFT) applied to EEG data from motor imagery tasks. BCIs often suffer from limited accuracy due to a limited amount of training data. To address this, we leveraged a corticothalamic NFT model to generate artificial EEG time series as supplemental training data. We employed the BCI competition IV '2a' dataset to evaluate this augmentation technique. For each individual, we fitted the model to common spatial patterns of each motor imagery class, jittered the fitted parameters, and generated time series for data augmentation. Our method led to significant accuracy improvements of over 2% in classifying the "total power" feature, but not in the case of the "Higuchi fractal dimension" feature. This suggests that the fit NFT model may more favorably represent one feature than the other. These findings pave the way for further exploration of NFT-based data augmentation, highlighting the benefits of biophysically accurate artificial data.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...