Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 569
Filter
1.
Transl Psychiatry ; 14(1): 246, 2024 Jun 08.
Article in English | MEDLINE | ID: mdl-38851761

ABSTRACT

Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19. A retrospective electronic health record (EHR) cohort study of 2,391,006 individuals with acute COVID-19 was performed to evaluate whether non-psychiatric PASC-AMs are associated with new-onset psychiatric disease. Data were obtained from the National COVID Cohort Collaborative (N3C), which has EHR data from 76 clinical organizations. EHR codes were mapped to 151 non-psychiatric PASC-AMs recorded 28-120 days following SARS-CoV-2 diagnosis and before diagnosis of new-onset psychiatric disease. Association of newly diagnosed psychiatric disease with age, sex, race, pre-existing comorbidities, and PASC-AMs in seven categories was assessed by logistic regression. There were significant associations between a diagnosis of any psychiatric disease and five categories of PASC-AMs with odds ratios highest for neurological, cardiovascular, and constitutional PASC-AMs with odds ratios of 1.31, 1.29, and 1.23 respectively. Secondary analysis revealed that the proportions of 50 individual clinical features significantly differed between patients diagnosed with different psychiatric diseases. Our study provides evidence for association between non-psychiatric PASC-AMs and the incidence of newly diagnosed psychiatric disease. Significant associations were found for features related to multiple organ systems. This information could prove useful in understanding risk stratification for new-onset psychiatric disease following COVID-19. Prospective studies are needed to corroborate these findings.


Subject(s)
COVID-19 , Mental Disorders , SARS-CoV-2 , Humans , COVID-19/psychology , COVID-19/complications , COVID-19/epidemiology , Male , Female , Mental Disorders/epidemiology , Middle Aged , Adult , Retrospective Studies , Aged , Phenotype , Post-Acute COVID-19 Syndrome , Comorbidity , Electronic Health Records , Young Adult , Risk Factors , Adolescent
2.
Front Robot AI ; 11: 1362735, 2024.
Article in English | MEDLINE | ID: mdl-38694882

ABSTRACT

We introduce a novel approach to training data augmentation in brain-computer interfaces (BCIs) using neural field theory (NFT) applied to EEG data from motor imagery tasks. BCIs often suffer from limited accuracy due to a limited amount of training data. To address this, we leveraged a corticothalamic NFT model to generate artificial EEG time series as supplemental training data. We employed the BCI competition IV '2a' dataset to evaluate this augmentation technique. For each individual, we fitted the model to common spatial patterns of each motor imagery class, jittered the fitted parameters, and generated time series for data augmentation. Our method led to significant accuracy improvements of over 2% in classifying the "total power" feature, but not in the case of the "Higuchi fractal dimension" feature. This suggests that the fit NFT model may more favorably represent one feature than the other. These findings pave the way for further exploration of NFT-based data augmentation, highlighting the benefits of biophysically accurate artificial data.

3.
BMC Med Res Methodol ; 24(1): 120, 2024 May 27.
Article in English | MEDLINE | ID: mdl-38802749

ABSTRACT

BACKGROUND: To describe the methodology for conducting the CalScope study, a remote, population-based survey launched by the California Department of Public Health (CDPH) to estimate SARS-CoV-2 seroprevalence and understand COVID-19 disease burden in California. METHODS: Between April 2021 and August 2022, 666,857 randomly selected households were invited by mail to complete an online survey and at-home test kit for up to one adult and one child. A gift card was given for each completed survey and test kit. Multiple customized REDCap databases were used to create a data system which provided task automation and scalable data management through API integrations. Support infrastructure was developed to manage follow-up for participant questions and a communications plan was used for outreach through local partners. RESULTS: Across 3 waves, 32,671 out of 666,857 (4.9%) households registered, 6.3% by phone using an interactive voice response (IVR) system and 95.7% in English. Overall, 25,488 (78.0%) households completed surveys, while 23,396 (71.6%) households returned blood samples for testing. Support requests (n = 5,807) received through the web-based form (36.3%), by email (34.1%), and voicemail (29.7%) were mostly concerned with the test kit (31.6%), test result (26.8%), and gift card (21.3%). CONCLUSIONS: Ensuring a well-integrated and scalable data system, responsive support infrastructure for participant follow-up, and appropriate academic and local health department partnerships for study management and communication allowed for successful rollout of a large population-based survey. Remote data collection utilizing online surveys and at-home test kits can complement routine surveillance data for a state health department.


Subject(s)
COVID-19 , Dried Blood Spot Testing , SARS-CoV-2 , Humans , COVID-19/epidemiology , COVID-19/diagnosis , Seroepidemiologic Studies , California/epidemiology , SARS-CoV-2/immunology , Dried Blood Spot Testing/methods , Dried Blood Spot Testing/statistics & numerical data , Adult , Surveys and Questionnaires , Male , Female , Child , Middle Aged , Adolescent
4.
bioRxiv ; 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38712026

ABSTRACT

P21-activated kinase 2 (PAK2) is a serine/threonine kinase essential for a variety of cellular processes including signal transduction, cellular survival, proliferation, and migration. A recent report proposed monoallelic PAK2 variants cause Knobloch syndrome type 2 (KNO2)-a developmental disorder primarily characterized by ocular anomalies. Here, we identified a novel de novo heterozygous missense variant in PAK2, NM_002577.4:c.1273G>A, p.(D425N), by whole genome sequencing in an individual with features consistent with KNO2. Notable clinical phenotypes include global developmental delay, congenital retinal detachment, mild cerebral ventriculomegaly, hypotonia, FTT, pyloric stenosis, feeding intolerance, patent ductus arteriosus, and mild facial dysmorphism. The p.(D425N) variant lies within the protein kinase domain and is predicted to be functionally damaging by in silico analysis. Previous clinical genetic testing did not report this variant due to unknown relevance of PAK2 variants at the time of testing, highlighting the importance of reanalysis. Our findings also substantiate the candidacy of PAK2 variants in KNO2 and expand the KNO2 clinical spectrum.

5.
EBioMedicine ; 104: 105144, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38723553

ABSTRACT

BACKGROUND: Two or more autoantibodies against either insulin (IAA), glutamic acid decarboxylase (GADA), islet antigen-2 (IA-2A) or zinc transporter 8 (ZnT8A) denote stage 1 (normoglycemia) or stage 2 (dysglycemia) type 1 diabetes prior to stage 3 type 1 diabetes. Automated multiplex Antibody Detection by Agglutination-PCR (ADAP) assays in two laboratories were compared to single plex radiobinding assays (RBA) to define threshold levels for diagnostic specificity and sensitivity. METHODS: IAA, GADA, IA-2A and ZnT8A were analysed in 1504 (54% females) population based controls (PBC), 456 (55% females) doctor's office controls (DOC) and 535 (41% females) blood donor controls (BDC) as well as in 2300 (48% females) patients newly diagnosed (1-10 years of age) with stage 3 type 1 diabetes. The thresholds for autoantibody positivity were computed in 100 10-fold cross-validations to separate patients from controls either by maximizing the χ2-statistics (chisq) or using the 98th percentile of specificity (Spec98). Mean and 95% CI for threshold, sensitivity and specificity are presented. FINDINGS: The ADAP ROC curves of the four autoantibodies showed comparable AUC in the two ADAP laboratories and were higher than RBA. Detection of two or more autoantibodies using chisq showed 0.97 (0.95, 0.99) sensitivity and 0.94 (0.91, 0.97) specificity in ADAP compared to 0.90 (0.88, 0.95) sensitivity and 0.97 (0.94, 0.98) specificity in RBA. Using Spec98, ADAP showed 0.92 (0.89, 0.95) sensitivity and 0.99 (0.98, 1.00) specificity compared to 0.89 (0.77, 0.86) sensitivity and 1.00 (0.99, 1.00) specificity in the RBA. The diagnostic sensitivity and specificity were higher in PBC compared to DOC and BDC. INTERPRETATION: ADAP was comparable in two laboratories, both comparable to or better than RBA, to define threshold levels for two or more autoantibodies to stage type 1 diabetes. FUNDING: Supported by The Leona M. and Harry B. Helmsley Charitable Trust (grant number 2009-04078), the Swedish Foundation for Strategic Research (Dnr IRC15-0067) and the Swedish Research Council, Strategic Research Area (Dnr 2009-1039). AL was supported by the DiaUnion collaborative study, co-financed by EU Interreg ÖKS, Capital Region of Denmark, Region Skåne and the Novo Nordisk Foundation.

6.
Bioinform Adv ; 4(1): vbae036, 2024.
Article in English | MEDLINE | ID: mdl-38577542

ABSTRACT

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

7.
Heart ; 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38649264

ABSTRACT

Valvular heart disease, including calcific or degenerative aortic stenosis (AS), is increasingly prevalent among the older adult population. Over the last few decades, treatment of severe AS has been revolutionised following the development of transcatheter aortic valve replacement (TAVR). Despite improvements in outcomes, older adults with competing comorbidities and geriatric syndromes have suboptimal quality of life outcomes, highlighting the cumulative vulnerability that persists despite valve replacement. Sarcopenia, characterised by loss of muscle strength, mass and function, affects 21%-70% of older adults with AS. Sarcopenia is an independent predictor of short-term and long-term outcomes after TAVR and should be incorporated as a prognostic marker in preprocedural planning. Early diagnosis and treatment of sarcopenia may reduce morbidity and mortality and improve quality of life following TAVR. The adverse effects of sarcopenia can be mitigated through resistance training and optimisation of nutritional status. This is most efficacious when administered before sarcopenia has progressed to advanced stages. Management should be individualised based on the patient's wishes/preferences, care goals and physical capability. Exercise during the preoperative waiting period may be safe and effective in most patients with severe AS. However, future studies are needed to establish the benefits of prehabilitation in improving quality of life outcomes after TAVR procedures.

8.
bioRxiv ; 2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38617362

ABSTRACT

Many data resources generate, process, store, or provide kidney related molecular, pathological, and clinical data. Reference ontologies offer an opportunity to support knowledge and data integration. The Kidney Precision Medicine Project (KPMP) team contributed to the representation and addition of 329 kidney phenotype terms to the Human Phenotype Ontology (HPO), and identified many subcategories of acute kidney injury (AKI) or chronic kidney disease (CKD). The Kidney Tissue Atlas Ontology (KTAO) imports and integrates kidney-related terms from existing ontologies (e.g., HPO, CL, and Uberon) and represents 259 kidney-related biomarkers. We also developed a precision medicine metadata ontology (PMMO) to integrate 50 variables from KPMP and CZ CellxGene data resources and applied PMMO for integrative kidney data analysis. The gene expression profiles of kidney gene biomarkers were specifically analyzed under healthy control or AKI/CKD disease statuses. This work demonstrates how ontology-based approaches support multi-domain data and knowledge integration in precision medicine.

9.
Risk Anal ; 2024 Apr 28.
Article in English | MEDLINE | ID: mdl-38679462

ABSTRACT

To improve preparedness for natural disasters, it is imperative to understand the factors that enable individual risk-reduction actions. This study offers such insights using innovative real-time (N = 871) and repeated (N = 255) surveys of a sample of coastal residents in Florida regarding flood preparations and their drivers during an imminent threat posed by Hurricane Dorian and its aftermath. Compared with commonly employed cross-sectional surveys, our methodology better represents relationships between preparedness actions undertaken during the disaster threat and their drivers derived from an extended version of Protection Motivation Theory (PMT). The repeated survey allows for examining temporal dynamics in these drivers. Our results confirm the importance of coping appraisals and show that risk perceptions relate more strongly to emergency protection decisions made during the period of the disaster threat than to decisions made well before. Moreover, we find that several personal characteristics that we add to the standard PMT framework significantly relate to undertaking preparedness actions, especially locus of control and social norms. Significant changes in key explanatory variables occur following the disaster threat, including a decline in risk perception, a potential learning effect in coping appraisals, and a decline in risk aversion. Our results confirm the advantage of the real-time and repeated survey approach in understanding both short- and long-term disaster preparedness actions.

10.
Int J Med Inform ; 187: 105461, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38643701

ABSTRACT

OBJECTIVE: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids). MATERIALS AND METHODS: We harmonized survey data from the Personalized Environment and Genes Study (PEGS) on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. RESULTS: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant or suggestive predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. DISCUSSION: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal but can support hypothesis generation. CONCLUSION: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.


Subject(s)
Environmental Exposure , Humans , Female , Environmental Exposure/adverse effects , Genital Diseases, Female , Logistic Models , Nutritional Status , Diet , Adult , Random Forest
11.
Sci Data ; 11(1): 363, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38605048

ABSTRACT

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Subject(s)
Biological Science Disciplines , Knowledge Bases , Pattern Recognition, Automated , Algorithms , Translational Research, Biomedical
12.
Genet Med ; 26(7): 101141, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38629401

ABSTRACT

PURPOSE: Existing resources that characterize the essentiality status of genes are based on either proliferation assessment in human cell lines, viability evaluation in mouse knockouts, or constraint metrics derived from human population sequencing studies. Several repositories document phenotypic annotations for rare disorders; however, there is a lack of comprehensive reporting on lethal phenotypes. METHODS: We queried Online Mendelian Inheritance in Man for terms related to lethality and classified all Mendelian genes according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. We characterized the genes across these lethality categories, examined the evidence on viability from mouse models and explored how this information could be used for novel gene discovery. RESULTS: We developed the Lethal Phenotypes Portal to showcase this curated catalog of human essential genes. Differences in the mode of inheritance, physiological systems affected, and disease class were found for genes in different lethality categories, as well as discrepancies between the lethal phenotypes observed in mouse and human. CONCLUSION: We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.

13.
Article in English | MEDLINE | ID: mdl-38513838

ABSTRACT

BACKGROUND: Millions of people are exposed to landscape fire smoke (LFS) globally, and inhalation of LFS particulate matter (PM) is associated with poor respiratory and cardiovascular outcomes. However, how LFS affects respiratory and cardiovascular function is less well understood. OBJECTIVE: We aimed to characterize the pathophysiologic effects of representative LFS airway exposure on respiratory and cardiac function and on asthma outcomes. METHODS: LFS was generated using a customized combustion chamber. In 8-week-old female BALB/c mice, low (25 µg/m3, 24-hour equivalent) or moderate (100 µg/m3, 24-hour equivalent) concentrations of LFS PM (10 µm and below [PM10]) were administered daily for 3 (short-term) and 14 (long-term) days in the presence and absence of experimental asthma. Lung inflammation, gene expression, structural changes, and lung function were assessed. In 8-week-old male C57BL/6 mice, low concentrations of LFS PM10 were administered for 3 days. Cardiac function and gene expression were assessed. RESULTS: Short- and long-term LFS PM10 airway exposure increased airway hyperresponsiveness and induced steroid insensitivity in experimental asthma, independent of significant changes in airway inflammation. Long-term LFS PM10 airway exposure also decreased gas diffusion. Short-term LFS PM10 airway exposure decreased cardiac function and expression of gene changes relating to oxidative stress and cardiovascular pathologies. CONCLUSIONS: We characterized significant detrimental effects of physiologically relevant concentrations and durations of LFS PM10 airway exposure on lung and heart function. Our study provides a platform for assessment of mechanisms that underpin LFS PM10 airway exposure on respiratory and cardiovascular disease outcomes.

14.
Front Endocrinol (Lausanne) ; 15: 1340436, 2024.
Article in English | MEDLINE | ID: mdl-38390205

ABSTRACT

Introduction: Achieving early diagnosis of pre-symptomatic type 1 diabetes is critical to reduce potentially life-threatening diabetic ketoacidosis (DKA) at symptom onset, link patients to FDA approved therapeutics that can delay disease progression and support novel interventional drugs development. The presence of two or more islet autoantibodies in pre-symptomatic type 1 diabetes patients indicates high-risk of progression to clinical manifestation. Method: Herein, we characterized the capability of multiplex ADAP assay to predict type 1 diabetes progression. We obtained retrospective coded sera from a cohort of 48 progressors and 44 non-progressors from the NIDDK DPT-1 study. Result: The multiplex ADAP assay and radiobinding assays had positive predictive value (PPV)/negative predictive value (NPV) of 68%/92% and 67%/66% respectively. The improved NPV stemmed from 12 progressors tested positive for multiple islet autoantibodies by multiplex ADAP assay but not by RBA. Furthermore, 6 out of these 12 patients tested positive for multiple islet autoantibodies by RBA in subsequent sampling events with a median delay of 2.8 years compared to multiplex ADAP assay. Discussion: In summary, multiplex ADAP assay could be an ideal tool for type 1 diabetes risk testing due to its sample-sparing nature (4µL), non-radioactiveness, compatibility with widely available real-time qPCR instruments and favorable risk prediction capability.


Subject(s)
Diabetes Mellitus, Type 1 , Diabetic Ketoacidosis , Humans , Retrospective Studies , Autoantibodies , Agglutination , Polymerase Chain Reaction
15.
Int J Biometeorol ; 68(5): 899-908, 2024 May.
Article in English | MEDLINE | ID: mdl-38308729

ABSTRACT

Heat stress (HS) during the dry period of dairy cows in hot and dry conditions compromises the physiological status and mammary gland development of dairy cows, thereby negatively affecting milk component yield in the subsequent lactation. Our objective was to evaluate the effects of cooling Holstein cows under moderate or higher HS conditions (i.e., ambient temperature higher than 30 °C, with a temperature-humidity index of 78.2 units) during the dry period on prepartum physiological status, postpartum productivity, and calf growth. Twenty-four multiparous Holstein cows were divided into two groups: one with a cooling system based on spray and fans under a pen shade (CL, n = 12) and the other not-cooled (NC, n = 12). The cooling system operated 10 h/d (09:00-19:00 h) for 60 d prepartum. During the morning, rectal temperature and respiration frequency were lower in CL cows, but not in the afternoon, which was attributed to higher (P < 0.01) dry matter intake by CL cows. Total serum protein was higher (P < 0.01) in CL cows, but hemoglobin was higher in NC cows (P < 0.01), with no differences in other electrolytes, hormones, hematological components, and metabolites. Milk fat and fat and fat-protein corrected milk were higher (P < 0.05) in CL cows. Female and birth weight trended (P = 0.08) to be higher in CL cows. Cooling cows during the dry period had a limited effect on physiology prepartum but increased postpartum productivity of Holstein cows under hot and dry conditions.


Subject(s)
Milk , Postpartum Period , Animals , Cattle/physiology , Female , Milk/metabolism , Postpartum Period/physiology , Pregnancy , Seasons , Lactation/physiology , Body Temperature
16.
Bioinformatics ; 40(3)2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38383067

ABSTRACT

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.


Subject(s)
Knowledge Bases , Semantics , Databases, Factual
17.
BMC Med Inform Decis Mak ; 24(1): 30, 2024 Jan 31.
Article in English | MEDLINE | ID: mdl-38297371

ABSTRACT

OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.


Subject(s)
Knowledge , Language , Humans , Machine Learning , Phenotype , Rare Diseases
18.
Prenat Diagn ; 44(4): 454-464, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38242839

ABSTRACT

Advances in sequencing and imaging technologies enable enhanced assessment in the prenatal space, with a goal to diagnose and predict the natural history of disease, to direct targeted therapies, and to implement clinical management, including transfer of care, election of supportive care, and selection of surgical interventions. The current lack of standardization and aggregation stymies variant interpretation and gene discovery, which hinders the provision of prenatal precision medicine, leaving clinicians and patients without an accurate diagnosis. With large amounts of data generated, it is imperative to establish standards for data collection, processing, and aggregation. Aggregated and homogeneously processed genetic and phenotypic data permits dissection of the genomic architecture of prenatal presentations of disease and provides a dataset on which data analysis algorithms can be tuned to the prenatal space. Here we discuss the importance of generating aggregate data sets and how the prenatal space is driving the development of interoperable standards and phenotype-driven tools.


Subject(s)
Precision Medicine , Prenatal Diagnosis , Pregnancy , Female , Humans , Phenotype , Genomics , Algorithms
19.
medRxiv ; 2024 Jan 13.
Article in English | MEDLINE | ID: mdl-38260283

ABSTRACT

Essential genes are those whose function is required for cell proliferation and/or organism survival. A gene's intolerance to loss-of-function can be allocated within a spectrum, as opposed to being considered a binary feature, since this function might be essential at different stages of development, genetic backgrounds or other contexts. Existing resources that collect and characterise the essentiality status of genes are based on either proliferation assessment in human cell lines, embryonic and postnatal viability evaluation in different model organisms, and gene metrics such as intolerance to variation scores derived from human population sequencing studies. There are also several repositories available that document phenotypic annotations for rare disorders in humans such as the Online Mendelian Inheritance in Man (OMIM) and the Human Phenotype Ontology (HPO) knowledgebases. This raises the prospect of being able to use clinical data, including lethality as the most severe phenotypic manifestation, to further our characterisation of gene essentiality. Here we queried OMIM for terms related to lethality and classified all Mendelian genes into categories, according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. To showcase this curated catalogue of human essential genes, we developed the Lethal Phenotypes Portal (https://lethalphenotypes.research.its.qmul.ac.uk), where we also explore the relationships between these lethality categories, constraint metrics and viability in cell lines and mouse. Further analysis of the genes in these categories reveals differences in the mode of inheritance of the associated disorders, physiological systems affected and disease class. We highlight how the phenotypic similarity between genes in the same lethality category combined with gene family/group information can be used for novel disease gene discovery. Finally, we explore the overlaps and discrepancies between the lethal phenotypes observed in mouse and human and discuss potential explanations that include differences in transcriptional regulation, functional compensation and molecular disease mechanisms. We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.

20.
medRxiv ; 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-37503093

ABSTRACT

Objective: Large Language Models such as GPT-4 previously have been applied to differential diagnostic challenges based on published case reports. Published case reports have a sophisticated narrative style that is not readily available from typical electronic health records (EHR). Furthermore, even if such a narrative were available in EHRs, privacy requirements would preclude sending it outside the hospital firewall. We therefore tested a method for parsing clinical texts to extract ontology terms and programmatically generating prompts that by design are free of protected health information. Materials and Methods: We investigated different methods to prepare prompts from 75 recently published case reports. We transformed the original narratives by extracting structured terms representing phenotypic abnormalities, comorbidities, treatments, and laboratory tests and creating prompts programmatically. Results: Performance of all of these approaches was modest, with the correct diagnosis ranked first in only 5.3-17.6% of cases. The performance of the prompts created from structured data was substantially worse than that of the original narrative texts, even if additional information was added following manual review of term extraction. Moreover, different versions of GPT-4 demonstrated substantially different performance on this task. Discussion: The sensitivity of the performance to the form of the prompt and the instability of results over two GPT-4 versions represent important current limitations to the use of GPT-4 to support diagnosis in real-life clinical settings. Conclusion: Research is needed to identify the best methods for creating prompts from typically available clinical data to support differential diagnostics.

SELECTION OF CITATIONS
SEARCH DETAIL
...