Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 578
Filter
1.
Cureus ; 16(6): e61601, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38962621

ABSTRACT

Longitudinally extensive transverse myelitis (LETM) is traditionally classified as an inflammatory disorder of the spinal cord spanning three or more vertebral segments. The differential diagnosis for TM is vast and can include infectious, nutritional, and can even be idiopathic in some reported cases. However, autoimmune etiologies such as systemic lupus erythematosus (SLE) can rarely present with neurological manifestations such as LETM. In this case report, we present a 33-year-old female with a prior history of SLE who developed an LETM in the setting of possible provoking factors such as nutritional deficiencies and a recent viral illness. In this case report, we highlight her clinical course, recovery, and working differential diagnosis after laboratory testing and neurological imaging. Finally, we discuss the different treatments that ultimately lead to her successful recovery after her prolonged clinical course.

2.
Animals (Basel) ; 14(13)2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38998017

ABSTRACT

Eighty-four autumn (ACS, n = 45)- and spring (SCS, n = 39)-calved multiparous early lactation Holstein cows were assigned to groups of either: (a) grazing + mixed ration (MR) during partial confinement in outdoor soil-bedded pens with shade (OD-GRZ); (b) grazing + MR during partial confinement in a compost-bedded pack barn with cooling (CB-GRZ); or (c) total confinement fed a totally mixed ration (CB-TMR) in a compost-bedded pack barn. Data were analyzed using the SAS MIXED procedure with significance at p ≤ 0.05. In both seasons, despite behavioral differences (p < 0.05) between the OD-GRZ and CB-GRZ groups (i.e., standing, first grazing meal length, bite rate), the milk and component yields, DM intake, microbial CP output (MCP) and NE efficiency were unaffected by the housing conditions, possibly due to mild weather conditions. The milk yield was substantially higher in the CB-TMR group versus the OD-TMR and CB-TMR groups (p < 0.01) in both ACS (~35%) and SCS (~20%) despite there being no intake differences, without any impact on milk component levels. In ACS, this was associated with a higher MCP, likely due to the higher nutritional value of TMR compared to pasture, which was not the case in SCS. In conclusion, the OD-GRZ group achieved the same milk production as the CB-GRZ group through behavior adaptation, under mild weather conditions, in both calving seasons. The CB-TMR group outperformed the grazing systems in both calving seasons, regardless of the MCP.

3.
Front Cell Dev Biol ; 12: 1240384, 2024.
Article in English | MEDLINE | ID: mdl-38989060

ABSTRACT

Cell level functions underlie tissue and organ physiology. Gene expression patterns offer extensive views of the pathways and processes within and between cells. Single cell transcriptomics provides detailed information on gene expression within cells, cell types, subtypes and their relative proportions in organs. Functional pathways can be scalably connected to physiological functions at the cell and organ levels. Integrating experimentally obtained gene expression patterns with prior knowledge of pathway interactions enables identification of networks underlying whole cell functions such as growth, contractility, and secretion. These pathways can be computationally modeled using differential equations to simulate cell and organ physiological dynamics regulated by gene expression changes. Such computational systems can be thought of as parts of digital twins of organs. Digital twins, at the core, need computational models that represent in detail and simulate how dynamics of pathways and networks give rise to whole cell level physiological functions. Integration of transcriptomic responses and numerical simulations could simulate and predict whole cell functional outputs from transcriptomic data. We developed a computational pipeline that integrates gene expression timelines and systems of coupled differential equations to generate cell-type selective dynamical models. We tested our integrative algorithm on the eicosanoid biosynthesis network in macrophages. Converting transcriptomic changes to a dynamical model allowed us to predict dynamics of prostaglandin and thromboxane synthesis and secretion by macrophages that matched published lipidomics data obtained in the same experiments. Integration of cell-level system biology simulations with genomic and clinical data using a knowledge graph framework will allow us to create explicit predictive models that mechanistically link genomic determinants to organ function. Such integration requires a multi-domain ontological framework to connect genomic determinants to gene expression and cell pathways and functions to organ level phenotypes in healthy and diseased states. These integrated scalable models of tissues and organs as accurate digital twins predict health and disease states for precision medicine.

5.
bioRxiv ; 2024 Jul 04.
Article in English | MEDLINE | ID: mdl-39005436

ABSTRACT

Objectives: Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. Materials and methods: We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set. Results: We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. Discussion and Conclusion: This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

6.
Bioinformatics ; 40(7)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38913850

ABSTRACT

MOTIVATION: Human Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS: We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically equivalent tokens-to address lexical variability and a more effective CR step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10 000 publication abstracts in 5 s. AVAILABILITY AND IMPLEMENTATION: FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.


Subject(s)
Biological Ontologies , Phenotype , Humans , Natural Language Processing , Software , Algorithms
7.
medRxiv ; 2024 May 29.
Article in English | MEDLINE | ID: mdl-38854034

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

8.
bioRxiv ; 2024 Jun 16.
Article in English | MEDLINE | ID: mdl-38915571

ABSTRACT

Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools. Results: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. Conclusions: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.

10.
Transl Psychiatry ; 14(1): 246, 2024 Jun 08.
Article in English | MEDLINE | ID: mdl-38851761

ABSTRACT

Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19. A retrospective electronic health record (EHR) cohort study of 2,391,006 individuals with acute COVID-19 was performed to evaluate whether non-psychiatric PASC-AMs are associated with new-onset psychiatric disease. Data were obtained from the National COVID Cohort Collaborative (N3C), which has EHR data from 76 clinical organizations. EHR codes were mapped to 151 non-psychiatric PASC-AMs recorded 28-120 days following SARS-CoV-2 diagnosis and before diagnosis of new-onset psychiatric disease. Association of newly diagnosed psychiatric disease with age, sex, race, pre-existing comorbidities, and PASC-AMs in seven categories was assessed by logistic regression. There were significant associations between a diagnosis of any psychiatric disease and five categories of PASC-AMs with odds ratios highest for neurological, cardiovascular, and constitutional PASC-AMs with odds ratios of 1.31, 1.29, and 1.23 respectively. Secondary analysis revealed that the proportions of 50 individual clinical features significantly differed between patients diagnosed with different psychiatric diseases. Our study provides evidence for association between non-psychiatric PASC-AMs and the incidence of newly diagnosed psychiatric disease. Significant associations were found for features related to multiple organ systems. This information could prove useful in understanding risk stratification for new-onset psychiatric disease following COVID-19. Prospective studies are needed to corroborate these findings.


Subject(s)
COVID-19 , Mental Disorders , SARS-CoV-2 , Humans , COVID-19/psychology , COVID-19/complications , COVID-19/epidemiology , Male , Female , Mental Disorders/epidemiology , Middle Aged , Adult , Retrospective Studies , Aged , Phenotype , Post-Acute COVID-19 Syndrome , Comorbidity , Electronic Health Records , Young Adult , Risk Factors , Adolescent
11.
EBioMedicine ; 104: 105144, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38723553

ABSTRACT

BACKGROUND: Two or more autoantibodies against either insulin (IAA), glutamic acid decarboxylase (GADA), islet antigen-2 (IA-2A) or zinc transporter 8 (ZnT8A) denote stage 1 (normoglycemia) or stage 2 (dysglycemia) type 1 diabetes prior to stage 3 type 1 diabetes. Automated multiplex Antibody Detection by Agglutination-PCR (ADAP) assays in two laboratories were compared to single plex radiobinding assays (RBA) to define threshold levels for diagnostic specificity and sensitivity. METHODS: IAA, GADA, IA-2A and ZnT8A were analysed in 1504 (54% females) population based controls (PBC), 456 (55% females) doctor's office controls (DOC) and 535 (41% females) blood donor controls (BDC) as well as in 2300 (48% females) patients newly diagnosed (1-10 years of age) with stage 3 type 1 diabetes. The thresholds for autoantibody positivity were computed in 100 10-fold cross-validations to separate patients from controls either by maximizing the χ2-statistics (chisq) or using the 98th percentile of specificity (Spec98). Mean and 95% CI for threshold, sensitivity and specificity are presented. FINDINGS: The ADAP ROC curves of the four autoantibodies showed comparable AUC in the two ADAP laboratories and were higher than RBA. Detection of two or more autoantibodies using chisq showed 0.97 (0.95, 0.99) sensitivity and 0.94 (0.91, 0.97) specificity in ADAP compared to 0.90 (0.88, 0.95) sensitivity and 0.97 (0.94, 0.98) specificity in RBA. Using Spec98, ADAP showed 0.92 (0.89, 0.95) sensitivity and 0.99 (0.98, 1.00) specificity compared to 0.89 (0.77, 0.86) sensitivity and 1.00 (0.99, 1.00) specificity in the RBA. The diagnostic sensitivity and specificity were higher in PBC compared to DOC and BDC. INTERPRETATION: ADAP was comparable in two laboratories, both comparable to or better than RBA, to define threshold levels for two or more autoantibodies to stage type 1 diabetes. FUNDING: Supported by The Leona M. and Harry B. Helmsley Charitable Trust (grant number 2009-04078), the Swedish Foundation for Strategic Research (Dnr IRC15-0067) and the Swedish Research Council, Strategic Research Area (Dnr 2009-1039). AL was supported by the DiaUnion collaborative study, co-financed by EU Interreg ÖKS, Capital Region of Denmark, Region Skåne and the Novo Nordisk Foundation.


Subject(s)
Autoantibodies , Diabetes Mellitus, Type 1 , Humans , Diabetes Mellitus, Type 1/immunology , Diabetes Mellitus, Type 1/diagnosis , Diabetes Mellitus, Type 1/blood , Autoantibodies/blood , Autoantibodies/immunology , Female , Male , Child , Child, Preschool , Infant , Zinc Transporter 8/immunology , Sensitivity and Specificity , Receptor-Like Protein Tyrosine Phosphatases, Class 8/immunology , Glutamate Decarboxylase/immunology , ROC Curve , Mass Screening/methods
12.
Front Robot AI ; 11: 1362735, 2024.
Article in English | MEDLINE | ID: mdl-38694882

ABSTRACT

We introduce a novel approach to training data augmentation in brain-computer interfaces (BCIs) using neural field theory (NFT) applied to EEG data from motor imagery tasks. BCIs often suffer from limited accuracy due to a limited amount of training data. To address this, we leveraged a corticothalamic NFT model to generate artificial EEG time series as supplemental training data. We employed the BCI competition IV '2a' dataset to evaluate this augmentation technique. For each individual, we fitted the model to common spatial patterns of each motor imagery class, jittered the fitted parameters, and generated time series for data augmentation. Our method led to significant accuracy improvements of over 2% in classifying the "total power" feature, but not in the case of the "Higuchi fractal dimension" feature. This suggests that the fit NFT model may more favorably represent one feature than the other. These findings pave the way for further exploration of NFT-based data augmentation, highlighting the benefits of biophysically accurate artificial data.

13.
BMC Med Res Methodol ; 24(1): 120, 2024 May 27.
Article in English | MEDLINE | ID: mdl-38802749

ABSTRACT

BACKGROUND: To describe the methodology for conducting the CalScope study, a remote, population-based survey launched by the California Department of Public Health (CDPH) to estimate SARS-CoV-2 seroprevalence and understand COVID-19 disease burden in California. METHODS: Between April 2021 and August 2022, 666,857 randomly selected households were invited by mail to complete an online survey and at-home test kit for up to one adult and one child. A gift card was given for each completed survey and test kit. Multiple customized REDCap databases were used to create a data system which provided task automation and scalable data management through API integrations. Support infrastructure was developed to manage follow-up for participant questions and a communications plan was used for outreach through local partners. RESULTS: Across 3 waves, 32,671 out of 666,857 (4.9%) households registered, 6.3% by phone using an interactive voice response (IVR) system and 95.7% in English. Overall, 25,488 (78.0%) households completed surveys, while 23,396 (71.6%) households returned blood samples for testing. Support requests (n = 5,807) received through the web-based form (36.3%), by email (34.1%), and voicemail (29.7%) were mostly concerned with the test kit (31.6%), test result (26.8%), and gift card (21.3%). CONCLUSIONS: Ensuring a well-integrated and scalable data system, responsive support infrastructure for participant follow-up, and appropriate academic and local health department partnerships for study management and communication allowed for successful rollout of a large population-based survey. Remote data collection utilizing online surveys and at-home test kits can complement routine surveillance data for a state health department.


Subject(s)
COVID-19 , Dried Blood Spot Testing , SARS-CoV-2 , Humans , COVID-19/epidemiology , COVID-19/diagnosis , Seroepidemiologic Studies , California/epidemiology , SARS-CoV-2/immunology , Dried Blood Spot Testing/methods , Dried Blood Spot Testing/statistics & numerical data , Adult , Surveys and Questionnaires , Male , Female , Child , Middle Aged , Adolescent
14.
bioRxiv ; 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38712026

ABSTRACT

P21-activated kinase 2 (PAK2) is a serine/threonine kinase essential for a variety of cellular processes including signal transduction, cellular survival, proliferation, and migration. A recent report proposed monoallelic PAK2 variants cause Knobloch syndrome type 2 (KNO2)-a developmental disorder primarily characterized by ocular anomalies. Here, we identified a novel de novo heterozygous missense variant in PAK2, NM_002577.4:c.1273G>A, p.(D425N), by whole genome sequencing in an individual with features consistent with KNO2. Notable clinical phenotypes include global developmental delay, congenital retinal detachment, mild cerebral ventriculomegaly, hypotonia, FTT, pyloric stenosis, feeding intolerance, patent ductus arteriosus, and mild facial dysmorphism. The p.(D425N) variant lies within the protein kinase domain and is predicted to be functionally damaging by in silico analysis. Previous clinical genetic testing did not report this variant due to unknown relevance of PAK2 variants at the time of testing, highlighting the importance of reanalysis. Our findings also substantiate the candidacy of PAK2 variants in KNO2 and expand the KNO2 clinical spectrum.

15.
Sci Data ; 11(1): 363, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38605048

ABSTRACT

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Subject(s)
Biological Science Disciplines , Knowledge Bases , Pattern Recognition, Automated , Algorithms , Translational Research, Biomedical
16.
Int J Med Inform ; 187: 105461, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38643701

ABSTRACT

OBJECTIVE: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids). MATERIALS AND METHODS: We harmonized survey data from the Personalized Environment and Genes Study (PEGS) on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. RESULTS: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant or suggestive predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. DISCUSSION: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal but can support hypothesis generation. CONCLUSION: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.


Subject(s)
Environmental Exposure , Humans , Female , Environmental Exposure/adverse effects , Genital Diseases, Female , Logistic Models , Nutritional Status , Diet , Adult , Random Forest
17.
Heart ; 110(15): 974-979, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38649264

ABSTRACT

Valvular heart disease, including calcific or degenerative aortic stenosis (AS), is increasingly prevalent among the older adult population. Over the last few decades, treatment of severe AS has been revolutionised following the development of transcatheter aortic valve replacement (TAVR). Despite improvements in outcomes, older adults with competing comorbidities and geriatric syndromes have suboptimal quality of life outcomes, highlighting the cumulative vulnerability that persists despite valve replacement. Sarcopenia, characterised by loss of muscle strength, mass and function, affects 21%-70% of older adults with AS. Sarcopenia is an independent predictor of short-term and long-term outcomes after TAVR and should be incorporated as a prognostic marker in preprocedural planning. Early diagnosis and treatment of sarcopenia may reduce morbidity and mortality and improve quality of life following TAVR. The adverse effects of sarcopenia can be mitigated through resistance training and optimisation of nutritional status. This is most efficacious when administered before sarcopenia has progressed to advanced stages. Management should be individualised based on the patient's wishes/preferences, care goals and physical capability. Exercise during the preoperative waiting period may be safe and effective in most patients with severe AS. However, future studies are needed to establish the benefits of prehabilitation in improving quality of life outcomes after TAVR procedures.


Subject(s)
Sarcopenia , Transcatheter Aortic Valve Replacement , Humans , Sarcopenia/diagnosis , Sarcopenia/therapy , Sarcopenia/physiopathology , Sarcopenia/etiology , Transcatheter Aortic Valve Replacement/adverse effects , Transcatheter Aortic Valve Replacement/methods , Quality of Life , Aortic Valve Stenosis/surgery , Aortic Valve Stenosis/diagnosis , Aortic Valve Stenosis/physiopathology , Aortic Valve/surgery , Risk Factors , Aged , Aortic Valve Disease/surgery , Aortic Valve Disease/therapy , Treatment Outcome
18.
Genet Med ; 26(7): 101141, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38629401

ABSTRACT

PURPOSE: Existing resources that characterize the essentiality status of genes are based on either proliferation assessment in human cell lines, viability evaluation in mouse knockouts, or constraint metrics derived from human population sequencing studies. Several repositories document phenotypic annotations for rare disorders; however, there is a lack of comprehensive reporting on lethal phenotypes. METHODS: We queried Online Mendelian Inheritance in Man for terms related to lethality and classified all Mendelian genes according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. We characterized the genes across these lethality categories, examined the evidence on viability from mouse models and explored how this information could be used for novel gene discovery. RESULTS: We developed the Lethal Phenotypes Portal to showcase this curated catalog of human essential genes. Differences in the mode of inheritance, physiological systems affected, and disease class were found for genes in different lethality categories, as well as discrepancies between the lethal phenotypes observed in mouse and human. CONCLUSION: We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.


Subject(s)
Genes, Lethal , Genetic Diseases, Inborn , Phenotype , Humans , Animals , Mice , Genetic Diseases, Inborn/genetics , Databases, Genetic , Disease Models, Animal , Genes, Essential/genetics
19.
Bioinform Adv ; 4(1): vbae036, 2024.
Article in English | MEDLINE | ID: mdl-38577542

ABSTRACT

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

20.
bioRxiv ; 2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38617362

ABSTRACT

Many data resources generate, process, store, or provide kidney related molecular, pathological, and clinical data. Reference ontologies offer an opportunity to support knowledge and data integration. The Kidney Precision Medicine Project (KPMP) team contributed to the representation and addition of 329 kidney phenotype terms to the Human Phenotype Ontology (HPO), and identified many subcategories of acute kidney injury (AKI) or chronic kidney disease (CKD). The Kidney Tissue Atlas Ontology (KTAO) imports and integrates kidney-related terms from existing ontologies (e.g., HPO, CL, and Uberon) and represents 259 kidney-related biomarkers. We also developed a precision medicine metadata ontology (PMMO) to integrate 50 variables from KPMP and CZ CellxGene data resources and applied PMMO for integrative kidney data analysis. The gene expression profiles of kidney gene biomarkers were specifically analyzed under healthy control or AKI/CKD disease statuses. This work demonstrates how ontology-based approaches support multi-domain data and knowledge integration in precision medicine.

SELECTION OF CITATIONS
SEARCH DETAIL