RESUMO
PURPOSE: Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. Unstructured clinical notes contain valuable information for identifying rare diseases, but manual curation is time-consuming and prone to subjectivity. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) to improve rare disease identification from unstructured clinical reports. METHODS: We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary. SemEHR, a dictionary-based NLP tool, is employed to extract rare disease mentions from clinical notes. To refine the results and improve accuracy, we leverage various LLMs, including LLaMA3, Phi3-mini, and domain-specific models like OpenBioLLM and BioMistral. Different prompting strategies, such as zero-shot, few-shot, and knowledge-augmented generation, are explored to optimize the LLMs' performance. RESULTS: The proposed hybrid approach demonstrates superior performance compared to traditional NLP systems and standalone LLMs. LLaMA3 and Phi3-mini achieve the highest F1 scores in rare disease identification. Few-shot prompting with 1-3 examples yields the best results, while knowledge-augmented generation shows limited improvement. Notably, the approach uncovers a significant number of potential rare disease cases not documented in structured diagnostic records, highlighting its ability to identify previously unrecognized patients. CONCLUSION: The hybrid approach combining dictionary-based NLP tools with LLMs shows great promise for improving rare disease identification from unstructured clinical reports. By leveraging the strengths of both techniques, the method demonstrates superior performance and the potential to uncover hidden rare disease cases. Further research is needed to address limitations related to ontology mapping and overlapping case identification, and to integrate the approach into clinical practice for early diagnosis and improved patient outcomes.
Assuntos
Processamento de Linguagem Natural , Doenças Raras , Unified Medical Language System , Doenças Raras/diagnóstico , Humanos , Fenótipo , Registros Eletrônicos de Saúde , Ontologias BiológicasRESUMO
Background: Hepatocellular carcinoma (HCC) is a major health problem with more than 850,000 cases per year worldwide. This cancer is now the third leading cause of cancer-related deaths worldwide, and the number is rising. Cancer cells develop anoikis resistance which is a vital step during cancer progression and metastatic colonization. However, there is not much research that specifically addresses the role of anoikis in HCC, especially in terms of prognosis. Methods: This study obtained gene expression data and clinical information from 371 HCC patients through The Cancer Genome Atlas (TCGA) Program and The Gene Expression Omnibus (GEO) databases. A total of 516 anoikis-related genes (ANRGs) were retrieved from GeneCard database and Harmonizome portal. Differential expression analysis identified 219 differentially expressed genes (DEGs), and univariate Cox regression analysis was utilized to select 99 ANRGs associated with the prognosis of HCC patients. A risk scoring model with seven genes was established using the least absolute shrinkage and selection operator (LASSO) regression model, and internal validation of the model was performed. Results: The identified 99 ANRGs are closely associated with the prognosis of HCC patients. The risk scoring model based on seven characteristic genes demonstrates excellent predictive performance, further validated by receiver operating characteristic (ROC) curves and Kaplan-Meier survival curves. The study reveals significant differences in immune cell infiltration, gene expression, and survival status among different risk groups. Conclusions: The prognosis of HCC patients can be predicted using a unique prognostic model built on ANRGs in HCC.
RESUMO
Gastric cancer (GC) is one of the most common malignant tumors worldwide and the fourth leading cause of cancer-related deaths, with a relatively high incidence among the elderly population. Surgical resection is the mainstay treatment for GC and is currently the only cure. However, the incidence of postoperative intraabdominal infections remains high and seriously affects the prognosis. This study aimed to explore the risk factors for intraabdominal infections after radical gastrectomy in elderly patients and to establish and validate a risk prediction model. We collected the clinical data of 322 GC patients, who underwent radical gastrectomy at the General Surgery Department of China Medical University Dandong Central Hospital from January 2016 to January 2023. The patients were divided into an infected group (nâ =â 27) and a noninfected group (nâ =â 295) according to whether intraabdominal infections occurred postoperatively. A nomogram risk prediction model for the occurrence of postoperative intraabdominal infections was developed. All patients were randomized into a training set (nâ =â 225) and a validation set (nâ =â 97) in a 7:3 ratio, and the model was internally validated. Of the 322 patients, 27 (8.3%) experienced postoperative intraabdominal infections. Single-factor analysis revealed associations of intraabdominal infection with body mass index, glucose, hemoglobin, albumin, and other factors. The multifactorial analysis confirmed that body mass index, glucose, hemoglobin, albumin, surgical duration, and bleeding volume were independent risk factors for intraabdominal infections. The nomogram constructed based on these factors demonstrated excellent performance in both the training and validation sets. A nomogram model was developed and validated to predict the risk of intraabdominal infection after radical gastrectomy. The model has a good predictive performance, which could help clinicians prevent the occurrence of intraabdominal infections after radical gastrectomy in elderly patients.
Assuntos
Infecções Intra-Abdominais , Neoplasias Gástricas , Idoso , Humanos , Albuminas , Gastrectomia/efeitos adversos , Glucose , Hemoglobinas , Infecções Intra-Abdominais/etiologia , Infecções Intra-Abdominais/complicações , Nomogramas , Estudos Retrospectivos , Neoplasias Gástricas/patologiaRESUMO
Nonalcoholic fatty liver disease (NAFLD) is a spectrum of chronic liver disease characterized. The condition ranges from isolated excessive hepatocyte triglyceride accumulation and steatosis (nonalcoholic fatty liver (NAFL), to hepatic triglyceride accumulation plus inflammation and hepatocyte injury (nonalcoholic steatohepatitis (NASH)) and finally to hepatic fibrosis and cirrhosis and/or hepatocellular carcinoma (HCC). However, the mechanism driving this process is not yet clear. Obtain sample microarray from the GEO database. Extract 6 healthy liver samples, 74 nonalcoholic hepatitis samples, 8 liver cirrhosis samples, and 53 liver cancer samples from the GSE164760 dataset. We used the GEO2R tool for differentially expressed genes (DEGs) analysis of disease progression (nonalcoholic hepatitis healthy group, cirrhosis nonalcoholic hepatitis group, and liver cancer cirrhosis group) and necroptosis gene set. Gene set variation analysis (GSVA) is used to evaluate the association between biological pathways and gene features. The STRING database and Cytoscape software were used to establish and visualize protein-protein interaction (PPI) networks and identify the key functional modules of DEGs, drawn factor-target genes regulatory network. Gene Ontology (GO) and KEGG pathway enrichment analyses of DEGs were also performed. Additionally, immune infiltration patterns were analyzed using the cibersort, and the correlation between immune cell-type abundance and DEGs expression was investigated. We further screened and obtained a total of 152 intersecting DEGs from three groups. 23 key genes were obtained through the MCODE plugin. Transcription factors regulating common differentially expressed genes were obtained in the hTFtarget database, and a TF target network diagram was drawn. There are 118 nodes, 251 edges, and 4 clusters in the PPI network. The key genes of the four modules include METAP2, RPL14, SERBP1, EEF2; HR4A1; CANX; ARID1A, UBE2K. METAP2, RPL14, SERBP1 and EEF2 was identified as the key hub genes. CREB1 was identified as the hub TF interacting with those gens by taking the intersection of potential TFs. The types of key gene changes were genetic mutations. It can be seen that the incidence of key gene mutations is 1.7% in EEF2, 0.8% in METAP2, and 0.3% in RPL14, respectively. Finally, We found that the most significant expression differences of the immune infiltrating cells among the three groups, were Tregs and M2, M0 type macrophages. We identified four hub genes METAP2, RPL14, SERBP1 and EEF2 being the most closely with the process from NASH to cirrhosis to HCC. It is beneficial to examine and understand the interaction between hub DEGs and potential regulatory molecules in the process. This knowledge may provide a novel theoretical foundation for the development of diagnostic biomarkers and gene-related therapy targets in the process.
RESUMO
Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union's funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019-2022 was 80 times that of 2007-2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP's great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.
RESUMO
This paper reports on the performance of Edin-burgh_UCL_Health's models in the Social Media Mining for Health (SMM4H) 2022 shared tasks. Our team participated in the tasks related to the Identification of Adverse Drug Events (ADEs), the classification of change in medication (change-med) and the classification of selfreport of vaccination (self-vaccine). Our best performing models are based on DeepADEM-iner (with respective F1= 0.64, 0.62 and 0.39 for ADE identification), on a GloVe model trained on Twitter (with F1=0.11 for the changemed) and finally on a stack embedding including a layer of Glove embedding and two layers of Flair embedding (with F1= 0.77 for selfreport).
RESUMO
NHC-alcohol adduct-mediated deoxygenation of alcohols under photocatalytic conditions is described. This process provides various alkyl radicals, which can react with 2-isocyanobiaryls to afford 6-substituted phenanthridine derivatives in moderate to good yields. This method offered the first example of directly using alcohols as radical sources for 6-phenanthridine synthesis.
Assuntos
Etanol , FenantridinasRESUMO
Compared to the Haber-Bosch process, the electrochemical nitrogen reduction reaction (NRR) can convert N2 into NH3 under ambient conditions, and thus has attracted considerable attention in recent years. However, it remains a challenge to fabricate NRR catalysts with high faradaic efficiency and yield rate. In this work, by systematic first-principles calculations, we investigate the structure, stability and catalytic performance of single metal atoms anchored on porous monolayer C9N4 (M@C9N4) for the electrochemical NRR. A total of 25 transition metals (Sc-Zn, Zr-Mo, Ru-Ag, Hf-Au) were explored, and we screened out four promising systems, i.e., Nb, Ta, Re and W@C9N4, which not only exhibit high catalytic activity with low limiting potentials of -0.3, -0.42, -0.49 and -0.25 V, respectively, but also have superior selectivity that suppresses the competitive hydrogen evolution reaction. The physical origin lies in the coupling between the d orbitals of the transition metals and the 2π* orbital of N2, which activates the N2 molecule and facilitates the reduction process. Our proposed systems are kinetically and thermodynamically stable, which may shed light on future design and fabrication of high-efficiency single atom catalysts for various technologically important chemical reactions.
RESUMO
Two-dimensional (2D) ReSe2 has attracted considerable interest due to its unique anisotropic mechanical, optical, and exitonic characteristics. Recent transient absorption experiments demonstrated a prolonged lifetime of photoexcited charge carriers by stacking ReSe2 with MoS2, but the underlying mechanism remains elusive. Here, by combining time-domain density functional theory with nonadiabatic molecular dynamics, we investigate the electronic properties and charge carrier dynamics of 2D ReSe2/MoS2 van der Waals (vdW) heterostructure. ReSe2/MoS2 has a type II band alignment that exhibits spatially distinguished conduction and valence band edges, and a built-in electric field is formed due to interface charge transfer. Remarkably, in spite of the decreased band gap and increased decoherence time, we demonstrate that the photocarrier lifetime of ReSe2/MoS2 is â¼5 times longer than that of ReSe2, which originates from the greatly reduced nonadiabatic coupling that suppresses electron-hole recombination, perfectly explaining the experimental results. These findings not only provide physical insights into experiments but also shed light on future design and fabrication of functional optoelectronic devices based on 2D vdW heterostructures.
RESUMO
As novel states of quantum matter, quantum spin Hall (QSH) and quantum anomalous Hall (QAH) states have attracted considerable interest in condensed matter and materials science communities. Recently, a monolayer of the naturally occurring mineral jacutingaite (Pt2HgSe3), was theoretically proposed to be a large-gap QSH insulator and experimentally confirmed. Here, based on first-principles calculations and tight-binding modeling, we demonstrate QSH to QAH phase transition in jacutingaite by chemical functionalization with chalogen. We show that two-dimensional (2D) chalogenated jacutingaite, Pt2HgSe3-X (X = S, Se, Te), is ferromagnetic with Curie temperature up to 316 K, and it exhibits QAH effect with chiral edge states inside a sizeable topological gap. The physical mechanism lies in the adsorption induced transformation of the original Kane-Mele model into an effective four-band model involving (px, py) orbitals on a hexagonal lattice, so that the topological gap size can be controlled by spin-orbit coupling strength of the chalogen (0.28 eV for Pt2HgSe3-Te). These results not only show the promise of functionalization in orbital-engineering of 2D functional structures, but also provide an ideal and practical platform for achieving exotic topological phases for dissipationless transport and quantum computing.
RESUMO
Recent years have witnessed a surge of research in two-dimensional (2D) ferroelectric structures that may circumvent the depolarization effect in conventional perovskite oxide films. Herein, by first-principles calculations, we predict that an orthorhombic phase of lead(II) oxide, PbO, serves as a promising candidate for 2D ferroelectrics with good stability. With a semiconducting nature, 2D ferroelectric PbO exhibits intrinsic valley polarization, which leads to robust ferroelectricity with an in-plane spontaneous polarization of 2.4 × 10-10 C/m and a Curie temperature of 455 K. Remarkably, we reveal that the ferroelectricity is strain-tunable, and ferroelasticity coexists in the PbO film, implying the realization of 2D multiferroics. The underlying physical mechanism is generally applicable and can be extended to other oxide films such as ferroelectric SnO and GeO, thus paving an avenue for future design and fabrication of functional ultrathin devices that are compatible with Si-based technology.
RESUMO
Construction of tunable and robust two-dimensional (2D) molecular arrays with desirable lattices and functionalities over a macroscopic scale relies on spontaneous and reversible noncovalent interactions between suitable molecules as building blocks. Halogen bonding, with active tunability of direction, strength, and length, is ideal for tailoring supramolecular structures. Herein, by combining low-temperature scanning tunneling microscopy and systematic first-principles calculations, we demonstrate novel halogen bonding involving single halogen atoms and phase engineering in 2D molecular self-assembly. On the Au(111) surface, we observed catalyzed dehalogenation of hexabromobenzene (HBB) molecules, during which negatively charged bromine adatoms (Brδ-) were generated and participated in assembly via unique C-Brδ+···Brδ- interaction, drastically different from HBB assembly on a chemically inert graphene substrate. We successfully mapped out different phases of the assembled superstructure, including densely packed hexagonal, tetragonal, dimer chain, and expanded hexagonal lattices at room temperature, 60 °C, 90 °C, and 110 °C, respectively, and the critical role of Brδ- in regulating lattice characteristics was highlighted. Our results show promise for manipulating the interplay between noncovalent interactions and catalytic reactions for future development of molecular nanoelectronics and 2D crystal engineering.