RESUMO
Embedded in the nuclear envelope, nuclear pore complexes (NPCs) not only regulate nuclear transport but also interface with transcriptionally active euchromatin, largely silenced heterochromatin, as well as the boundaries between these regions. It is unclear what functional role NPCs play in establishing or maintaining these distinct chromatin domains. We report that the yeast NPC protein Nup170p interacts with regions of the genome that contain ribosomal protein and subtelomeric genes, where it functions in nucleosome positioning and as a repressor of transcription. We show that the role of Nup170p in subtelomeric gene silencing is linked to its association with the RSC chromatin-remodeling complex and the silencing factor Sir4p, and that the binding of Nup170p and Sir4p to subtelomeric chromatin is cooperative and necessary for the association of telomeres with the nuclear envelope. Our results establish the NPC as an active participant in silencing and the formation of peripheral heterochromatin.
Assuntos
Inativação Gênica , Complexo de Proteínas Formadoras de Poros Nucleares/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Cromatina/química , Cromatina/metabolismo , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Nucleossomos/metabolismo , Proteínas Ribossômicas/genética , Proteínas Reguladoras de Informação Silenciosa de Saccharomyces cerevisiae/metabolismo , Telômero/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Several studies to date have proposed different types of interpreters for measuring the degree of pathogenicity of variants. However, in predicting the disease type and disease-gene associations, scholars face two essential challenges, namely the vast number of existing variants and the existence of variants which are recognized as variant of uncertain significance (VUS). To tackle these challenges, we propose algorithms to assign a significance to each gene rather than each variant, describing its degree of pathogenicity. Since the interpreters identified most of the variants as VUS, most of the gene scores were identified as uncertain significance. To predict the uncertain significance scores, we design two matrix factorization-based models: the common latent space model uses genomics variant data as well as heterogeneous clinical data, while the single-matrix factorization model can be used when heterogeneous clinical data are unavailable. We have managed to show that the models successfully predict the uncertain significance scores with low error and high accuracy. Moreover, to evaluate the effectiveness of our novel input features, we train five different multi-label classifiers including a feedforward neural network with the same feature set and show they all achieve high accuracy as the main impact of our approach comes from the features. Availability: The source code is freely available at https://github.com/sabdollahi/CoLaSpSMFM.
Assuntos
Variação Genética , Genômica , Modelos Genéticos , Redes Neurais de Computação , Software , HumanosRESUMO
BACKGROUND: Hyperkalemia is a common complication of chronic kidney disease (CKD). Hyperkalemia is associated with mortality, CKD progression, hospitalization, and high healthcare costs in patients with CKD. We developed a machine learning model to predict hyperkalemia in patients with advanced CKD at an outpatient clinic. METHODS: This retrospective study included 1,965 advanced CKD patients between January 1, 2010, and December 31, 2020 in Taiwan. We randomly divided all patients into the training (75%) and testing (25%) datasets. The primary outcome was to predict hyperkalemia (K+ > 5.5 mEq/L) in the next clinic vist. Two nephrologists were enrolled in a human-machine competition. The area under the receiver operating characteristic curves (AUCs), sensitivity, specificity, and accuracy were used to evaluate the performance of XGBoost and conventional logistic regression models with that of these physicians. RESULTS: In a human-machine competition of hyperkalemia prediction, the AUC, PPV, and accuracy of the XGBoost model were 0.867 (95% confidence interval: 0.840-0.894), 0.700, and 0.933, which was significantly better than that of our clinicians. There were four variables that were chosen as high-ranking variables in XGBoost and logistic regression models, including hemoglobin, the serum potassium level in the previous visit, angiotensin receptor blocker use, and calcium polystyrene sulfonate use. CONCLUSIONS: The XGBoost model provided better predictive performance for hyperkalemia than physicians at the outpatient clinic.
Assuntos
Hiperpotassemia , Insuficiência Renal Crônica , Humanos , Estudos Retrospectivos , Rim , Instituições de Assistência AmbulatorialRESUMO
BACKGROUND: Functional disruptions by large germline genomic structural variants in susceptible genes are known risks for cancer. We used deletion structural variants (DSVs) generated from germline whole-genome sequencing (WGS) and DSV immune-related association tumor microenvironment (TME) to predict cancer risk and prognosis. METHODS: We investigated the contribution of germline DSVs to cancer susceptibility and prognosis by silicon and causal inference models. DSVs in germline WGS data were generated from the blood samples of 192 cancer and 499 non-cancer subjects. Clinical information, including family cancer history (FCH), was obtained from the National Cheng Kung University Hospital and Taiwan Biobank. Ninety-nine colorectal cancer (CRC) patients had immune response gene expression data. We used joint calling tools and an attention-weighted model to build the cancer risk predictive model and identify DSVs in familial cancer. The survival support vector machine (survival-SVM) was used to select prognostic DSVs. RESULTS: We identified 671 DSVs that could predict cancer risk. The area under the curve (AUC) of the receiver operating characteristic curve (ROC) of the attention-weighted model was 0.71. The 3 most frequent DSV genes observed in cancer patients were identified as ADCY9, AURKAPS1, and RAB3GAP2 (p < 0.05). The DSVs in SGSM2 and LHFPL3 were relevant to colorectal cancer. We found a higher incidence of FCH in cancer patients than in non-cancer subjects (p < 0.05). SMYD3 and NKD2DSV genes were associated with cancer patients with FCH (p < 0.05). We identified 65 immune-associated DSV markers for assessing cancer prognosis (p < 0.05). The functional protein of MUC4 DSV gene interacted with MAGE1 expression, according to the STRING database. The causal inference model showed that deleting the CEP72 DSV gene affect the recurrence-free survival (RFS) of IFIT1 expression. CONCLUSIONS: We established an explainable attention-weighted model for cancer risk prediction and used the survival-SVM for prognostic stratification by using germline DSVs and immune gene expression datasets. Comprehensive assessments of germline DSVs can predict the cancer risk and clinical outcome of colon cancer patients.
Assuntos
Neoplasias Colorretais/genética , Predisposição Genética para Doença , Proteínas Associadas aos Microtúbulos/genética , Mucina-4/genética , Adulto , Idoso , Neoplasias Colorretais/imunologia , Neoplasias Colorretais/patologia , Feminino , Regulação Neoplásica da Expressão Gênica , Mutação em Linhagem Germinativa/genética , Humanos , Imunidade/genética , Imunidade/imunologia , Masculino , Pessoa de Meia-Idade , Deleção de Sequência/genética , Microambiente Tumoral/genética , Microambiente Tumoral/imunologiaRESUMO
BACKGROUND: Automated interpretation of echocardiography by deep neural networks could support clinical reporting and improve efficiency. Whereas previous studies have evaluated spatial relationships using still frame images, we aimed to train and test a deep neural network for video analysis by combining spatial and temporal information, to automate the recognition of left ventricular regional wall motion abnormalities. METHODS: We collected a series of transthoracic echocardiography examinations performed between July 2017 and April 2018 in 2 tertiary care hospitals. Regional wall abnormalities were defined by experienced physiologists and confirmed by trained cardiologists. First, we developed a 3-dimensional convolutional neural network model for view selection ensuring stringent image quality control. Second, a U-net model segmented images to annotate the location of each left ventricular wall. Third, a final 3-dimensional convolutional neural network model evaluated echocardiographic videos from 4 standard views, before and after segmentation, and calculated a wall motion abnormality confidence level (0-1) for each segment. To evaluate model stability, we performed 5-fold cross-validation and external validation. RESULTS: In a series of 10 638 echocardiograms, our view selection model identified 6454 (61%) examinations with sufficient image quality in all standard views. In this training set, 2740 frames were annotated to develop the segmentation model, which achieved a Dice similarity coefficient of 0.756. External validation was performed in 1756 examinations from an independent hospital. A regional wall motion abnormality was observed in 8.9% and 4.9% in the training and external validation datasets, respectively. The final model recognized regional wall motion abnormalities in the cross-validation and external validation datasets with an area under the receiver operating characteristic curve of 0.912 (95% CI, 0.896-0.928) and 0.891 (95% CI, 0.834-0.948), respectively. In the external validation dataset, the sensitivity was 81.8% (95% CI, 73.8%-88.2%), and specificity was 81.6% (95% CI, 80.4%-82.8%). CONCLUSIONS: In echocardiographic examinations of sufficient image quality, it is feasible for deep neural networks to automate the recognition of regional wall motion abnormalities using temporal and spatial information from moving images. Further investigation is required to optimize model performance and evaluate clinical applications.
Assuntos
Ecocardiografia/métodos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Humanos , Lactente , Recém-Nascido , Pessoa de Meia-Idade , Redes Neurais de Computação , Adulto JovemRESUMO
BACKGROUND: Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced. OBJECTIVE: The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl. METHODS: We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed. RESULTS: Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions. CONCLUSIONS: We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.
Assuntos
Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico , Aprendizado de Máquina/normas , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Humanos , Reprodutibilidade dos TestesRESUMO
BACKGROUND: As the COVID-19 epidemic increases in severity, the burden of quarantine stations outside emergency departments (EDs) at hospitals is increasing daily. To address the high screening workload at quarantine stations, all staff members with medical licenses are required to work shifts in these stations. Therefore, it is necessary to simplify the workflow and decision-making process for physicians and surgeons from all subspecialties. OBJECTIVE: The aim of this paper is to demonstrate how the National Cheng Kung University Hospital artificial intelligence (AI) trilogy of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm improves medical care and reduces quarantine processing times. METHODS: This observational study on the emerging COVID-19 pandemic included 643 patients. An "AI trilogy" of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm on a tablet computer was applied to shorten the quarantine survey process and reduce processing time during the COVID-19 pandemic. RESULTS: The use of the AI trilogy facilitated the processing of suspected cases of COVID-19 with or without symptoms; also, travel, occupation, contact, and clustering histories were obtained with the tablet computer device. A separate AI-mode function that could quickly recognize pulmonary infiltrates on chest x-rays was merged into the smart clinical assisting system (SCAS), and this model was subsequently trained with COVID-19 pneumonia cases from the GitHub open source data set. The detection rates for posteroanterior and anteroposterior chest x-rays were 55/59 (93%) and 5/11 (45%), respectively. The SCAS algorithm was continuously adjusted based on updates to the Taiwan Centers for Disease Control public safety guidelines for faster clinical decision making. Our ex vivo study demonstrated the efficiency of disinfecting the tablet computer surface by wiping it twice with 75% alcohol sanitizer. To further analyze the impact of the AI application in the quarantine station, we subdivided the station group into groups with or without AI. Compared with the conventional ED (n=281), the survey time at the quarantine station (n=1520) was significantly shortened; the median survey time at the ED was 153 minutes (95% CI 108.5-205.0), vs 35 minutes at the quarantine station (95% CI 24-56; P<.001). Furthermore, the use of the AI application in the quarantine station reduced the survey time in the quarantine station; the median survey time without AI was 101 minutes (95% CI 40-153), vs 34 minutes (95% CI 24-53) with AI in the quarantine station (P<.001). CONCLUSIONS: The AI trilogy improved our medical care workflow by shortening the quarantine survey process and reducing the processing time, which is especially important during an emerging infectious disease epidemic.
Assuntos
Inteligência Artificial , Betacoronavirus , Quarentena , Adulto , COVID-19 , Infecções por Coronavirus , Feminino , Hospitais de Isolamento , Humanos , Pessoa de Meia-Idade , Pandemias , Pneumonia Viral , Quarentena/métodos , SARS-CoV-2 , Inquéritos e Questionários , Taiwan/epidemiologiaRESUMO
Drug development is an expensive and time-consuming process; these could be reduced if the existing resources could be used to identify candidates for drug repurposing. This study sought to do this by text mining a large-scale literature repository to curate repurposed drug lists for different cancers. We devised a pattern-based relationship extraction method to extract disease-gene and gene-drug direct relationships from the literature. These direct relationships are used to infer indirect relationships using the ABC model. A gene-shared ranking method based on drug target similarity was then proposed to prioritize the indirect relationships. Our method of assessing drug target similarity correlated to existing anatomical therapeutic chemical code-based methods with a Pearson correlation coefficient of 0.9311. The indirect relationships ranking method achieved a significant mean average precision score of top 100 most common diseases. We also confirmed the suitability of candidates identified for repurposing as anticancer drugs by conducting a manual review of the literature and the clinical trials. Eventually, for visualization and enrichment of huge amount of repurposed drug information, a chord diagram was demonstrated to rapidly identify two novel indications for further biological evaluations.
Assuntos
Reposicionamento de Medicamentos , Mineração de Dados , HumanosRESUMO
BACKGROUND: Adverse drug reactions (ADRs) are common and are the underlying cause of over a million serious injuries and deaths each year. The most familiar method to detect ADRs is relying on spontaneous reports. Unfortunately, the low reporting rate of spontaneous reports is a serious limitation of pharmacovigilance. OBJECTIVE: The objective of this study was to identify a method to detect potential ADRs of drugs automatically using a deep neural network (DNN). METHODS: We designed a DNN model that utilizes the chemical, biological, and biomedical information of drugs to detect ADRs. This model aimed to fulfill two main purposes: identifying the potential ADRs of drugs and predicting the possible ADRs of a new drug. For improving the detection performance, we distributed representations of the target drugs in a vector space to capture the drug relationships using the word-embedding approach to process substantial biomedical literature. Moreover, we built a mapping function to address new drugs that do not appear in the dataset. RESULTS: Using the drug information and the ADRs reported up to 2009, we predicted the ADRs of drugs recorded up to 2012. There were 746 drugs and 232 new drugs, which were only recorded in 2012 with 1325 ADRs. The experimental results showed that the overall performance of our model with mean average precision at top-10 achieved is 0.523 and the rea under the receiver operating characteristic curve (AUC) score achieved is 0.844 for ADR prediction on the dataset. CONCLUSIONS: Our model is effective in identifying the potential ADRs of a drug and the possible ADRs of a new drug. Most importantly, it can detect potential ADRs irrespective of whether they have been reported in the past.
Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Redes Neurais de Computação , Humanos , ProibitinasRESUMO
There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.
Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Animais , Projeto Genoma Humano , Humanos , Medicina de Precisão/métodos , Pesquisa , Ferramenta de BuscaRESUMO
Targeted metabolomics and biochemical studies complement the ongoing investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven Human Proteome Project (B/D-HPP). However, it is challenging to identify and prioritize metabolite and chemical targets. Literature-mining-based approaches have been proposed for target proteomics studies, but text mining methods for metabolite and chemical prioritization are hindered by a large number of synonyms and nonstandardized names of each entity. In this study, we developed a cloud-based literature mining and summarization platform that maps metabolites and chemicals in the literature to unique identifiers and summarizes the copublication trends of metabolites/chemicals and B/D-HPP topics using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores. We successfully prioritized metabolites and chemicals associated with the B/D-HPP targeted fields and validated the results by checking against expert-curated associations and enrichment analyses. Compared with existing algorithms, our system achieved better precision and recall in retrieving chemicals related to B/D-HPP focused areas. Our cloud-based platform enables queries on all biological terms in multiple species, which will contribute to B/D-HPP and targeted metabolomics/chemical studies.
Assuntos
Computação em Nuvem , Metabolômica , Proteoma , Algoritmos , Mineração de Dados/métodos , Humanos , Ferramenta de BuscaRESUMO
Systems scale models provide the foundation for an effective iterative cycle between hypothesis generation, experiment and model refinement. Such models also enable predictions facilitating the understanding of biological complexity and the control of biological systems. Here, we demonstrate the reconstruction of a globally predictive gene regulatory model from public data: a model that can drive rational experiment design and reveal new regulatory mechanisms underlying responses to novel environments. Specifically, using â¼ 1500 publically available genome-wide transcriptome data sets from Saccharomyces cerevisiae, we have reconstructed an environment and gene regulatory influence network that accurately predicts regulatory mechanisms and gene expression changes on exposure of cells to completely novel environments. Focusing on transcriptional networks that induce peroxisomes biogenesis, the model-guided experiments allow us to expand a core regulatory network to include novel transcriptional influences and linkage across signaling and transcription. Thus, the approach and model provides a multi-scalar picture of gene dynamics and are powerful resources for exploiting extant data to rationally guide experimentation. The techniques outlined here are generally applicable to any biological system, which is especially important when experimental systems are challenging and samples are difficult and expensive to obtain-a common problem in laboratory animal and human studies.
Assuntos
Redes Reguladoras de Genes , Biologia de Sistemas/métodos , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Saccharomyces cerevisiae/genéticaRESUMO
MOTIVATION: With the spreading technique of mass sequencing, nucleosome positions and scores for their intensity have become available through several previous studies in yeast, but relatively few studies have specifically aimed to determine the score of nucleosome stability. Based on mass sequencing data, we proposed a nucleosome center score (NCS) for quantifying nucleosome stability by measuring shifts of the nucleosome center, and then mapping NCS scores to nucleosome positions in Brogaard et al.'s study. RESULTS: We demonstrated the efficiency of NCS by known preference of A/T-based tracts for nucleosome formation, and showed that central nucleosomal DNA is more sensitive to A/T-based tracts than outer regions, which corresponds to the central histone tetramer-dominated region. We also found significant flanking preference around nucleosomal DNA for A/T-based dinucleotides, suggesting that neighboring sequences could affect nucleosome stability. Finally, the difference between results of NCS and Brogaard et al.'s scores was addressed and discussed. CONTACTS: jchiang@mail.ncku.edu.tw SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Nucleossomos/genética , Saccharomyces cerevisiae/genética , DNA Fúngico/genética , Sequenciamento de Nucleotídeos em Larga Escala , Poli A/genética , Poli T/genética , Saccharomyces cerevisiae/metabolismoRESUMO
MOTIVATION: Protein phosphorylation is critical for regulating cellular activities by controlling protein activities, localization and turnover, and by transmitting information within cells through signaling networks. However, predictions of protein phosphorylation and signaling networks remain a significant challenge, lagging behind predictions of transcriptional regulatory networks into which they often feed. RESULTS: We developed PhosphoChain to predict kinases, phosphatases and chains of phosphorylation events in signaling networks by combining mRNA expression levels of regulators and targets with a motif detection algorithm and optional prior information. PhosphoChain correctly reconstructed â¼78% of the yeast mitogen-activated protein kinase pathway from publicly available data. When tested on yeast phosphoproteomic data from large-scale mass spectrometry experiments, PhosphoChain correctly identified â¼27% more phosphorylation sites than existing motif detection tools (NetPhosYeast and GPS2.0), and predictions of kinase-phosphatase interactions overlapped with â¼59% of known interactions present in yeast databases. PhosphoChain provides a valuable framework for predicting condition-specific phosphorylation events from high-throughput data. AVAILABILITY: PhosphoChain is implemented in Java and available at http://virgo.csie.ncku.edu.tw/PhosphoChain/ or http://aitchisonlab.com/PhosphoChain
Assuntos
Algoritmos , Sistema de Sinalização das MAP Quinases , Fosfoproteínas Fosfatases/metabolismo , Proteínas Quinases/metabolismo , Sequência de Aminoácidos , Regulação Enzimológica da Expressão Gênica , Genoma , Dados de Sequência Molecular , Fosfoproteínas Fosfatases/química , Fosfoproteínas Fosfatases/genética , Fosforilação , Proteínas Quinases/química , Proteínas Quinases/genética , RNA Mensageiro/biossíntese , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMO
Despite numerous research efforts by teams participating in the BioCreative VIII Track 01 employing various techniques to achieve the high accuracy of biomedical relation tasks, the overall performance in this area still has substantial room for improvement. Large language models bring a new opportunity to improve the performance of existing techniques in natural language processing tasks. This paper presents our improved method for relation extraction, which involves integrating two renowned large language models: Gemini and GPT-4. Our new approach utilizes GPT-4 to generate augmented data for training, followed by an ensemble learning technique to combine the outputs of diverse models to create a more precise prediction. We then employ a method using Gemini responses as input to fine-tune the BioNLP-PubMed-Bert classification model, which leads to improved performance as measured by precision, recall, and F1 scores on the same test dataset used in the challenge evaluation. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-1/.
Assuntos
Processamento de Linguagem Natural , Mineração de Dados/métodos , Bases de Dados Factuais , Biologia Computacional/métodos , Aprendizado de MáquinaRESUMO
By using omics, we can now examine all components of biological systems simultaneously. Deep learning-based drug prediction methods have shown promise by integrating cancer-related multi-omics data. However, the complex interaction between genes poses challenges in accurately projecting multi-omics data. In this research, we present a predictive model for drug response that incorporates diverse types of omics data, comprising genetic mutation, copy number variation, methylation, and gene expression data. This study proposes latent alignment for information mismatch in integration, which is achieved through an attention module capturing interactions among diverse types of omics data. The latent alignment and attention modules significantly improve predictions, outperforming the baseline model, with MSE = 1.1333, F1-score = 0.5342, and AUROC = 0.5776. High accuracy was achieved in predicting drug responses for piplartine and tenovin-6, while the accuracy was comparatively lower for mitomycin-C and obatoclax. The latent alignment module exclusively outperforms the baseline model, enhancing the MSE by 0.2375, the F1-score by 4.84%, and the AUROC by 6.1%. Similarly, the attention module only improves these metrics by 0.1899, 2.88%, and 2.84%, respectively. In the interpretability case study, panobinostat exhibited the most effective predicted response, with a value of -4.895. We provide reliable insights for drug selection in personalized medicine by identifying crucial genetic factors influencing drug response.
RESUMO
Background: Intradialytic hypotension (IDH) is a common hemodialysis complication causing adverse outcomes. Despite the well-documented associations of ambient temperatures with fluid removal and pre-dialysis blood pressure (BP), the relationship between ambient temperature and IDH has not been adequately studied. Methods: We conducted a cohort study at a tertiary hospital in southern Taiwan between 1 January 2016 and 31 October 2021. The 24-h pre-hemodialysis mean ambient temperature was determined using hourly readings from the weather station closest to each patient's residence. IDH was defined using Fall40 [systolic BP (SBP) drop of ≥40 mmHg] or Nadir90/100 (SBP <100 if pre-dialysis SBP was ≥160, or SBP <90 mmHg). Multivariate logistic regression with generalizing estimating equations and mediation analysis were utilized. Results: The study examined 110 400 hemodialysis sessions from 182 patients, finding an IDH prevalence of 11.8% and 10.4% as per the Fall40 and Nadir90/100 criteria, respectively. It revealed a reverse J-shaped relationship between ambient temperature and IDH, with a turning point around 27°C. For temperatures under 27°C, a 4°C drop significantly increased the odds ratio of IDH to 1.292 [95% confidence interval (CI) 1.228 to 1.358] and 1.207 (95% CI 1.149 to 1.268) under the Fall40 and Nadir90/100 definitions, respectively. Lower ambient temperatures correlated with higher ultrafiltration, accounting for about 23% of the increased IDH risk. Stratified seasonal analysis indicated that this relationship was consistent in spring, autumn and winter. Conclusion: Lower ambient temperature is significantly associated with an increased risk of IDH below the threshold of 27°C, irrespective of the IDH definition. This study provides further insight into environmental risk factors for IDH in patients undergoing hemodialysis.
RESUMO
The BioRED track at BioCreative VIII calls for a community effort to identify, semantically categorize, and highlight the novelty factor of the relationships between biomedical entities in unstructured text. Relation extraction is crucial for many biomedical natural language processing (NLP) applications, from drug discovery to custom medical solutions. The BioRED track simulates a real-world application of biomedical relationship extraction, and as such, considers multiple biomedical entity types, normalized to their specific corresponding database identifiers, as well as defines relationships between them in the documents. The challenge consisted of two subtasks: (i) in Subtask 1, participants were given the article text and human expert annotated entities, and were asked to extract the relation pairs, identify their semantic type and the novelty factor, and (ii) in Subtask 2, participants were given only the article text, and were asked to build an end-to-end system that could identify and categorize the relationships and their novelty. We received a total of 94 submissions from 14 teams worldwide. The highest F-score performances achieved for the Subtask 1 were: 77.17% for relation pair identification, 58.95% for relation type identification, 59.22% for novelty identification, and 44.55% when evaluating all of the above aspects of the comprehensive relation extraction. The highest F-score performances achieved for the Subtask 2 were: 55.84% for relation pair, 43.03% for relation type, 42.74% for novelty, and 32.75% for comprehensive relation extraction. The entire BioRED track dataset and other challenge materials are available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/ and https://codalab.lisn.upsaclay.fr/competitions/13377 and https://codalab.lisn.upsaclay.fr/competitions/13378. Database URL: https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/https://codalab.lisn.upsaclay.fr/competitions/13377https://codalab.lisn.upsaclay.fr/competitions/13378.
Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , SemânticaRESUMO
MOTIVATION: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection. RESULT: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles. AVAILABILITY: AutoBind is automatically updated on a monthly basis, and it is freely available at http://autobind.csie.ncku.edu.tw/ and http://autobind.mc.ntu.edu.tw/. All of the deposited binding affinities have been refined and approved manually before being released.
Assuntos
Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Ligantes , Ligação Proteica , Software , Algoritmos , Biologia Computacional/métodosRESUMO
BACKGROUND AND PURPOSE: Radiation-induced hypothyroidism (RIHT) is a common but underestimated late effect in head and neck cancers. However, no consensus exists regarding risk prediction or dose constraints in RIHT. We aimed to develop a machine learning model for the accurate risk prediction of RIHT based on clinical and dose-volume features and to evaluate its performance internally and externally. MATERIALS AND METHODS: We retrospectively searched two institutions for patients aged >20 years treated with definitive radiotherapy for nasopharyngeal or oropharyngeal cancer, and extracted their clinical information and dose-volume features. One was designated the developmental cohort, the other as the external validation cohort. We compared the performances of machine learning models with those of published normal tissue complication probability (NTCP) models. RESULTS: The developmental and external validation cohorts consisted of 378 and 49 patients, respectively. The estimated cumulative incidence rates of grade ≥1 hypothyroidism were 53.5% and 61.3% in the developmental and external validation cohorts, respectively. Machine learning models outperformed traditional NTCP models by having lower Brier scores at every time point and a lower integrated Brier score, while demonstrating a comparable calibration index and mean area under the curve. Even simplified machine learning models using only thyroid features performed better than did traditional NTCP algorithms. The machine learning models showed consistent performance between folds. The performance in a previously unseen external validation cohort was comparable to that of the cross-validation. CONCLUSIONS: Our model outperformed traditional NTCP models, with additional capabilities of predicting the RIHT risk at individual time points. A simplified model using only thyroid dose-volume features still outperforms traditional NTCP models and can be incorporated into future treatment planning systems for biological optimization.