Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.859
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39358034

RESUMO

We sought to develop and validate a machine learning (ML) model for predicting multidimensional frailty based on clinical and laboratory data. Moreover, an explainable ML model utilizing SHapley Additive exPlanations (SHAP) was constructed. This study enrolled 622 patients hospitalized due to decompensating episodes at a tertiary hospital. The cohort data were randomly divided into training and test sets. External validation was carried out using 131 patients from other tertiary hospitals. The frail phenotype was defined according to a self-reported questionnaire (Frailty Index). The area under the receiver operating characteristics curve was adopted to compare the performance of five ML models. The importance of the features and interpretation of the ML models were determined using the SHAP method. The proportions of cirrhotic patients with nonfrail and frail phenotypes in combined training and test sets were 87.8% and 12.2%, respectively, while they were 88.5% and 11.5% in the external validation dataset. Five ML algorithms were used, and the random forest (RF) model exhibited substantially predictive performance. Regarding the external validation, the RF algorithm outperformed other ML models. Moreover, the SHAP method demonstrated that neutrophil-to-lymphocyte ratio, age, lymphocyte-to-monocyte ratio, ascites, and albumin served as the most important predictors for frailty. At the patient level, the SHAP force plot and decision plot exhibited a clinically meaningful explanation of the RF algorithm. We constructed an ML model (RF) providing accurate prediction of frail phenotype in decompensated cirrhosis. The explainability and generalizability may foster clinicians to understand contributors to this physiologically vulnerable situation and tailor interventions.


Assuntos
Fragilidade , Hospitalização , Cirrose Hepática , Aprendizado de Máquina , Humanos , Cirrose Hepática/complicações , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Algoritmos , Curva ROC
2.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39038932

RESUMO

MOTIVATION: Drug repositioning, the identification of new therapeutic uses for existing drugs, is crucial for accelerating drug discovery and reducing development costs. Some methods rely on heterogeneous networks, which may not fully capture the complex relationships between drugs and diseases. However, integrating diverse biological data sources offers promise for discovering new drug-disease associations (DDAs). Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. However, the challenge lies in effectively integrating different biological data sources to identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms. RESULTS: In response to this challenge, we present MiRAGE, a novel computational method for drug repositioning. MiRAGE leverages a three-step framework, comprising negative sampling using hard negative mining, classification employing random forest models, and feature selection based on feature importance. We evaluate MiRAGE on multiple benchmark datasets, demonstrating its superiority over state-of-the-art algorithms across various metrics. Notably, MiRAGE consistently outperforms other methods in uncovering novel DDAs. Case studies focusing on Parkinson's disease and schizophrenia showcase MiRAGE's ability to identify top candidate drugs supported by previous studies. Overall, our study underscores MiRAGE's efficacy and versatility as a computational tool for drug repositioning, offering valuable insights for therapeutic discoveries and addressing unmet medical needs.


Assuntos
Algoritmos , Mineração de Dados , Reposicionamento de Medicamentos , Reposicionamento de Medicamentos/métodos , Mineração de Dados/métodos , Humanos , Biologia Computacional/métodos , Esquizofrenia/tratamento farmacológico , Doença de Parkinson/tratamento farmacológico , Descoberta de Drogas/métodos
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38622357

RESUMO

Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.


Assuntos
Pseudouridina , Algoritmo Florestas Aleatórias , Pseudouridina/genética , RNA/genética , Sequência de Bases
4.
J Neurosci ; 44(39)2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39187379

RESUMO

Recording and analysis of neural activity are often biased toward detecting sparse subsets of highly active neurons, masking important signals carried in low-magnitude and variable responses. To investigate the contribution of seemingly noisy activity to odor encoding, we used mesoscale calcium imaging from mice of both sexes to record odor responses from the dorsal surface of bilateral olfactory bulbs (OBs). The outer layer of the mouse OB is comprised of dendrites organized into discrete "glomeruli," which are defined by odor receptor-specific sensory neuron input. We extracted activity from a large population of glomeruli and used logistic regression to classify odors from individual trials with high accuracy. We then used add-in and dropout analyses to determine subsets of glomeruli necessary and sufficient for odor classification. Classifiers successfully predicted odor identity even after excluding sparse, highly active glomeruli, indicating that odor information is redundantly represented across a large population of glomeruli. Additionally, we found that random forest (RF) feature selection informed by Gini inequality (RF Gini impurity, RFGI) reliably ranked glomeruli by their contribution to overall odor classification. RFGI provided a measure of "feature importance" for each glomerulus that correlated with intuitive features like response magnitude. Finally, in agreement with previous work, we found that odor information persists in glomerular activity after the odor offset. Together, our findings support a model of OB odor coding where sparse activity is sufficient for odor identification, but information is widely, redundantly available across a large population of glomeruli, with each glomerulus representing information about more than one odor.


Assuntos
Camundongos Endogâmicos C57BL , Odorantes , Bulbo Olfatório , Vigília , Animais , Bulbo Olfatório/fisiologia , Camundongos , Masculino , Feminino , Vigília/fisiologia , Olfato/fisiologia , Neurônios Receptores Olfatórios/fisiologia
5.
Am J Hum Genet ; 109(2): 195-209, 2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35032432

RESUMO

Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic cause of a portion of these unresolved cases. As sequencing methods using long or linked reads become more accessible and SV detection algorithms improve, clinicians and researchers are gaining access to thousands of reliable SVs of unknown disease relevance. Methods to predict the pathogenicity of these SVs are required to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE to distinguish pathogenic SVs from benign SVs that overlap exons. In a random forest classifier, we integrated features that capture gene importance, coding region, conservation, expression, and exon structure. We found that features such as expression and conservation are important but are absent from SV classification guidelines. We leveraged multiple resources to construct a size-matched training set of rare, putatively benign and pathogenic SVs. StrVCTVRE performs accurately across a wide SV size range on independent test sets, which will allow clinicians and researchers to eliminate about half of SVs from consideration while retaining a 90% sensitivity. We anticipate clinicians and researchers will use StrVCTVRE to prioritize SVs in probands where no SV is immediately compelling, empowering deeper investigation into novel SVs to resolve cases and understand new mechanisms of disease. StrVCTVRE runs rapidly and is publicly available.


Assuntos
Algoritmos , Genoma Humano , Variação Estrutural do Genoma , Software , Aprendizado de Máquina Supervisionado , Conjuntos de Dados como Assunto , Éxons , Genômica/métodos , Humanos , Curva ROC , Sequenciamento Completo do Genoma/estatística & dados numéricos
6.
Biostatistics ; 25(4): 933-946, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-38332633

RESUMO

Clinicians and patients must make treatment decisions at a series of key decision points throughout disease progression. A dynamic treatment regime is a set of sequential decision rules that return treatment decisions based on accumulating patient information, like that commonly found in electronic medical record (EMR) data. When applied to a patient population, an optimal treatment regime leads to the most favorable outcome on average. Identifying optimal treatment regimes that maximize residual life is especially desirable for patients with life-threatening diseases such as sepsis, a complex medical condition that involves severe infections with organ dysfunction. We introduce the residual life value estimator (ReLiVE), an estimator for the expected value of cumulative restricted residual life under a fixed treatment regime. Building on ReLiVE, we present a method for estimating an optimal treatment regime that maximizes expected cumulative restricted residual life. Our proposed method, ReLiVE-Q, conducts estimation via the backward induction algorithm Q-learning. We illustrate the utility of ReLiVE-Q in simulation studies, and we apply ReLiVE-Q to estimate an optimal treatment regime for septic patients in the intensive care unit using EMR data from the Multiparameter Intelligent Monitoring Intensive Care database. Ultimately, we demonstrate that ReLiVE-Q leverages accumulating patient information to estimate personalized treatment regimes that optimize a clinically meaningful function of residual life.


Assuntos
Registros Eletrônicos de Saúde , Humanos , Sepse/terapia , Modelos Estatísticos
7.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36528388

RESUMO

Membrane-based cells are the fundamental structural and functional units of organisms, while evidences demonstrate that liquid-liquid phase separation (LLPS) is associated with the formation of membraneless organelles, such as P-bodies, nucleoli and stress granules. Many studies have been undertaken to explore the functions of protein phase separation (PS), but these studies lacked an effective tool to identify the sequence segments that critical for LLPS. In this study, we presented a novel software called dSCOPE (http://dscope.omicsbio.info) to predict the PS-driving regions. To develop the predictor, we curated experimentally identified sequence segments that can drive LLPS from published literature. Then sliding sequence window based physiological, biochemical, structural and coding features were integrated by random forest algorithm to perform prediction. Through rigorous evaluation, dSCOPE was demonstrated to achieve satisfactory performance. Furthermore, large-scale analysis of human proteome based on dSCOPE showed that the predicted PS-driving regions enriched various protein post-translational modifications and cancer mutations, and the proteins which contain predicted PS-driving regions enriched critical cellular signaling pathways. Taken together, dSCOPE precisely predicted the protein sequence segments critical for LLPS, with various helpful information visualized in the webserver to facilitate LLPS-related research.


Assuntos
Proteínas , Software , Humanos , Proteínas/química
8.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37141141

RESUMO

Microbiome-based diagnosis of cancer is an increasingly important supplement for the genomics approach in cancer diagnosis, yet current models for microbiome-based diagnosis of cancer face difficulties in generality: not only diagnosis models could not be adapted from one cancer to another, but models built based on microbes from tissues could not be adapted for diagnosis based on microbes from blood. Therefore, a microbiome-based model suitable for a broad spectrum of cancer types is urgently needed. Here we have introduced DeepMicroCancer, a diagnosis model using artificial intelligence techniques for a broad spectrum of cancer types. Built based on the random forest models it has enabled superior performances on more than twenty types of cancers' tissue samples. And by using the transfer learning techniques, improved accuracies could be obtained, especially for cancer types with only a few samples, which could satisfy the requirement in clinical scenarios. Moreover, transfer learning techniques have enabled high diagnosis accuracy that could also be achieved for blood samples. These results indicated that certain sets of microbes could, if excavated using advanced artificial techniques, reveal the intricate differences among cancers and healthy individuals. Collectively, DeepMicroCancer has provided a new venue for accurate diagnosis of cancer based on tissue and blood materials, which could potentially be used in clinics.


Assuntos
Líquidos Corporais , Microbiota , Neoplasias , Humanos , Inteligência Artificial , Neoplasias/diagnóstico , Genômica
9.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37406190

RESUMO

Studies have confirmed that the occurrence of many complex diseases in the human body is closely related to the microbial community, and microbes can affect tumorigenesis and metastasis by regulating the tumor microenvironment. However, there are still large gaps in the clinical observation of the microbiota in disease. Although biological experiments are accurate in identifying disease-associated microbes, they are also time-consuming and expensive. The computational models for effective identification of diseases related microbes can shorten this process, and reduce capital and time costs. Based on this, in the paper, a model named DSAE_RF is presented to predict latent microbe-disease associations by combining multi-source features and deep learning. DSAE_RF calculates four similarities between microbes and diseases, which are then used as feature vectors for the disease-microbe pairs. Later, reliable negative samples are screened by k-means clustering, and a deep sparse autoencoder neural network is further used to extract effective features of the disease-microbe pairs. In this foundation, a random forest classifier is presented to predict the associations between microbes and diseases. To assess the performance of the model in this paper, 10-fold cross-validation is implemented on the same dataset. As a result, the AUC and AUPR of the model are 0.9448 and 0.9431, respectively. Furthermore, we also conduct a variety of experiments, including comparison of negative sample selection methods, comparison with different models and classifiers, Kolmogorov-Smirnov test and t-test, ablation experiments, robustness analysis, and case studies on Covid-19 and colorectal cancer. The results fully demonstrate the reliability and availability of our model.


Assuntos
COVID-19 , Aprendizado Profundo , Microbiota , Humanos , Reprodutibilidade dos Testes , Algoritmos , Biologia Computacional/métodos
10.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36988160

RESUMO

Small open reading frames (smORFs) encoding proteins less than 100 amino acids (aa) are known to be important regulators of key cellular processes. However, their computational identification remains a challenge. Based on a comprehensive analysis of known prokaryotic small ORFs, we have developed the ProsmORF-pred resource which uses a machine learning (ML)-based method for prediction of smORFs in the prokaryotic genome sequences. ProsmORF-pred consists of two ML models, one for initiation site recognition in nucleic acid sequences upstream of putative start codons and the other uses translated amino acid sequences to decipher functional protein like sequences. The nucleotide sequence-based initiation site recognition model has been trained using longer ORFs (>100 aa) in the same genome while the ML model for identification of protein like sequences has been trained using annotated smORFs from Escherichia coli. Comprehensive benchmarking of ProsmORF-pred reveals that its performance is comparable to other state-of-the-art approaches on the annotated smORF set derived from 32 prokaryotic genomes. Its performance is distinctly superior to other tools like PRODIGAL and RANSEPS for prediction of newly identified smORFs which have a length range of 10-30 aa, where prediction of smORFs has been a major challenge. Apart from identification of smORFs in genomic sequences, ProsmORF-pred can also aid in functional annotation of the predicted smORFs based on sequence similarity and genomic neighbourhood similarity searches in ProsmORFDB, a well-curated database of known smORFs. ProsmORF-pred along with its backend database ProsmORFDB is available as a user-friendly web server (http://www.nii.ac.in/prosmorfpred.html).


Assuntos
Genoma , Proteínas , Fases de Leitura Aberta , Proteínas/genética , Genômica , Sequência de Aminoácidos
11.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36715277

RESUMO

N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.


Assuntos
Aprendizado de Máquina , RNA , Sequência de Bases , RNA/genética , Redes Neurais de Computação
12.
FASEB J ; 38(1): e23370, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38168496

RESUMO

Aging is acknowledged as the most significant risk factor for cardiovascular disease (CVD). This study sought to identify and validate potential aging-related genes associated with CVD by using bioinformatics. The confluence of the limma test, weighted correlation network analysis (WGCNA), and 2129 aging and senescence-associated genes led to the identification of aging-related differential expression genes (ARDEGs). By using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), potential biological roles and pathways of ARDEGs were identified. To find the significantly different functions between CVD and non-cardiovascular disease (nCVD) and to reckon the processes score, enrichment analysis of all genes was carried out using gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA). By using GO and KEGG, potential biological roles and pathways of ARDEGs were identified. To evaluate the immune cell composition of the immune microenvironment, we performed an immune infiltration analysis on the dataset from the training group. We were able to acquire four ARDEGs (PTGS2, MMP9, HBEGF, and FN1). Aging, cellular senescence, and nitric oxide signal transduction were selected for biological function analysis. The diagnostic value of the four ARDEGs in distinguishing CVD from nCVD samples was deemed to be favorable. This research identified four ARDEGs that are associated with CVD. This study provides insight into prospective novel biomarkers for aging-related CVD diagnosis and progression monitoring.


Assuntos
Doenças Cardiovasculares , Sistema Cardiovascular , Humanos , Doenças Cardiovasculares/genética , Estudos Prospectivos , Senescência Celular , Biologia Computacional
13.
Methods ; 231: 26-36, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39270885

RESUMO

Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.

14.
Methods ; 223: 56-64, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38237792

RESUMO

DNA-binding proteins are a class of proteins that can interact with DNA molecules through physical and chemical interactions. Their main functions include regulating gene expression, maintaining chromosome structure and stability, and more. DNA-binding proteins play a crucial role in cellular and molecular biology, as they are essential for maintaining normal cellular physiological functions and adapting to environmental changes. The prediction of DNA-binding proteins has been a hot topic in the field of bioinformatics. The key to accurately classifying DNA-binding proteins is to find suitable feature sources and explore the information they contain. Although there are already many models for predicting DNA-binding proteins, there is still room for improvement in mining feature source information and calculation methods. In this study, we created a model called DBPboost to better identify DNA-binding proteins. The innovation of this study lies in the use of eight feature extraction methods, the improvement of the feature selection step, which involves selecting some features first and then performing feature selection again after feature fusion, and the optimization of the differential evolution algorithm in feature fusion, which improves the performance of feature fusion. The experimental results show that the prediction accuracy of the model on the UniSwiss dataset is 89.32%, and the sensitivity is 89.01%, which is better than most existing models.


Assuntos
Proteínas de Ligação a DNA , Máquina de Vetores de Suporte , Proteínas de Ligação a DNA/química , Algoritmos , DNA/química , Biologia Computacional/métodos
15.
Cereb Cortex ; 34(9)2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39329355

RESUMO

The diagnosis of Parkinson's Disease (PD) presents ongoing challenges. Advances in imaging techniques like 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET) have highlighted metabolic alterations in PD, yet the dynamic network interactions within the metabolic connectome remain elusive. To this end, we examined a dataset comprising 49 PD patients and 49 healthy controls. By employing a personalized metabolic connectome approach, we assessed both within- and between-network connectivities using Standard Uptake Value (SUV) and Jensen-Shannon Divergence Similarity Estimation (JSSE). A random forest algorithm was utilized to pinpoint key neuroimaging features differentiating PD from healthy states. Specifically, the results revealed heightened internetwork connectivity in PD, specifically within the somatomotor (SMN) and frontoparietal (FPN) networks, persisting after multiple comparison corrections (P < 0.05, Bonferroni adjusted for 10% and 20% sparsity). This altered connectivity effectively distinguished PD patients from healthy individuals. Notably, this study utilizes 18F-FDG PET imaging to map individual metabolic networks, revealing enhanced connectivity in the SMN and FPN among PD patients. This enhanced connectivity may serve as a promising imaging biomarker, offering a valuable asset for early PD detection.


Assuntos
Encéfalo , Conectoma , Fluordesoxiglucose F18 , Doença de Parkinson , Tomografia por Emissão de Pósitrons , Humanos , Doença de Parkinson/diagnóstico por imagem , Doença de Parkinson/metabolismo , Doença de Parkinson/fisiopatologia , Feminino , Masculino , Tomografia por Emissão de Pósitrons/métodos , Pessoa de Meia-Idade , Idoso , Conectoma/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Biomarcadores , Redes e Vias Metabólicas/fisiologia , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/metabolismo , Imageamento por Ressonância Magnética/métodos , Vias Neurais/diagnóstico por imagem , Vias Neurais/fisiopatologia
16.
Eur Heart J ; 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39217456

RESUMO

BACKGROUND: and aims: Cardiogenic shock (CS) remains the primary cause of in-hospital death after acute coronary syndromes (ACS), with its plateauing mortality rates approaching 50%. To test novel interventions, personalized risk prediction is essential. The ORBI (Observatoire Régional Breton sur l'Infarctus) score represents the first-of-its-kind risk score to predict in-hospital CS in ACS patients undergoing percutaneous coronary intervention (PCI). However, its sex-specific performance remains unknown, and refined risk prediction strategies are warranted. METHODS: This multinational study included a total of 53 537 ACS patients without CS on admission undergoing PCI. Following sex-specific evaluation of ORBI, regression and machine-learning models were used for variable selection and risk prediction. By combining best-performing models with highest-ranked predictors, SEX-SHOCK was developed, and internally and externally validated. RESULTS: The ORBI score showed lower discriminative performance for the prediction of CS in females than males in Swiss (AUC [95% CI]: 0.78 [0.76-0.81] vs. 0.81 [0.79-0.83]; p=0.048) and French ACS patients (0.77 [0.74-0.81] vs. 0.84 [0.81-0.86]; p=0.002). The newly developed SEX-SHOCK score, now incorporating ST-segment elevation, creatinine, C-reactive protein, and left ventricular ejection fraction, outperformed ORBI in both sexes (females: 0.81 [0.78-0.83]; males: 0.83 [0.82-0.85]; p<0.001), which prevailed following internal and external validation in RICO (females: 0.82 [0.79-0.85]; males: 0.88 [0.86-0.89]; p<0.001) and SPUM-ACS (females: 0.83 [0.77-0.90], p=0.004; males: 0.83 [0.80-0.87], p=0.001). CONCLUSIONS: The ORBI score showed modest sex-specific performance. The novel SEX-SHOCK score provides superior performance in females and males across the entire spectrum of ACS, thus providing a basis for future interventional trials and contemporary ACS management.

17.
BMC Bioinformatics ; 25(1): 50, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38291384

RESUMO

BACKGROUND: Enzymes play an irreplaceable and important role in maintaining the lives of living organisms. The Enzyme Commission (EC) number of an enzyme indicates its essential functions. Correct identification of the first digit (family class) of the EC number for a given enzyme is a hot topic in the past twenty years. Several previous methods adopted functional domain composition to represent enzymes. However, it would lead to dimension disaster, thereby reducing the efficiency of the methods. On the other hand, most previous methods can only deal with enzymes belonging to one family class. In fact, several enzymes belong to two or more family classes. RESULTS: In this study, a fast and efficient multi-label classifier, named PredictEFC, was designed. To construct this classifier, a novel feature extraction scheme was designed for processing functional domain information of enzymes, which counting the distribution of each functional domain entry across seven family classes in the training dataset. Based on this scheme, each training or test enzyme was encoded into a 7-dimenion vector by fusing its functional domain information and above statistical results. Random k-labelsets (RAKEL) was adopted to build the classifier, where random forest was selected as the base classification algorithm. The two tenfold cross-validation results on the training dataset shown that the accuracy of PredictEFC can reach 0.8493 and 0.8370. The independent test on two datasets indicated the accuracy values of 0.9118 and 0.8777. CONCLUSION: The performance of PredictEFC was slightly lower than the classifier directly using functional domain composition. However, its efficiency was sharply improved. The running time was less than one-tenth of the time of the classifier directly using functional domain composition. In additional, the utility of PredictEFC was superior to the classifiers using traditional dimensionality reduction methods and some previous methods, and this classifier can be transplanted for predicting enzyme family classes of other species. Finally, a web-server available at http://124.221.158.221/ was set up for easy usage.


Assuntos
Algoritmos , Enzimas , Enzimas/classificação
18.
BMC Bioinformatics ; 25(1): 78, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38378437

RESUMO

BACKGROUND: In recent years, the extensive use of drugs and antibiotics has led to increasing microbial resistance. Therefore, it becomes crucial to explore deep connections between drugs and microbes. However, traditional biological experiments are very expensive and time-consuming. Therefore, it is meaningful to develop efficient computational models to forecast potential microbe-drug associations. RESULTS: In this manuscript, we proposed a novel prediction model called GARFMDA by combining graph attention networks and bilayer random forest to infer probable microbe-drug correlations. In GARFMDA, through integrating different microbe-drug-disease correlation indices, we constructed two different microbe-drug networks first. And then, based on multiple measures of similarity, we constructed a unique feature matrix for drugs and microbes respectively. Next, we fed these newly-obtained microbe-drug networks together with feature matrices into the graph attention network to extract the low-dimensional feature representations for drugs and microbes separately. Thereafter, these low-dimensional feature representations, along with the feature matrices, would be further inputted into the first layer of the Bilayer random forest model to obtain the contribution values of all features. And then, after removing features with low contribution values, these contribution values would be fed into the second layer of the Bilayer random forest to detect potential links between microbes and drugs. CONCLUSIONS: Experimental results and case studies show that GARFMDA can achieve better prediction performance than state-of-the-art approaches, which means that GARFMDA may be a useful tool in the field of microbe-drug association prediction in the future. Besides, the source code of GARFMDA is available at https://github.com/KuangHaiYue/GARFMDA.git.


Assuntos
Antibacterianos , Algoritmo Florestas Aleatórias , Probabilidade , Software
19.
BMC Bioinformatics ; 25(1): 18, 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-38212697

RESUMO

BACKGROUND: Metabolic syndrome (MetS) is a cluster of metabolic abnormalities (including obesity, insulin resistance, hypertension, and dyslipidemia), which can be used to identify at-risk populations for diabetes and cardiovascular diseases, the main causes of morbidity and mortality worldwide. The achievement of a simple approach for diagnosing MetS without needing biochemical tests is so valuable. The present study aimed to predict MetS using non-invasive features based on a successful random forest learning algorithm. Also, to deal with the problem of data imbalance that naturally exists in this type of data, the effect of two different data balancing approaches, including the Synthetic Minority Over-sampling Technique (SMOTE) and Random Splitting data balancing (SplitBal), on model performance is investigated. RESULTS: The most important determinant for MetS prediction was waist circumference. Applying a random forest learning algorithm to imbalanced data, the trained models reach 86.9% and 79.4% accuracies and 37.1% and 38.2% sensitivities in men and women, respectively. However, by applying the SplitBal data balancing technique, the best results were obtained, and despite that the accuracy of the trained models decreased by 7.8% and 11.3%, but their sensitivity improved significantly to 82.3% and 73.7% in men and women, respectively. CONCLUSIONS: The random forest learning method, along with data balancing techniques, especially SplitBal, could create MetS prediction models with promising results that can be applied as a useful prognostic tool in health screening programs.


Assuntos
Resistência à Insulina , Síndrome Metabólica , Masculino , Humanos , Feminino , Síndrome Metabólica/diagnóstico , Algoritmo Florestas Aleatórias , Fatores de Risco , Obesidade
20.
BMC Bioinformatics ; 25(1): 108, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38475723

RESUMO

RNA-protein interaction (RPI) is crucial to the life processes of diverse organisms. Various researchers have identified RPI through long-term and high-cost biological experiments. Although numerous machine learning and deep learning-based methods for predicting RPI currently exist, their robustness and generalizability have significant room for improvement. This study proposes LPI-MFF, an RPI prediction model based on multi-source information fusion, to address these issues. The LPI-MFF employed protein-protein interactions features, sequence features, secondary structure features, and physical and chemical properties as the information sources with the corresponding coding scheme, followed by the random forest algorithm for feature screening. Finally, all information was combined and a classification method based on convolutional neural networks is used. The experimental results of fivefold cross-validation demonstrated that the accuracy of LPI-MFF on RPI1807 and NPInter was 97.60% and 97.67%, respectively. In addition, the accuracy rate on the independent test set RPI1168 was 84.9%, and the accuracy rate on the Mus musculus dataset was 90.91%. Accordingly, LPI-MFF demonstrated greater robustness and generalization than other prevalent RPI prediction methods.


Assuntos
Aprendizado Profundo , RNA Longo não Codificante , Animais , Camundongos , RNA Longo não Codificante/química , Algoritmo Florestas Aleatórias , Redes Neurais de Computação , Aprendizado de Máquina , Biologia Computacional/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA