Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 208
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38366802

RESUMO

Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/.


Assuntos
Algoritmos , Peptídeos , Humanos , Sequência de Aminoácidos , Peptídeos/farmacologia , Aprendizado de Máquina
2.
Breast Cancer Res ; 26(1): 12, 2024 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-38238771

RESUMO

BACKGROUND: Pathological complete response (pCR) is associated with favorable prognosis in patients with triple-negative breast cancer (TNBC). However, only 30-40% of TNBC patients treated with neoadjuvant chemotherapy (NAC) show pCR, while the remaining 60-70% show residual disease (RD). The role of the tumor microenvironment in NAC response in patients with TNBC remains unclear. In this study, we developed a machine learning-based two-step pipeline to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of TNBC tissue biopsies and to identify histological features that can predict NAC response. METHODS: H&E-stained WSIs of treatment-naïve biopsies from 85 patients (51 with pCR and 34 with RD) of the model development cohort and 79 patients (41 with pCR and 38 with RD) of the validation cohort were separated through a stratified eightfold cross-validation strategy for the first step and leave-one-out cross-validation strategy for the second step. A tile-level histology label prediction pipeline and four machine-learning classifiers were used to analyze 468,043 tiles of WSIs. The best-trained classifier used 55 texture features from each tile to produce a probability profile during testing. The predicted histology classes were used to generate a histology classification map of the spatial distributions of different tissue regions. A patient-level NAC response prediction pipeline was trained with features derived from paired histology classification maps. The top graph-based features capturing the relevant spatial information across the different histological classes were provided to the radial basis function kernel support vector machine (rbfSVM) classifier for NAC treatment response prediction. RESULTS: The tile-level prediction pipeline achieved 86.72% accuracy for histology class classification, while the patient-level pipeline achieved 83.53% NAC response (pCR vs. RD) prediction accuracy of the model development cohort. The model was validated with an independent cohort with tile histology validation accuracy of 83.59% and NAC prediction accuracy of 81.01%. The histological class pairs with the strongest NAC response predictive ability were tumor and tumor tumor-infiltrating lymphocytes for pCR and microvessel density and polyploid giant cancer cells for RD. CONCLUSION: Our machine learning pipeline can robustly identify clinically relevant histological classes that predict NAC response in TNBC patients and may help guide patient selection for NAC treatment.


Assuntos
Neoplasias da Mama , Neoplasias de Mama Triplo Negativas , Humanos , Feminino , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Terapia Neoadjuvante/métodos , Prognóstico , Aprendizado de Máquina , Microambiente Tumoral
3.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35176756

RESUMO

Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Biologia Computacional , Redes Neurais de Computação , Proteínas/química , Software
4.
Biotechnol Bioeng ; 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39044472

RESUMO

In the burgeoning field of proteins, the effective analysis of intricate protein data remains a formidable challenge, necessitating advanced computational tools for data processing, feature extraction, and interpretation. This study introduces ProteinFlow, an innovative framework designed to revolutionize feature engineering in protein data analysis. ProteinFlow stands out by offering enhanced efficiency in data collection and preprocessing, along with advanced capabilities in feature extraction, directly addressing the complexities inherent in multidimensional protein data sets. Through a comparative analysis, ProteinFlow demonstrated a significant improvement over traditional methods, notably reducing data preprocessing time and expanding the scope of biologically significant features identified. The framework's parallel data processing strategy and advanced algorithms ensure not only rapid data handling but also the extraction of comprehensive, meaningful insights from protein sequences, structures, and interactions. Furthermore, ProteinFlow exhibits remarkable scalability, adeptly managing large-scale data sets without compromising performance, a crucial attribute in the era of big data.

5.
J Chem Inf Model ; 64(5): 1456-1472, 2024 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-38385768

RESUMO

Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.


Assuntos
Aprendizado de Máquina , Proteínas , Ligantes , Proteínas/química
6.
Crit Care ; 28(1): 180, 2024 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-38802973

RESUMO

BACKGROUND: Sepsis, an acute and potentially fatal systemic response to infection, significantly impacts global health by affecting millions annually. Prompt identification of sepsis is vital, as treatment delays lead to increased fatalities through progressive organ dysfunction. While recent studies have delved into leveraging Machine Learning (ML) for predicting sepsis, focusing on aspects such as prognosis, diagnosis, and clinical application, there remains a notable deficiency in the discourse regarding feature engineering. Specifically, the role of feature selection and extraction in enhancing model accuracy has been underexplored. OBJECTIVES: This scoping review aims to fulfill two primary objectives: To identify pivotal features for predicting sepsis across a variety of ML models, providing valuable insights for future model development, and To assess model efficacy through performance metrics including AUROC, sensitivity, and specificity. RESULTS: The analysis included 29 studies across diverse clinical settings such as Intensive Care Units (ICU), Emergency Departments, and others, encompassing 1,147,202 patients. The review highlighted the diversity in prediction strategies and timeframes. It was found that feature extraction techniques notably outperformed others in terms of sensitivity and AUROC values, thus indicating their critical role in improving sepsis prediction models. CONCLUSION: Key dynamic indicators, including vital signs and critical laboratory values, are instrumental in the early detection of sepsis. Applying feature selection methods significantly boosts model precision, with models like Random Forest and XG Boost showing promising results. Furthermore, Deep Learning models (DL) reveal unique insights, spotlighting the pivotal role of feature engineering in sepsis prediction, which could greatly benefit clinical practice.


Assuntos
Aprendizado de Máquina , Sepse , Humanos , Sepse/diagnóstico , Sepse/terapia , Aprendizado de Máquina/tendências , Aprendizado de Máquina/normas
7.
BMC Med Inform Decis Mak ; 24(1): 152, 2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38831432

RESUMO

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community.. RESULTS: The proposed model in this study was applied to three distinct tabular datasets, resulting in notable enhancements in their quality with respect to readiness for ML tasks, as assessed through established data quality metrics. Our findings demonstrate the efficacy of the framework in substantially augmenting the original dataset quality, achieved through the elimination of extraneous features and rows. This refinement yielded improved accuracy across both supervised and unsupervised learning methodologies. CONCLUSION: Our software presents an automated framework for data readiness, aimed at enhancing the integrity of raw datasets to facilitate robust utilization within ML pipelines. Through our proposed framework, we streamline the original dataset, resulting in enhanced accuracy and efficiency within the associated ML algorithms.


Assuntos
Aprendizado de Máquina , Humanos , Conjuntos de Dados como Assunto , Aprendizado de Máquina não Supervisionado , Algoritmos , Aprendizado de Máquina Supervisionado , Software
8.
Artigo em Inglês | MEDLINE | ID: mdl-39082872

RESUMO

Explorative data analysis (EDA) is a critical step in scientific projects, aiming to uncover valuable insights and patterns within data. Traditionally, EDA involves manual inspection, visualization, and various statistical methods. The advent of artificial intelligence (AI) and machine learning (ML) has the potential to improve EDA, offering more sophisticated approaches that enhance its efficacy. This review explores how AI and ML algorithms can improve feature engineering and selection during EDA, leading to more robust predictive models and data-driven decisions. Tree-based models, regularized regression, and clustering algorithms were identified as key techniques. These methods automate feature importance ranking, handle complex interactions, perform feature selection, reveal hidden groupings, and detect anomalies. Real-world applications include risk prediction in total hip arthroplasty and subgroup identification in scoliosis patients. Recent advances in explainable AI and EDA automation show potential for further improvement. The integration of AI and ML into EDA accelerates tasks and uncovers sophisticated insights. However, effective utilization requires a deep understanding of the algorithms, their assumptions, and limitations, along with domain knowledge for proper interpretation. As data continues to grow, AI will play an increasingly pivotal role in EDA when combined with human expertise, driving more informed, data-driven decision-making across various scientific domains. Level of Evidence: Level V - Expert opinion.

9.
Sensors (Basel) ; 24(4)2024 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-38400466

RESUMO

Research in field sports often involves analysis of running performance profiles of players during competitive games with individual, per-position, and time-related descriptive statistics. Data are acquired through wearable technologies, which generally capture simple data points, which in the case of many team-based sports are times, latitudes, and longitudes. While the data capture is simple and in relatively high volumes, the raw data are unsuited to any form of analysis or machine learning functions. The main goal of this research is to develop a multistep feature engineering framework that delivers the transformation of sequential data into feature sets more suited to machine learning applications.


Assuntos
Corrida , Dispositivos Eletrônicos Vestíveis , Movimento , Esportes de Equipe , Aprendizado de Máquina
10.
Sensors (Basel) ; 24(7)2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38610506

RESUMO

Anonymous networks, which aim primarily to protect user identities, have gained prominence as tools for enhancing network security and anonymity. Nonetheless, these networks have become a platform for adversarial affairs and sources of suspicious attack traffic. To defend against unpredictable adversaries on the Internet, detecting anonymous network traffic has emerged as a necessity. Many supervised approaches to identify anonymous traffic have harnessed machine learning strategies. However, many require access to engineered datasets and complex architectures to extract the desired information. Due to the resistance of anonymous network traffic to traffic analysis and the scarcity of publicly available datasets, those approaches may need to improve their training efficiency and achieve a higher performance when it comes to anonymous traffic detection. This study utilizes feature engineering techniques to extract pattern information and rank the feature importance of the static traces of anonymous traffic. To leverage these pattern attributes effectively, we developed a reinforcement learning framework that encompasses four key components: states, actions, rewards, and state transitions. A lightweight system is devised to classify anonymous and non-anonymous network traffic. Subsequently, two fine-tuned thresholds are proposed to substitute the traditional labels in a binary classification system. The system will identify anonymous network traffic without reliance on labeled data. The experimental results underscore that the system can identify anonymous traffic with an accuracy rate exceeding 80% (when based on pattern information).

11.
Sensors (Basel) ; 24(11)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38894140

RESUMO

Nocturnal enuresis (NE) is involuntary bedwetting during sleep, typically appearing in young children. Despite the potential benefits of the long-term home monitoring of NE patients for research and treatment enhancement, this area remains underexplored. To address this, we propose NEcare, an in-home monitoring system that utilizes wearable devices and machine learning techniques. NEcare collects sensor data from an electrocardiogram, body impedance (BI), a three-axis accelerometer, and a three-axis gyroscope to examine bladder volume (BV), heart rate (HR), and periodic limb movements in sleep (PLMS). Additionally, it analyzes the collected NE patient data and supports NE moment estimation using heuristic rules and deep learning techniques. To demonstrate the feasibility of in-home monitoring for NE patients using our wearable system, we used our datasets from 30 in-hospital patients and 4 in-home patients. The results show that NEcare captures expected trends associated with NE occurrences, including BV increase, HR increase, and PLMS appearance. In addition, we studied the machine learning-based NE moment estimation, which could help relieve the burdens of NE patients and their families. Finally, we address the limitations and outline future research directions for the development of wearable systems for NE patients.


Assuntos
Enurese Noturna , Dispositivos Eletrônicos Vestíveis , Humanos , Enurese Noturna/fisiopatologia , Monitorização Fisiológica/instrumentação , Monitorização Fisiológica/métodos , Criança , Frequência Cardíaca/fisiologia , Aprendizado de Máquina , Masculino , Feminino , Eletrocardiografia/métodos , Sono/fisiologia , Monitorização Ambulatorial/instrumentação , Monitorização Ambulatorial/métodos
12.
Sensors (Basel) ; 24(15)2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39123855

RESUMO

The detection performance of radar is significantly impaired by active jamming and mutual interference from other radars. This paper proposes a radio signal modulation recognition method to accurately recognize these signals, which helps in the jamming cancellation decisions. Based on the ensemble learning stacking algorithm improved by meta-feature enhancement, the proposed method adopts random forests, K-nearest neighbors, and Gaussian naive Bayes as the base-learners, with logistic regression serving as the meta-learner. It takes the multi-domain features of signals as input, which include time-domain features including fuzzy entropy, slope entropy, and Hjorth parameters; frequency-domain features, including spectral entropy; and fractal-domain features, including fractal dimension. The simulation experiment, including seven common signal types of radar and active jamming, was performed for the effectiveness validation and performance evaluation. Results proved the proposed method's performance superiority to other classification methods, as well as its ability to meet the requirements of low signal-to-noise ratio and few-shot learning.

13.
J Stroke Cerebrovasc Dis ; 33(6): 107714, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38636829

RESUMO

OBJECTIVES: We set out to develop a machine learning model capable of distinguishing patients presenting with ischemic stroke from a healthy cohort of subjects. The model relies on a 3-min resting electroencephalogram (EEG) recording from which features can be computed. MATERIALS AND METHODS: Using a large-scale, retrospective database of EEG recordings and matching clinical reports, we were able to construct a dataset of 1385 healthy subjects and 374 stroke patients. With subjects often producing more than one recording per session, the final dataset consisted of 2401 EEG recordings (63% healthy, 37% stroke). RESULTS: Using a rich set of features encompassing both the spectral and temporal domains, our model yielded an AUC of 0.95, with a sensitivity and specificity of 93% and 86%, respectively. Allowing for multiple recordings per subject in the training set boosted sensitivity by 7%, attributable to a more balanced dataset. CONCLUSIONS: Our work demonstrates strong potential for the use of EEG in conjunction with machine learning methods to distinguish stroke patients from healthy subjects. Our approach provides a solution that is not only timely (3-minutes recording time) but also highly precise and accurate (AUC: 0.95).


Assuntos
Ondas Encefálicas , Bases de Dados Factuais , Eletroencefalografia , AVC Isquêmico , Aprendizado de Máquina , Valor Preditivo dos Testes , Humanos , Estudos Retrospectivos , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , AVC Isquêmico/diagnóstico , AVC Isquêmico/fisiopatologia , Estudos de Casos e Controles , Adulto , Encéfalo/fisiopatologia , Processamento de Sinais Assistido por Computador , Reprodutibilidade dos Testes , Idoso de 80 Anos ou mais , Diagnóstico Diferencial , Diagnóstico por Computador , Fatores de Tempo
14.
Hum Brain Mapp ; 44(2): 779-789, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-36206321

RESUMO

Although a large number of case-control statistical and machine learning studies have been conducted to investigate structural brain changes in schizophrenia, how best to measure and characterize structural abnormalities for use in classification algorithms remains an open question. In the current study, a convolutional 3D autoencoder specifically designed for discretized volumes was constructed and trained with segmented brains from 477 healthy individuals. A cohort containing 158 first-episode schizophrenia patients and 166 matched controls was fed into the trained autoencoder to generate auto-encoded morphological patterns. A classifier discriminating schizophrenia patients from healthy controls was built using 80% of the samples in this cohort by automated machine learning and validated on the remaining 20% of the samples, and this classifier was further validated on another independent cohort containing 77 first-episode schizophrenia patients and 58 matched controls acquired at a different resolution. This specially designed autoencoder allowed a satisfactory recovery of the input. With the same feature dimension, the classifier trained with autoencoded features outperformed the classifier trained with conventional morphological features by about 10% points, achieving 73.44% accuracy and 0.8 AUC on the internal validation set and 71.85% accuracy and 0.77 AUC on the external validation set. The use of features automatically learned from the segmented brain can better identify schizophrenia patients from healthy controls, but there is still a need for further improvements to establish a clinical diagnostic marker. However, with a limited sample size, the method proposed in the current study shed insight into the application of deep learning in psychiatric disorders.


Assuntos
Esquizofrenia , Humanos , Esquizofrenia/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Algoritmos , Aprendizado de Máquina , Imageamento por Ressonância Magnética/métodos
15.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34058752

RESUMO

Understanding how a mutation might affect protein stability is of significant importance to protein engineering and for understanding protein evolution genetic diseases. While a number of computational tools have been developed to predict the effect of missense mutations on protein stability protein stability upon mutations, they are known to exhibit large biases imparted in part by the data used to train and evaluate them. Here, we provide a comprehensive overview of predictive tools, which has provided an evolving insight into the importance and relevance of features that can discern the effects of mutations on protein stability. A diverse selection of these freely available tools was benchmarked using a large mutation-level blind dataset of 1342 experimentally characterised mutations across 130 proteins from ThermoMutDB, a second test dataset encompassing 630 experimentally characterised mutations across 39 proteins from iStable2.0 and a third blind test dataset consisting of 268 mutations in 27 proteins from the newly published ProThermDB. The performance of the methods was further evaluated with respect to the site of mutation, type of mutant residue and by ranging the pH and temperature. Additionally, the classification performance was also evaluated by classifying the mutations as stabilizing (∆∆G ≥ 0) or destabilizing (∆∆G < 0). The results reveal that the performance of the predictors is affected by the site of mutation and the type of mutant residue. Further, the results show very low performance for pH values 6-8 and temperature higher than 65 for all predictors except iStable2.0 on the S630 dataset. To illustrate how stability and structure change upon single point mutation, we considered four stabilizing, two destabilizing and two stabilizing mutations from two proteins, namely the toxin protein and bovine liver cytochrome. Overall, the results on S268, S630 and S1342 datasets show that the performance of the integrated predictors is better than the mechanistic or individual machine learning predictors. We expect that this paper will provide useful guidance for the design and development of next-generation bioinformatic tools for predicting protein stability changes upon mutations.


Assuntos
Biologia Computacional/métodos , Mutação de Sentido Incorreto , Estabilidade Proteica , Proteínas/química , Proteínas/genética , Software , Algoritmos , Bases de Dados de Proteínas , Evolução Molecular , Aprendizado de Máquina , Modelos Moleculares , Conformação Proteica , Proteínas/metabolismo , Reprodutibilidade dos Testes , Relação Estrutura-Atividade
16.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33774670

RESUMO

Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Proteínas Citotóxicas Formadoras de Poros/farmacologia , Bactérias/classificação , Bactérias/efeitos dos fármacos , Biofilmes/efeitos dos fármacos , Biofilmes/crescimento & desenvolvimento , Bases de Dados Factuais , Fungos/classificação , Fungos/efeitos dos fármacos , Proteínas Citotóxicas Formadoras de Poros/classificação , Proteínas Citotóxicas Formadoras de Poros/metabolismo , Máquina de Vetores de Suporte , Vírus/efeitos dos fármacos
17.
J Biomed Inform ; 144: 104445, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37467835

RESUMO

In biomedical literature, cross-sentence texts can usually express rich knowledge, and extracting the interaction relation between entities from cross-sentence texts is of great significance to biomedical research. However, compared with single sentence, cross-sentence text has a longer sequence length, so the research on cross-sentence text information extraction should focus more on learning the context dependency structural information. Nowadays, it is still a challenge to handle global dependencies and structural information of long sequences effectively, and graph-oriented modeling methods have received more and more attention recently. In this paper, we propose a new graph attention network guided by syntactic dependency relationship (SR-GAT) for extracting biomedical relation from the cross-sentence text. It allows each node to pay attention to other nodes in its neighborhood, regardless of the sequence length. The attention weight between nodes is given by a syntactic relation graph probability network (SR-GPR), which encodes the syntactic dependency between nodes and guides the graph attention mechanism to learn information about the dependency structure. The learned feature representation retains information about the node-to-node syntactic dependency, and can further discover global dependencies effectively. The experimental results demonstrate on a publicly available biomedical dataset that, our method achieves state-of-the-art performance while requiring significantly less computational resources. Specifically, in the "drug-mutation" relation extraction task, our method achieves an advanced accuracy of 93.78% for binary classification and 92.14% for multi-classification. In the "drug-gene-mutation" relation extraction task, our method achieves an advanced accuracy of 93.22% for binary classification and 92.28% for multi-classification. Across all relation extraction tasks, our method improves accuracy by an average of 0.49% compared to the existing best model. Furthermore, our method achieved an accuracy of 69.5% in text classification, surpassing most existing models, demonstrating its robustness in generalization across different domains without additional fine-tuning.


Assuntos
Pesquisa Biomédica , Idioma , Armazenamento e Recuperação da Informação
18.
Appl Microbiol Biotechnol ; 107(17): 5351-5365, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37421474

RESUMO

Ectoine is generally produced by the fermentation process of Halomonas elongata DSM 2581 T, which is one of the primary industrial ectoine production techniques. To effectively monitor and control the fermentation process, the important parameters require accurate real-time measurement. However, for ectoine fermentation, three critical parameters (cell optical density, glucose, and product concentration) cannot be measured conveniently in real-time due to time variation, strong coupling, and other constraints. As a result, our work effectively created a series of hybrid models to predict the values of these three parameters incorporating both fermentation kinetics and machine learning approaches. Compared with the traditional machine learning models, our models solve the problem of insufficient data which is common in fermentation. In addition, a simple kinetic modeling is only applicable to specific physical conditions, so different physical conditions require refitting the function, which is tedious to operate. However, our models also overcome this limitation. In this work, we compared different hybrid models based on 5 feature engineering methods, 11 machine-learning approaches, and 2 kinetic models. The best models for predicting three key parameters, respectively, are as follows: CORR-Ensemble (R2: 0.983 ± 0.0, RMSE: 0.086 ± 0.0, MAE: 0.07 ± 0.0), SBE-Ensemble (R2: 0.972 ± 0.0, RMSE: 0.127 ± 0.0, MAE: 0.078 ± 0.0), and SBE-Ensemble (R2:0.98 ± 0.0, RMSE: 0.023 ± 0.001, MAE: 0.018 ± 0.001). To verify the universality and stability of constructed models, we have done an experimental verification, and its results showed that our proposed models have excellent performance. KEY POINTS: • Using the kinetic models for producing simulated data • Through different feature engineering methods for dimension reduction • Creating a series of hybrid models to predict the values of three parameters in the fermentation process of Halomonas elongata DSM 2581 T.


Assuntos
Diamino Aminoácidos , Halomonas , Halomonas/genética , Halomonas/metabolismo , Fermentação
19.
BMC Med Inform Decis Mak ; 23(1): 179, 2023 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-37697312

RESUMO

Addressing the current complexities, costs, and adherence issues in the detection of forward head posture (FHP), our study conducted an exhaustive epidemiologic investigation, incorporating a comprehensive posture screening process for each participant in China. This research introduces an avant-garde, machine learning-based non-contact method for the accurate discernment of FHP. Our approach elevates detection accuracy by leveraging body landmarks identified from human images, followed by the application of a genetic algorithm for precise feature identification and posture estimation. Observational data corroborates the superior efficacy of the Extra Tree Classifier technique in FHP detection, attaining an accuracy of 82.4%, a specificity of 85.5%, and a positive predictive value of 90.2%. Our model affords a rapid, effective solution for FHP identification, spotlighting the transformative potential of the convergence of feature point recognition and genetic algorithms in non-contact posture detection. The expansive potential and paramount importance of these applications in this niche field are therefore underscored.


Assuntos
Pontos de Referência Anatômicos , População do Leste Asiático , Postura , Adolescente , Humanos , Povo Asiático , Aprendizado de Máquina , Postura/fisiologia , Algoritmos
20.
Sensors (Basel) ; 23(11)2023 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-37299786

RESUMO

Network traffic anomaly detection is a key step in identifying and preventing network security threats. This study aims to construct a new deep-learning-based traffic anomaly detection model through in-depth research on new feature-engineering methods, significantly improving the efficiency and accuracy of network traffic anomaly detection. The specific research work mainly includes the following two aspects: 1. In order to construct a more comprehensive dataset, this article first starts from the raw data of the classic traffic anomaly detection dataset UNSW-NB15 and combines the feature extraction standards and feature calculation methods of other classic detection datasets to re-extract and design a feature description set for the original traffic data in order to accurately and completely describe the network traffic status. We reconstructed the dataset DNTAD using the feature-processing method designed in this article and conducted evaluation experiments on it. Experiments have shown that by verifying classic machine learning algorithms, such as XGBoost, this method not only does not reduce the training performance of the algorithm but also improves its operational efficiency. 2. This article proposes a detection algorithm model based on LSTM and the recurrent neural network self-attention mechanism for important time-series information contained in the abnormal traffic datasets. With this model, through the memory mechanism of the LSTM, the time dependence of traffic features can be learned. On the basis of LSTM, a self-attention mechanism is introduced, which can weight the features at different positions in the sequence, enabling the model to better learn the direct relationship between traffic features. A series of ablation experiments were also used to demonstrate the effectiveness of each component of the model. The experimental results show that, compared to other comparative models, the model proposed in this article achieves better experimental results on the constructed dataset.


Assuntos
Algoritmos , Engenharia , Aprendizado de Máquina , Redes Neurais de Computação , Registros
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa