Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Int J Biol Macromol ; : 134317, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39094861

RESUMO

Plant vacuoles, play a crucial role in maintaining cellular stability, adapting to environmental changes, and responding to external pressures. The accurate identification of vacuolar proteins (PVPs) is crucial for understanding the biosynthetic mechanisms of intracellular vacuoles and the adaptive mechanisms of plants. In order to more accurately identify vacuole proteins, this study developed a new predictive model PEL-PVP based on ESM-2. Through this study, the feasibility and effectiveness of using advanced pre-training models and fine-tuning techniques for bioinformatics tasks were demonstrated, providing new methods and ideas for plant vacuolar protein research. In addition, previous datasets for vacuolar proteins were balanced, but imbalance is more closely related to the actual situation. Therefore, this study constructed an imbalanced dataset UB-PVP from the UniProt database,helping the model better adapt to the complexity and uncertainty in real environments, thereby improving the model's generalization ability and practicality. The experimental results show that compared with existing recognition techniques, achieving significant improvements in multiple indicators, with 6.08 %, 13.51 %, 11.9 %, and 5 % improvements in ACC, SP, MCC, and AUC, respectively. The accuracy reaches 94.59 %, significantly higher than the previous best model GraphIdn. This provides an efficient and precise tool for the study of plant vacuole proteins.

2.
BMC Med Inform Decis Mak ; 24(1): 172, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38898499

RESUMO

Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.


Assuntos
Algoritmos , Hemorragia Cerebral , Hematoma , Humanos , Hemorragia Cerebral/diagnóstico por imagem , Hematoma/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Masculino , Aprendizado de Máquina , Idoso , Pessoa de Meia-Idade , Feminino
3.
Bioresour Technol ; 402: 130776, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38701979

RESUMO

Insights into key properties of biochar with a fast adsorption rate and high adsorption capacity are urgent to design biochar as an adsorbent in pollution emergency treatment. Machine learning (ML) incorporating classical theoretical adsorption models was applied to build prediction models for adsorption kinetics rate (i.e., K) and maximum adsorption capacity (i.e., Qm) of emerging contaminants (ECs) on biochar. Results demonstrated that the prediction performance of adaptive boosting algorithm significantly improved after data preprocessing (i.e., log-transformation) in the small unbalanced datasets with R2 of 0.865 and 0.874 for K and Qm, respectively. The surface chemistry, primarily led by ash content of biochar significantly influenced the K, while surface porous structure of biochar showed a dominant role in predicting Qm. An interactive platform was deployed for relevant scientists to predict K and Qm of new biochar for ECs. The research provided practical references for future engineered biochar design for ECs removal.


Assuntos
Carvão Vegetal , Aprendizado de Máquina , Carvão Vegetal/química , Adsorção , Cinética , Modelos Teóricos , Poluentes Químicos da Água
4.
Comput Biol Med ; 175: 108472, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38663349

RESUMO

With the rapid development of artificial intelligence, automated endoscopy-assisted diagnostic systems have become an effective tool for reducing the diagnostic costs and shortening the treatment cycle of patients. Typically, the performance of these systems depends on deep learning models which are pre-trained with large-scale labeled data, for example, early gastric cancer based on endoscopic images. However, the expensive annotation and the subjectivity of the annotators lead to an insufficient and class-imbalanced endoscopic image dataset, and these datasets are detrimental to the training of deep learning models. Therefore, we proposed a Swin Transformer encoder-based StyleGAN (STE-StyleGAN) for unbalanced endoscopic image enhancement, which is composed of an adversarial learning encoder and generator. Firstly, a pre-trained Swin Transformer is introduced into the encoder to extract multi-scale features layer by layer from endoscopic images. The features are subsequently fed into a mapping block for aggregation and recombination. Secondly, a self-attention mechanism is applied to the generator, which adds detailed information of the image layer by layer through recoded features, enabling the generator to autonomously learn the coupling between different image regions. Finally, we conducted extensive experiments on a private intestinal metaplasia grading dataset from a Grade-A tertiary hospital. The experimental results show that the images generated by STE-StyleGAN are closer to the initial image distribution, achieving a Fréchet Inception Distance (FID) value of 100.4. Then, these generated images are used to enhance the initial dataset to improve the robustness of the classification model, and achieved a top accuracy of 86 %.


Assuntos
Aprendizado Profundo , Humanos , Neoplasias Gástricas/diagnóstico por imagem , Neoplasias Gástricas/patologia , Aumento da Imagem/métodos , Endoscopia/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodos
5.
Stud Health Technol Inform ; 310: 604-608, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269880

RESUMO

With growing use of machine learning (ML)-enabled medical devices by clinicians and consumers safety events involving these systems are emerging. Current analysis of safety events heavily relies on retrospective review by experts, which is time consuming and cost ineffective. This study develops automated text classifiers and evaluates their potential to identify rare ML safety events from the US FDA's MAUDE. Four stratified classifiers were evaluated using a real-world data distribution with different feature sets: report text; text and device brand name; text and generic device type; and all information combined. We found that stratified classifiers using the generic type of devices were the most effective technique when tested on both stratified (F1-score=85%) and external datasets (precision=100%). All true positives on the external dataset were consistently identified by the three stratified classifiers, indicating the ensemble results from them can be used directly to monitor ML events reported to MAUDE.


Assuntos
Medicamentos Genéricos , Aprendizado de Máquina
6.
J Mol Graph Model ; 126: 108627, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37801808

RESUMO

This research investigates the application of Graph Neural Networks (GNNs) to enhance the cost-effectiveness of drug development, addressing the limitations of cost and time. Class imbalances within classification datasets, such as the discrepancy between active and inactive compounds, give rise to difficulties that can be resolved through strategies like oversampling, undersampling, and manipulation of the loss function. A comparison is conducted between three distinct datasets using three different GNN architectures. This benchmarking research can steer future investigations and enhance the efficacy of GNNs in drug discovery and design. Three hundred models for each combination of architecture and dataset were trained using hyperparameter tuning techniques and evaluated using a range of metrics. Notably, the oversampling technique outperforms eight experiments, showcasing its potential. While balancing techniques boost imbalanced dataset models, their efficacy depends on dataset specifics and problem type. Although oversampling aids molecular graph datasets, more research is needed to optimize its usage and explore other class imbalance solutions.


Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Hidrolases , Redes Neurais de Computação
7.
Artif Intell Med ; 136: 102477, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36710064

RESUMO

Anemia is a condition in which the oxygen-carrying capacity of red blood cells is insufficient to meet the body's physiological needs. It affects billions of people worldwide. An early diagnosis of this disease could prevent the advancement of other disorders. Traditional methods used to detect anemia consist of venipuncture, which requires a patient to frequently undergo laboratory tests. Therefore, anemia diagnosis using noninvasive and cost-effective methods is an open challenge. The pallor of the fingertips, palms, nail beds, and eye conjunctiva can be observed to establish whether a patient suffers from anemia. This article addresses the above challenges by presenting a novel intelligent system, based on machine learning, that supports the automated diagnosis of anemia. This system is innovative from different points of view. Specifically, it has been trained on a dataset that contains eye conjunctiva photos of Indian and Italian patients. This dataset, which was created using a very strict experimental set, is now made available to the Scientific Community. Moreover, compared to previous systems in the literature, the proposed system uses a low-cost device, which makes it suitable for widespread use. The performance of the learning algorithms utilizing two different areas of the mucous membrane of the eye is discussed. In particular, the RUSBoost algorithm, when appropriately trained on palpebral conjunctiva images, shows good performance in classifying anemic and nonanemic patients. The results are very robust, even when considering different ethnicities.


Assuntos
Anemia , Humanos , Anemia/diagnóstico , Túnica Conjuntiva , Palidez/diagnóstico , Algoritmos
8.
Int J Inf Technol ; 15(1): 325-333, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35757149

RESUMO

Credit card fraud is a growing problem nowadays and it has escalated during COVID-19 due to the authorities in many countries requiring people to use cashless transactions. Every year, billions of Euros are lost due to credit card fraud transactions, therefore, fraud detection systems are essential for financial institutions. As the classes' distribution is not equally represented in the credit card dataset, the machine learning trains the model according to the majority class which leads to inaccurate fraud predictions. For that, in this research, we mainly focus on processing unbalanced data by using an under-sampling technique to get more accurate and better results with different machine learning algorithms. We propose a framework that is based on clustering the dataset using fuzzy C-means and selecting similar fraud and normal instances that have the same features, which guarantees the integrity between the data features.

9.
Front Pharmacol ; 13: 786710, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35401179

RESUMO

A timely diagnosis is a key challenge for many rare diseases. As an expanding group of rare and severe monogenic disorders with a broad spectrum of clinical manifestations, ciliopathies, notably renal ciliopathies, suffer from important underdiagnosis issues. Our objective is to develop an approach for screening large-scale clinical data warehouses and detecting patients with similar clinical manifestations to those from diagnosed ciliopathy patients. We expect that the top-ranked similar patients will benefit from genetic testing for an early diagnosis. The dependence and relatedness between phenotypes were taken into account in our similarity model through medical concept embedding. The relevance of each phenotype to each patient was also considered by adjusted aggregation of phenotype similarity into patient similarity. A ranking model based on the best-subtype-average similarity was proposed to address the phenotypic overlapping and heterogeneity of ciliopathies. Our results showed that using less than one-tenth of learning sources, our language and center specific embedding provided comparable or better performances than other existing medical concept embeddings. Combined with the best-subtype-average ranking model, our patient-patient similarity-based screening approach was demonstrated effective in two large scale unbalanced datasets containing approximately 10,000 and 60,000 controls with kidney manifestations in the clinical data warehouse (about 2 and 0.4% of prevalence, respectively). Our approach will offer the opportunity to identify candidate patients who could go through genetic testing for ciliopathy. Earlier diagnosis, before irreversible end-stage kidney disease, will enable these patients to benefit from appropriate follow-up and novel treatments that could alleviate kidney dysfunction.

10.
Sensors (Basel) ; 22(5)2022 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-35270994

RESUMO

In this paper, we addressed the problem of dataset scarcity for the task of network intrusion detection. Our main contribution was to develop a framework that provides a complete process for generating network traffic datasets based on the aggregation of real network traces. In addition, we proposed a set of tools for attribute extraction and labeling of traffic sessions. A new dataset with botnet network traffic was generated by the framework to assess our proposed method with machine learning algorithms suitable for unbalanced data. The performance of the classifiers was evaluated in terms of macro-averages of F1-score (0.97) and the Matthews Correlation Coefficient (0.94), showing a good overall performance average.


Assuntos
Algoritmos , Aprendizado de Máquina , Projetos de Pesquisa
11.
Sensors (Basel) ; 21(16)2021 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-34450936

RESUMO

Rolling mill multi-row bearings are subjected to axial loads, which cause damage of rolling elements and cages, so the axial vibration signal contains rich fault character information. The vertical shock caused by the failure is weakened because multiple rows of bearings are subjected to radial forces together. Considering the special characters of rolling mill bearing vibration signals, a fault diagnosis method combining Adaptive Multivariate Variational Mode Decomposition (AMVMD) and Multi-channel One-dimensional Convolution Neural Network (MC1DCNN) is proposed to improve the diagnosis accuracy. Additionally, Deep Convolutional Generative Adversarial Network (DCGAN) is embedded in models to solve the problem of fault data scarcity. DCGAN is used to generate AMVMD reconstruction data to supplement the unbalanced dataset, and the MC1DCNN model is trained by the dataset to diagnose the real data. The proposed method is compared with a variety of diagnostic models, and the experimental results show that the method can effectively improve the diagnosis accuracy of rolling mill multi-row bearing under unbalanced dataset conditions. It is an important guide to the current problem of insufficient data and low diagnosis accuracy faced in the fault diagnosis of multi-row bearings such as rolling mills.

12.
Front Cell Dev Biol ; 8: 591487, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33195258

RESUMO

Excessive oxidative stress responses can threaten our health, and thus it is essential to produce antioxidant proteins to regulate the body's oxidative responses. The low number of antioxidant proteins makes it difficult to extract their representative features. Our experimental method did not use structural information but instead studied antioxidant proteins from a sequenced perspective while focusing on the impact of data imbalance on sensitivity, thus greatly improving the model's sensitivity for antioxidant protein recognition. We developed a method based on the Composition of k-spaced Amino Acid Pairs (CKSAAP) and the Conjoint Triad (CT) features derived from the amino acid composition and protein-protein interactions. SMOTE and the Max-Relevance-Max-Distance algorithm (MRMD) were utilized to unbalance the training data and select the optimal feature subset, respectively. The test set used 10-fold crossing validation and a random forest algorithm for classification according to the selected feature subset. The sensitivity was 0.792, the specificity was 0.808, and the average accuracy was 0.8.

13.
Front Genet ; 11: 820, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33133122

RESUMO

Orphan genes are associated with regulatory patterns, but experimental methods for identifying orphan genes are both time-consuming and expensive. Designing an accurate and robust classification model to detect orphan and non-orphan genes in unbalanced distribution datasets poses a particularly huge challenge. Synthetic minority over-sampling algorithms (SMOTE) are selected in a preliminary step to deal with unbalanced gene datasets. To identify orphan genes in balanced and unbalanced Arabidopsis thaliana gene datasets, SMOTE algorithms were then combined with traditional and advanced ensemble classified algorithms respectively, using Support Vector Machine, Random Forest (RF), AdaBoost (adaptive boosting), GBDT (gradient boosting decision tree), and XGBoost (extreme gradient boosting). After comparing the performance of these ensemble models, SMOTE algorithms with XGBoost achieved an F1 score of 0.94 with the balanced A. thaliana gene datasets, but a lower score with the unbalanced datasets. The proposed ensemble method combines different balanced data algorithms including Borderline SMOTE (BSMOTE), Adaptive Synthetic Sampling (ADSYN), SMOTE-Tomek, and SMOTE-ENN with the XGBoost model separately. The performances of the SMOTE-ENN-XGBoost model, which combined over-sampling and under-sampling algorithms with XGBoost, achieved higher predictive accuracy than the other balanced algorithms with XGBoost models. Thus, SMOTE-ENN-XGBoost provides a theoretical basis for developing evaluation criteria for identifying orphan genes in unbalanced and biological datasets.

14.
J Digit Imaging ; 33(3): 685-696, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32144499

RESUMO

This study explores an automatic diagnosis method to predict unnecessary nodule biopsy from a small, unbalanced, and pathologically proven database. The automatic diagnosis method is based on a convolutional neural network (CNN) model. Because of the small and unbalanced samples, the presented method aims to improve the transfer learning capability via the VGG16 architecture and optimize the related transfer learning parameters. For comparison purpose, a traditional machine learning method is implemented, which extracts the texture features and classifies the features by support vector machine (SVM). The database includes 68 biopsied nodules, 16 are pathologically proven benign and the remaining 52 are malignant. To consider the volumetric data by the CNN model, each image slice from each nodule volume is selected randomly until all image slices of each nodule are utilized. The leave-one-out and 10-folder cross validations are applied to train and test the randomly selected 68 image slices (one image slice from one nodule) in each experiment, respectively. The averages over all the experimental outcomes are the final results. The experiments revealed that the features from both the medical and the natural images share the similarity of focusing on simpler and less-abstract objects, leading to the conclusion that not the more the transfer convolutional layers, the better the classification results. Transfer learning from other larger datasets can supply additional information to small and unbalanced datasets to improve the classification performance. The presented method has shown the potential to adapt CNN architecture to improve the prediction of unnecessary nodule biopsy from small, unbalanced, and pathologically proven volumetric dataset.


Assuntos
Neoplasias Pulmonares , Nódulo Pulmonar Solitário , Biópsia , Humanos , Aprendizado de Máquina , Tomografia Computadorizada por Raios X
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA