Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 325
Filter
Add more filters

Publication year range
1.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38797968

ABSTRACT

A major challenge of precision oncology is the identification and prioritization of suitable treatment options based on molecular biomarkers of the considered tumor. In pursuit of this goal, large cancer cell line panels have successfully been studied to elucidate the relationship between cellular features and treatment response. Due to the high dimensionality of these datasets, machine learning (ML) is commonly used for their analysis. However, choosing a suitable algorithm and set of input features can be challenging. We performed a comprehensive benchmarking of ML methods and dimension reduction (DR) techniques for predicting drug response metrics. Using the Genomics of Drug Sensitivity in Cancer cell line panel, we trained random forests, neural networks, boosting trees and elastic nets for 179 anti-cancer compounds with feature sets derived from nine DR approaches. We compare the results regarding statistical performance, runtime and interpretability. Additionally, we provide strategies for assessing model performance compared with a simple baseline model and measuring the trade-off between models of different complexity. Lastly, we show that complex ML models benefit from using an optimized DR strategy, and that standard models-even when using considerably fewer features-can still be superior in performance.


Subject(s)
Algorithms , Antineoplastic Agents , Benchmarking , Machine Learning , Humans , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Neoplasms/drug therapy , Neoplasms/genetics , Neural Networks, Computer , Cell Line, Tumor
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38385873

ABSTRACT

Lysine lactylation (Kla) is a newly discovered posttranslational modification that is involved in important life activities, such as glycolysis-related cell function, macrophage polarization and nervous system regulation, and has received widespread attention due to the Warburg effect in tumor cells. In this work, we first design a natural language processing method to automatically extract the 3D structural features of Kla sites, avoiding potential biases caused by manually designed structural features. Then, we establish two Kla prediction frameworks, Attention-based feature fusion Kla model (ABFF-Kla) and EBFF-Kla, to integrate the sequence features and the structure features based on the attention layer and embedding layer, respectively. The results indicate that ABFF-Kla and Embedding-based feature fusion Kla model (EBFF-Kla), which fuse features from protein sequences and spatial structures, have better predictive performance than that of models that use only sequence features. Our work provides an approach for the automatic extraction of protein structural features, as well as a flexible framework for Kla prediction. The source code and the training data of the ABFF-Kla and the EBFF-Kla are publicly deposited at: https://github.com/ispotato/Lactylation_model.


Subject(s)
Lysine , Natural Language Processing , Amino Acid Sequence , Protein Domains , Protein Processing, Post-Translational
3.
Methods ; 226: 127-132, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38604414

ABSTRACT

Protein lysine methylation is a particular type of post translational modification that plays an important role in both histone and non-histone function regulation in proteins. Deregulation caused by lysine methyltransferases has been identified as the cause of several diseases including cancer as well as both mental and developmental disorders. Identifying lysine methylation sites is a critical step in both early diagnosis and drug design. This study proposes a new Machine Learning method called CNN-Meth for predicting lysine methylation sites using a convolutional neural network (CNN). Our model is trained using evolutionary, structural, and physicochemical-based presentation along with binary encoding. Unlike previous studies, instead of extracting handcrafted features, we use CNN to automatically extract features from different presentations of amino acids to avoid information loss. Automated feature extraction from these representations of amino acids as well as CNN as a classifier have never been used for this problem. Our results demonstrate that CNN-Meth can significantly outperform previous methods for predicting methylation sites. It achieves 96.0%, 85.1%, 96.4%, and 0.65 in terms of Accuracy, Sensitivity, Specificity, and Matthew's Correlation Coefficient (MCC), respectively. CNN-Meth and its source code are publicly available at https://github.com/MLBC-lab/CNN-Meth.


Subject(s)
Lysine , Neural Networks, Computer , Lysine/metabolism , Lysine/chemistry , Methylation , Protein Processing, Post-Translational , Machine Learning , Humans , Histone-Lysine N-Methyltransferase/metabolism , Histone-Lysine N-Methyltransferase/genetics , Histone-Lysine N-Methyltransferase/chemistry , Computational Biology/methods
4.
BMC Bioinformatics ; 25(1): 61, 2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38321434

ABSTRACT

BACKGROUND: The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages. RESULTS: We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome. CONCLUSIONS: This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.


Subject(s)
Arabidopsis , Artificial Intelligence , Bayes Theorem , Escherichia coli , Staphylococcus aureus , Software , Algorithms , High-Throughput Nucleotide Sequencing , Machine Learning , Sequence Analysis, DNA
5.
Neuroimage ; 287: 120522, 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38253216

ABSTRACT

Designing a comprehensive four-dimensional resting-state functional magnetic resonance imaging (4D Rs-fMRI) based default mode network (DMN) modeling methodology to reveal the spatio-temporal patterns of individual DMN, is crucial for understanding the cognitive mechanisms of the brain and the pathogenesis of psychiatric disorders. However, there are still two limitations of existing approaches for DMN modeling. The approaches either (1) simply split the spatio-temporal components and ignore the overall character of the spatio-temporal patterns or (2) are biased in the process of feature extraction for DMN modeling, and their spatio-temporal accuracy is thus not warranted. To this end, we propose a novel Spatio-Temporal Brain Attention Skip Network (STBAS-Net) to model the personalized spatio-temporal patterns of the DMN. STBAS-Net consists of spatial and temporal components, where the multi-head attention skip connection block in the spatial component achieves detailed feature extraction and enhancement in the shallow stage. Under the guidance of spatial information, we technically fuse multiple spatio-temporal information in the temporal component, which dexterously exploits the overall spatio-temporal features and achieves mutual constraints of spatio-temporal patterns to characterize the spatio-temporal patterns of the DMN. We verify the proposed STBAS-Net on a publicly released 4D Rs-fMRI dataset and an EMCI dataset. The experimental results show that compared with existing advanced methods, the proposed network can more accurately model the personalized spatio-temporal patterns of the human brain DMN and successfully identify abnormal spatio-temporal patterns in EMCI patients. This study provides a potential tool for revealing the spatio-temporal patterns of the human brain DMN and is expected to provide an effective methodological framework for future exploration of abnormal brain spatio-temporal patterns and modeling of other functional brain networks.


Subject(s)
Brain Mapping , Default Mode Network , Humans , Brain Mapping/methods , Magnetic Resonance Imaging/methods , Brain/diagnostic imaging , Attention , Nerve Net/diagnostic imaging
6.
Funct Integr Genomics ; 24(5): 139, 2024 Aug 19.
Article in English | MEDLINE | ID: mdl-39158621

ABSTRACT

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.


Subject(s)
High-Throughput Nucleotide Sequencing , Machine Learning , Humans , High-Throughput Nucleotide Sequencing/methods , Deep Learning
7.
BMC Plant Biol ; 24(1): 136, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38408925

ABSTRACT

Subsistence farmers and global food security depend on sufficient food production, which aligns with the UN's "Zero Hunger," "Climate Action," and "Responsible Consumption and Production" sustainable development goals. In addition to already available methods for early disease detection and classification facing overfitting and fine feature extraction complexities during the training process, how early signs of green attacks can be identified or classified remains uncertain. Most pests and disease symptoms are seen in plant leaves and fruits, yet their diagnosis by experts in the laboratory is expensive, tedious, labor-intensive, and time-consuming. Notably, how plant pests and diseases can be appropriately detected and timely prevented is a hotspot paradigm in smart, sustainable agriculture remains unknown. In recent years, deep transfer learning has demonstrated tremendous advances in the recognition accuracy of object detection and image classification systems since these frameworks utilize previously acquired knowledge to solve similar problems more effectively and quickly. Therefore, in this research, we introduce two plant disease detection (PDDNet) models of early fusion (AE) and the lead voting ensemble (LVE) integrated with nine pre-trained convolutional neural networks (CNNs) and fine-tuned by deep feature extraction for efficient plant disease identification and classification. The experiments were carried out on 15 classes of the popular PlantVillage dataset, which has 54,305 image samples of different plant disease species in 38 categories. Hyperparameter fine-tuning was done with popular pre-trained models, including DenseNet201, ResNet101, ResNet50, GoogleNet, AlexNet, ResNet18, EfficientNetB7, NASNetMobile, and ConvNeXtSmall. We test these CNNs on the stated plant disease detection and classification problem, both independently and as part of an ensemble. In the final phase, a logistic regression (LR) classifier is utilized to determine the performance of various CNN model combinations. A comparative analysis was also performed on classifiers, deep learning, the proposed model, and similar state-of-the-art studies. The experiments demonstrated that PDDNet-AE and PDDNet-LVE achieved 96.74% and 97.79%, respectively, compared to current CNNs when tested on several plant diseases, depicting its exceptional robustness and generalization capabilities and mitigating current concerns in plant disease detection and classification.


Subject(s)
Neural Networks, Computer , Plant Diseases , Fruit , Machine Learning
8.
Cytometry A ; 105(7): 536-546, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38420862

ABSTRACT

The gold standard of leukocyte differentiation is a manual examination of blood smears, which is not only time and labor intensive but also susceptible to human error. As to automatic classification, there is still no comparative study of cell segmentation, feature extraction, and cell classification, where a variety of machine and deep learning models are compared with home-developed approaches. In this study, both traditional machine learning of K-means clustering versus deep learning of U-Net, U-Net + ResNet18, and U-Net + ResNet34 were used for cell segmentation, producing segmentation accuracies of 94.36% versus 99.17% for the dataset of CellaVision and 93.20% versus 98.75% for the dataset of BCCD, confirming that deep learning produces higher performance than traditional machine learning in leukocyte classification. In addition, a series of deep-learning approaches, including AlexNet, VGG16, and ResNet18, was adopted to conduct feature extraction and cell classification of leukocytes, producing classification accuracies of 91.31%, 97.83%, and 100% of CellaVision as well as 81.18%, 91.64% and 97.82% of BCCD, confirming the capability of the increased deepness of neural networks in leukocyte classification. As to the demonstrations, this study further conducted cell-type classification of ALL-IDB2 and PCB-HBC datasets, producing high accuracies of 100% and 98.49% among all literature, validating the deep learning model used in this study.


Subject(s)
Deep Learning , Leukocytes , Neural Networks, Computer , Humans , Leukocytes/cytology , Leukocytes/classification , Machine Learning , Image Processing, Computer-Assisted/methods , Algorithms
9.
Anal Biochem ; 687: 115460, 2024 04.
Article in English | MEDLINE | ID: mdl-38191118

ABSTRACT

SUMOylation is a protein post-translational modification that plays an essential role in cellular functions. For predicting SUMO sites, numerous researchers have proposed advanced methods based on ordinary machine learning algorithms. These reported methods have shown excellent predictive performance, but there is room for improvement. In this study, we constructed a novel deep neural network Residual Pyramid Network (RsFPN), and developed an ensemble deep learning predictor called iSUMO-RsFPN. Initially, three feature extraction methods were employed to extract features from samples. Following this, weak classifiers were trained based on RsFPN for each feature type. Ultimately, the weak classifiers were integrated to construct the final classifier. Moreover, the predictor underwent systematically testing on an independent test dataset, where the results demonstrated a significant improvement over the existing state-of-the-art predictors. The code of iSUMO-RsFPN is free and available at https://github.com/454170054/iSUMO-RsFPN.


Subject(s)
Lysine , Sumoylation , Neural Networks, Computer , Machine Learning , Algorithms
10.
Biotechnol Bioeng ; 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39044472

ABSTRACT

In the burgeoning field of proteins, the effective analysis of intricate protein data remains a formidable challenge, necessitating advanced computational tools for data processing, feature extraction, and interpretation. This study introduces ProteinFlow, an innovative framework designed to revolutionize feature engineering in protein data analysis. ProteinFlow stands out by offering enhanced efficiency in data collection and preprocessing, along with advanced capabilities in feature extraction, directly addressing the complexities inherent in multidimensional protein data sets. Through a comparative analysis, ProteinFlow demonstrated a significant improvement over traditional methods, notably reducing data preprocessing time and expanding the scope of biologically significant features identified. The framework's parallel data processing strategy and advanced algorithms ensure not only rapid data handling but also the extraction of comprehensive, meaningful insights from protein sequences, structures, and interactions. Furthermore, ProteinFlow exhibits remarkable scalability, adeptly managing large-scale data sets without compromising performance, a crucial attribute in the era of big data.

11.
Biometrics ; 80(2)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38888097

ABSTRACT

Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to the highly complicated model structure and overparameterization. To address this challenge, we propose a new Bayesian approach by embedding CNNs within the generalized linear models (GLMs) framework. We use extracted nodes from the last hidden layer of CNN with Monte Carlo (MC) dropout as informative covariates in GLM. This improves accuracy in prediction and regression coefficient inference, allowing for the interpretation of coefficients and uncertainty quantification. By fitting ensemble GLMs across multiple realizations from MC dropout, we can account for uncertainties in extracting the features. We apply our methods to biological and epidemiological problems, which have both high-dimensional correlated inputs and vector covariates. Specifically, we consider malaria incidence data, brain tumor image data, and fMRI data. By extracting information from correlated inputs, the proposed method can provide an interpretable Bayesian analysis. The algorithm can be broadly applicable to image regressions or correlated data analysis by enabling accurate Bayesian inference quickly.


Subject(s)
Bayes Theorem , Brain Neoplasms , Magnetic Resonance Imaging , Monte Carlo Method , Neural Networks, Computer , Humans , Linear Models , Magnetic Resonance Imaging/statistics & numerical data , Magnetic Resonance Imaging/methods , Malaria/epidemiology , Algorithms
12.
Stat Med ; 43(5): 1019-1047, 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-38155152

ABSTRACT

Birth defects and their associated deaths, high health and financial costs of maternal care and associated morbidity are major contributors to infant mortality. If permitted by law, prenatal diagnosis allows for intrauterine care, more complicated hospital deliveries, and termination of pregnancy. During pregnancy, a set of measurements is commonly used to monitor the fetal health, including fetal head circumference, crown-rump length, abdominal circumference, and femur length. Because of the intricate interactions between the biological tissues and the US waves mother and fetus, analyzing fetal US images from a specialized perspective is difficult. Artifacts include acoustic shadows, speckle noise, motion blur, and missing borders. The fetus moves quickly, body structures close, and the weeks of pregnancy vary greatly. In this work, we propose a fetal growth analysis through US image of head circumference biometry using optimal segmentation and hybrid classifier. First, we introduce a hybrid whale with oppositional fruit fly optimization (WOFF) algorithm for optimal segmentation of segment fetal head which improves the detection accuracy. Next, an improved U-Net design is utilized for the hidden feature (head circumference biometry) extraction which extracts features from the segmented extraction. Then, we design a modified Boosting arithmetic optimization (MBAO) algorithm for feature optimization to selects optimal best features among multiple features for the reduction of data dimensionality issues. Furthermore, a hybrid deep learning technique called bi-directional LSTM with convolutional neural network (B-LSTM-CNN) for fetal growth analysis to compute the fetus growth and health. Finally, we validate our proposed method through the open benchmark datasets are HC18 (Ultrasound image) and oxford university research archive (ORA-data) (Ultrasound video frames). We compared the simulation results of our proposed algorithm with the existing state-of-art techniques in terms of various metrics.


Subject(s)
Fetal Development , Ultrasonography, Prenatal , Pregnancy , Female , Humans , Ultrasonography, Prenatal/methods , Biometry , Algorithms , Neural Networks, Computer
13.
RNA Biol ; 21(1): 1-12, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38528797

ABSTRACT

The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.


Subject(s)
Deep Learning , RNA, Untranslated/genetics , Algorithms , RNA , Neural Networks, Computer
14.
J Gastroenterol Hepatol ; 39(7): 1343-1351, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38414305

ABSTRACT

BACKGROUND AND AIM: Early whitish gastric neoplasms can be easily misdiagnosed; differential diagnosis of gastric whitish lesions remains a challenge. We aim to build a deep learning (DL) model to diagnose whitish gastric neoplasms and explore the effect of adding domain knowledge in model construction. METHODS: We collected 4558 images from two institutions to train and test models. We first developed two sole DL models (1 and 2) using supervised and semi-supervised algorithms. Then we selected diagnosis-related features through literature research and developed feature-extraction models to determine features including boundary, surface, roundness, depression, and location. Then predictions of the five feature-extraction models and sole DL model were combined and inputted into seven machine-learning (ML) based fitting-diagnosis models. The optimal model was selected as ENDOANGEL-WD (whitish-diagnosis) and compared with endoscopists. RESULTS: Sole DL 2 had higher sensitivity (83.12% vs 68.67%, Bonferroni adjusted P = 0.024) than sole DL 1. Adding domain knowledge, the decision tree performed best among the seven ML models, achieving higher specificity than DL 1 (84.38% vs 72.27%, Bonferroni adjusted P < 0.05) and higher accuracy than DL 2 (80.47%, Bonferroni adjusted P < 0.001) and was selected as ENDOANGEL-WD. ENDOANGEL-WD showed better accuracy compared with 10 endoscopists (75.70%, P < 0.001). CONCLUSIONS: We developed a novel system ENDOANGEL-WD combining domain knowledge and traditional DL to detect gastric whitish neoplasms. Adding domain knowledge improved the performance of traditional DL, which provided a novel solution for establishing diagnostic models for other rare diseases potentially.


Subject(s)
Deep Learning , Stomach Neoplasms , Humans , Stomach Neoplasms/diagnosis , Retrospective Studies , Diagnosis, Differential , Sensitivity and Specificity , Algorithms
15.
Network ; : 1-30, 2024 May 29.
Article in English | MEDLINE | ID: mdl-38808648

ABSTRACT

Sentiment Analysis (SA) is a technique for categorizing texts based on the sentimental polarity of people's opinions. This paper introduces a sentiment analysis (SA) model with text and emojis. The two preprocessed data's are data with text and emojis and text without emojis. Feature extraction consists text features and text with emojis features. The text features are features like N-grams, modified Term Frequency-Inverse Document Frequency (TF-IDF), and Bag-of-Words (BoW) features extracted from the text. In classification, CNN (Conventional Neural Network) and MLP (Multi-Layer Perception) use emojis and text-based SA. The CNN weight is optimized by a new Electric fish Customized Shark Smell Optimization (ECSSO) Algorithm. Similarly, the text-based SA is carried out by hybrid Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN) classifiers. The bagged data are given as input to the classification process via RNN and LSTM. Here, the weight of LSTM is optimized by the suggested ECSSO algorithm. Then, the mean of LSTM and RNN determines the final output. The specificity of the developed scheme is 29.01%, 42.75%, 23.88%,22.07%, 25.31%, 18.42%, 5.68%, 10.34%, 6.20%, 6.64%, and 6.84% better for 70% than other models. The efficiency of the proposed scheme is computed and evaluated.

16.
Network ; 35(2): 154-189, 2024 May.
Article in English | MEDLINE | ID: mdl-38155542

ABSTRACT

The remarkable development in technology has led to the increase of massive big data. Machine learning processes provide a way for investigators to examine and particularly classify big data. Besides, several machine learning models rely on powerful feature extraction and feature selection techniques for their success. In this paper, a big data classification approach is developed using an optimized deep learning classifier integrated with hybrid feature extraction and feature selection approaches. The proposed technique uses local linear embedding-based kernel principal component analysis and perturbation theory, respectively, to extract more representative data and select the appropriate features from the big data environment. In addition, the feature selection task is fine-tuned by using perturbation theory through heuristic search based on their output accuracy. This feature selection heuristic search method is analysed with five recent heuristic optimization algorithms for deciding the final feature subset. Finally, the data are categorized through an attention-based bidirectional long short-term memory classifier that is optimized with a golden eagle-inspired algorithm. The performance of the proposed model is experimentally verified on publicly accessible datasets. From the experimental outcomes, it is demonstrated that the proposed framework is capable of classifying large datasets with more than 90% accuracy.


Subject(s)
Algorithms , Big Data , Propylamines , Sulfides , Machine Learning
17.
Network ; 35(3): 249-277, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38224325

ABSTRACT

This research introduces a revolutionary machinet learning algorithm-based quality estimation and grading system. The suggested work is divided into four main parts: Ppre-processing, neutroscopic model transformation, Feature Extraction, and Grading. The raw images are first pre-processed by following five major stages: read, resize, noise removal, contrast enhancement via CLAHE, and Smoothing via filtering. The pre-processed images are then converted into a neutrosophic domain for more effective mango grading. The image is processed under a new Geometric Mean based neutrosophic approach to transforming it into the neutrosophic domain. Finally, the prediction of TSS for the different chilling conditions is done by Improved Deep Belief Network (IDBN) and based on this; the grading of mango is done automatically as the model is already trained with it. Here, the prediction of TSS is carried out under the consideration of SSC, firmness, and TAC. A comparison between the proposed and traditional methods is carried out to confirm the efficacy of various metrics.


Subject(s)
Mangifera , Algorithms , Neural Networks, Computer , Humans , Deep Learning , Image Processing, Computer-Assisted/methods , Machine Learning
18.
Network ; : 1-31, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38708841

ABSTRACT

In contemporary times, content-based image retrieval (CBIR) techniques have gained widespread acceptance as a means for end-users to discern and extract specific image content from vast repositories. However, it is noteworthy that a substantial majority of CBIR studies continue to rely on linear methodologies such as gradient-based and derivative-based edge detection techniques. This research explores the integration of bioinspired Spiking Neural Network (SNN) based edge detection within CBIR. We introduce an innovative, computationally efficient SNN-based approach designed explicitly for CBIR applications, outperforming existing SNN models by reducing computational overhead by 2.5 times. The proposed SNN-based edge detection approach is seamlessly incorporated into three distinct CBIR techniques, each employing conventional edge detection methodologies including Sobel, Canny, and image derivatives. Rigorous experimentation and evaluations are carried out utilizing the Corel-10k dataset and crop weed dataset, a widely recognized and frequently adopted benchmark dataset in the realm of image analysis. Importantly, our findings underscore the enhanced performance of CBIR methodologies integrating the proposed SNN-based edge detection approach, with an average increase in mean precision values exceeding 3%. This study conclusively demonstrated the utility of our proposed methodology in optimizing feature extraction, thereby establishing its pivotal role in advancing edge centric CBIR approaches.

19.
Dig Dis Sci ; 69(8): 2985-2995, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38837111

ABSTRACT

BACKGROUND: Colorectal cancer (CRC) is a malignant tumor within the digestive tract with both a high incidence rate and mortality. Early detection and intervention could improve patient clinical outcomes and survival. METHODS: This study computationally investigates a set of prognostic tissue and cell features from diagnostic tissue slides. With the combination of clinical prognostic variables, the pathological image features could predict the prognosis in CRC patients. Our CRC prognosis prediction pipeline sequentially consisted of three modules: (1) A MultiTissue Net to delineate outlines of different tissue types within the WSI of CRC for further ROI selection by pathologists. (2) Development of three-level quantitative image metrics related to tissue compositions, cell shape, and hidden features from a deep network. (3) Fusion of multi-level features to build a prognostic CRC model for predicting survival for CRC. RESULTS: Experimental results suggest that each group of features has a particular relationship with the prognosis of patients in the independent test set. In the fusion features combination experiment, the accuracy rate of predicting patients' prognosis and survival status is 81.52%, and the AUC value is 0.77. CONCLUSION: This paper constructs a model that can predict the postoperative survival of patients by using image features and clinical information. Some features were found to be associated with the prognosis and survival of patients.


Subject(s)
Colorectal Neoplasms , Humans , Colorectal Neoplasms/pathology , Colorectal Neoplasms/mortality , Prognosis , Male , Female , Image Interpretation, Computer-Assisted , Predictive Value of Tests
20.
BMC Med Imaging ; 24(1): 176, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39030496

ABSTRACT

Medical imaging stands as a critical component in diagnosing various diseases, where traditional methods often rely on manual interpretation and conventional machine learning techniques. These approaches, while effective, come with inherent limitations such as subjectivity in interpretation and constraints in handling complex image features. This research paper proposes an integrated deep learning approach utilizing pre-trained models-VGG16, ResNet50, and InceptionV3-combined within a unified framework to improve diagnostic accuracy in medical imaging. The method focuses on lung cancer detection using images resized and converted to a uniform format to optimize performance and ensure consistency across datasets. Our proposed model leverages the strengths of each pre-trained network, achieving a high degree of feature extraction and robustness by freezing the early convolutional layers and fine-tuning the deeper layers. Additionally, techniques like SMOTE and Gaussian Blur are applied to address class imbalance, enhancing model training on underrepresented classes. The model's performance was validated on the IQ-OTH/NCCD lung cancer dataset, which was collected from the Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases over a period of three months in fall 2019. The proposed model achieved an accuracy of 98.18%, with precision and recall rates notably high across all classes. This improvement highlights the potential of integrated deep learning systems in medical diagnostics, providing a more accurate, reliable, and efficient means of disease detection.


Subject(s)
Deep Learning , Lung Neoplasms , Humans , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed/methods , Neural Networks, Computer
SELECTION OF CITATIONS
SEARCH DETAIL