Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
J Med Syst ; 48(1): 10, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38193948

RESUMO

Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.


Assuntos
Algoritmos , Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/genética , Pessoal de Saúde , Aprendizado de Máquina , Máquina de Vetores de Suporte
2.
BMC Bioinformatics ; 24(1): 479, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38102551

RESUMO

Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.


Assuntos
Algoritmos , Neoplasias , Humanos , Análise em Microsséries , Neoplasias/genética , Técnicas Genéticas , Aprendizado de Máquina
3.
Sensors (Basel) ; 23(15)2023 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-37571552

RESUMO

Good feature engineering is a prerequisite for accurate classification, especially in challenging scenarios such as detecting the breathing of living persons trapped under building rubble using bioradar. Unlike monitoring patients' breathing through the air, the measuring conditions of a rescue bioradar are very complex. The ultimate goal of search and rescue is to determine the presence of a living person, which requires extracting representative features that can distinguish measurements with the presence of a person and without. To address this challenge, we conducted a bioradar test scenario under laboratory conditions and decomposed the radar signal into different range intervals to derive multiple virtual scenes from the real one. We then extracted physical and statistical quantitative features that represent a measurement, aiming to find those features that are robust to the complexity of rescue-radar measuring conditions, including different rubble sites, breathing rates, signal strengths, and short-duration disturbances. To this end, we utilized two methods, Analysis of Variance (ANOVA), and Minimum Redundancy Maximum Relevance (MRMR), to analyze the significance of the extracted features. We then trained the classification model using a linear kernel support vector machine (SVM). As the main result of this work, we identified an optimal feature set of four features based on the feature ranking and the improvement in the classification accuracy of the SVM model. These four features are related to four different physical quantities and independent from different rubble sites.


Assuntos
Radar , Taxa Respiratória , Humanos , Máquina de Vetores de Suporte
4.
Int J Mol Sci ; 23(8)2022 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-35457243

RESUMO

Trans-acting splicing factors play a pivotal role in modulating alternative splicing by specifically binding to cis-elements in pre-mRNAs. There are approximately 1500 RNA-binding proteins (RBPs) in the human genome, but the activities of these RBPs in alternative splicing are unknown. Since determining RBP activities through experimental methods is expensive and time consuming, the development of an efficient computational method for predicting the activities of RBPs in alternative splicing from their sequences is of great practical importance. Recently, a machine learning model for predicting the activities of splicing factors was built based on features of single and dual amino acid compositions. Here, we explored the role of physicochemical and structural properties in predicting their activities in alternative splicing using machine learning approaches and found that the prediction performance is significantly improved by including these properties. By combining the minimum redundancy-maximum relevance (mRMR) method and forward feature searching strategy, a promising feature subset with 24 features was obtained to predict the activities of RBPs. The feature subset consists of 16 dual amino acid compositions, 5 physicochemical features, and 3 structural features. The physicochemical and structural properties were as important as the sequence composition features for an accurate prediction of the activities of splicing factors. The hydrophobicity and distribution of coil are suggested to be the key physicochemical and structural features, respectively.


Assuntos
Biologia Computacional , Transativadores , Algoritmos , Aminoácidos , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Fatores de Processamento de RNA , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética
5.
Sensors (Basel) ; 20(12)2020 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-32560493

RESUMO

Bearing is one of the key components of a rotating machine. Hence, monitoring health condition of the bearing is of paramount importace. This paper develops a novel particle swarm optimization (PSO)-least squares wavelet support vector machine (PSO-LSWSVM) classifier, which is designed based on a combination between a PSO, a least squares procedure, and a new wavelet kernel function-based support vector machine (SVM), for bearing fault diagnosis. In this work, bearing fault classification is transformed into a pattern recognition problem, which consists of three stages of data processing. Firstly, a rich information dataset is built by extracting the features from the signals, which are decomposed by the nonlocal means (NLM) and empirical mode decomposition (EMD). Secondly, a minimum-redundancy maximum-relevance (mRMR) method is employed to determine a subset of feature that can provide an optimal performance. Thirdly, a novel classifier, namely LSWSVM, is proposed with the aid of a PSO, to provide higher classification accuracy. The key innovative science of this work is to propropose a new classifier with the aid of an new wavelet kernel type to increase the classification precision of bearing fault diagnosis. The merit features of the proposed approach are demonstrated based on a benchmark bearing dataset and a comprehensive comparison procedure.

6.
Mol Genet Genomics ; 293(1): 137-149, 2018 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28913654

RESUMO

As non-coding RNAs, circular RNAs (cirRNAs) and long non-coding RNAs (lncRNAs) have attracted an increasing amount of attention. They have been confirmed to participate in many biological processes, including playing roles in transcriptional regulation, regulating protein-coding genes, and binding to RNA-associated proteins. Until now, the differences between these two types of non-coding RNAs have not been fully uncovered. It is still quite difficult to detect cirRNAs from other lncRNAs using simple techniques. In this study, we investigated these two types of non-coding RNAs using several computational methods. The purpose was to extract important factors that could distinguish cirRNAs from other lncRNAs and build an effective classification model to distinguish them. First, we collected cirRNAs, lncRNAs and their representations from a previous study, in which each cirRNA or lncRNA was represented by 188 features derived from its graph representation, sequence and conservation properties. Second, these features were analyzed by the minimum redundancy maximum relevance (mRMR) method. The obtained mRMR feature list, incremental feature selection method and hierarchical extreme learning machine algorithm were employed to build an optimal classification model with sensitivity of 0.703, specificity of 0.850, accuracy of 0.789 and a Matthews correlation coefficient of 0.561. Finally, we analyzed the 16 most important features. Of them, the sequences and structures of the RNA molecule were top ranking, implying they can be potential indicators of differences between cirRNAs and other lncRNAs. Meanwhile, other features of evolutionary conversation, sequence consecution were also important.


Assuntos
Biologia Computacional/métodos , RNA Longo não Codificante/isolamento & purificação , RNA/isolamento & purificação , Algoritmos , Ácidos Nucleicos Livres/genética , Aprendizado de Máquina , RNA/genética , RNA Circular , RNA Longo não Codificante/genética
7.
Biochim Biophys Acta ; 1860(11 Pt B): 2725-34, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-26801878

RESUMO

BACKGROUND: Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers. METHODS: In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways. RESULTS: Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them. CONCLUSIONS: This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request. GENERAL SIGNIFICANCE: We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.


Assuntos
Neoplasias/genética , Oncogenes/genética , Transdução de Sinais/genética , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Ontologia Genética , Humanos
8.
Biochim Biophys Acta ; 1860(11 Pt B): 2619-26, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27208425

RESUMO

BACKGROUND: Chemical toxicity is one of the major barriers for designing and detecting new chemical entities during drug discovery. Unexpected toxicity of an approved drug may lead to withdrawal from the market and significant loss of the associated costs. Better understanding of the mechanisms underlying various toxicity effects can help eliminate unqualified candidate drugs in early stages, allowing researchers to focus their attention on other more viable candidates. METHODS: In this study, we aimed to understand the mechanisms underlying several toxicity effects using Gene Ontology (GO) terms and KEGG pathways. GO term and KEGG pathway enrichment theories were adopted to encode each chemical, and the minimum redundancy maximum relevance (mRMR) was used to analyze the GO terms and the KEGG pathways. Based on the feature list obtained by the mRMR method, the most related GO terms and KEGG pathways were extracted. RESULTS: Some important GO terms and KEGG pathways were uncovered, which were concluded to be significant for determining chemical toxicity effects. CONCLUSIONS: Several GO terms and KEGG pathways are highly related to all investigated toxicity effects, while some are specific to a certain toxicity effect. GENERAL SIGNIFICANCE: The findings in this study have the potential to further our understanding of different chemical toxicity mechanisms and to assist scientists in developing new chemical toxicity prediction algorithms. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.


Assuntos
Preparações Farmacêuticas/química , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Descoberta de Drogas , Ontologia Genética
9.
Gynecol Obstet Invest ; 82(4): 361-370, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27846619

RESUMO

BACKGROUND: The overall survival rate of ovarian cancer patients is still poor because of the difficulties encountered in detection, diagnosis and treatment. Here, we aim to systematically identify the genetic factors causing ovarian cancer and find the accurate diagnostic and therapeutic targets for ovarian cancer. METHODS: We collected the known archived ovarian cancer-related genes from the databases used as the investigated targets and employed the minimum redundancy maximum relevance and random forest classification to identify the novel ovarian cancer-related genes in addition to the known ones. We further identified candidates as the markers for the detection of the ovarian cancer based on the gene expression data and then confirmed them by quantitative real-time PCR. RESULTS: We found out the genetic terms to interpret the mechanism of ovarian cancer. Based on those terms, we predicted 860 novel related genes as candidates. These candidates can act as expression biomarkers for clinical detection and they achieved a 100% accuracy. We verified 10 of them as the optimal biomarkers for detection in the expression data. CONCLUSION: We employed the features of achieved ovarian cancer-related genes to identify 860 novel ovarian cancer genes. We further validated 10 genes as biomarkers for detection of ovarian cancer.


Assuntos
Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica/métodos , Neoplasias Ovarianas/genética , Transcriptoma/genética , Feminino , Humanos , Reação em Cadeia da Polimerase em Tempo Real
10.
Mol Genet Genomics ; 291(6): 2065-2079, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27530612

RESUMO

Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Bibliotecas de Moléculas Pequenas/farmacologia , Algoritmos , Bases de Dados Genéticas , Ontologia Genética , Proteínas/química
11.
Heliyon ; 10(11): e31882, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38841483

RESUMO

Background: TNFRSF4 plays a significant role in cancer progression, especially in hepatocellular carcinoma (HCC). This study aims to investigate the prognostic value of TNFRSF4 expression in patients with HCC and to develop a predictive pathomics model for its expression. Methods: A cohort of patients with HCC retrieved from the TCGA database was analyzed using RNA-seq analysis to determine TNFRSF4 expression and its impact on overall survival (OS). Additionally, hematoxylin-eosin staining analysis was performed to construct a pathomics model for predicting TNFRSF4 expression. Then, pathway enrichment analysis was conducted, immune checkpoint markers were investigated, and immune cell infiltration was examined to explore the underlying biological mechanism of the pathomics score. Results: TNFRSF4 expression was significantly higher in tumor tissues than in normal tissues. TNFRSF4 expression also exhibited significant correlations with various clinical variables, including pathologic stage III/IV and R1/R2/RX residual tumor. Furthermore, elevated TNFRSF4 expression was associated with unfavorable OS. Interestingly, in the subgroup analysis, elevated TNFRSF4 expression was identified as a significant risk factor for OS in male patients. The newly developed pathomics model successfully predicted TNFRSF4 expression with good performance and revealed a significant association between high pathomics scores and worse OS. In male patients, high pathomics scores were also associated with a higher risk of mortality. Moreover, pathomics scores were also involved in specific hallmarks, immune-related characteristics, and apoptosis-related genes in HCC, such as epithelial-mesenchymal transition, Tregs, and BAX expression. Conclusions: Our findings suggest that TNFRSF4 expression and the newly devised pathomics scores hold potential as prognostic markers for OS in patients with HCC. Additionally, gender influenced the association between these markers and patient outcomes.

12.
J Clin Med ; 11(14)2022 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-35887768

RESUMO

Heart rate is quite regular during sinus (normal) rhythm (SR) originating from the sinus node. In contrast, heart rate is usually irregular during atrial fibrillation (AF). Complete atrioventricular block with an escape rhythm, ventricular pacing, or ventricular tachycardia are the most common exceptions when heart rate may be regular in AF. Heart rate variability (HRV) is the variation in the duration of consecutive cardiac cycles (RR intervals). We investigated the utility of HRV parameters for automated detection of AF with machine learning (ML) classifiers. The minimum redundancy maximum relevance (MRMR) algorithm, one of the most effective algorithms for feature selection, helped select the HRV parameters (including five original), best suited for distinguishing AF from SR in a database of over 53,000 60 s separate electrocardiogram (ECG) segments cut from longer (up to 24 h) ECG recordings. HRV parameters entered the ML-based classifiers as features. Seven different, commonly used classifiers were trained with one to six HRV-based features with the highest scores resulting from the MRMR algorithm and tested using the 5-fold cross-validation and blindfold validation. The best ML classifier in the blindfold validation achieved an accuracy of 97.2% and diagnostic odds ratio of 1566. From all studied HRV features, the top three HRV parameters distinguishing AF from SR were: the percentage of successive RR intervals differing by at least 50 ms (pRR50), the ratio of standard deviations of points along and across the identity line of the Poincare plots, respectively (SD2/SD1), and coefficient of variation-standard deviation of RR intervals divided by their mean duration (CV). The proposed methodology and the presented results of the selection of HRV parameters have the potential to develop practical solutions and devices for automatic AF detection with minimal sets of simple HRV parameters. Using straightforward ML classifiers and the extremely small sets of simple HRV features, always with pRR50 included, the differentiation of AF from sinus rhythms in the 60 s ECGs is very effective.

13.
Micromachines (Basel) ; 13(10)2022 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-36296118

RESUMO

Small target features are difficult to distinguish and identify in an environment with complex backgrounds. The identification and extraction of multi-dimensional features have been realized due to the rapid development of deep learning, but there are still redundant relationships between features, reducing feature recognition accuracy. The YOLOv5 neural network is used in this paper to achieve preliminary feature extraction, and the minimum redundancy maximum relevance algorithm is used for the 512 candidate features extracted in the fully connected layer to perform de-redundancy processing on the features with high correlation, reducing the dimension of the feature set and making small target feature recognition a reality. Simultaneously, by pre-processing the image, the feature recognition of the pre-processed image can be improved. Simultaneously, by pre-processing the image, the feature recognition of the pre-processed image can significantly improve the recognition accuracy. The experimental results demonstrate that using the minimum redundancy maximum relevance algorithm can effectively reduce the feature dimension and identify small target features.

14.
J King Saud Univ Comput Inf Sci ; 34(6): 3226-3235, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38620614

RESUMO

Chest X-ray image contains sufficient information that finds wide-spread applications in diverse disease diagnosis and decision making to assist the medical experts. This paper has proposed an intelligent approach to detect Covid-19 from the chest X-ray image using the hybridization of deep convolutional neural network (CNN) and discrete wavelet transform (DWT) features. At first, the X-ray image is enhanced and segmented through preprocessing tasks, and then deep CNN and DWT features are extracted. The optimum features are extracted from these hybridized features through minimum redundancy and maximum relevance (mRMR) along with recursive feature elimination (RFE). Finally, the random forest-based bagging approach is used for doing the detection task. An extensive experiment is performed, and the results confirm that our approach gives satisfactory performance compare to the existing methods with an overall accuracy of more than 98.5%.

15.
Ophthalmol Sci ; 2(4): 100171, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36531588

RESUMO

Purpose: No established biomarkers currently exist for therapeutic efficacy and durability of anti-VEGF therapy in neovascular age-related macular degeneration (nAMD). This study evaluated radiomic-based quantitative OCT biomarkers that may be predictive of anti-VEGF treatment response and durability. Design: Assessment of baseline biomarkers using machine learning (ML) classifiers to predict tolerance to anti-VEGF therapy. Participants: Eighty-one participants with treatment-naïve nAMD from the OSPREY study, including 15 super responders (patients who achieved and maintained retinal fluid resolution) and 66 non-super responders (patients who did not achieve or maintain retinal fluid resolution). Methods: A total of 962 texture-based radiomic features were extracted from fluid, subretinal hyperreflective material (SHRM), and different retinal tissue compartments of OCT scans. The top 8 features, chosen by the minimum redundancy maximum relevance feature selection method, were evaluated using 4 ML classifiers in a cross-validated approach to distinguish between the 2 patient groups. Longitudinal assessment of changes in different texture-based radiomic descriptors (delta-texture features) between baseline and month 3 also was performed to evaluate their association with treatment response. Additionally, 8 baseline clinical parameters and a combination of baseline OCT, delta-texture features, and the clinical parameters were evaluated in a cross-validated approach in terms of association with therapeutic response. Main Outcome Measures: The cross-validated area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity were calculated to validate the classifier performance. Results: The cross-validated AUC by the quadratic discriminant analysis classifier was 0.75 ± 0.09 using texture-based baseline OCT features. The delta-texture features within different OCT compartments between baseline and month 3 yielded an AUC of 0.78 ± 0.08. The baseline clinical parameters sub-retinal pigment epithelium volume and intraretinal fluid volume yielded an AUC of 0.62 ± 0.07. When all the baseline, delta, and clinical features were combined, a statistically significant improvement in the classifier performance (AUC, 0.81 ± 0.07) was obtained. Conclusions: Radiomic-based quantitative assessment of OCT images was shown to distinguish between super responders and non-super responders to anti-VEGF therapy in nAMD. The baseline fluid and SHRM delta-texture features were found to be most discriminating across groups.

16.
Comput Struct Biotechnol J ; 20: 3783-3795, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35891786

RESUMO

In transcriptomics, differentially expressed genes (DEGs) provide fine-grained phenotypic resolution for comparisons between groups and insights into molecular mechanisms underlying the pathogenesis of complex diseases or phenotypes. The robust detection of DEGs from large datasets is well-established. However, owing to various limitations (e.g., the low availability of samples for some diseases or limited research funding), small sample size is frequently used in experiments. Therefore, methods to screen reliable and stable features are urgently needed for analyses with limited sample size. In this study, MSPJ, a new machine learning approach for identifying DEGs was proposed to mitigate the reduced power and improve the stability of DEG identification in small gene expression datasets. This ensemble learning-based method consists of three algorithms: an improved multiple random sampling with meta-analysis, SVM-RFE (support vector machines-recursive feature elimination), and permutation test. MSPJ was compared with ten classical methods by 94 simulated datasets and large-scale benchmarking with 165 real datasets. The results showed that, among these methods MSPJ had the best performance in most small gene expression datasets, especially those with sample size below 30. In summary, the MSPJ method enables effective feature selection for robust DEG identification in small transcriptome datasets and is expected to expand research on the molecular mechanisms underlying complex diseases or phenotypes.

17.
Life (Basel) ; 11(6)2021 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-34204983

RESUMO

Antifreeze protein (AFP) is a proteinaceous compound with improved antifreeze ability and binding ability to ice to prevent its growth. As a surface-active material, a small number of AFPs have a tremendous influence on the growth of ice. Therefore, identifying novel AFPs is important to understand protein-ice interactions and create novel ice-binding domains. To date, predicting AFPs is difficult due to their low sequence similarity for the ice-binding domain and the lack of common features among different AFPs. Here, a computational engine was developed to predict the features of AFPs and reveal the most important 39 features for AFP identification, such as antifreeze-like/N-acetylneuraminic acid synthase C-terminal, insect AFP motif, C-type lectin-like, and EGF-like domain. With this newly presented computational method, a group of previously confirmed functional AFP motifs was screened out. This study has identified some potential new AFP motifs and contributes to understanding biological antifreeze mechanisms.

18.
BMC Med Genomics ; 14(1): 285, 2021 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-34852799

RESUMO

BACKGROUND: We previously identified differentially expressed genes on the basis of false discovery rate adjusted P value using empirical Bayes moderated tests. However, that approach yielded a subset of differentially expressed genes without accounting for redundancy between the selected genes. METHODS: This study is a secondary analysis of a case-control study of the effect of antiretroviral therapy on apoptosis pathway genes comprising of 16 cases (HIV infected with mitochondrial toxicity) and 16 controls (uninfected). We applied the maximum relevance minimum redundancy (mRMR) algorithm on the genes that were differentially expressed between the cases and controls. The mRMR algorithm iteratively selects features (genes) that are maximally relevant for class prediction and minimally redundant. We implemented several machine learning classifiers and tested the prediction accuracy of the two mRMR genes. We next used network analysis to estimate and visualize the association among the differentially expressed genes. We employed Markov Random Field or undirected network models to identify gene networks related to mitochondrial toxicity. The Spinglass model was used to identify clusters of gene communities. RESULTS: The mRMR algorithm ranked DFFA and TNFRSF1A, two of the upregulated proapoptotic genes, on the top. The overall prediction accuracy was 86%, the two mRMR genes correctly classified 86% of the participants into their respective groups. The estimated network models showed different patterns of gene networks. In the network of the cases, FASLG was the most central gene. However, instead of FASLG, ABL1 and LTBR had the highest centrality in controls. CONCLUSION: The mRMR algorithm and network analysis revealed a new correlation of genes associated with mitochondrial toxicity.


Assuntos
Infecções por HIV , Leucócitos Mononucleares , Algoritmos , Apoptose , Teorema de Bayes , Estudos de Casos e Controles , Infecções por HIV/tratamento farmacológico , Infecções por HIV/genética , Humanos
19.
Biomed Tech (Berl) ; 66(5): 489-501, 2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-33939896

RESUMO

Myocardial infarction (MI) happens when blood stops circulating to an explicit segment of the heart causing harm to the heart muscles. Vectorcardiography (VCG) is a technique of recording direction and magnitude of the signals that are produced by the heart in a 3-lead representation. In this work, we present a technique for detection of MI in the inferior portion of heart using short duration VCG signals. The raw signal was pre-processed using the median and Savitzky-Golay (SG) filter. The Stationary Wavelet Transform (SWT) was used for time-invariant decomposition of the signal followed by feature extraction. The selected features using minimum-redundancy-maximum-relevance (mRMR) based feature selection method were applied to the supervised classification methods. The efficacy of the proposed method was assessed under both class-oriented and a more real-life subject-oriented approach. An accuracy of 99.14 and 89.37% were achieved respectively. Results of the proposed technique are better than existing state-of-art methods and used VCG segment is shorter. Thus, a shorter segment and a high accuracy can be helpful in the automation of timely and reliable detection of MI. The satisfactory performance achieved in the subject-oriented approach shows reliability and applicability of the proposed technique.


Assuntos
Infarto Miocárdico de Parede Inferior , Infarto do Miocárdio , Eletrocardiografia , Coração , Humanos , Infarto do Miocárdio/diagnóstico , Reprodutibilidade dos Testes , Vetorcardiografia
20.
Comput Struct Biotechnol J ; 19: 5008-5018, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34589181

RESUMO

Knowing metastasis is the primary cause of cancer-related deaths, incentivized research directed towards unraveling the complex cellular processes that drive the metastasis. Advancement in technology and specifically the advent of high-throughput sequencing provides knowledge of such processes. This knowledge led to the development of therapeutic and clinical applications, and is now being used to predict the onset of metastasis to improve diagnostics and disease therapies. In this regard, predicting metastasis onset has also been explored using artificial intelligence approaches that are machine learning, and more recently, deep learning-based. This review summarizes the different machine learning and deep learning-based metastasis prediction methods developed to date. We also detail the different types of molecular data used to build the models and the critical signatures derived from the different methods. We further highlight the challenges associated with using machine learning and deep learning methods, and provide suggestions to improve the predictive performance of such methods.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa