RESUMO
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (${\chi }^{2}$) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
Assuntos
Algoritmos , Aprendizado de Máquina , Cristalização , Sequência de Aminoácidos , Biologia ComputacionalRESUMO
Anticancer peptides (ACPs) are the types of peptides that have been demonstrated to have anticancer activities. Using ACPs to prevent cancer could be a viable alternative to conventional cancer treatments because they are safer and display higher selectivity. Due to ACP identification being highly lab-limited, expensive and lengthy, a computational method is proposed to predict ACPs from sequence information in this study. The process includes the input of the peptide sequences, feature extraction in terms of ordinal encoding with positional information and handcrafted features, and finally feature selection. The whole model comprises of two modules, including deep learning and machine learning algorithms. The deep learning module contained two channels: bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN). Light Gradient Boosting Machine (LightGBM) was used in the machine learning module. Finally, this study voted the three models' classification results for the three paths resulting in the model ensemble layer. This study provides insights into ACP prediction utilizing a novel method and presented a promising performance. It used a benchmark dataset for further exploration and improvement compared with previous studies. Our final model has an accuracy of 0.7895, sensitivity of 0.8153 and specificity of 0.7676, and it was increased by at least 2% compared with the state-of-the-art studies in all metrics. Hence, this paper presents a novel method that can potentially predict ACPs more effectively and efficiently. The work and source codes are made available to the community of researchers and developers at https://github.com/khanhlee/acp-ope/.
Assuntos
Aprendizado Profundo , Peptídeos/uso terapêutico , Aprendizado de Máquina , Algoritmos , Redes Neurais de ComputaçãoRESUMO
In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge.
Assuntos
Elétrons , Redes Neurais de Computação , Sequência de Aminoácidos , Evolução Biológica , Humanos , Proteínas/químicaRESUMO
BACKGROUND AND AIM: The Rome IV criteria, the standard for diagnosing functional constipation (FC), deem the Bristol Stool Scale (BSS) unsuitable for assessing stool consistency in young children. Hence, the Brussels Infant and Toddler Stool Scale (BITSS) was developed. We aimed to validate and test the reliability of BITSS for hard stools and FC among infants and toddlers, where there is limited evidence in Asian populations. METHODS: The research evaluated FC in children aged 0-48 months who came for medical examination using Rome IV criteria. Stool properties provided by caregivers were assessed sequentially through three methods: the BSS, the BITSS, and caregiver reports. RESULTS: A total of 370 responses were received, with an average age of 26.2 months. Substantial agreement was observed between the BITSS and caregiver reports for hard stools (concordance rate: 91.9%, κ = 0.75), while near-perfect agreement was found between BITSS and BSS (concordance rate: 93.5%, κ = 0.81). The BITSS exhibited higher sensitivity than the BSS in assessing hard stools (95.3% vs 87.5%, P < 0.001). And the BITSS (23.5%) identified the highest prevalence of FC than the BSS (20.5%) and caregiver report (18.7%), with near-perfect agreement. Moderate agreement was reported when evaluating the test-retest reliability between BITSS and caregiver reports (concordance rate: 86.2%, κ = 0.44). CONCLUSIONS: The BITSS, more sensitive than the BSS in identifying abnormal, especially hard stools, aids in early FC detection in young children. These findings support using BITSS over BSS for evaluating hard stools in infants and toddlers, both in Vietnam and globally.
RESUMO
With the rising demand for in vitro fertilization (IVF) cycles, there is a growing need for innovative techniques to optimize procedure outcomes. One such technique is time-lapse system (TLS) for embryo incubation, which minimizes environmental changes in the embryo culture process. TLS also significantly advances predicting embryo quality, a crucial determinant of IVF cycle success. However, the current subjective nature of embryo assessments is due to inter- and intra-observer subjectivity, resulting in highly variable results. To address this challenge, reproductive medicine has gradually turned to artificial intelligence (AI) to establish a standardized and objective approach, aiming to achieve higher success rates. Extensive research is underway investigating the utilization of AI in TLS to predict multiple outcomes. These studies explore the application of popular AI algorithms, their specific implementations, and the achieved advancements in TLS. This review aims to provide an overview of the advances in AI algorithms and their particular applications within the context of TLS and the potential challenges and opportunities for further advancements in reproductive medicine.
Assuntos
Inteligência Artificial , Medicina Reprodutiva , Humanos , Imagem com Lapso de Tempo/métodos , Fertilização in vitro/métodos , AlgoritmosRESUMO
PURPOSE: To determine if an explainable artificial intelligence (XAI) model enhances the accuracy and transparency of predicting embryo ploidy status based on embryonic characteristics and clinical data. METHODS: This retrospective study utilized a dataset of 1908 blastocyst embryos. The dataset includes ploidy status, morphokinetic features, morphology grades, and 11 clinical variables. Six machine learning (ML) models including Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Support Vector Machine (SVM), AdaBoost (ADA), and Light Gradient-Boosting Machine (LGBM) were trained to predict ploidy status probabilities across three distinct datasets: high-grade embryos (HGE, n = 1107), low-grade embryos (LGE, n = 364), and all-grade embryos (AGE, n = 1471). The model's performance was interpreted using XAI, including SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) techniques. RESULTS: The mean maternal age was 38.5 ± 3.85 years. The Random Forest (RF) model exhibited superior performance compared to the other five ML models, achieving an accuracy of 0.749 and an AUC of 0.808 for AGE. In the external test set, the RF model achieved an accuracy of 0.714 and an AUC of 0.750 (95% CI, 0.702-0.796). SHAP's feature impact analysis highlighted that maternal age, paternal age, time to blastocyst (tB), and day 5 morphology grade significantly impacted the predictive model. In addition, LIME offered specific case-ploidy prediction probabilities, revealing the model's assigned values for each variable within a finite range. CONCLUSION: The model highlights the potential of using XAI algorithms to enhance ploidy prediction, optimize embryo selection as patient-centric consultation, and provides reliability and transparent insights into the decision-making process.
Assuntos
Inteligência Artificial , Ploidias , Humanos , Feminino , Adulto , Gravidez , Blastocisto/citologia , Estudos Retrospectivos , Transferência Embrionária/métodos , Diagnóstico Pré-Implantação/métodos , Aprendizado de Máquina , Fertilização in vitro/métodos , Encaminhamento e Consulta , Idade Materna , Máquina de Vetores de SuporteRESUMO
The role of the IFI6 gene has been described in several cancers, but its involvement in esophageal cancer (ESCA) remains unclear. This study aimed to identify novel prognostic indicators for ESCA-targeted therapy by investigating IFI6's expression, epigenetic mechanisms, and signaling activities. We utilized public data from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) to analyze IFI6's expression, clinical characteristics, gene function, pathways, and correlation with different immune cells in ESCA. The TIMER2.0 database was employed to assess the pan-cancer expression of IFI6, while UALCAN was used to examine its expression across tumor stages and histology subtypes. Additionally, the KEGG database helped identify related pathways. Our findings revealed 95 genes positively correlated and 15 genes negatively correlated with IFI6 in ESCA. IFI6 was over-expressed in ESCA and other cancers, impacting patient survival and showing higher expression in tumor tissues than normal tissues. IFI6 was also correlated with CD4+ T cells and B cell receptors (BCRs), both essential in immune response. GO Biological Process (GO BP) enrichment analysis indicated that IFI6 was primarily associated with the Type I interferon signaling pathway and the defense response to viruses. Intriguingly, KEGG pathway analysis demonstrated that IFI6 and its positively correlated genes in ESCA were mostly linked to the Cytosolic DNA-sensing pathway, which plays a crucial role in innate immunity and viral defense, and the RIG-I-like receptor (RLR) signaling pathway, which detects viral infections and activates immune responses. Pathways related to various viral infections were also identified. It is important to note that our study relied on online databases. Given that ESCA consists of two distinct subgroups (ESCC and EAC), most databases combine them into a single category. Future research should focus on evaluating IFI6 expression and its impact on each subgroup to gain more specific insights. In conclusion, inhibiting IFI6 using targeted therapy could be an effective strategy for treating ESCA considering its potential as a biomarker and correlation with immune cell factors.
Assuntos
Neoplasias Esofágicas , Viroses , Humanos , Prognóstico , Multiômica , Linfócitos T CD4-Positivos , Proteínas MitocondriaisRESUMO
In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.
Assuntos
Biologia Computacional , Proteoma , Mineração de Dados , Aprendizado de Máquina , Processamento de Linguagem NaturalRESUMO
Non-small cell lung cancer (NSCLC) is the most prevalent histological type of lung cancer and the leading cause of death globally. Patients with NSCLC have a poor prognosis for various factors, and a late diagnosis is one of them. The DNA methylation of CpG island sequences found in the promoter regions of tumor suppressor genes has recently received attention as a potential biomarker of human cancer. In this study, we report DNA methylation changes of the adenosine triphosphate (ATP)-binding cassette transporter G1 (ABCG1), which belongs to the ATP cassette transporter family in NSCLC patients. Our results demonstrate that ABCG1 is hyper-methylation in NSCLC samples, and these changes are negatively correlated to gene and protein expression. Furthermore, the expression of the ABCG1 gene is significantly associated with the survival time of lung adenocarcinoma (LUAD) patients; however, it did not show a correlation to overall survival (OS) of lung squamous cell carcinoma (LUSC) patients. Notably, we found ABCG1 methylation status at locus cg20214535 is strongly associated with the survival time and consistently observed hyper-methylation in LUAD samples. This novel finding suggests ABCG1 is a potential candidate for targeted therapy in lung cancer via this specific probe. In addition, we illustrate the protein-protein interaction (PPI) of ABCG1 with other proteins and the strong communication of ABCG1 with immune cells.
Assuntos
Adenocarcinoma de Pulmão , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Neoplasias Pulmonares/patologia , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/patologia , Metilação de DNA , Epigênese Genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Membro 1 da Subfamília G de Transportadores de Cassetes de Ligação de ATP/genética , Membro 1 da Subfamília G de Transportadores de Cassetes de Ligação de ATP/metabolismoRESUMO
Protein S-sulfenylation is one kind of crucial post-translational modifications (PTMs) in which the hydroxyl group covalently binds to the thiol of cysteine. Some recent studies have shown that this modification plays an important role in signaling transduction, transcriptional regulation and apoptosis. To date, the dynamic of sulfenic acids in proteins remains unclear because of its fleeting nature. Identifying S-sulfenylation sites, therefore, could be the key to decipher its mysterious structures and functions, which are important in cell biology and diseases. However, due to the lack of effective methods, scientists in this field tend to be limited in merely a handful of some wet lab techniques that are time-consuming and not cost-effective. Thus, this motivated us to develop an in silico model for detecting S-sulfenylation sites only from protein sequence information. In this study, protein sequences served as natural language sentences comprising biological subwords. The deep neural network was consequentially employed to perform classification. The performance statistics within the independent dataset including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under the curve rates achieved 85.71%, 69.47%, 77.09%, 0.5554 and 0.833, respectively. Our results suggested that the proposed method (fastSulf-DNN) achieved excellent performance in predicting S-sulfenylation sites compared to other well-known tools on a benchmark dataset.
Assuntos
Bases de Dados de Proteínas , Redes Neurais de Computação , Processamento de Proteína Pós-Traducional , Análise de Sequência de Proteína , Ácidos Sulfênicos , Ácidos Sulfênicos/química , Ácidos Sulfênicos/metabolismoRESUMO
Recently, language representation models have drawn a lot of attention in the natural language processing field due to their remarkable results. Among them, bidirectional encoder representations from transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared. In this study, we present a novel technique by incorporating BERT-based multilingual model in bioinformatics to represent the information of DNA sequences. We treated DNA sequences as natural sentences and then used BERT models to transform them into fixed-length numerical matrices. As a case study, we applied our method to DNA enhancer prediction, which is a well-known and challenging problem in this field. We then observed that our BERT-based features improved more than 5-10% in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient compared to the current state-of-the-art features in bioinformatics. Moreover, advanced experiments show that deep learning (as represented by 2D convolutional neural networks; CNN) holds potential in learning BERT features better than other traditional machine learning techniques. In conclusion, we suggest that BERT and 2D CNNs could open a new avenue in biological modeling using sequence information.
Assuntos
Biologia Computacional/métodos , DNA/genética , Aprendizado Profundo , Elementos Facilitadores Genéticos , Modelos Biológicos , Processamento de Linguagem Natural , Simulação por Computador , Confiabilidade dos Dados , Humanos , Multilinguismo , Semântica , Sensibilidade e Especificidade , Transcrição GênicaRESUMO
BACKGROUND: Timely diagnosis of meniscus injuries is key for preventing knee joint dysfunction and improving patient outcomes because it decreases morbidity and facilitates treatment planning. PURPOSE: To train and evaluate a deep learning model for automated detection of meniscus tears on knee magnetic resonance imaging (MRI). STUDY TYPE: Bicentric retrospective study. SUBJECTS: In total, 584 knee MRI studies, divided among training (n = 234), testing (n = 200), and external validation (n = 150) data sets, were used in this study. The public data set MRNet was used as a second external validation data set to evaluate the performance of the model. SEQUENCE: A 3 T, coronal, and sagittal images from T1-weighted proton density (PD) fast spin-echo (FSE) with fat saturation and T2-weighted FSE with fat saturation sequences. ASSESSMENT: The detection system for meniscus tear was based on the improved YOLOv4 model with Darknet-53 as the backbone. The performance of the model was also compared with that of three radiologists of varying levels of experience. The determination of the presence of a meniscus tear from surgery reports was used as the ground truth for the images. STATISTICAL TESTS: Sensitivity, specificity, prevalence, positive predictive value, negative predictive value, accuracy, and receiver operating characteristic curve were used to evaluate the performance of the detection model. Two-way analysis of variance, Wilcoxon signed-rank test, and Tukey's multiple tests were used to evaluate differences in performance between the model and radiologists. RESULTS: The overall accuracies for detecting meniscus tears using our model on the internal testing, internal validation, and external validation data sets were 95.4%, 95.8%, and 78.8%, respectively. One radiologist had significantly lower performance than our model in detecting meniscal tears (accuracy: 0.9025 ± 0.093 vs. 0.9580 ± 0.025). DATA CONCLUSION: The proposed model had high sensitivity, specificity, and accuracy for detecting meniscus tears on knee MRIs. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.
Assuntos
Menisco , Lesões do Menisco Tibial , Humanos , Estudos Retrospectivos , Meniscos Tibiais , Lesões do Menisco Tibial/diagnóstico por imagem , Lesões do Menisco Tibial/patologia , Artroscopia , Articulação do Joelho/patologia , Imageamento por Ressonância Magnética/métodos , Sensibilidade e Especificidade , Redes Neurais de ComputaçãoRESUMO
As one of the most common post-transcriptional epigenetic modifications, N6-methyladenine (6 mA), plays an essential role in various cellular processes and disease pathogenesis. Therefore, accurately identifying 6 mA modifications is necessary for a deep understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models were developed with small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we present a novel model based on transformer architecture and deep learning to identify DNA 6 mA sites from the cross-species genome. The model is constructed on a benchmark dataset and explored a feature derived from pre-trained transformer word embedding approaches. Subsequently, a convolutional neural network was employed to learn the generated features and generate the prediction outcomes. As a result, our predictor achieved excellent performance during independent test with the accuracy and Matthews correlation coefficient (MCC) of 79.3% and 0.58, respectively. Overall, its performance achieved better accuracy than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, our model is expected to assist biologists in accurately identifying 6mAs and formulate the novel testable biological hypothesis. We also release source codes and datasets freely at https://github.com/khanhlee/bert-dna for front-end users.
Assuntos
Genoma , Redes Neurais de Computação , DNA/genética , Epigênese Genética , SoftwareRESUMO
Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Aprendizado de Máquina , Algoritmos , Sequência de Aminoácidos , ProteínasRESUMO
Possible drug-food constituent interactions (DFIs) could change the intended efficiency of particular therapeutics in medical practice. The increasing number of multiple-drug prescriptions leads to the rise of drug-drug interactions (DDIs) and DFIs. These adverse interactions lead to other implications, e.g., the decline in medicament's effect, the withdrawals of various medications, and harmful impacts on the patients' health. However, the importance of DFIs remains underestimated, as the number of studies on these topics is constrained. Recently, scientists have applied artificial intelligence-based models to study DFIs. However, there were still some limitations in data mining, input, and detailed annotations. This study proposed a novel prediction model to address the limitations of previous studies. In detail, we extracted 70,477 food compounds from the FooDB database and 13,580 drugs from the DrugBank database. We extracted 3780 features from each drug-food compound pair. The optimal model was eXtreme Gradient Boosting (XGBoost). We also validated the performance of our model on one external test set from a previous study which contained 1922 DFIs. Finally, we applied our model to recommend whether a drug should or should not be taken with some food compounds based on their interactions. The model can provide highly accurate and clinically relevant recommendations, especially for DFIs that may cause severe adverse events and even death. Our proposed model can contribute to developing more robust predictive models to help patients, under the supervision and consultants of physicians, avoid DFI adverse effects in combining drugs and foods for therapy.
Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Interações Alimento-Droga , Humanos , Inteligência Artificial , Aprendizado de MáquinaRESUMO
The malignant tumors in nature share some common morphological characteristics. Radiomics is not only images but also data; we think that a probability exists in a set of radiomics signatures extracted from CT scan images of one cancer tumor in one specific organ also be utilized for overall survival prediction in different types of cancers in different organs. The retrospective study enrolled four data sets of cancer patients in three different organs (420, 157, 137, and 191 patients for lung 1 training, lung 2 testing, and two external validation set: kidney and head and neck, respectively). In the training set, radiomics features were obtained from CT scan images, and essential features were chosen by LASSO algorithm. Univariable and multivariable analyses were then conducted to find a radiomics signature via Cox proportional hazard regression. The Kaplan-Meier curve was performed based on the risk score. The integrated time-dependent area under the ROC curve (iAUC) was calculated for each predictive model. In the training set, Kaplan-Meier curve classified patients as high or low-risk groups (p-value < 0.001; log-rank test). The risk score of radiomics signature was locked and independently evaluated in the testing set, and two external validation sets showed significant differences (p-value < 0.05; log-rank test). A combined model (radiomics + clinical) showed improved iAUC in lung 1, lung 2, head and neck, and kidney data set are 0.621 (95% CI 0.588, 0.654), 0.736 (95% CI 0.654, 0.819), 0.732 (95% CI 0.655, 0.809), and 0.834 (95% CI 0.722, 0.946), respectively. We believe that CT-based radiomics signatures for predicting overall survival in various cancer sites may exist.
Assuntos
Neoplasias , Humanos , Estudos Retrospectivos , Neoplasias/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Pescoço , RimRESUMO
Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.
Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Histonas/metabolismo , Humanos , Lisina/metabolismo , Masculino , Redes Neurais de ComputaçãoRESUMO
As lung cancer remains the leading cause of cancer deaths globally, characterizing the tumor molecular profiles is crucial to tailoring treatments for individuals at advanced stages. Cancer cells exhibit strong dependence on iron for their proliferation, and several iron-regulatory proteins have been proposed as either oncogenes or tumor suppressive genes. This study aims to evaluate the prospective therapeutic and prognostic values of the sideroflexin (SFXN) gene family, whose functions involve mitochondrial iron metabolism, in lung adenocarcinoma (LUAD). Differential expression analysis using TIMER and UALCAN tools was first employed to compare SFXNs expression levels between normal and LUAD tissues. Next, SFXNs' prognostic values, biological significance, and potential as immunotherapy candidates were examined from GEPIA, cBioPortal, MetaCore, Cytoscape, and TIMER databases. It was found that all members of SFXN family, except SFXN3, were differentially expressed in LUAD compared to normal samples and within different stages of LUAD. Survival analysis then revealed SFXN1 to be related to worse overall survival outcome in patients with LUAD. Furthermore, several correlations between expression of SFXN1 and immune infiltration cells were discovered. To conclude, our study provides evidence of SFXN family gene's relevance to the prognosis and immunotherapeutic targets of LUAD.
Assuntos
Adenocarcinoma de Pulmão , Neoplasias Pulmonares , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/metabolismo , Adenocarcinoma de Pulmão/patologia , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Biologia Computacional , Regulação Neoplásica da Expressão Gênica , Humanos , Imunoterapia , Ferro/metabolismo , Proteínas Reguladoras de Ferro/genética , Proteínas Reguladoras de Ferro/metabolismo , Neoplasias Pulmonares/patologiaRESUMO
In 2016, the World Health Organization (WHO) updated the glioma classification by incorporating molecular biology parameters, including low-grade glioma (LGG). In the new scheme, LGGs have three molecular subtypes: isocitrate dehydrogenase (IDH)-mutated 1p/19q-codeleted, IDH-mutated 1p/19q-noncodeleted, and IDH-wild type 1p/19q-noncodeleted entities. This work proposes a model prediction of LGG molecular subtypes using magnetic resonance imaging (MRI). MR images were segmented and converted into radiomics features, thereby providing predictive information about the brain tumor classification. With 726 raw features obtained from the feature extraction procedure, we developed a hybrid machine learning-based radiomics by incorporating a genetic algorithm and eXtreme Gradient Boosting (XGBoost) classifier, to ascertain 12 optimal features for tumor classification. To resolve imbalanced data, the synthetic minority oversampling technique (SMOTE) was applied in our study. The XGBoost algorithm outperformed the other algorithms on the training dataset by an accuracy value of 0.885. We continued evaluating the XGBoost model, then achieved an overall accuracy of 0.6905 for the three-subtype classification of LGGs on an external validation dataset. Our model is among just a few to have resolved the three-subtype LGG classification challenge with high accuracy compared with previous studies performing similar work.
Assuntos
Neoplasias Encefálicas , Glioma , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Glioma/patologia , Humanos , Isocitrato Desidrogenase/genética , Aprendizado de Máquina , Imageamento por Ressonância Magnética/métodos , Mutação/genética , Estudos RetrospectivosRESUMO
Background: SNARE proteins play a vital role in membrane fusion and cellular physiology and pathological processes. Many potential therapeutics for mental diseases or even cancer based on SNAREs are also developed. Therefore, there is a dire need to predict the SNAREs for further manipulation of these essential proteins, which demands new and efficient approaches. Methods: Some computational frameworks were proposed to tackle the hurdles of biological methods, which take plenty of time and budget to conduct the identification of SNAREs. However, the performances of existing frameworks were insufficiently satisfied, as they failed to retain the SNARE sequence order and capture the mass hidden features from SNAREs. This paper proposed a novel model constructed on the multiscan convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to address these limitations. We employed and trained our model on the benchmark dataset with fivefold cross-validation and two different independent datasets. Results: Overall, the multiscan CNN was cross-validated on the training set and excelled in the SNARE classification reaching 0.963 in AUC and 0.955 in AUPRC. On top of that, with the sensitivity, specificity, accuracy, and MCC of 0.842, 0.968, 0.955, and 0.767, respectively, our proposed framework outperformed previous models in the SNARE recognition task. Conclusions: It is truly believed that our model can contribute to the discrimination of SNARE proteins and general proteins.