Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 111
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37649385

RESUMEN

Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (${\chi }^{2}$) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.


Asunto(s)
Algoritmos , Aprendizaje Automático , Cristalización , Secuencia de Aminoácidos , Biología Computacional
2.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36642410

RESUMEN

Anticancer peptides (ACPs) are the types of peptides that have been demonstrated to have anticancer activities. Using ACPs to prevent cancer could be a viable alternative to conventional cancer treatments because they are safer and display higher selectivity. Due to ACP identification being highly lab-limited, expensive and lengthy, a computational method is proposed to predict ACPs from sequence information in this study. The process includes the input of the peptide sequences, feature extraction in terms of ordinal encoding with positional information and handcrafted features, and finally feature selection. The whole model comprises of two modules, including deep learning and machine learning algorithms. The deep learning module contained two channels: bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN). Light Gradient Boosting Machine (LightGBM) was used in the machine learning module. Finally, this study voted the three models' classification results for the three paths resulting in the model ensemble layer. This study provides insights into ACP prediction utilizing a novel method and presented a promising performance. It used a benchmark dataset for further exploration and improvement compared with previous studies. Our final model has an accuracy of 0.7895, sensitivity of 0.8153 and specificity of 0.7676, and it was increased by at least 2% compared with the state-of-the-art studies in all metrics. Hence, this paper presents a novel method that can potentially predict ACPs more effectively and efficiently. The work and source codes are made available to the community of researchers and developers at https://github.com/khanhlee/acp-ope/.


Asunto(s)
Aprendizaje Profundo , Péptidos/uso terapéutico , Aprendizaje Automático , Algoritmos , Redes Neurales de la Computación
3.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34472594

RESUMEN

In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge.


Asunto(s)
Electrones , Redes Neurales de la Computación , Secuencia de Aminoácidos , Evolución Biológica , Humanos , Proteínas/química
4.
J Assist Reprod Genet ; 41(2): 239-252, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37880512

RESUMEN

With the rising demand for in vitro fertilization (IVF) cycles, there is a growing need for innovative techniques to optimize procedure outcomes. One such technique is time-lapse system (TLS) for embryo incubation, which minimizes environmental changes in the embryo culture process. TLS also significantly advances predicting embryo quality, a crucial determinant of IVF cycle success. However, the current subjective nature of embryo assessments is due to inter- and intra-observer subjectivity, resulting in highly variable results. To address this challenge, reproductive medicine has gradually turned to artificial intelligence (AI) to establish a standardized and objective approach, aiming to achieve higher success rates. Extensive research is underway investigating the utilization of AI in TLS to predict multiple outcomes. These studies explore the application of popular AI algorithms, their specific implementations, and the achieved advancements in TLS. This review aims to provide an overview of the advances in AI algorithms and their particular applications within the context of TLS and the potential challenges and opportunities for further advancements in reproductive medicine.


Asunto(s)
Inteligencia Artificial , Medicina Reproductiva , Humanos , Imagen de Lapso de Tiempo/métodos , Fertilización In Vitro/métodos , Algoritmos
5.
J Assist Reprod Genet ; 41(9): 2349-2358, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38963605

RESUMEN

PURPOSE: To determine if an explainable artificial intelligence (XAI) model enhances the accuracy and transparency of predicting embryo ploidy status based on embryonic characteristics and clinical data. METHODS: This retrospective study utilized a dataset of 1908 blastocyst embryos. The dataset includes ploidy status, morphokinetic features, morphology grades, and 11 clinical variables. Six machine learning (ML) models including Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Support Vector Machine (SVM), AdaBoost (ADA), and Light Gradient-Boosting Machine (LGBM) were trained to predict ploidy status probabilities across three distinct datasets: high-grade embryos (HGE, n = 1107), low-grade embryos (LGE, n = 364), and all-grade embryos (AGE, n = 1471). The model's performance was interpreted using XAI, including SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) techniques. RESULTS: The mean maternal age was 38.5 ± 3.85 years. The Random Forest (RF) model exhibited superior performance compared to the other five ML models, achieving an accuracy of 0.749 and an AUC of 0.808 for AGE. In the external test set, the RF model achieved an accuracy of 0.714 and an AUC of 0.750 (95% CI, 0.702-0.796). SHAP's feature impact analysis highlighted that maternal age, paternal age, time to blastocyst (tB), and day 5 morphology grade significantly impacted the predictive model. In addition, LIME offered specific case-ploidy prediction probabilities, revealing the model's assigned values for each variable within a finite range. CONCLUSION: The model highlights the potential of using XAI algorithms to enhance ploidy prediction, optimize embryo selection as patient-centric consultation, and provides reliability and transparent insights into the decision-making process.


Asunto(s)
Inteligencia Artificial , Ploidias , Humanos , Femenino , Adulto , Embarazo , Blastocisto/citología , Estudios Retrospectivos , Transferencia de Embrión/métodos , Diagnóstico Preimplantación/métodos , Aprendizaje Automático , Fertilización In Vitro/métodos , Derivación y Consulta , Edad Materna , Máquina de Vectores de Soporte
6.
Int J Mol Sci ; 25(5)2024 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-38473938

RESUMEN

The role of the IFI6 gene has been described in several cancers, but its involvement in esophageal cancer (ESCA) remains unclear. This study aimed to identify novel prognostic indicators for ESCA-targeted therapy by investigating IFI6's expression, epigenetic mechanisms, and signaling activities. We utilized public data from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) to analyze IFI6's expression, clinical characteristics, gene function, pathways, and correlation with different immune cells in ESCA. The TIMER2.0 database was employed to assess the pan-cancer expression of IFI6, while UALCAN was used to examine its expression across tumor stages and histology subtypes. Additionally, the KEGG database helped identify related pathways. Our findings revealed 95 genes positively correlated and 15 genes negatively correlated with IFI6 in ESCA. IFI6 was over-expressed in ESCA and other cancers, impacting patient survival and showing higher expression in tumor tissues than normal tissues. IFI6 was also correlated with CD4+ T cells and B cell receptors (BCRs), both essential in immune response. GO Biological Process (GO BP) enrichment analysis indicated that IFI6 was primarily associated with the Type I interferon signaling pathway and the defense response to viruses. Intriguingly, KEGG pathway analysis demonstrated that IFI6 and its positively correlated genes in ESCA were mostly linked to the Cytosolic DNA-sensing pathway, which plays a crucial role in innate immunity and viral defense, and the RIG-I-like receptor (RLR) signaling pathway, which detects viral infections and activates immune responses. Pathways related to various viral infections were also identified. It is important to note that our study relied on online databases. Given that ESCA consists of two distinct subgroups (ESCC and EAC), most databases combine them into a single category. Future research should focus on evaluating IFI6 expression and its impact on each subgroup to gain more specific insights. In conclusion, inhibiting IFI6 using targeted therapy could be an effective strategy for treating ESCA considering its potential as a biomarker and correlation with immune cell factors.


Asunto(s)
Neoplasias Esofágicas , Virosis , Humanos , Pronóstico , Multiómica , Linfocitos T CD4-Positivos , Proteínas Mitocondriales
7.
Proteomics ; 23(23-24): e2300011, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37381841

RESUMEN

In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.


Asunto(s)
Biología Computacional , Proteoma , Minería de Datos , Aprendizaje Automático , Procesamiento de Lenguaje Natural
8.
Funct Integr Genomics ; 23(3): 256, 2023 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-37523012

RESUMEN

Non-small cell lung cancer (NSCLC) is the most prevalent histological type of lung cancer and the leading cause of death globally. Patients with NSCLC have a poor prognosis for various factors, and a late diagnosis is one of them. The DNA methylation of CpG island sequences found in the promoter regions of tumor suppressor genes has recently received attention as a potential biomarker of human cancer. In this study, we report DNA methylation changes of the adenosine triphosphate (ATP)-binding cassette transporter G1 (ABCG1), which belongs to the ATP cassette transporter family in NSCLC patients. Our results demonstrate that ABCG1 is hyper-methylation in NSCLC samples, and these changes are negatively correlated to gene and protein expression. Furthermore, the expression of the ABCG1 gene is significantly associated with the survival time of lung adenocarcinoma (LUAD) patients; however, it did not show a correlation to overall survival (OS) of lung squamous cell carcinoma (LUSC) patients. Notably, we found ABCG1 methylation status at locus cg20214535 is strongly associated with the survival time and consistently observed hyper-methylation in LUAD samples. This novel finding suggests ABCG1 is a potential candidate for targeted therapy in lung cancer via this specific probe. In addition, we illustrate the protein-protein interaction (PPI) of ABCG1 with other proteins and the strong communication of ABCG1 with immune cells.


Asunto(s)
Adenocarcinoma del Pulmón , Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/patología , Neoplasias Pulmonares/patología , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/patología , Metilación de ADN , Epigénesis Genética , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Transportador de Casetes de Unión a ATP, Subfamilia G, Miembro 1/genética , Transportador de Casetes de Unión a ATP, Subfamilia G, Miembro 1/metabolismo
9.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32613242

RESUMEN

Protein S-sulfenylation is one kind of crucial post-translational modifications (PTMs) in which the hydroxyl group covalently binds to the thiol of cysteine. Some recent studies have shown that this modification plays an important role in signaling transduction, transcriptional regulation and apoptosis. To date, the dynamic of sulfenic acids in proteins remains unclear because of its fleeting nature. Identifying S-sulfenylation sites, therefore, could be the key to decipher its mysterious structures and functions, which are important in cell biology and diseases. However, due to the lack of effective methods, scientists in this field tend to be limited in merely a handful of some wet lab techniques that are time-consuming and not cost-effective. Thus, this motivated us to develop an in silico model for detecting S-sulfenylation sites only from protein sequence information. In this study, protein sequences served as natural language sentences comprising biological subwords. The deep neural network was consequentially employed to perform classification. The performance statistics within the independent dataset including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under the curve rates achieved 85.71%, 69.47%, 77.09%, 0.5554 and 0.833, respectively. Our results suggested that the proposed method (fastSulf-DNN) achieved excellent performance in predicting S-sulfenylation sites compared to other well-known tools on a benchmark dataset.


Asunto(s)
Bases de Datos de Proteínas , Redes Neurales de la Computación , Procesamiento Proteico-Postraduccional , Análisis de Secuencia de Proteína , Ácidos Sulfénicos , Ácidos Sulfénicos/química , Ácidos Sulfénicos/metabolismo
10.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33539511

RESUMEN

Recently, language representation models have drawn a lot of attention in the natural language processing field due to their remarkable results. Among them, bidirectional encoder representations from transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared. In this study, we present a novel technique by incorporating BERT-based multilingual model in bioinformatics to represent the information of DNA sequences. We treated DNA sequences as natural sentences and then used BERT models to transform them into fixed-length numerical matrices. As a case study, we applied our method to DNA enhancer prediction, which is a well-known and challenging problem in this field. We then observed that our BERT-based features improved more than 5-10% in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient compared to the current state-of-the-art features in bioinformatics. Moreover, advanced experiments show that deep learning (as represented by 2D convolutional neural networks; CNN) holds potential in learning BERT features better than other traditional machine learning techniques. In conclusion, we suggest that BERT and 2D CNNs could open a new avenue in biological modeling using sequence information.


Asunto(s)
Biología Computacional/métodos , ADN/genética , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Modelos Biológicos , Procesamiento de Lenguaje Natural , Simulación por Computador , Exactitud de los Datos , Humanos , Multilingüismo , Semántica , Sensibilidad y Especificidad , Transcripción Genética
11.
J Magn Reson Imaging ; 57(3): 740-749, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-35648374

RESUMEN

BACKGROUND: Timely diagnosis of meniscus injuries is key for preventing knee joint dysfunction and improving patient outcomes because it decreases morbidity and facilitates treatment planning. PURPOSE: To train and evaluate a deep learning model for automated detection of meniscus tears on knee magnetic resonance imaging (MRI). STUDY TYPE: Bicentric retrospective study. SUBJECTS: In total, 584 knee MRI studies, divided among training (n = 234), testing (n = 200), and external validation (n = 150) data sets, were used in this study. The public data set MRNet was used as a second external validation data set to evaluate the performance of the model. SEQUENCE: A 3 T, coronal, and sagittal images from T1-weighted proton density (PD) fast spin-echo (FSE) with fat saturation and T2-weighted FSE with fat saturation sequences. ASSESSMENT: The detection system for meniscus tear was based on the improved YOLOv4 model with Darknet-53 as the backbone. The performance of the model was also compared with that of three radiologists of varying levels of experience. The determination of the presence of a meniscus tear from surgery reports was used as the ground truth for the images. STATISTICAL TESTS: Sensitivity, specificity, prevalence, positive predictive value, negative predictive value, accuracy, and receiver operating characteristic curve were used to evaluate the performance of the detection model. Two-way analysis of variance, Wilcoxon signed-rank test, and Tukey's multiple tests were used to evaluate differences in performance between the model and radiologists. RESULTS: The overall accuracies for detecting meniscus tears using our model on the internal testing, internal validation, and external validation data sets were 95.4%, 95.8%, and 78.8%, respectively. One radiologist had significantly lower performance than our model in detecting meniscal tears (accuracy: 0.9025 ± 0.093 vs. 0.9580 ± 0.025). DATA CONCLUSION: The proposed model had high sensitivity, specificity, and accuracy for detecting meniscus tears on knee MRIs. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Asunto(s)
Menisco , Lesiones de Menisco Tibial , Humanos , Estudios Retrospectivos , Meniscos Tibiales , Lesiones de Menisco Tibial/diagnóstico por imagen , Lesiones de Menisco Tibial/patología , Artroscopía , Articulación de la Rodilla/patología , Imagen por Resonancia Magnética/métodos , Sensibilidad y Especificidad , Redes Neurales de la Computación
12.
Methods ; 204: 199-206, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-34915158

RESUMEN

As one of the most common post-transcriptional epigenetic modifications, N6-methyladenine (6 mA), plays an essential role in various cellular processes and disease pathogenesis. Therefore, accurately identifying 6 mA modifications is necessary for a deep understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models were developed with small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we present a novel model based on transformer architecture and deep learning to identify DNA 6 mA sites from the cross-species genome. The model is constructed on a benchmark dataset and explored a feature derived from pre-trained transformer word embedding approaches. Subsequently, a convolutional neural network was employed to learn the generated features and generate the prediction outcomes. As a result, our predictor achieved excellent performance during independent test with the accuracy and Matthews correlation coefficient (MCC) of 79.3% and 0.58, respectively. Overall, its performance achieved better accuracy than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, our model is expected to assist biologists in accurately identifying 6mAs and formulate the novel testable biological hypothesis. We also release source codes and datasets freely at https://github.com/khanhlee/bert-dna for front-end users.


Asunto(s)
Genoma , Redes Neurales de la Computación , ADN/genética , Epigénesis Genética , Programas Informáticos
13.
Methods ; 207: 90-96, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36174933

RESUMEN

Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Aprendizaje Automático , Algoritmos , Secuencia de Aminoácidos , Proteínas
14.
Sensors (Basel) ; 23(8)2023 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-37112302

RESUMEN

Possible drug-food constituent interactions (DFIs) could change the intended efficiency of particular therapeutics in medical practice. The increasing number of multiple-drug prescriptions leads to the rise of drug-drug interactions (DDIs) and DFIs. These adverse interactions lead to other implications, e.g., the decline in medicament's effect, the withdrawals of various medications, and harmful impacts on the patients' health. However, the importance of DFIs remains underestimated, as the number of studies on these topics is constrained. Recently, scientists have applied artificial intelligence-based models to study DFIs. However, there were still some limitations in data mining, input, and detailed annotations. This study proposed a novel prediction model to address the limitations of previous studies. In detail, we extracted 70,477 food compounds from the FooDB database and 13,580 drugs from the DrugBank database. We extracted 3780 features from each drug-food compound pair. The optimal model was eXtreme Gradient Boosting (XGBoost). We also validated the performance of our model on one external test set from a previous study which contained 1922 DFIs. Finally, we applied our model to recommend whether a drug should or should not be taken with some food compounds based on their interactions. The model can provide highly accurate and clinically relevant recommendations, especially for DFIs that may cause severe adverse events and even death. Our proposed model can contribute to developing more robust predictive models to help patients, under the supervision and consultants of physicians, avoid DFI adverse effects in combining drugs and foods for therapy.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Interacciones Alimento-Droga , Humanos , Inteligencia Artificial , Aprendizaje Automático
15.
J Digit Imaging ; 36(3): 911-922, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36717518

RESUMEN

The malignant tumors in nature share some common morphological characteristics. Radiomics is not only images but also data; we think that a probability exists in a set of radiomics signatures extracted from CT scan images of one cancer tumor in one specific organ also be utilized for overall survival prediction in different types of cancers in different organs. The retrospective study enrolled four data sets of cancer patients in three different organs (420, 157, 137, and 191 patients for lung 1 training, lung 2 testing, and two external validation set: kidney and head and neck, respectively). In the training set, radiomics features were obtained from CT scan images, and essential features were chosen by LASSO algorithm. Univariable and multivariable analyses were then conducted to find a radiomics signature via Cox proportional hazard regression. The Kaplan-Meier curve was performed based on the risk score. The integrated time-dependent area under the ROC curve (iAUC) was calculated for each predictive model. In the training set, Kaplan-Meier curve classified patients as high or low-risk groups (p-value < 0.001; log-rank test). The risk score of radiomics signature was locked and independently evaluated in the testing set, and two external validation sets showed significant differences (p-value < 0.05; log-rank test). A combined model (radiomics + clinical) showed improved iAUC in lung 1, lung 2, head and neck, and kidney data set are 0.621 (95% CI 0.588, 0.654), 0.736 (95% CI 0.654, 0.819), 0.732 (95% CI 0.655, 0.809), and 0.834 (95% CI 0.722, 0.946), respectively. We believe that CT-based radiomics signatures for predicting overall survival in various cancer sites may exist.


Asunto(s)
Neoplasias , Humanos , Estudios Retrospectivos , Neoplasias/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Cuello , Riñón
16.
J Proteome Res ; 21(1): 265-273, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34812044

RESUMEN

Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.


Asunto(s)
Lisina , Procesamiento Proteico-Postraduccional , Secuencia de Aminoácidos , Histonas/metabolismo , Humanos , Lisina/metabolismo , Masculino , Redes Neurales de la Computación
17.
Funct Integr Genomics ; 22(5): 1057-1072, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35851932

RESUMEN

As lung cancer remains the leading cause of cancer deaths globally, characterizing the tumor molecular profiles is crucial to tailoring treatments for individuals at advanced stages. Cancer cells exhibit strong dependence on iron for their proliferation, and several iron-regulatory proteins have been proposed as either oncogenes or tumor suppressive genes. This study aims to evaluate the prospective therapeutic and prognostic values of the sideroflexin (SFXN) gene family, whose functions involve mitochondrial iron metabolism, in lung adenocarcinoma (LUAD). Differential expression analysis using TIMER and UALCAN tools was first employed to compare SFXNs expression levels between normal and LUAD tissues. Next, SFXNs' prognostic values, biological significance, and potential as immunotherapy candidates were examined from GEPIA, cBioPortal, MetaCore, Cytoscape, and TIMER databases. It was found that all members of SFXN family, except SFXN3, were differentially expressed in LUAD compared to normal samples and within different stages of LUAD. Survival analysis then revealed SFXN1 to be related to worse overall survival outcome in patients with LUAD. Furthermore, several correlations between expression of SFXN1 and immune infiltration cells were discovered. To conclude, our study provides evidence of SFXN family gene's relevance to the prognosis and immunotherapeutic targets of LUAD.


Asunto(s)
Adenocarcinoma del Pulmón , Neoplasias Pulmonares , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/metabolismo , Adenocarcinoma del Pulmón/patología , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional , Regulación Neoplásica de la Expresión Génica , Humanos , Inmunoterapia , Hierro/metabolismo , Proteínas Reguladoras del Hierro/genética , Proteínas Reguladoras del Hierro/metabolismo , Neoplasias Pulmonares/patología
18.
NMR Biomed ; 35(11): e4792, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-35767281

RESUMEN

In 2016, the World Health Organization (WHO) updated the glioma classification by incorporating molecular biology parameters, including low-grade glioma (LGG). In the new scheme, LGGs have three molecular subtypes: isocitrate dehydrogenase (IDH)-mutated 1p/19q-codeleted, IDH-mutated 1p/19q-noncodeleted, and IDH-wild type 1p/19q-noncodeleted entities. This work proposes a model prediction of LGG molecular subtypes using magnetic resonance imaging (MRI). MR images were segmented and converted into radiomics features, thereby providing predictive information about the brain tumor classification. With 726 raw features obtained from the feature extraction procedure, we developed a hybrid machine learning-based radiomics by incorporating a genetic algorithm and eXtreme Gradient Boosting (XGBoost) classifier, to ascertain 12 optimal features for tumor classification. To resolve imbalanced data, the synthetic minority oversampling technique (SMOTE) was applied in our study. The XGBoost algorithm outperformed the other algorithms on the training dataset by an accuracy value of 0.885. We continued evaluating the XGBoost model, then achieved an overall accuracy of 0.6905 for the three-subtype classification of LGGs on an external validation dataset. Our model is among just a few to have resolved the three-subtype LGG classification challenge with high accuracy compared with previous studies performing similar work.


Asunto(s)
Neoplasias Encefálicas , Glioma , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/patología , Glioma/patología , Humanos , Isocitrato Deshidrogenasa/genética , Aprendizaje Automático , Imagen por Resonancia Magnética/métodos , Mutación/genética , Estudios Retrospectivos
19.
J Chem Inf Model ; 62(19): 4820-4826, 2022 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-36166351

RESUMEN

Background: SNARE proteins play a vital role in membrane fusion and cellular physiology and pathological processes. Many potential therapeutics for mental diseases or even cancer based on SNAREs are also developed. Therefore, there is a dire need to predict the SNAREs for further manipulation of these essential proteins, which demands new and efficient approaches. Methods: Some computational frameworks were proposed to tackle the hurdles of biological methods, which take plenty of time and budget to conduct the identification of SNAREs. However, the performances of existing frameworks were insufficiently satisfied, as they failed to retain the SNARE sequence order and capture the mass hidden features from SNAREs. This paper proposed a novel model constructed on the multiscan convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to address these limitations. We employed and trained our model on the benchmark dataset with fivefold cross-validation and two different independent datasets. Results: Overall, the multiscan CNN was cross-validated on the training set and excelled in the SNARE classification reaching 0.963 in AUC and 0.955 in AUPRC. On top of that, with the sensitivity, specificity, accuracy, and MCC of 0.842, 0.968, 0.955, and 0.767, respectively, our proposed framework outperformed previous models in the SNARE recognition task. Conclusions: It is truly believed that our model can contribute to the discrimination of SNARE proteins and general proteins.


Asunto(s)
Redes Neurales de la Computación , Proteínas SNARE , Posición Específica de Matrices de Puntuación
20.
Plant Mol Biol ; 107(6): 533-542, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34843033

RESUMEN

KEY MESSAGE: This study used k-mer embeddings as effective feature to identify DNA N6-Methyladenine sites in plant genomes and obtained improved performance without substantial effort in feature extraction, combination and selection. Identification of DNA N6-methyladenine sites has been a very active topic of computational biology due to the unavailability of suitable methods to identify them accurately, especially in plants. Substantial results were obtained with a great effort put in extracting, heuristic searching, or fusing a diverse types of features, not to mention a feature selection step. In this study, we regarded DNA sequences as textual information and employed natural language processing techniques to decipher hidden biological meanings from those sequences. In other words, we considered DNA, the human life book, as a book corpus for training DNA language models. K-mer embeddings then were generated from these language models to be used in machine learning prediction models. Skip-gram neural networks were the base of the language models and ensemble tree-based algorithms were the machine learning algorithms for prediction models. We trained the prediction model on Rosaceae genome dataset and performed a comprehensive test on 3 plant genome datasets. Our proposed method shows promising performance with AUC performance approaching an ideal value on Rosaceae dataset (0.99), a high score on Rice dataset (0.95) and improved performance on Rice dataset while enjoying an elegant, yet efficient feature extraction process.


Asunto(s)
Adenina/análogos & derivados , Algoritmos , Modelos Biológicos , Redes Neurales de la Computación , Adenina/metabolismo , Secuencia de Bases , ADN de Plantas/genética , Bases de Datos Genéticas , Nucleótidos/genética , Plantas/genética , Curva ROC , Encuestas y Cuestionarios
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA