Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 178
Filtrar
Más filtros

País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Mol Cell ; 75(5): 1058-1072.e9, 2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31375263

RESUMEN

The endoplasmic reticulum (ER) is susceptible to wear-and-tear and proteotoxic stress, necessitating its turnover. Here, we show that the N-degron pathway mediates ER-phagy. This autophagic degradation initiates when the transmembrane E3 ligase TRIM13 (also known as RFP2) is ubiquitinated via the lysine 63 (K63) linkage. K63-ubiquitinated TRIM13 recruits p62 (also known as sequestosome-1), whose complex undergoes oligomerization. The oligomerization is induced when the ZZ domain of p62 is bound by the N-terminal arginine (Nt-Arg) of arginylated substrates. Upon activation by the Nt-Arg, oligomerized TRIM13-p62 complexes are separated along with the ER compartments and targeted to autophagosomes, leading to lysosomal degradation. When protein aggregates accumulate within the ER lumen, degradation-resistant autophagic cargoes are co-segregated by ER membranes for lysosomal degradation. We developed synthetic ligands to the p62 ZZ domain that enhance ER-phagy for ER protein quality control and alleviate ER stresses. Our results elucidate the biochemical mechanisms and pharmaceutical means that regulate ER homeostasis.


Asunto(s)
Proteínas Portadoras/metabolismo , Retículo Endoplásmico/metabolismo , Proteolisis , Proteína Sequestosoma-1/metabolismo , Animales , Proteínas Portadoras/genética , Retículo Endoplásmico/genética , Células HEK293 , Células HeLa , Humanos , Ratones , Ratones Noqueados , Proteína Sequestosoma-1/genética , Ubiquitinación
2.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38305458

RESUMEN

MOTIVATION: Diabetes is a chronic metabolic disorder that has been a major cause of blindness, kidney failure, heart attacks, stroke, and lower limb amputation across the world. To alleviate the impact of diabetes, researchers have developed the next generation of anti-diabetic drugs, known as dipeptidyl peptidase IV inhibitory peptides (DPP-IV-IPs). However, the discovery of these promising drugs has been restricted due to the lack of effective peptide-mining tools. RESULTS: Here, we presented StructuralDPPIV, a deep learning model designed for DPP-IV-IP identification, which takes advantage of both molecular graph features in amino acid and sequence information. Experimental results on the independent test dataset and two wet experiment datasets show that our model outperforms the other state-of-art methods. Moreover, to better study what StructuralDPPIV learns, we used CAM technology and perturbation experiment to analyze our model, which yielded interpretable insights into the reasoning behind prediction results. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/WeiLab-BioChem/Structural-DPP-IV.


Asunto(s)
Aprendizaje Profundo , Diabetes Mellitus , Humanos , Dipeptidil Peptidasa 4 , Aminoácidos , Péptidos
3.
Plant Physiol ; 195(2): 1200-1213, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38428981

RESUMEN

N 6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning-based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.


Asunto(s)
Adenosina , Adenosina/análogos & derivados , Adenosina/metabolismo , ARN de Planta/genética , Aprendizaje Automático , Pisum sativum/genética , Pisum sativum/metabolismo , Plantas/genética , Plantas/metabolismo
4.
Nucleic Acids Res ; 51(7): 3017-3029, 2023 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-36796796

RESUMEN

Here, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop-shop web service that enables researchers to develop new deep-learning architectures to answer any biological question. Specifically, given any biological sequence data, DeepBIO supports a total of 42 state-of-the-art deep-learning algorithms for model training, comparison, optimization and evaluation in a fully automated pipeline. DeepBIO provides a comprehensive result visualization analysis for predictive models covering several aspects, such as model interpretability, feature analysis and functional sequential region discovery. Additionally, DeepBIO supports nine base-level functional annotation tasks using deep-learning architectures, with comprehensive interpretations and graphical visualizations to validate the reliability of annotated sites. Empowered by high-performance computers, DeepBIO allows ultra-fast prediction with up to million-scale sequence data in a few hours, demonstrating its usability in real application scenarios. Case study results show that DeepBIO provides an accurate, robust and interpretable prediction, demonstrating the power of deep learning in biological sequence functional analysis. Overall, we expect DeepBIO to ensure the reproducibility of deep-learning biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone. DeepBIO is publicly available at https://inner.wei-group.net/DeepBIO.


The development of next-generation sequencing techniques has led to an exponential increase in the amount of biological sequence data accessible. It naturally poses a fundamental challenge­how to build the relationships from such large-scale sequences to their functions. In this work, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. It enables researchers to develop new deep-learning architectures to answer any biological question in a fully automated pipeline. We expect DeepBIO to ensure the reproducibility of deep-learning-based biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone.


Asunto(s)
Aprendizaje Profundo , Reproducibilidad de los Resultados , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento
5.
Int J Cancer ; 155(11): 1928-1938, 2024 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-39039820

RESUMEN

Immunotherapy, especially immune checkpoint blockade therapy, represents a major milestone in the history of cancer therapy. However, the current response rate to immunotherapy among cancer patients must be improved; thus, new strategies for sensitizing patients to immunotherapy are urgently needed. Erythroid progenitor cells (EPCs), a population of immature erythroid cells, exert potent immunosuppressive functions. As a newly recognized immunosuppressive population, EPCs have not yet been effectively targeted. In this review, we summarize the immunoregulatory mechanisms of EPCs, especially for CD45+ EPCs. Moreover, in view of the regulatory effects of EPCs on the tumor microenvironment, we propose the concept of EPC-immunity, present existing strategies for targeting EPCs, and discuss the challenges encountered in both basic research and clinical applications. In particular, the impact of existing cancer treatments on EPCs is discussed, laying the foundation for combination therapies. The aim of this review is to provide new avenues for improving the efficacy of cancer immunotherapy by targeting EPCs.


Asunto(s)
Células Precursoras Eritroides , Inmunoterapia , Neoplasias , Microambiente Tumoral , Humanos , Neoplasias/terapia , Neoplasias/inmunología , Neoplasias/patología , Inmunoterapia/métodos , Microambiente Tumoral/inmunología , Células Precursoras Eritroides/inmunología , Animales , Antígenos Comunes de Leucocito/metabolismo
6.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34882198

RESUMEN

Metastasis is a major cause of cancer morbidity and mortality, and most cancer deaths are caused by cancer metastasis rather than by the primary tumor. The prediction of metastasis based on computational methods has not been explored much in the previous research. In this study, we proposed a graph convolutional network embedded with a graph learning (GL) module, named glmGCN, to predict the distant metastasis of cancer. Both the mRNA and lncRNA expressions were used to provide more genetic information than using the mRNA alone and we used them to construct gene interaction graph representation to consider the effect of genetic interaction. Then, the prediction of the cancer metastasis was performed under a GCN framework, which extracted informative and advanced features from the built non-regular graph structures. Particularly, a GL module was embedded in the proposed glmGCN to learn an optimal graph representation of the gene interaction. We firstly constructed the protein-protein interaction network to represent the initial gene(node) relationship graph. Then, through the GL module, a new graph representation was built which optimally learned the gene interaction strength. Finally, the GCN was adopted to identify the distant metastasis cases. It is worth mentioning that the proposed method pays more attentions on the gene-gene relation than the previous GCN-based method, so more accurate prediction performance can be obtained. The glmGCN was trained based on two types of cancer and was further validated using two other cancer types. A series of experiments have shown that the effectiveness of the proposed method. The implementation for the proposed method is available at https://github.com/RanSuLab/Metastasis-glmGCN.


Asunto(s)
Neoplasias , ARN Largo no Codificante , Humanos , Aprendizaje Automático , Neoplasias/genética , Redes Neurales de la Computación , ARN Largo no Codificante/genética
7.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35043144

RESUMEN

Predicting the response of cancer patients to a particular treatment is a major goal of modern oncology and an important step toward personalized treatment. In the practical clinics, the clinicians prefer to obtain the most-suited drugs for a particular patient instead of knowing the exact values of drug sensitivity. Instead of predicting the exact value of drug response, we proposed a deep learning-based method, named Siamese Response Deep Factorization Machines (SRDFM) Network, for personalized anti-cancer drug recommendation, which directly ranks the drugs and provides the most effective drugs. A Siamese network (SN), a type of deep learning network that is composed of identical subnetworks that share the same architecture, parameters and weights, was used to measure the relative position (RP) between drugs for each cell line. Through minimizing the difference between the real RP and the predicted RP, an optimal SN model was established to provide the rank for all the candidate drugs. Specifically, the subnetwork in each side of the SN consists of a feature generation level and a predictor construction level. On the feature generation level, both drug property and gene expression, were adopted to build a concatenated feature vector, which even enables the recommendation for newly designed drugs with only chemical property known. Particularly, we developed a response unit here to generate weighted genetic feature vector to simulate the biological interaction mechanism between a specific drug and the genes. For the predictor construction level, we built this level integrating a factorization machine (FM) component with a deep neural network component. The FM can well handle the discrete chemical information and both low-order and high-order feature interactions could be sufficiently learned. Impressively, the SRDFM works well on both single-drug recommendation and synergic drug combination. Experiment result on both single-drug and synergetic drug data sets have shown the efficiency of the SRDFM. The Python implementation for the proposed SRDFM is available at at https://github.com/RanSuLab/SRDFM Contact: ran.su@tju.edu.cn, gbx@mju.edu.cn and weileyi@sdu.edu.cn.


Asunto(s)
Antineoplásicos , Neoplasias , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Redes Neurales de la Computación
8.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34882225

RESUMEN

Recently, machine learning methods have been developed to identify various peptide bio-activities. However, due to the lack of experimentally validated peptides, machine learning methods cannot provide a sufficiently trained model, easily resulting in poor generalizability. Furthermore, there is no generic computational framework to predict the bioactivities of different peptides. Thus, a natural question is whether we can use limited samples to build an effective predictive model for different kinds of peptides. To address this question, we propose Mutual Information Maximization Meta-Learning (MIMML), a novel meta-learning-based predictive model for bioactive peptide discovery. Using few samples from various functional peptides, MIMML can sufficiently learn the discriminative information amongst various functions and characterize functional differences. Experimental results show excellent performance of MIMML though using far fewer training samples as compared to the state-of-the-art methods. We also decipher the latent relationships among different kinds of functions to understand what meta-model learned to improve a specific task. In summary, this study is a pioneering work in the field of functional peptide mining and provides the first-of-its-kind solution for few-sample learning problems in biological sequence analysis, accelerating the new functional peptide discovery. The source codes and datasets are available on https://github.com/TearsWaiting/MIMML.


Asunto(s)
Aprendizaje Automático , Péptidos , Péptidos/química , Programas Informáticos
9.
BMC Microbiol ; 24(1): 158, 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38720268

RESUMEN

BACKGROUND: The production of succinic acid (SA) from biomass has attracted worldwide interest. Saccharomyces cerevisiae is preferred for SA production due to its strong tolerance to low pH conditions, ease of genetic manipulation, and extensive application in industrial processes. However, when compared with bacterial producers, the SA titers and productivities achieved by engineered S. cerevisiae strains were relatively low. To develop efficient SA-producing strains, it's necessary to clearly understand how S. cerevisiae cells respond to SA. RESULTS: In this study, we cultivated five S. cerevisiae strains with different genetic backgrounds under different concentrations of SA. Among them, KF7 and NBRC1958 demonstrated high tolerance to SA, whereas NBRC2018 displayed the least tolerance. Therefore, these three strains were chosen to study how S. cerevisiae responds to SA. Under a concentration of 20 g/L SA, only a few differentially expressed genes were observed in three strains. At the higher concentration of 60 g/L SA, the response mechanisms of the three strains diverged notably. For KF7, genes involved in the glyoxylate cycle were significantly downregulated, whereas genes involved in gluconeogenesis, the pentose phosphate pathway, protein folding, and meiosis were significantly upregulated. For NBRC1958, genes related to the biosynthesis of vitamin B6, thiamin, and purine were significantly downregulated, whereas genes related to protein folding, toxin efflux, and cell wall remodeling were significantly upregulated. For NBRC2018, there was a significant upregulation of genes connected to the pentose phosphate pathway, gluconeogenesis, fatty acid utilization, and protein folding, except for the small heat shock protein gene HSP26. Overexpression of HSP26 and HSP42 notably enhanced the cell growth of NBRC1958 both in the presence and absence of SA. CONCLUSIONS: The inherent activities of small heat shock proteins, the levels of acetyl-CoA and the strains' potential capacity to consume SA all seem to affect the responses and tolerances of S. cerevisiae strains to SA. These factors should be taken into consideration when choosing host strains for SA production. This study provides a theoretical basis and identifies potential host strains for the development of robust and efficient SA-producing strains.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Saccharomyces cerevisiae , Ácido Succínico , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Ácido Succínico/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Fermentación
10.
Inorg Chem ; 63(17): 7937-7945, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38629190

RESUMEN

The urea-assisted water splitting not only enables a reduction in energy consumption during hydrogen production but also addresses the issue of environmental pollution caused by urea. Doping heterogeneous atoms in Ni-based electrocatalysts is considered an efficient means for regulating the electronic structure of Ni sites in catalytic processes. However, the current methodologies for synthesizing heteroatom-doped Ni-based electrocatalysts exhibit certain limitations, including intricate experimental procedures, prolonged reaction durations, and low product yield. Herein, Fe-doped NiO electrocatalysts were successfully synthesized using a rapid and facile solution combustion method, enabling the synthesis of 1.1107 g within a mere 5 min. The incorporation of iron atoms facilitates the modulation of the electronic environment around Ni atoms, generating a substantial decrease in the Gibbs free energy of intermediate species for the Fe-NiO catalyst. This modification promotes efficient cleavage of C-N bonds and consequently enhances the catalytic performance of UOR. Benefiting from the tunability of the electronic environment around the active sites and its efficient electron transfer, Fe-NiO electrocatalysts only needs 1.334 V to achieve 50 mA cm-2 during UOR. Moreover, Fe-NiO catalysts were integrated into a dual electrode urea electrolytic system, requiring only 1.43 V of cell voltage at 10 mA cm-2.

11.
Methods ; 212: 31-38, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36706825

RESUMEN

Liver is an important metabolic organ in human body and is sensitive to toxic chemicals or drugs. Adverse reactions caused by drug hepatotoxicity will damage the liver and hepatotoxicity is the leading cause of removal of approved drugs from the market. Therefore, it is of great significance to identify liver toxicity as early as possible in the drug development process. In this study, we developed a predictive model for drug hepatotoxicity based on histopathological whole slide images (WSI) which are the by-product of drug experiments and have received little attention. To better represent the WSIs, we constructed a graph representation for each WSI by dividing it into small patches, taking sampled patches as nodes and calculating the correlation coefficients between node features as the edges of the graph structure. Then a WSI-level graph convolutional network (GCN) was built to effectively extract the node information of the graph and predict the toxicity. In addition, we introduced a gated attention global context vector (gaGCV) to combine the global context to make node features to contain more comprehensive information. The results validated on rat liver in vivo data from the Open TG-GATES show that the use of WSI for the prediction of toxicity is feasible and effective.


Asunto(s)
Enfermedad Hepática Inducida por Sustancias y Drogas , Hígado , Animales , Humanos , Ratas , Enfermedad Hepática Inducida por Sustancias y Drogas/etiología , Hígado/patología , Microscopía , Interpretación de Imagen Asistida por Computador
12.
Methods ; 209: 1-9, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36410694

RESUMEN

With the rapid development of deep learning techniques and large-scale genomics database, it is of great potential to apply deep learning to the prediction task of anticancer drug sensitivity, which can effectively improve the identification efficiency and accuracy of therapeutic biomarkers. In this study, we propose a parallel deep learning framework DNN-PNN, which integrates rich and heterogeneous information from gene expression and pharmaceutical chemical structure data. With the proposal of DNN-PNN, a new and more effective drug data representation strategy is introduced, that is, the correlation between features is represented by product, which alleviates the limitations of high-dimensional discrete data in deep learning. Furthermore, the framework is optimized to reduce the time complexity of the model. We conducted extensive experiments on the CCLE datasets to compare DNN-PNN with its variant DNN-FM representing the traditional feature correlation model, the component DNN or PNN alone, and the common machine learning models. It is found that DNN-PNN not only has high prediction accuracy, but also has significant advantages in stability and convergence speed.


Asunto(s)
Antineoplásicos , Redes Neurales de la Computación , Aprendizaje Automático , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico
13.
Nucleic Acids Res ; 50(9): 4877-4899, 2022 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-35524568

RESUMEN

With the advent of single-cell RNA sequencing (scRNA-seq), one major challenging is the so-called 'dropout' events that distort gene expression and remarkably influence downstream analysis in single-cell transcriptome. To address this issue, much effort has been done and several scRNA-seq imputation methods were developed with two categories: model-based and deep learning-based. However, comprehensively and systematically comparing existing methods are still lacking. In this work, we use six simulated and two real scRNA-seq datasets to comprehensively evaluate and compare a total of 12 available imputation methods from the following four aspects: (i) gene expression recovering, (ii) cell clustering, (iii) gene differential expression, and (iv) cellular trajectory reconstruction. We demonstrate that deep learning-based approaches generally exhibit better overall performance than model-based approaches under major benchmarking comparison, indicating the power of deep learning for imputation. Importantly, we built scIMC (single-cell Imputation Methods Comparison platform), the first online platform that integrates all available state-of-the-art imputation methods for benchmarking comparison and visualization analysis, which is expected to be a convenient and useful tool for researchers of interest. It is now freely accessible via https://server.wei-group.net/scIMC/.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Benchmarking , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos
14.
Hemoglobin ; 48(1): 60-62, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38314576

RESUMEN

Patients with the genotype of ß0/ß0 for ß-thalassemia (ß-thal) usually behave as ß-thal major (ß-TM) phenotype which is transfusion-dependent. The pathophysiology of ß-thal is the imbalance between α/ß-globin chains. The degree of α/ß-globin imbalance can be reduced by the more effective synthesis of γ-globin chains, and increased Hb F levels, modifying clinical severity of ß-TM. We report a Chinese child who had homozygous ß0-thal and a heterozygous KLF1 mutation. The patient had a moderate anemia since 6 months old, keeping a baseline Hb value of 8.0-9.0 g/dL. She had normal development except for a short stature (3rd percentile) until 6 years old, when splenomegaly and facial bone deformities occurred. Although genetic alteration of KLF1 expression in ß0/ß0 patients can result in some degree of disease alleviation, our case shows that it is insufficient to ameliorate satisfactorily the presentation. This point should be borne in mind for physicians who provide the genetic counseling and prenatal diagnosis to at-risk families.


Asunto(s)
Globinas beta , Talasemia beta , Niño , Femenino , Humanos , Lactante , Globinas alfa/genética , Globinas beta/genética , Talasemia beta/genética , China , Estudios de Seguimiento , Genotipo , Mutación
15.
Brief Bioinform ; 22(1): 428-437, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31838506

RESUMEN

Identifying hepatotoxicity as early as possible is significant in drug development. In this study, we developed a drug-induced hepatotoxicity prediction model taking account of both the biological context and the computational efficacy based on toxicogenomics data. Specifically, we proposed a novel gene selection algorithm considering gene's participation, named BioCB, to choose the discriminative genes and make more efficient prediction. Then instead of using the raw gene expression levels to characterize each drug, we developed a two-dimensional biological process feature pattern map to represent each drug. Then we employed two strategies to handle the maps and identify the hepatotoxicity, the direct use of maps, named Two-dim branch, and vectorization of maps, named One-dim branch. The two strategies subsequently used the deep convolutional neural networks and LightGBM as predictors, respectively. Additionally, we here for the first time proposed a stacked vectorized gene matrix, which was more predictive than the raw gene matrix. Results validated on both in vivo and in vitro data from two public data sets, the TG-GATES and DrugMatrix, show that the proposed One-dim branch outperforms the deep framework, the Two-dim branch, and has achieved high accuracy and efficiency. The implementation of the proposed method is available at https://github.com/RanSuLab/Hepatotoxicity.


Asunto(s)
Enfermedad Hepática Inducida por Sustancias y Drogas/etiología , Desarrollo de Medicamentos/métodos , Genómica/métodos , Toxicogenética/métodos , Humanos , Programas Informáticos
16.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33415328

RESUMEN

Triple-negative breast cancer (TNBC) has been a challenging breast cancer subtype for oncological therapy. Normally, it can be classified into different molecular subtypes. Accurate and stable classification of the six subtypes is essential for personalized treatment of TNBC. In this study, we proposed a new framework to distinguish the six subtypes of TNBC, and this is one of the handful studies that completed the classification based on mRNA and long noncoding RNA expression data. Particularly, we developed a gene selection approach named DGGA, which takes correlation information between genes into account in the process of measuring gene importance and then effectively removes redundant genes. A gene scoring approach that combined GeneRank scores with gene importance generated by deep neural network (DNN), taking inter-subtype discrimination and inner-gene correlations into account, was came up to improve gene selection performance. More importantly, we embedded a gene connectivity matrix in the DNN for sparse learning, which takes additional consideration with weight changes during training when obtaining the measurement of the relative importance of each gene. Finally, Genetic Algorithm was used to simulate the natural evolutionary process to search for the optimal subset of TNBC subtype classification. We validated the proposed method through cross-validation, and the results demonstrate that it can use fewer genes to obtain more accurate classification results. The implementation for the proposed method is available at https://github.com/RanSuLab/TNBC.


Asunto(s)
Proteínas de Neoplasias/genética , Redes Neurales de la Computación , ARN Largo no Codificante/genética , ARN Mensajero/genética , Neoplasias de la Mama Triple Negativas/clasificación , Neoplasias de la Mama Triple Negativas/genética , Algoritmos , Antineoplásicos/uso terapéutico , Femenino , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Proteínas de Neoplasias/metabolismo , Medicina de Precisión , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo , Neoplasias de la Mama Triple Negativas/tratamiento farmacológico , Neoplasias de la Mama Triple Negativas/patología
17.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33320936

RESUMEN

The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label-attribute relevancy and label-label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Proteoma/metabolismo , Humanos
18.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33169141

RESUMEN

MOTIVATION: N7-methylguanosine (m7G) is an important epigenetic modification, playing an essential role in gene expression regulation. Therefore, accurate identification of m7G modifications will facilitate revealing and in-depth understanding their potential functional mechanisms. Although high-throughput experimental methods are capable of precisely locating m7G sites, they are still cost ineffective. Therefore, it's necessary to develop new methods to identify m7G sites. RESULTS: In this work, by using the iterative feature representation algorithm, we developed a machine learning based method, namely m7G-IFL, to identify m7G sites. To demonstrate its superiority, m7G-IFL was evaluated and compared with existing predictors. The results demonstrate that our predictor outperforms existing predictors in terms of accuracy for identifying m7G sites. By analyzing and comparing the features used in the predictors, we found that the positive and negative samples in our feature space were more separated than in existing feature space. This result demonstrates that our features extracted more discriminative information via the iterative feature learning process, and thus contributed to the predictive performance improvement.


Asunto(s)
Metilación de ADN , Epigénesis Genética , Guanosina/análogos & derivados , Máquina de Vectores de Soporte , Guanosina/genética , Guanosina/metabolismo , Células HeLa , Células Hep G2 , Humanos
19.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33152766

RESUMEN

Origins of replication sites (ORIs), which refers to the initiative locations of genomic DNA replication, play essential roles in DNA replication process. Detection of ORIs' distribution in genome scale is one of key steps to in-depth understanding their regulation mechanisms. In this study, we presented a novel machine learning-based approach called Stack-ORI encompassing 10 cell-specific prediction models for identifying ORIs from four different eukaryotic species (Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana). For each cell-specific model, we employed 12 feature encoding schemes that cover nucleic acid composition, position-specific and physicochemical properties information. The optimal feature set was identified from each encoding individually and developed their respective baseline models using the eXtreme Gradient Boosting (XGBoost) classifier. Subsequently, the predicted scores of 12 baseline models are integrated as a novel feature vector to train XGBoost and develop the final model. Extensive experimental results show that Stack-ORI achieves significantly better performance as compared with their baseline models on both training and independent datasets. Interestingly, Stack-ORI consistently outperforms existing predictor in all cell-specific models, not only on training but also on independent test. Moreover, our novel approach provides necessary interpretations that help understanding model success by leveraging the powerful SHapley Additive exPlanation algorithm, thus underlining the most important feature encoding schemes significant for predicting cell-specific ORIs.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Modelos Genéticos , Origen de Réplica , Máquina de Vectores de Soporte , Transcripción Genética , Animales , Drosophila melanogaster , Humanos , Ratones
20.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34117740

RESUMEN

The prediction of peptide secondary structures is fundamentally important to reveal the functional mechanisms of peptides with potential applications as therapeutic molecules. In this study, we propose a multi-view deep learning method named Peptide Secondary Structure Prediction based on Multi-View Information, Restriction and Transfer learning (PSSP-MVIRT) for peptide secondary structure prediction. To sufficiently exploit discriminative information, we introduce a multi-view fusion strategy to integrate different information from multiple perspectives, including sequential information, evolutionary information and hidden state information, respectively, and generate a unified feature space. Moreover, we construct a hybrid network architecture of Convolutional Neural Network and Bi-directional Gated Recurrent Unit to extract global and local features of peptides. Furthermore, we utilize transfer learning to effectively alleviate the lack of training samples (peptides with experimentally validated structures). Comparative results on independent tests demonstrate that our proposed method significantly outperforms state-of-the-art methods. In particular, our method exhibits better performance at the segment level, suggesting the strong ability of our model in capturing local discriminative information. The case study also shows that our PSSP-MVIRT achieves promising and robust performance in the prediction of new peptide secondary structures. Importantly, we establish a webserver to implement the proposed method, which is currently accessible via http://server.malab.cn/PSSP-MVIRT. We expect it can be a useful tool for the researchers of interest, facilitating the wide use of our method.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Profundo , Modelos Moleculares , Péptidos/química , Estructura Secundaria de Proteína , Bases de Datos de Proteínas , Reproducibilidad de los Resultados , Navegador Web
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA