Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 426
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38581416

RESUMEN

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Asunto(s)
Redes Reguladoras de Genes , Neoplasias Hepáticas , Humanos , Biología de Sistemas/métodos , Transcriptoma , Algoritmos , Biología Computacional/métodos
2.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38935070

RESUMEN

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Redes Neurales de la Computación , Humanos , Biología Computacional/métodos , Algoritmos , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/patología , Escherichia coli/genética
3.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36592058

RESUMEN

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Distribución Normal , Teorema de Bayes , Análisis de la Célula Individual/métodos , Análisis por Conglomerados
4.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631401

RESUMEN

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
5.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36611253

RESUMEN

Although previous studies have revealed that synonymous mutations contribute to various human diseases, distinguishing deleterious synonymous mutations from benign ones is still a challenge in medical genomics. Recently, computational tools have been introduced to predict the harmfulness of synonymous mutations. However, most of these computational tools rely on balanced training sets without considering abundant negative samples that could result in deficient performance. In this study, we propose a computational model that uses a selective ensemble to predict deleterious synonymous mutations (seDSM). We construct several candidate base classifiers for the ensemble using balanced training subsets randomly sampled from the imbalanced benchmark training sets. The diversity measures of the base classifiers are calculated by the pairwise diversity metrics, and the classifiers with the highest diversities are selected for integration using soft voting for synonymous mutation prediction. We also design two strategies for filling in missing values in the imbalanced dataset and constructing models using different pairwise diversity metrics. The experimental results show that a selective ensemble based on double fault with the ensemble strategy EKNNI for filling in missing values is the most effective scheme. Finally, using 40-dimensional biology features, we propose a novel model based on a selective ensemble for predicting deleterious synonymous mutations (seDSM). seDSM outperformed other state-of-the-art methods on the independent test sets according to multiple evaluation indicators, indicating that it has an outstanding predictive performance for deleterious synonymous mutations. We hope that seDSM will be useful for studying deleterious synonymous mutations and advancing our understanding of synonymous mutations. The source code of seDSM is freely accessible at https://github.com/xialab-ahu/seDSM.git.


Asunto(s)
Genómica , Mutación Silenciosa , Humanos , Genómica/métodos , Programas Informáticos , Algoritmos
6.
J Proteome Res ; 23(7): 2376-2385, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38856018

RESUMEN

Schizophrenia is a severe psychological disorder. The current diagnosis mainly relies on clinical symptoms and lacks laboratory evidence, which makes it very difficult to make an accurate diagnosis especially at an early stage. Plasma protein profiles of schizophrenia patients were obtained and compared with healthy controls using 4D-DIA proteomics technology. Furthermore, 79 DEPs were identified between schizophrenia and healthy controls. GO functional analysis indicated that DEPs were predominantly associated with responses to toxic substances and platelet aggregation, suggesting the presence of metabolic and immune dysregulation in patients with schizophrenia. KEGG pathway enrichment analysis revealed that DEPs were primarily enriched in the chemokine signaling pathway and cytokine receptor interactions. A diagnostic model was ultimately established, comprising three proteins, namely, PFN1, GAPDH and ACTBL2. This model demonstrated an AUC value of 0.972, indicating its effectiveness in accurately identifying schizophrenia. PFN1, GAPDH and ACTBL2 exhibit potential as biomarkers for the early detection of schizophrenia. The findings of our studies provide novel insights into the laboratory-based diagnosis of schizophrenia.


Asunto(s)
Biomarcadores , Profilinas , Proteómica , Esquizofrenia , Esquizofrenia/metabolismo , Esquizofrenia/diagnóstico , Esquizofrenia/sangre , Humanos , Biomarcadores/sangre , Biomarcadores/metabolismo , Proteómica/métodos , Profilinas/metabolismo , Femenino , Masculino , Adulto , Estudios de Casos y Controles , Gliceraldehído-3-Fosfato Deshidrogenasa (Fosforilante)/metabolismo , Persona de Mediana Edad , Proteínas Sanguíneas/análisis , Proteoma/análisis
7.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35136924

RESUMEN

Rapid development of single-cell RNA sequencing (scRNA-seq) technology has allowed researchers to explore biological phenomena at the cellular scale. Clustering is a crucial and helpful step for researchers to study the heterogeneity of cell. Although many clustering methods have been proposed, massive dropout events and the curse of dimensionality in scRNA-seq data make it still difficult to analysis because they reduce the accuracy of clustering methods, leading to misidentification of cell types. In this work, we propose the scHFC, which is a hybrid fuzzy clustering method optimized by natural computation based on Fuzzy C Mean (FCM) and Gath-Geva (GG) algorithms. Specifically, principal component analysis algorithm is utilized to reduce the dimensions of scRNA-seq data after it is preprocessed. Then, FCM algorithm optimized by simulated annealing algorithm and genetic algorithm is applied to cluster the data to output a membership matrix, which represents the initial clustering result and is taken as the input for GG algorithm to get the final clustering results. We also develop a cluster number estimation method called multi-index comprehensive estimation, which can estimate the cluster numbers well by combining four clustering effectiveness indexes. The performance of the scHFC method is evaluated on 17 scRNA-seq datasets, and compared with six state-of-the-art methods. Experimental results validate the better performance of our scHFC method in terms of clustering accuracy and stability of algorithm. In short, scHFC is an effective method to cluster cells for scRNA-seq data, and it presents great potential for downstream analysis of scRNA-seq data. The source code is available at https://github.com/WJ319/scHFC.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
8.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36305457

RESUMEN

With the development of research on the complex aetiology of many diseases, computational drug repositioning methodology has proven to be a shortcut to costly and inefficient traditional methods. Therefore, developing more promising computational methods is indispensable for finding new candidate diseases to treat with existing drugs. In this paper, a model integrating a new variant of message passing neural network and a novel-gated fusion mechanism called GLGMPNN is proposed for drug-disease association prediction. First, a light-gated message passing neural network (LGMPNN), including message passing, aggregation and updating, is proposed to separately extract multiple pieces of information from the similarity networks and the association network. Then, a gated fusion mechanism consisting of a forget gate and an output gate is applied to integrate the multiple pieces of information to extent. The forget gate calculated by the multiple embeddings is built to integrate the association information into the similarity information. Furthermore, the final node representations are controlled by the output gate, which fuses the topology information of the networks and the initial similarity information. Finally, a bilinear decoder is adopted to reconstruct an adjacency matrix for drug-disease associations. Evaluated by 10-fold cross-validations, GLGMPNN achieves excellent performance compared with the current models. The following studies show that our model can effectively discover novel drug-disease associations.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Algoritmos
9.
PLoS Comput Biol ; 19(8): e1011344, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37651321

RESUMEN

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.


Asunto(s)
Algoritmos , ARN Circular , Humanos , ARN Circular/genética , Semántica
10.
Org Biomol Chem ; 22(4): 645-681, 2024 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-38180073

RESUMEN

Organochalcogen compounds are prevalent in numerous natural products, pharmaceuticals, agrochemicals, polymers, biological molecules and synthetic intermediates. Direct chalcogenation of C-H bonds has evolved as a step- and atom-economical method for the synthesis of chalcogen-bearing compounds. Nevertheless, direct C-H chalcogenation severely lags behind C-C, C-N and C-O bond formations. Moreover, compared with the C-H monochalcogenation, reports of selective mono-/dichalcogenation and exclusive dichalcogenation of C-H bonds are relatively scarce. The past decade has witnessed significant advancements in selective mono-/dichalcogenation and exclusive dichalcogenation of various C(sp2)-H and C(sp3)-H bonds via transition-metal-catalyzed/mediated, photocatalytic, electrochemical or metal-free approaches. In light of the significance of both mono- and dichalcogen-containing compounds in various fields of chemical science and the critical issue of chemoselectivity in organic synthesis, the present review systematically summarizes the advances in these research fields, with a special focus on elucidating scopes and mechanistic aspects. Moreover, the synthetic limitations, applications of some of these processes, the current challenges and our own perspectives on these highly active research fields are also discussed. Based on the substrate types and C-H bonds being chalcogenated, the present review is organized into four sections: (1) transition-metal-catalyzed/mediated chelation-assisted selective C-H mono-/dichalcogenation or exclusive dichalcogenation of (hetero)arenes; (2) directing group-free selective C-H mono-/dichalcogenation or exclusive dichalcogenation of electron-rich (hetero)arenes; (3) C(sp3)-H dichalcogenation; (4) dichalcogenation of both C(sp2)-H and C(sp3)-H bonds. We believe the present review will serve as an invaluable resource for future innovations and drug discovery.

11.
Chem Biodivers ; : e202400870, 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38842484

RESUMEN

24 C3'-focused hybrids of aryl/penta-1,4-dien-3-one/amine (APDA) were designed and synthesized. Of these hybrids, 2n demonstrated improved antiproliferative effects on HER2-positive breast cancer cells (SKBr3 and BT474) and triple-negative breast cancer (TNBC) cells (MDA-MB-231 and MDA-MB-468) with IC50 values ranging from 7.45 to 10.75 µM, but less toxicity to normal breast cells MCF-10A than the first generation of hybrid 1. Additionally, 2n retained its ability to inhibit HSP90 C-terminus, leading to the degradation of HSP90 client proteins HER2, EGFR, pAKT, AKT, and CDK4, without inducing a heat-shock response. Notably, 2n also demonstrated improved thermostability compared to 1 and maintained in vitro metabolic stability in simulated intestinal fluid. These findings will provide a scientific basis for developing HSP90 C-terminal inhibitors in the future.

12.
World J Microbiol Biotechnol ; 40(7): 232, 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38834810

RESUMEN

Microbially induced carbonate precipitation (MICP) has been used to cure rare earth slags (RES) containing radionuclides (e.g. Th and U) and heavy metals with favorable results. However, the role of microbial extracellular polymeric substances (EPS) in MICP curing RES remains unclear. In this study, the EPS of Lysinibacillus sphaericus K-1 was extracted for the experiments of adsorption, inducing calcium carbonate (CaCO3) precipitation and curing of RES. The role of EPS in in MICP curing RES and stabilizing radionuclides and heavy metals was analyzed by evaluating the concentration and morphological distribution of radionuclides and heavy metals, and the compressive strength of the cured body. The results indicate that the adsorption efficiencies of EPS for Th (IV), U (VI), Cu2+, Pb2+, Zn2+, and Cd2+ were 44.83%, 45.83%, 53.7%, 61.3%, 42.1%, and 77.85%, respectively. The addition of EPS solution resulted in the formation of nanoscale spherical particles on the microorganism surface, which could act as an accumulating skeleton to facilitate the formation of CaCO3. After adding 20 mL of EPS solution during the curing process (Treat group), the maximum unconfined compressive strength (UCS) of the cured body reached 1.922 MPa, which was 12.13% higher than the CK group. The contents of exchangeable Th (IV) and U (VI) in the cured bodies of the Treat group decreased by 3.35% and 4.93%, respectively, compared with the CK group. Therefore, EPS enhances the effect of MICP curing RES and reduces the potential environmental problems that may be caused by radionuclides and heavy metals during the long-term sequestration of RES.


Asunto(s)
Bacillaceae , Carbonato de Calcio , Matriz Extracelular de Sustancias Poliméricas , Metales Pesados , Torio , Uranio , Uranio/química , Uranio/metabolismo , Carbonato de Calcio/química , Torio/química , Matriz Extracelular de Sustancias Poliméricas/metabolismo , Matriz Extracelular de Sustancias Poliméricas/química , Bacillaceae/metabolismo , Metales de Tierras Raras/química , Adsorción , Precipitación Química
13.
Zhongguo Zhong Yao Za Zhi ; 49(7): 1785-1792, 2024 Apr.
Artículo en Zh | MEDLINE | ID: mdl-38812190

RESUMEN

From the perspective of lncRNA MALAT1 regulating cholesterol metabolism in chondrocytes, this paper explores the effect and mechanism of Tougu Xiaotong Capsules(TGXTC) in delaying the degeneration of osteoarthritis. After one week of adaptive feeding, 48(8-week-old) C57BL/6 mice were randomly divided into a blank group(12 mice) and a model group(36 mice) by random number table method. The mice in the model group were anesthetized by inhalation of 5% isoflurane, and the OA model was induced by Hulth method. The experiment randomly divided the mice into a model group(12 mice), a drug-positive group(taururso-deoxycholic acid)(12 mice), and a TGXTC group(12 mice). The drug-positive group was given 500 mg·kg~(-1) taurodeoxycholic acid by intragastric administration. TGXTC group was given TGXTC 368 mg·kg~(-1) by gavage. The blank group and model group were given the same amount of normal saline for four weeks. After the intervention, the mice in each group were killed under anesthesia, and the knee cartilage tissue was separated and collected. The morphologic changes of knee cartilage were observed. The level of lncRNA MALAT1 in the cartilage tissue was detected by real-time PCR. The protein expressions of ABCA1, ApoA1, LXRß, CHOP, and caspase-3 in mouse articular cartilage were detected by Western blot. Lentivirus-coated plasmid was used to transfect mouse chondrocytes with sh-MALAT1. The gene levels of lncRNA MALAT1 in mouse chondrocytes transfected with sh-MALAT1 were detected by real-time PCR. Western blot was used to detect the effect of TGXTC on the protein content of ABCA1, ApoA1, LXRß, CHOP, and caspase-3 in thapsigargin(TG)-induced mouse chondrocytes after lncRNA MALAT1 knockdown. Flow cytometry was used to detect the effect of TGXTC on apoptosis of TG-induced mouse chondrocytes after lncRNA MALAT1 knockdown. The results of HE and saffranine O staining showed that compared with the model group, the structure of the cartilage layer was basically intact; the damage degree of joint structure was significantly improved, and the cartilage matrix was significantly enhanced by saffranine O staining in the TGXTC group and drug-positive group. Compared with the model group, the lncRNA MALAT1 level was significantly decreased in the TGXTC group and drug-positive group. Compared with the model group, the protein content of ABCA1, ApoA1, and LXRß was significantly increased, while that of CHOP and caspase-3 in the TGXTC group and drug-positive group significantly decreased. Compared with the TG group, the lncRNA MALAT1 level in the TG+sh-MALAT1 group was decreased. The lncRNA MALAT1 level in the TG+sh-MA-LAT1+TGXTC group was increased compared with the TG+TGXTC group. Western blot results showed that compared with the model group, protein expressions of ABCA1, ApoA1, LXRß, CHOP, and caspase-3 in the TGXTC group were significantly decreased, after lncRNA MALAT1 knockdown, the regulation and apoptosis of ABCA1, ApoA1, LXRß, CHOP, and caspase-3 in TG-induced mouse chondrocytes were weakened by TGXTC. TGXTC can improve the disorder of cholesterol metabolism in OA chondrocytes and delay OA degeneration, which is closely related to the regulation of lncRNA MALAT1.


Asunto(s)
Colesterol , Condrocitos , Medicamentos Herbarios Chinos , Ratones Endogámicos C57BL , Osteoartritis , ARN Largo no Codificante , Animales , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Condrocitos/metabolismo , Condrocitos/efectos de los fármacos , Ratones , Osteoartritis/metabolismo , Osteoartritis/genética , Osteoartritis/tratamiento farmacológico , Colesterol/metabolismo , Medicamentos Herbarios Chinos/farmacología , Medicamentos Herbarios Chinos/administración & dosificación , Masculino , Humanos , Cápsulas
14.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33415333

RESUMEN

Predicting disease-related long non-coding RNAs (lncRNAs) is beneficial to finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. In this paper, we proposed a machine learning techniques-based classification approach to identify disease-related lncRNAs by graph auto-encoder (GAE) and random forest (RF) (GAERF). First, we combined the relationship of lncRNA, miRNA and disease into a heterogeneous network. Then, low-dimensional representation vectors of nodes were learned from the network by GAE, which reduce the dimension and heterogeneity of biological data. Taking these feature vectors as input, we trained a RF classifier to predict new lncRNA-disease associations (LDAs). Related experiment results show that the proposed method for the representation of lncRNA-disease characterizes them accurately. GAERF achieves superior performance owing to the ensemble learning method, outperforming other methods significantly. Moreover, case studies further demonstrated that GAERF is an effective method to predict LDAs.


Asunto(s)
Neoplasias Pulmonares/genética , Aprendizaje Automático , Redes Neurales de la Computación , Neoplasias de la Próstata/genética , ARN Largo no Codificante/genética , Neoplasias Gástricas/genética , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional/métodos , Gráficos por Computador/estadística & datos numéricos , Árboles de Decisión , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patología , Masculino , MicroARNs/clasificación , MicroARNs/genética , MicroARNs/metabolismo , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , ARN Largo no Codificante/clasificación , ARN Largo no Codificante/metabolismo , Curva ROC , Factores de Riesgo , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/metabolismo , Neoplasias Gástricas/patología
15.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33866367

RESUMEN

Although synonymous mutations do not alter the encoded amino acids, they may impact protein function by interfering with the regulation of RNA splicing or altering transcript splicing. New progress on next-generation sequencing technologies has put the exploration of synonymous mutations at the forefront of precision medicine. Several approaches have been proposed for predicting the deleterious synonymous mutations specifically, but their performance is limited by imbalance of the positive and negative samples. In this study, we firstly expanded the number of samples greatly from various data sources and compared six undersampling strategies to solve the problem of the imbalanced datasets. The results suggested that cluster centroid is the most effective scheme. Secondly, we presented a computational model, undersampling scheme based method for deleterious synonymous mutation (usDSM) prediction, using 14-dimensional biology features and random forest classifier to detect the deleterious synonymous mutation. The results on the test datasets indicated that the proposed usDSM model can attain superior performance in comparison with other state-of-the-art machine learning methods. Lastly, we found that the deep learning model did not play a substantial role in deleterious synonymous mutation prediction through a lot of experiments, although it achieves superior results in other fields. In conclusion, we hope our work will contribute to the future development of computational methods for a more accurate prediction of the deleterious effect of human synonymous mutation. The web server of usDSM is freely accessible at http://usdsm.xialab.info/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Modelos Genéticos , Proteínas/genética , Mutación Silenciosa , Humanos , Proteínas/química , Reproducibilidad de los Resultados
16.
Bioinformatics ; 38(15): 3703-3709, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35699473

RESUMEN

MOTIVATION: A large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicate cell type assignment. RESULTS: In this article, we propose a semi-supervised clustering method based on a capsule network named scCNC that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses. AVAILABILITY AND IMPLEMENTATION: The source code of scCNC is freely available at https://github.com/WHY-17/scCNC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Programas Informáticos
17.
Ann Surg Oncol ; 30(4): 2227-2241, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36587172

RESUMEN

OBJECTIVE: This study aimed to construct a new staging system for patients with esophageal squamous cell carcinoma (ESCC) based on combined pathological TNM (pTNM) stage, radiomics, and proteomics. METHODS: This study collected patients with radiomics and pTNM stage (Cohort 1, n = 786), among whom 103 patients also had proteomic data (Cohort 2, n = 103). The Cox regression model with the least absolute shrinkage and selection operator, and the Cox proportional hazards model were used to construct a nomogram and predictive models. Concordance index (C-index) and the integrated area under the time-dependent receiver operating characteristic (ROC) curve (IAUC) were used to evaluate the predictive models. The corresponding staging systems were further assessed using Kaplan-Meier survival curves. RESULTS: For Cohort 1, the RadpTNM4c staging systems, constructed based on combined pTNM stage and radiomic features, outperformed the pTNM4c stage in both the training dataset 1 (Train1; IAUC 0.711 vs. 0.706, p < 0.001) and the validation dataset 1 (Valid1; IAUC 0.695 vs. 0.659, p < 0.001; C-index 0.703 vs. 0.674, p = 0.029). For Cohort 2, the ProtRadpTNM2c staging system, constructed based on combined pTNM stage, radiomics, and proteomics, outperformed the pTNM2c stage in both the Train2 (IAUC 0.777 vs. 0.610, p < 0.001; C-index 0.898 vs. 0.608, p < 0.001) and Valid2 (IAUC 0.746 vs. 0.608, p < 0.001; C-index 0.889 vs. 0.641, p = 0.009) datasets. CONCLUSIONS: The ProtRadpTNM2c staging system, based on combined pTNM stage, radiomic, and proteomic features, improves the predictive performance of the classical pTNM staging system.


Asunto(s)
Neoplasias Esofágicas , Carcinoma de Células Escamosas de Esófago , Humanos , Carcinoma de Células Escamosas de Esófago/diagnóstico por imagen , Carcinoma de Células Escamosas de Esófago/terapia , Carcinoma de Células Escamosas de Esófago/patología , Neoplasias Esofágicas/diagnóstico por imagen , Neoplasias Esofágicas/terapia , Neoplasias Esofágicas/patología , Proteómica , Estadificación de Neoplasias , Nomogramas
18.
Stem Cells ; 40(3): 290-302, 2022 03 31.
Artículo en Inglés | MEDLINE | ID: mdl-35356984

RESUMEN

Cellular senescence severely limits the research and the application of dental pulp stem cells (DPSCs). A previous study conducted by our research group revealed a close implication of ROR2 in DPSC senescence, although the mechanism underlying the regulation of ROR2 in DPSCs remains poorly understood so far. In the present study, it was revealed that the expression of the ROR2-interacting transcription factor MSX2 was increased in aging DPSCs. It was demonstrated that the depletion of MSX2 inhibits the senescence of DPSCs and restores their self-renewal capacity, and the simultaneous overexpression of ROR2 enhanced this effect. Moreover, MSX2 knockdown suppressed the transcription of NOP2/Sun domain family member 2 (NSUN2), which regulates the expression of p21 by binding to and causing the 5-methylcytidine methylation of the 3'- untranslated region of p21 mRNA. Interestingly, ROR2 downregulation elevated the levels of MSX2 protein, and not the MSX2 mRNA expression, by reducing the phosphorylation level of MSX2 and inhibiting the RNF34-mediated MSX2 ubiquitination degradation. The results of the present study demonstrated the vital role of the ROR2/MSX2/NSUN2 axis in the regulation of DPSC senescence, thereby revealing a potential target for antagonizing DPSC aging.


Asunto(s)
Senescencia Celular , Pulpa Dental , Senescencia Celular/genética , Pulpa Dental/metabolismo , Regulación hacia Abajo/genética , Regulación de la Expresión Génica , ARN Mensajero/genética
19.
Methods ; 208: 66-74, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36377123

RESUMEN

BACKGROUND: Single cell sequencing is a technology for high-throughput sequencing analysis of genome, transcriptome and epigenome at the single cell level. It can improve the shortcomings of traditional methods, reveal the gene structure and gene expression state of a single cell, and reflect the heterogeneity between cells. Among them, the clustering analysis of single-cell RNA data is a very important step, but the clustering of single-cell RNA data is faced with two difficulties, dropout events and dimension curse. At present, many methods are only driven by data, and do not make full use of the existing biological information. RESULTS: In this work, we propose scSSA, a clustering model based on semi-supervised autoencoder, fast independent component analysis (FastICA) and Gaussian mixture clustering. Firstly, the semi-supervised autoencoder imputes and denoises the scRNA-seq data, and then get the low-dimensional latent representation. Secondly, the low-dimensional representation is reduced the dimension and clustered by FastICA and Gaussian mixture model respectively. Finally, scSSA is compared with Seurat, CIDR and other methods on 10 public scRNA-seq datasets. CONCLUSION: The results show that scSSA has superior performance in cell clustering on 10 public datasets. In conclusion, scSSA can accurately identify the cell types and is generally applicable to all kinds of single cell datasets. scSSA has great application potential in the field of scRNA-seq data analysis. Details in the code have been uploaded to the website https://github.com/houtongshuai123/scSSA/.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , RNA-Seq , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , ARN
20.
Methods ; 204: 38-46, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35367367

RESUMEN

Promoter is a key DNA element located near the transcription start site, which regulates gene transcription by binding RNA polymerase. Thus, the identification of promoters is an important research field in synthetic biology. Nannochloropsis is an important unicellular industrial oleaginous microalgae, and at present, some studies have identified some promoters with specific functions by biological methods in Nannochloropsis, whereas few studies used computational methods. Here, we propose a method called DNPPro (DenseNet-Predict-Promoter) based on densely connected convolutional neural networks to predict the promoter of Nannochloropsis. First, we collected promoter sequences from six Nannochloropsis strains and removed 80% similarity using CD-HIT for each strain to yield a reliable set of positive datasets. Then, in order to construct a robust classifier, within-group scrambling method was used to generate negative dataset which overcomes the limitation of randomly selecting a non-promoter region from the same genome as a negative sample. Finally, we constructed a densely connected convolutional neural network, with the sequence one-hot encoding as the input. Compared with commonly used sequence processing methods, DNPPro can extract long sequence features to a greater extent. The cross-strain experiment on independent dataset verifies the generalization of our method. At the same time, T-SNE visualization analysis shows that our method can effectively distinguish promoters from non-promoters.


Asunto(s)
Redes Neurales de la Computación , Biología Sintética , Regiones Promotoras Genéticas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA