Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 288
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proteomics ; 24(12-13): e2300371, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38643379

RESUMEN

Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.


Asunto(s)
Estabilidad Proteica , Proteínas , Termodinámica , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Algoritmos , Mutación , Humanos
2.
Proteomics ; : e2300302, 2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38258387

RESUMEN

Small proteins (SPs) are a unique group of proteins that play crucial roles in many important biological processes. Exploring the biological function of SPs is necessary. In this study, the InterPro tool and the maximum correlation method were utilized to analyze functional domains of SPs. The purpose was to identify important functional domains that can indicate the essential differences between small and large protein sequences. First, the small and large proteins were represented by their functional domains via a one-hot scheme. Then, the MaxRel method was adopted to evaluate the relationships between each domain and the target variable, indicating small or large protein. The top 36 domain features were selected for further investigation. Among them, 14 were deemed to be highly related to SPs because they were annotated to SPs more frequently than large proteins. We found the involvement of functional domains, such as ubiquitin-conjugating enzyme/RWD-like, nuclear transport factor 2 domain, and alpha subunit of guanine nucleotide-binding protein (G-protein) in regulating the biological function of SPs. The involvement of these domains has been confirmed by other recent studies. Our findings indicate that protein functional domains may regulate small protein-related functions and predict their biological activity.

3.
Biochem Genet ; 2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38383836

RESUMEN

Breast cancer remains the most prevalent cancer in women. To date, its underlying molecular mechanisms have not been fully uncovered. The determination of gene factors is important to improve our understanding on breast cancer, which can correlate the specific gene expression and tumor staging. However, the knowledge in this regard is still far from complete. Thus, this study aimed to explore these knowledge gaps by analyzing existing gene expression profile data from 3149 breast cancer samples, where each sample was represented by the expression of 19,644 genes and classified into Nottingham histological grade (NHG) classes (Grade 1, 2, and 3). To this end, a machine learning-based framework was designed. First, the profile data were analyzed by using seven feature ranking algorithms to evaluate the importance of features (genes). Seven feature lists were generated, each of which sorted features in accordance with feature importance evaluated from a special aspect. Then, the incremental feature selection method was applied to each list to determine essential features for classification and building efficient classifiers. Consequently, overlapping genes, such as AURKA, CBX2, and MYBL2, were deemed as potentially related to breast cancer malignancy and prognosis, indicating that such genes were identified to be important by multiple feature ranking algorithms. In addition, the study formulated classification rules to reflect special gene expression patterns for three NHG classes. Some genes and rules were analyzed and supported by recent literature, providing new references for studying breast cancer.

4.
Proteomics ; 22(15-16): e2100190, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35567424

RESUMEN

Protein-protein interactions (PPIs) form the basis of a myriad of biological pathways and mechanism, such as the formation of protein complexes or the components of signaling cascades. Here, we reviewed experimental methods for identifying PPI pairs, including yeast two-hybrid (Y2H), mass spectrometry (MS), co-localization, and co-immunoprecipitation. Furthermore, a range of computational methods leveraging biochemical properties, evolution history, protein structures and more have enabled identification of additional PPIs. Given the wealth of known PPIs, we reviewed important network methods to construct and analyze networks of PPIs. These methods aid biological discovery through identifying hub genes and dynamic changes in the network, and have been thoroughly applied in various fields of biological research. Lastly, we discussed the challenges and future direction of research utilizing the power of PPI networks.


Asunto(s)
Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Saccharomyces cerevisiae/metabolismo
5.
Mol Genet Genomics ; 297(5): 1301-1313, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35780439

RESUMEN

Lung is the most important organ in the human respiratory system, whose normal functions are quite essential for human beings. Under certain pathological conditions, the normal lung functions could no longer be maintained in patients, and lung transplantation is generally applied to ease patients' breathing and prolong their lives. However, several risk factors exist during and after lung transplantation, including bleeding, infection, and transplant rejections. In particular, transplant rejections are difficult to predict or prevent, leading to the most dangerous complications and severe status in patients undergoing lung transplantation. Given that most common monitoring and validation methods for lung transplantation rejections may take quite a long time and have low reproducibility, new technologies and methods are required to improve the efficacy and accuracy of rejection monitoring after lung transplantation. Recently, one previous study set up the gene expression profiles of patients who underwent lung transplantation. However, it did not provide a tool to predict lung transplantation responses. Here, a further deep investigation was conducted on such profiling data. A computational framework, incorporating several machine learning algorithms, such as feature selection methods and classification algorithms, was built to establish an effective prediction model distinguishing patient into different clinical subgroups, corresponding to different rejection responses after lung transplantation. Furthermore, the framework also screened essential genes with functional enrichments and create quantitative rules for the distinction of patients with different rejection responses to lung transplantation. The outcome of this contribution could provide guidelines for clinical treatment of each rejection subtype and contribute to the revealing of complicated rejection mechanisms of lung transplantation.


Asunto(s)
Trasplante de Pulmón , Rechazo de Injerto , Humanos , Pulmón , Reproducibilidad de los Resultados , Transcriptoma
6.
Mol Genet Genomics ; 296(4): 905-918, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33914130

RESUMEN

Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Estudios de Asociación Genética/métodos , Conjuntos de Datos como Asunto , Redes Reguladoras de Genes/fisiología , Redes y Vías Metabólicas/genética , Fenotipo , Proteínas/química , Proteínas/genética , Proteínas/fisiología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiología , Relación Estructura-Actividad
7.
Genomics ; 112(6): 4945-4958, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32919019

RESUMEN

Coronary artery disease (CAD) is the most common cardiovascular disease. CAD research has greatly progressed during the past decade. mRNA is a traditional and popular pipeline to investigate various disease, including CAD. Compared with mRNA, lncRNA has better stability and thus may serve as a better disease indicator in blood. Investigating potential CAD-related lncRNAs and mRNAs will greatly contribute to the diagnosis and treatment of CAD. In this study, a computational analysis was conducted on patients with CAD by using a comprehensive transcription dataset with combined mRNA and lncRNA expression data. Several machine learning algorithms, including feature selection methods and classification algorithms, were applied to screen for the most CAD-related RNA molecules. Decision rules were also reported to provide a quantitative description about the effect of these RNA molecules on CAD progression. These new findings (CAD-related RNA molecules and rules) can help understand mRNA and lncRNA expression levels in CAD.


Asunto(s)
Enfermedad de la Arteria Coronaria/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo , Enfermedad de la Arteria Coronaria/metabolismo , Perfilación de la Expresión Génica , Humanos , Aprendizaje Automático
8.
Genomics ; 112(3): 2524-2534, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32045671

RESUMEN

The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.


Asunto(s)
Embrión de Mamíferos/metabolismo , Desarrollo Embrionario/genética , Aprendizaje Automático , Transcriptoma , Animales , Perfilación de la Expresión Génica , Ratones , Análisis de la Célula Individual , Máquina de Vectores de Soporte
9.
Gene Ther ; 26(12): 465-478, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31455874

RESUMEN

Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Neoplasias de la Boca/genética , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático , Mapas de Interacción de Proteínas
10.
Gene Ther ; 26(1-2): 29-39, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30443044

RESUMEN

Many complex diseases or traits are the results of both genetic and environmental factors. The environmental factors affect the human body by modifying its epigenetics, which controls the activity of genomes without mutating it. Viral infection is one of the common environmental factors for complex diseases. For example, the human immunodeficiency virus (HIV) infection can cause acquired immune deficiency syndrome (AIDS), HBV, and HCV infections are associated with hepatocellular carcinoma, and human papillomavirus infection is a causal factor in cervical carcinoma. In this study, to investigate how HIV infection affects DNA methylation, we analyzed the blood DNA methylation data of 485 512 sites in 44 HIV- and 142 HIV + patients. Several advanced computational methods were applied to identify the core distinctive features that were different between the HIV patients and the healthy controls. These methods can be used for differentiating HIV-infected patients from uninfected ones. These core distinctive DNA methylation features were confirmed to be functionally connected to premature aging and abnormal immune regulation, two typical pathological symptoms of HIV infection, revealing the potential regulatory mechanisms of HIV infection on the DNA methylation status of the host cells and provided novel insights on the pathogenesis of HIV infection and AIDS.


Asunto(s)
Metilación de ADN , Epigénesis Genética , Infecciones por VIH/genética , Algoritmos , Genoma Humano , Humanos , Modelos Genéticos
11.
J Cell Biochem ; 120(5): 7068-7081, 2019 May.
Artículo en Inglés | MEDLINE | ID: mdl-30368905

RESUMEN

Mechanisms through which tissues are formed and maintained remain unknown but are fundamental aspects in biology. Tissue-specific gene expression is a valuable tool to study such mechanisms. But in many biomedical studies, cell lines, rather than human body tissues, are used to investigate biological mechanisms Whether or not cell lines maintain their tissue-specific characteristics after they are isolated and cultured outside the human body remains to be explored. In this study, we applied a novel computational method to identify core genes that contribute to the differentiation of cell lines from various tissues. Several advanced computational techniques, such as Monte Carlo feature selection method, incremental feature selection method, and support vector machine (SVM) algorithm, were incorporated in the proposed method, which extensively analyzed the gene expression profiles of cell lines from different tissues. As a result, we extracted a group of functional genes that can indicate the differences of cell lines in different tissues and built an optimal SVM classifier for identifying cell lines in different tissues. In addition, a set of rules for classifying cell lines were also reported, which can give a clearer picture of cell lines in different issues although its performance was not better than the optimal SVM classifier. Finally, we compared such genes with the tissue-specific genes identified by the Genotype-tissue Expression project. Results showed that most expression patterns between tissues remained in the derived cell lines despite some uniqueness that some genes show tissue specificity.

12.
J Cell Biochem ; 120(1): 405-416, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30125975

RESUMEN

Synthetic lethality is the synthesis of mutations leading to cell death. Tumor-specific synthetic lethality has been targeted in research to improve cancer therapy. With the advances of techniques in molecular biology, such as RNAi and CRISPR/Cas9 gene editing, efforts have been made to systematically identify synthetic lethal interactions, especially for frequently mutated genes in cancers. However, elucidating the mechanism of synthetic lethality remains a challenge because of the complexity of its influencing conditions. In this study, we proposed a new computational method to identify critical functional features that can accurately predict synthetic lethal interactions. This method incorporates several machine learning algorithms and encodes protein-coding genes by an enrichment system derived from gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways to represent their functional features. We built a random forest-based prediction engine by using 2120 selected features and obtained a Matthews correlation coefficient of 0.532. We examined the top 15 features and found that most of them have potential roles in synthetic lethality according to previous studies. These results demonstrate the ability of our proposed method to predict synthetic lethal interactions and provide a basis for further characterization of these particular genetic combinations.


Asunto(s)
Biología Computacional/métodos , Genes Relacionados con las Neoplasias/genética , Aprendizaje Automático , Neoplasias/genética , Mutaciones Letales Sintéticas/genética , Células A549 , Exactitud de los Datos , Edición Génica , Ontología de Genes , Células HeLa , Humanos , Interferencia de ARN , Sensibilidad y Especificidad
13.
Mol Genet Genomics ; 294(1): 95-110, 2019 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-30203254

RESUMEN

Breast cancer is a common and threatening malignant disease with multiple biological and clinical subtypes. It can be categorized into subtypes of luminal A, luminal B, Her2 positive, and basal-like. Copy number variants (CNVs) have been reported to be a potential and even better biomarker for cancer diagnosis than mRNA biomarkers, because it is considerably more stable and robust than gene expression. Thus, it is meaningful to detect CNVs of different cancers. To identify the CNV biomarker for breast cancer subtypes, we integrated the CNV data of more than 2000 samples from two large breast cancer databases, METABRIC and The Cancer Genome Atlas (TCGA). A Monte Carlo feature selection-based and incremental feature selection-based computational method was proposed and tested to identify the distinctive core CNVs in different breast cancer subtypes. We identified the CNV genes that may contribute to breast cancer tumorigenesis as well as built a set of quantitative distinctive rules for recognition of the breast cancer subtypes. The tenfold cross-validation Matthew's correlation coefficient (MCC) on METABRIC training set and the independent test on TCGA dataset were 0.515 and 0.492, respectively. The CNVs of PGAP3, GRB7, MIR4728, PNMT, STARD3, TCAP and ERBB2 were important for the accurate diagnosis of breast cancer subtypes. The findings reported in this study may further uncover the difference between different breast cancer subtypes and improve the diagnosis accuracy.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/diagnóstico , Variaciones en el Número de Copia de ADN , Neoplasias de la Mama/genética , Bases de Datos Genéticas , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Método de Montecarlo , Sensibilidad y Especificidad
14.
Int J Mol Sci ; 20(17)2019 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-31480430

RESUMEN

Breast cancer is regarded worldwide as a severe human disease. Various genetic variations, including hereditary and somatic mutations, contribute to the initiation and progression of this disease. The diagnostic parameters of breast cancer are not limited to the conventional protein content and can include newly discovered genetic variants and even genetic modification patterns such as methylation and microRNA. In addition, breast cancer detection extends to detailed breast cancer stratifications to provide subtype-specific indications for further personalized treatment. One genome-wide expression-methylation quantitative trait loci analysis confirmed that different breast cancer subtypes have various methylation patterns. However, recognizing clinically applied (methylation) biomarkers is difficult due to the large number of differentially methylated genes. In this study, we attempted to re-screen a small group of functional biomarkers for the identification and distinction of different breast cancer subtypes with advanced machine learning methods. The findings may contribute to biomarker identification for different breast cancer subtypes and provide a new perspective for differential pathogenesis in breast cancer subtypes.


Asunto(s)
Neoplasias de la Mama/genética , Metilación de ADN , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/patología , Epigénesis Genética , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Aprendizaje Automático , Sitios de Carácter Cuantitativo
15.
Int J Mol Sci ; 20(9)2019 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-31052553

RESUMEN

Small nucleolar RNAs (snoRNAs) are a new type of functional small RNAs involved in the chemical modifications of rRNAs, tRNAs, and small nuclear RNAs. It is reported that they play important roles in tumorigenesis via various regulatory modes. snoRNAs can both participate in the regulation of methylation and pseudouridylation and regulate the expression pattern of their host genes. This research investigated the expression pattern of snoRNAs in eight major cancer types in TCGA via several machine learning algorithms. The expression levels of snoRNAs were first analyzed by a powerful feature selection method, Monte Carlo feature selection (MCFS). A feature list and some informative features were accessed. Then, the incremental feature selection (IFS) was applied to the feature list to extract optimal features/snoRNAs, which can make the support vector machine (SVM) yield best performance. The discriminative snoRNAs included HBII-52-14, HBII-336, SNORD123, HBII-85-29, HBII-420, U3, HBI-43, SNORD116, SNORA73B, SCARNA4, HBII-85-20, etc., on which the SVM can provide a Matthew's correlation coefficient (MCC) of 0.881 for predicting these eight cancer types. On the other hand, the informative features were fed into the Johnson reducer and repeated incremental pruning to produce error reduction (RIPPER) algorithms to generate classification rules, which can clearly show different snoRNAs expression patterns in different cancer types. The analysis results indicated that extracted discriminative snoRNAs can be important for identifying cancer samples in different types and the expression pattern of snoRNAs in different cancer types can be partly uncovered by quantitative recognition rules.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Aprendizaje Automático , Neoplasias/genética , ARN Nucleolar Pequeño/genética , Algoritmos , Humanos , Método de Montecarlo , Máquina de Vectores de Soporte
16.
J Cell Biochem ; 119(4): 3394-3403, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29130544

RESUMEN

Adult neural stem cells (NSCs) are a group of multi-potent, self-renewing progenitor cells that contribute to the generation of new neurons and oligodendrocytes. Three subtypes of NSCs can be isolated based on the stages of the NSC lineage, including quiescent neural stem cells (qNSCs), activated neural stem cells (aNSCs) and neural progenitor cells (NPCs). Although it is widely accepted that these three groups of NSCs play different roles in the development of the nervous system, their molecular signatures are poorly understood. In this study, we applied the Monte-Carlo Feature Selection (MCFS) method to identify the gene expression signatures, which can yield a Matthews correlation coefficient (MCC) value of 0.918 with a support vector machine evaluated by ten-fold cross-validation. In addition, some classification rules yielded by the MCFS program for distinguishing above three subtypes were reported. Our results not only demonstrate a high classification capacity and subtype-specific gene expression patterns but also quantitatively reflect the pattern of the gene expression levels across the NSC lineage, providing insight into deciphering the molecular basis of NSC differentiation.


Asunto(s)
Astrocitos/citología , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Células-Madre Neurales/clasificación , Algoritmos , Linaje de la Célula , Células Cultivadas , Humanos , Método de Montecarlo , Máquina de Vectores de Soporte
17.
Int J Cancer ; 143(7): 1731-1740, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29696646

RESUMEN

Colorectal cancer is the third most common cancer in males and second in females. This disease can be caused by genetic and acquired/environmental factors. Microsatellite instability (MSI) is one of the major mechanisms in colorectal cancer. This mechanism is a specific condition of genetic hyper mutability that results from incompetent DNA mismatch repair. MSI has been applied to classify different colorectal cancer subtypes. However, the effects of MSI status on gene expression are largely unknown. In our study, we integrated the gene expression profile and MSI status of all CRC samples from the TCGA database, and then categorized the CRC samples into three subgroups, namely, MSI-stable, MSI-low, and MSI-high, according to the MSI status. We applied a novel computational method based on machine learning and screened the genes specifically expressed for the different colorectal cancer subtypes. The results showed the distinct mechanisms of the different colorectal cancer subtypes with MSI status and provided the genes that may be the optimal standards to further classify the various molecular subtypes of colorectal cancer with distinct MSI status.


Asunto(s)
Algoritmos , Biomarcadores de Tumor/genética , Neoplasias Colorrectales/clasificación , Neoplasias Colorrectales/genética , Regulación Neoplásica de la Expresión Génica , Inestabilidad de Microsatélites , Mutación , Neoplasias Colorrectales/patología , Perfilación de la Expresión Génica , Humanos , Aprendizaje Automático , Pronóstico
18.
Mol Genet Genomics ; 293(1): 293-301, 2018 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-28932904

RESUMEN

Epigenetic regulation has long been recognized as a significant factor in various biological processes, such as development, transcriptional regulation, spermatogenesis, and chromosome stabilization. Epigenetic alterations lead to many human diseases, including cancer, depression, autism, and immune system defects. Although efforts have been made to identify epigenetic regulators, it remains a challenge to systematically uncover all the components of the epigenetic regulation in the genome level using experimental approaches. The advances of constructing protein-protein interaction (PPI) networks provide an excellent opportunity to identify novel epigenetic factors computationally in the genome level. In this study, we identified potential epigenetic factors by using a computational method that applied the random walk with restart (RWR) algorithm on a protein-protein interaction (PPI) network using reported epigenetic factors as seed nodes. False positives were identified by their specific roles in the PPI network or by a low-confidence interaction and a weak functional relationship with epigenetic regulators. After filtering out the false positives, 26 candidate epigenetic factors were finally accessed. According to previous studies, 22 of these are thought to be involved in epigenetic regulation, suggesting the robustness of our method. Our study provides a novel computational approach which successfully identified 26 potential epigenetic factors, paving the way on deepening our understandings on the epigenetic mechanism.


Asunto(s)
Epigénesis Genética , Regulación de la Expresión Génica/genética , Mapas de Interacción de Proteínas/genética , Algoritmos , Biología Computacional , Humanos
19.
Mol Genet Genomics ; 293(1): 137-149, 2018 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-28913654

RESUMEN

As non-coding RNAs, circular RNAs (cirRNAs) and long non-coding RNAs (lncRNAs) have attracted an increasing amount of attention. They have been confirmed to participate in many biological processes, including playing roles in transcriptional regulation, regulating protein-coding genes, and binding to RNA-associated proteins. Until now, the differences between these two types of non-coding RNAs have not been fully uncovered. It is still quite difficult to detect cirRNAs from other lncRNAs using simple techniques. In this study, we investigated these two types of non-coding RNAs using several computational methods. The purpose was to extract important factors that could distinguish cirRNAs from other lncRNAs and build an effective classification model to distinguish them. First, we collected cirRNAs, lncRNAs and their representations from a previous study, in which each cirRNA or lncRNA was represented by 188 features derived from its graph representation, sequence and conservation properties. Second, these features were analyzed by the minimum redundancy maximum relevance (mRMR) method. The obtained mRMR feature list, incremental feature selection method and hierarchical extreme learning machine algorithm were employed to build an optimal classification model with sensitivity of 0.703, specificity of 0.850, accuracy of 0.789 and a Matthews correlation coefficient of 0.561. Finally, we analyzed the 16 most important features. Of them, the sequences and structures of the RNA molecule were top ranking, implying they can be potential indicators of differences between cirRNAs and other lncRNAs. Meanwhile, other features of evolutionary conversation, sequence consecution were also important.


Asunto(s)
Biología Computacional/métodos , ARN Largo no Codificante/aislamiento & purificación , ARN/aislamiento & purificación , Algoritmos , Ácidos Nucleicos Libres de Células/genética , Aprendizaje Automático , ARN/genética , ARN Circular , ARN Largo no Codificante/genética
20.
Int J Mol Sci ; 19(11)2018 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-30384456

RESUMEN

Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Regulación de la Expresión Génica/fisiología , ARN Largo no Codificante/biosíntesis , ARN Mensajero/biosíntesis , Animales , Humanos , Especificidad de Órganos/fisiología , ARN Largo no Codificante/genética , ARN Mensajero/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA