Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Life (Basel) ; 13(4)2023 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-37109540

RESUMEN

Corona Virus Disease 2019 (COVID-19) not only causes respiratory system damage, but also imposes strain on the cardiovascular system. Vascular endothelial cells and cardiomyocytes play an important role in cardiac function. The aberrant expression of genes in vascular endothelial cells and cardiomyocytes can lead to cardiovascular diseases. In this study, we sought to explain the influence of respiratory syndrome coronavirus 2 (SARS-CoV-2) infection on the gene expression levels of vascular endothelial cells and cardiomyocytes. We designed an advanced machine learning-based workflow to analyze the gene expression profile data of vascular endothelial cells and cardiomyocytes from patients with COVID-19 and healthy controls. An incremental feature selection method with a decision tree was used in building efficient classifiers and summarizing quantitative classification genes and rules. Some key genes, such as MALAT1, MT-CO1, and CD36, were extracted, which exert important effects on cardiac function, from the gene expression matrix of 104,182 cardiomyocytes, including 12,007 cells from patients with COVID-19 and 92,175 cells from healthy controls, and 22,438 vascular endothelial cells, including 10,812 cells from patients with COVID-19 and 11,626 cells from healthy controls. The findings reported in this study may provide insights into the effect of COVID-19 on cardiac cells and further explain the pathogenesis of COVID-19, and they may facilitate the identification of potential therapeutic targets.

2.
Front Genet ; 14: 1145647, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36936430

RESUMEN

Chromatin accessibility is a generic property of the eukaryotic genome, which refers to the degree of physical compaction of chromatin. Recent studies have shown that chromatin accessibility is cell type dependent, indicating chromatin heterogeneity across cell lines and tissues. The identification of markers used to distinguish cell types at the chromosome level is important to understand cell function and classify cell types. In the present study, we investigated transcriptionally active chromosome segments identified by sci-ATAC-seq at single-cell resolution, including 69,015 cells belonging to 77 different cell types. Each cell was represented by existence status on 20,783 genes that were obtained from 436,206 active chromosome segments. The gene features were deeply analyzed by Boruta, resulting in 3897 genes, which were ranked in a list by Monte Carlo feature selection. Such list was further analyzed by incremental feature selection (IFS) method, yielding essential genes, classification rules and an efficient random forest (RF) classifier. To improve the performance of the optimal RF classifier, its features were further processed by autoencoder, light gradient boosting machine and IFS method. The final RF classifier with MCC of 0.838 was constructed. Some marker genes such as H2-Dmb2, which are specifically expressed in antigen-presenting cells (e.g., dendritic cells or macrophages), and Tenm2, which are specifically expressed in T cells, were identified in this study. Our analysis revealed numerous potential epigenetic modification patterns that are unique to particular cell types, thereby advancing knowledge of the critical functions of chromatin accessibility in cell processes.

3.
Front Genet ; 13: 1053772, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36437952

RESUMEN

The global outbreak of the COVID-19 epidemic has become a major public health problem. COVID-19 virus infection triggers a complex immune response. CD8+ T cells, in particular, play an essential role in controlling the severity of the disease. However, the mechanism of the regulatory role of CD8+ T cells on COVID-19 remains poorly investigated. In this study, single-cell gene expression profiles from three CD8+ T cell subtypes (effector, memory, and naive T cells) were downloaded. Each cell subtype included three disease states, namely, acute COVID-19, convalescent COVID-19, and unexposed individuals. The profiles on each cell subtype were individually analyzed in the same way. Irrelevant features in the profiles were first excluded by the Boruta method. The remaining features for each CD8+ T cells subtype were further analyzed by Max-Relevance and Min-Redundancy, Monte Carlo feature selection, and light gradient boosting machine methods to obtain three feature lists. These lists were then brought into the incremental feature selection method to determine the optimal features for each cell subtype. Their corresponding genes may be latent biomarkers to determine COVID-19 severity. Genes, such as ZFP36, DUSP1, TCR, and IL7R, can be confirmed to play an immune regulatory role in COVID-19 infection and recovery. The results of functional enrichment analysis revealed that these important genes may be associated with immune functions, such as response to cAMP, response to virus, T cell receptor complex, T cell activation, and T cell differentiation. This study further set up different gene expression pattens, represented by classification rules, on three states of COVID-19 and constructed several efficient classifiers to distinguish COVID-19 severity. The findings of this study provided new insights into the biological processes of CD8+ T cells in regulating the immune response.

4.
Front Oncol ; 12: 979336, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36248961

RESUMEN

With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers.

5.
Sci China Life Sci ; 56(2): 143-55, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23393030

RESUMEN

Transcriptome reconstruction is an important application of RNA-Seq, providing critical information for further analysis of transcriptome. Although RNA-Seq offers the potential to identify the whole picture of transcriptome, it still presents special challenges. To handle these difficulties and reconstruct transcriptome as completely as possible, current computational approaches mainly employ two strategies: de novo assembly and genome-guided assembly. In order to find the similarities and differences between them, we firstly chose five representative assemblers belonging to the two classes respectively, and then investigated and compared their algorithm features in theory and real performances in practice. We found that all the methods can be reduced to graph reduction problems, yet they have different conceptual and practical implementations, thus each assembly method has its specific advantages and disadvantages, performing worse than others in certain aspects while outperforming others in anther aspects at the same time. Finally we merged assemblies of the five assemblers and obtained a much better assembly. Additionally we evaluated an assembler using genome-guided de novo assembly approach, and achieved good performance. Based on these results, we suggest that to obtain a comprehensive set of recovered transcripts, it is better to use a combination of de novo assembly and genome-guided assembly.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Encéfalo/metabolismo , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Células Madre Embrionarias/metabolismo , Perfilación de la Expresión Génica/estadística & datos numéricos , Genoma Fúngico , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Ratones , Schizosaccharomyces/genética , Análisis de Secuencia de ARN/estadística & datos numéricos , Programas Informáticos
6.
Clin Cancer Res ; 17(7): 1722-30, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21350001

RESUMEN

PURPOSE: To investigate expression, regulation, potential role and targets of miR-195 and miR-497 in breast cancer. EXPERIMENTAL DESIGN: The expression patterns of miR-195 and miR-497 were initially examined in breast cancer tissues and cell lines by Northern blotting and quantitative real-time PCR. Combined bisulfite restriction analysis and bisulfite sequencing were carried out to study the DNA methylation status of miR-195 and miR-497 genes. Breast cancer cells stably expressing miR-195 and miR-497 were established to study their role and targets. Finally, normal, fibroadenoma and breast cancer tissues were employed to analyze the correlation between miR-195/497 levels and malignant stages of breast tumor tissues. RESULTS: MiR-195 and miR-497 were significantly downregulated in breast cancer. The methylation state of CpG islands upstream of the miR-195/497 gene was found to be responsible for the downregulation of both miRNAs. Forced expression of miR-195 or miR-497 suppressed breast cancer cell proliferation and invasion. Raf-1 and Ccnd1 were identified as novel direct targets of miR-195 and miR-497. miR-195/497 expression levels in clinical specimens were found to be correlated inversely with malignancy of breast cancer. CONCLUSIONS: Our data imply that both miR-195 and miR-497 play important inhibitory roles in breast cancer malignancy and may be the potential therapeutic and diagnostic targets.


Asunto(s)
Neoplasias de la Mama/genética , MicroARNs/genética , Regiones no Traducidas 3' , Secuencia de Bases , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Línea Celular Tumoral , Movimiento Celular , Proliferación Celular , Islas de CpG , Ciclina D1/metabolismo , Metilación de ADN , Regulación hacia Abajo , Femenino , Silenciador del Gen , Genes Reporteros , Humanos , Luciferasas de Renilla/biosíntesis , Luciferasas de Renilla/genética , MicroARNs/metabolismo , Invasividad Neoplásica , Proteínas Proto-Oncogénicas c-raf/metabolismo , Mapeo Restrictivo
7.
Mol Divers ; 14(4): 719-29, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20041294

RESUMEN

We used a machine learning method, the nearest neighbor algorithm (NNA), to learn the relationship between miRNAs and their target proteins, generating a predictor which can then judge whether a new miRNA-target pair is true or not. We acquired 198 positive (true) miRNA-target pairs from Tarbase and the literature, and generated 4,888 negative (false) pairs through random combination. A 0/1 system and the frequencies of single nucleotides and di-nucleotides were used to encode miRNAs into vectors while various physicochemical parameters were used to encode the targets. The NNA was then applied, learning from these data to produce a predictor. We implemented minimum redundancy maximum relevance (mRMR) and properties forward selection (PFS) to reduce the redundancy of our encoding system, obtaining 91 most efficient properties. Finally, via the Jackknife cross-validation test, we got a positive accuracy of 69.2% and an overall accuracy of 96.0% with all the 253 properties. Besides, we got a positive accuracy of 83.8% and an overall accuracy of 97.2% with the 91 most efficient properties. A web-server for predictions is also made available at http://app3.biosino.org:8080/miRTP/index.jsp.


Asunto(s)
Algoritmos , Secuencia de Bases/fisiología , Biología Computacional/métodos , MicroARNs/metabolismo , Homología de Secuencia , Secuencia de Aminoácidos , Inteligencia Artificial , Sitios de Unión/genética , Predicción , MicroARNs/fisiología , Anotación de Secuencia Molecular/métodos , Datos de Secuencia Molecular , Interferencia de ARN/fisiología
8.
J Proteome Res ; 8(11): 5212-8, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19764809

RESUMEN

Protein complexes, integrating multiple gene products, perform all sorts of fundamental biological functions in cells. Much effort has been put into identifying protein complexes using computational approaches. A vast majority attempt to research densely connected regions in protein-protein interaction (PPI) network/graph. In this research, we try an alterative approach to analyze protein complexes using hybrid features and present a method to determine whether multiple (more than two) proteins from yeast can form a protein complex. The data set consists of 493 positive protein complexes and 9878 negative protein complexes. Every complex is represented by graph features, where proteins in the complex form a graph (web) of interactions, and features derived from biological properties including protein length, biochemical properties and physicochemical properties. These features are filtered and optimized by Minimum Redundancy Maximum Relevance method, Incremental Feature Selection and Forward Feature Selection, established through a prediction/identification model called Nearest Neighbor Algorithm. Jackknife cross-validation test is employed to evaluate the identification accuracy. As a result, the highest accuracy for the identification of the real protein complexes using filtered features is 69.17%, and feature analysis shows that, among the adopted features, graph features play the main roles in the determination of protein complexes.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Complejos Multiproteicos/metabolismo , Mapeo de Interacción de Proteínas/métodos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...