Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37279467

RESUMO

Deoxyribonucleic acid (DNA) methylation (DNAm) is an important epigenetic mechanism that plays a role in chromatin structure and transcriptional regulation. Elucidating the relationship between DNAm and gene expression is of great importance for understanding its role in transcriptional regulation. The conventional approach is to construct machine-learning-based methods to predict gene expression based on mean methylation signals in promoter regions. However, this type of strategy only explains about 25% of gene expression variation, and hence is inadequate in elucidating the relationship between DNAm and transcriptional activity. In addition, using mean methylation as input features neglects the heterogeneity of cell populations that can be reflected by DNAm haplotypes. We here developed TRAmaHap, a novel deep-learning framework that predicts gene expression by utilizing the characteristics of DNAm haplotypes in proximal promoters and distal enhancers. Using benchmark data of human and mouse normal tissues, TRAmHap shows much higher accuracy than existing machine-learning based methods, by explaining 60~80% of gene expression variation across tissue types and disease conditions. Our model demonstrated that gene expression can be accurately predicted by DNAm patterns in promoters and long-range enhancers as far as 25 kb away from transcription start site, especially in the presence of intra-gene chromatin interactions.


Assuntos
Metilação de DNA , Epigênese Genética , Humanos , Animais , Camundongos , Haplótipos , Cromatina/genética
2.
Bioinformatics ; 37(24): 4892-4894, 2021 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-34179956

RESUMO

SUMMARY: Bisulfite sequencing (BS-seq) is currently the gold standard for measuring genome-wide DNA methylation profiles at single-nucleotide resolution. Most analyses focus on mean CpG methylation and ignore methylation states on the same DNA fragments [DNA methylation haplotypes (mHaps)]. Here, we propose mHap, a simple DNA mHap format for storing DNA BS-seq data. This format reduces the size of a BAM file by 40- to 140-fold while retaining complete read-level CpG methylation information. It is also compatible with the Tabix tool for fast and random access. We implemented a command-line tool, mHapTools, for converting BAM/SAM files from existing platforms to mHap files as well as post-processing DNA methylation data in mHap format. With this tool, we processed all publicly available human reduced representation bisulfite sequencing data and provided these data as a comprehensive mHap database. AVAILABILITY AND IMPLEMENTATION: https://jiantaoshi.github.io/mHap/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Software , Humanos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , DNA
3.
Biochim Biophys Acta Mol Basis Dis ; 1864(6 Pt B): 2376-2383, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29197659

RESUMO

The human papillomavirus (HPV), a common virus that infects the reproductive tract, may lead to malignant changes within the infection area in certain cases and is directly associated with such cancers as cervical cancer, anal cancer, and vaginal cancer. Identification of novel HPV infection related genes can lead to a better understanding of the specific signal pathways and cellular processes related to HPV infection, providing information for the development of more efficient therapies. In this study, several novel HPV infection related genes were predicted by a computation method based on the known genes involved in HPV infection from HPVbase. This method applied the algorithm of random walk with restart (RWR) to a protein-protein interaction (PPI) network. The candidate genes were further filtered by the permutation and association tests. These steps eliminated genes occupying special positions in the PPI network and selected key genes with strong associations to known HPV infection related genes based on the interaction confidence and functional similarity obtained from published databases, such as STRING, gene ontology (GO) terms and KEGG pathways. Our study identified 104 novel HPV infection related genes, a number of which were confirmed to relate to the infection processes and complications of HPV infection, as reported in the literature. These results demonstrate the reliability of our method in identifying HPV infection related genes. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.


Assuntos
Algoritmos , Mineração de Dados/métodos , Bases de Dados Genéticas , Redes Reguladoras de Genes , Papillomaviridae , Infecções por Papillomavirus , Humanos , Papillomaviridae/genética , Papillomaviridae/metabolismo , Infecções por Papillomavirus/genética , Infecções por Papillomavirus/metabolismo
4.
BMC Bioinformatics ; 17(Suppl 17): 535, 2016 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-28155637

RESUMO

BACKGROUND: A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. RESULTS: Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. CONCLUSIONS: This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Redes Reguladoras de Genes , Software , Algoritmos , Animais , Genoma , Humanos , Transcriptoma
5.
Life (Basel) ; 13(4)2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37109540

RESUMO

Corona Virus Disease 2019 (COVID-19) not only causes respiratory system damage, but also imposes strain on the cardiovascular system. Vascular endothelial cells and cardiomyocytes play an important role in cardiac function. The aberrant expression of genes in vascular endothelial cells and cardiomyocytes can lead to cardiovascular diseases. In this study, we sought to explain the influence of respiratory syndrome coronavirus 2 (SARS-CoV-2) infection on the gene expression levels of vascular endothelial cells and cardiomyocytes. We designed an advanced machine learning-based workflow to analyze the gene expression profile data of vascular endothelial cells and cardiomyocytes from patients with COVID-19 and healthy controls. An incremental feature selection method with a decision tree was used in building efficient classifiers and summarizing quantitative classification genes and rules. Some key genes, such as MALAT1, MT-CO1, and CD36, were extracted, which exert important effects on cardiac function, from the gene expression matrix of 104,182 cardiomyocytes, including 12,007 cells from patients with COVID-19 and 92,175 cells from healthy controls, and 22,438 vascular endothelial cells, including 10,812 cells from patients with COVID-19 and 11,626 cells from healthy controls. The findings reported in this study may provide insights into the effect of COVID-19 on cardiac cells and further explain the pathogenesis of COVID-19, and they may facilitate the identification of potential therapeutic targets.

6.
Front Genet ; 14: 1145647, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36936430

RESUMO

Chromatin accessibility is a generic property of the eukaryotic genome, which refers to the degree of physical compaction of chromatin. Recent studies have shown that chromatin accessibility is cell type dependent, indicating chromatin heterogeneity across cell lines and tissues. The identification of markers used to distinguish cell types at the chromosome level is important to understand cell function and classify cell types. In the present study, we investigated transcriptionally active chromosome segments identified by sci-ATAC-seq at single-cell resolution, including 69,015 cells belonging to 77 different cell types. Each cell was represented by existence status on 20,783 genes that were obtained from 436,206 active chromosome segments. The gene features were deeply analyzed by Boruta, resulting in 3897 genes, which were ranked in a list by Monte Carlo feature selection. Such list was further analyzed by incremental feature selection (IFS) method, yielding essential genes, classification rules and an efficient random forest (RF) classifier. To improve the performance of the optimal RF classifier, its features were further processed by autoencoder, light gradient boosting machine and IFS method. The final RF classifier with MCC of 0.838 was constructed. Some marker genes such as H2-Dmb2, which are specifically expressed in antigen-presenting cells (e.g., dendritic cells or macrophages), and Tenm2, which are specifically expressed in T cells, were identified in this study. Our analysis revealed numerous potential epigenetic modification patterns that are unique to particular cell types, thereby advancing knowledge of the critical functions of chromatin accessibility in cell processes.

7.
Clin Transl Med ; 12(3): e757, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35297204

RESUMO

BACKGROUND: Multiple myeloma (MM) is a clinically and biologically heterogeneous plasma-cell malignancy. Despite extensive research, disease heterogeneity and relapse remain a big challenge in MM therapeutics. We tried to dissect this disease and identify novel biomarkers for patient stratification and treatment outcome prediction by applying single-cell technology. METHODS: We performed single-cell RNA sequencing (scRNA-seq) and variable-diversity-joining regions-targeted sequencing (scVDJ-seq) concurrently on bone marrow samples from a cohort of 18 patients with newly diagnosed MM (NDMM; n = 12) or refractory/relapsed MM (RRMM; n = 6). We analysed the malignant clonotypes using scVDJ-seq data and conducted data integration and cell-type annotation through the CCA algorithm based on gene expression profiling. Furthermore, we identified disease status-specific genes and modules by comparison of NDMM and RRMM datasets and explored the findings in a larger MM cohort from the MMRF CoMMpass study. RESULTS: We found that all the myeloma cells in either diagnosed or relapsed samples were dominated by a major clone, with a few subclones in several samples (n = 5). Next, we investigated the universal transcriptional features of myeloma cells and identified eight meta-programs correlated with this disease, especially meta-programs 1 and 8 (M1 and M8), which were the most significant and related to cell cycle and stress response, respectively. Furthermore, we classified the malignant plasma cells into eight clusters and found that the cell numbers in clusters 2/6/7 were exclusively higher in relapsed samples. Besides, we identified several attractive candidates for biomarkers (e.g. SMAD1 and STMN1) associated with disease progression and relapse in our dataset and related to overall survival in the CoMMpass dataset. CONCLUSIONS: Our data provide insights into the heterogeneity of MM as well as highlight the relevance of intra-tumour heterogeneity and discover novel biomarkers that might be a potent therapy.


Assuntos
Mieloma Múltiplo , Humanos , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/genética , Recidiva Local de Neoplasia/genética , Prognóstico , RNA-Seq , Sequenciamento do Exoma
8.
Front Oncol ; 11: 611580, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33816243

RESUMO

BACKGROUND: Subcutaneous panniculitis-like T-cell lymphoma (SPTCL) is a malignant primary T-cell lymphoma that is challenging to distinguish from autoimmune disorders and reactive panniculitides. Delay in diagnosis and a high misdiagnosis rate affect the prognosis and survival of patients. The difficulty of diagnosis is mainly due to an incomplete understanding of disease pathogenesis. METHODS: We performed single-cell RNA sequencing of matched subcutaneous lesion tissue, peripheral blood, and bone marrow from a patient with SPTCL, as well as peripheral blood, bone marrow, lymph node, and lung tissue samples from healthy donors as normal controls. We conducted cell clustering, gene expression program identification, gene differential expression analysis, and cell-cell interaction analysis to investigate the ecosystem of SPTCL. RESULTS: Based on gene expression profiles in a single-cell resolution, we identified and characterized the malignant cells and immune subsets from a patient with SPTCL. Our analysis showed that SPTCL malignant cells expressed a distinct gene signature, including chemokines families, cytotoxic proteins, T cell immune checkpoint molecules, and the immunoglobulin family. By comparing with normal T cells, we identified potential novel markers for SPTCL (e.g., CYTOR, CXCL13, VCAM1, and TIMD4) specifically differentially expressed in the malignant cells. We also found that macrophages and fibroblasts dominated the cell-cell communication landscape with the SPTCL malignant cells. CONCLUSIONS: This work offers insight into the heterogeneity of subcutaneous panniculitis-like T-cell lymphoma, providing a better understanding of the transcription characteristics and immune microenvironment of this rare tumor.

9.
Cancer Gene Ther ; 27(1-2): 56-69, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31138902

RESUMO

Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.


Assuntos
Biomarcadores Tumorais/genética , Leucemia Mieloide Aguda/genética , Modelos Genéticos , Células-Tronco Neoplásicas/patologia , Máquina de Vetores de Suporte , Biomarcadores Tumorais/análise , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Resistencia a Medicamentos Antineoplásicos/genética , Estudos de Viabilidade , Perfilação da Expressão Gênica/métodos , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/patologia , Método de Monte Carlo , Células-Tronco Neoplásicas/efeitos dos fármacos
10.
Oncotarget ; 8(50): 87494-87511, 2017 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-29152097

RESUMO

Detection and diagnosis of cancer are especially important for early prevention and effective treatments. Traditional methods of cancer detection are usually time-consuming and expensive. Liquid biopsy, a newly proposed noninvasive detection approach, can promote the accuracy and decrease the cost of detection according to a personalized expression profile. However, few studies have been performed to analyze this type of data, which can promote more effective methods for detection of different cancer subtypes. In this study, we applied some reliable machine learning algorithms to analyze data retrieved from patients who had one of six cancer subtypes (breast cancer, colorectal cancer, glioblastoma, hepatobiliary cancer, lung cancer and pancreatic cancer) as well as healthy persons. Quantitative gene expression profiles were used to encode each sample. Then, they were analyzed by the maximum relevance minimum redundancy method. Two feature lists were obtained in which genes were ranked rigorously. The incremental feature selection method was applied to the mRMR feature list to extract the optimal feature subset, which can be used in the support vector machine algorithm to determine the best performance for the detection of cancer subtypes and healthy controls. The ten-fold cross-validation for the constructed optimal classification model yielded an overall accuracy of 0.751. On the other hand, we extracted the top eighteen features (genes), including TTN, RHOH, RPS20, TRBC2, in another feature list, the MaxRel feature list, and performed a detailed analysis of them. The results indicated that these genes could be important biomarkers for discriminating different cancer subtypes and healthy controls.

11.
Biomed Res Int ; 2014: 438341, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25184139

RESUMO

Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews' correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews' correlation coefficients of 0.2239.


Assuntos
Algoritmos , Biologia Computacional , Processamento de Proteína Pós-Traducional , Proteínas/química , Sequência de Aminoácidos , Aminoácidos/química , Aminoácidos/genética , Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/genética , Proteínas/metabolismo , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA