RESUMEN
The CRISPR-Cas system is a highly adaptive and RNA-guided immune system found in bacteria and archaea, which has applications as a genome editing tool and is a valuable system for studying the co-evolutionary dynamics of bacteriophage interactions. Here introduces CRISPRimmunity, a new web server designed for Acr prediction, identification of novel class 2 CRISPR-Cas loci, and dissection of key CRISPR-associated molecular events. CRISPRimmunity is built on a suite of CRISPR-oriented databases providing a comprehensive co-evolutionary perspective of the CRISPR-Cas and anti-CRISPR systems. The platform achieved a high prediction accuracy of 0.997 for Acr prediction when tested on a dataset of 99 experimentally validated Acrs and 676 non-Acrs, outperforming other existing prediction tools. Some of the newly identified class 2 CRISPR-Cas loci using CRISPRimmunity have been experimentally validated for cleavage activity in vitro. CRISPRimmunity offers the catalogues of pre-identified CRISPR systems to browse and query, the collected resources or databases to download, a well-designed graphical interface, a detailed tutorial, multi-faceted information, and exportable results in machine-readable formats, making it easy to use and facilitating future experimental design and further data mining. The platform is available at http://www.microbiome-bigdata.com/CRISPRimmunity. Moreover, the source code for batch analysis are published on Github (https://github.com/HIT-ImmunologyLab/CRISPRimmunity).
Asunto(s)
Sistemas CRISPR-Cas , Edición Génica , Edición Génica/métodos , Sistemas CRISPR-Cas/genética , Bacterias/genética , Archaea/genética , ComputadoresRESUMEN
Super-enhancers (SEs) are critical for the transcriptional regulation of gene expression. We developed the super-enhancer archive version 3.0 (SEA v. 3.0, http://sea.edbc.org) to extend SE research. SEA v. 3.0 provides the most comprehensive archive to date, consisting of 164 545 super-enhancers. Of these, 80 549 are newly identified from 266 cell types/tissues/diseases using an optimized computational strategy, and 52 have been experimentally confirmed with manually curated references. We now support super-enhancers in 11 species including 7 new species (zebrafish, chicken, chimp, rhesus, sheep, Xenopus tropicalis and stickleback). To facilitate super-enhancer functional analysis, we added several new regulatory datasets including 3 361 785 typical enhancers, chromatin interactions, SNPs, transcription factor binding sites and SpCas9 target sites. We also updated or developed new criteria query, genome visualization and analysis tools for the archive. This includes a tool based on Shannon Entropy to evaluate SE cell type specificity, a new genome browser that enables the visualization of SE spatial interactions based on Hi-C data, and an enhanced enrichment analysis interface that provides online enrichment analyses of SE related genes. SEA v. 3.0 provides a comprehensive database of all available SE information across multiple species, and will facilitate super-enhancer research, especially as related to development and disease.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Elementos de Facilitación Genéticos , Animales , Sitios de Unión , Cromatina , Humanos , Polimorfismo de Nucleótido Simple , Factores de Transcripción/metabolismoRESUMEN
The human disease methylation database (DiseaseMeth, http://bioinfo.hrbmu.edu.cn/diseasemeth/) is an interactive database that aims to present the most complete collection and annotation of aberrant DNA methylation in human diseases, especially various cancers. Recently, the high-throughput microarray and sequencing technologies have promoted the production of methylome data that contain comprehensive knowledge of human diseases. In this DiseaseMeth update, we have increased the number of samples from 3610 to 32 701, the number of diseases from 72 to 88 and the disease-gene associations from 216 201 to 679 602. DiseaseMeth version 2.0 provides an expanded comprehensive list of disease-gene associations based on manual curation from experimental studies and computational identification from high-throughput methylome data. Besides the data expansion, we also updated the search engine and visualization tools. In particular, we enhanced the differential analysis tools, which now enable online automated identification of DNA methylation abnormalities in human disease in a case-control or disease-disease manner. To facilitate further mining of the disease methylome, three new web tools were developed for cluster analysis, functional annotation and survival analysis. DiseaseMeth version 2.0 should be a useful resource platform for further understanding the molecular mechanisms of human diseases.
Asunto(s)
Biología Computacional/métodos , Metilación de ADN , Bases de Datos Genéticas , Motor de Búsqueda , Epigenómica/métodos , Perfilación de la Expresión Génica/métodos , Estudios de Asociación Genética/métodos , Humanos , Programas Informáticos , Navegador WebRESUMEN
Decades of overconsumption of antimicrobials in the treatment and prevention of bacterial infections have resulted in the increasing emergence of drug-resistant bacteria, which poses a significant challenge to public health, driving the urgent need to find alternatives to conventional antibiotics. Bacteriophages are viruses infecting specific bacterial hosts, often destroying the infected bacterial hosts. Phages attach to and enter their potential hosts using their tail proteins, with the composition of the tail determining the range of potentially infected bacteria. To aid the exploitation of bacteriophages for therapeutic purposes, we developed the PhageTailFinder algorithm to predict tail-related proteins and identify the putative tail module in previously uncharacterized phages. The PhageTailFinder relies on a two-state hidden Markov model (HMM) to predict the probability of a given protein being tail-related. The process takes into account the natural modularity of phage tail-related proteins, rather than simply considering amino acid properties or secondary structures for each protein in isolation. The PhageTailFinder exhibited robust predictive power for phage tail proteins in novel phages due to this sequence-independent operation. The performance of the prediction model was evaluated in 13 extensively studied phages and a sample of 992 complete phages from the NCBI database. The algorithm achieved a high true-positive prediction rate (>80%) in over half (571) of the studied phages, and the ROC value was 0.877 using general models and 0.968 using corresponding morphologic models. It is notable that the median ROC value of 992 complete phages is more than 0.75 even for novel phages, indicating the high accuracy and specificity of the PhageTailFinder. When applied to a dataset containing 189,680 viral genomes derived from 11,810 bulk metagenomic human stool samples, the ROC value was 0.895. In addition, tail protein clusters could be identified for further studies by density-based spatial clustering of applications with the noise algorithm (DBSCAN). The developed PhageTailFinder tool can be accessed either as a web server (http://www.microbiome-bigdata.com/PHISDetector/index/tools/PhageTailFinder) or as a stand-alone program on a standard desktop computer (https://github.com/HIT-ImmunologyLab/PhageTailFinder).
RESUMEN
As an intracellular form of a bacteriophage in the bacterial host genome, a prophage usually integrates into bacterial DNA with high specificity and contributes to horizontal gene transfer (HGT). With the exponentially increasing number of microbial sequences uncovered in genomic or metagenomics studies, there is a massive demand for a tool that is capable of fast and accurate identification of prophages. Here, we introduce DBSCAN-SWA, a command line software tool developed to predict prophage regions in bacterial genomes. DBSCAN-SWA runs faster than any previous tools. Importantly, it has great detection power based on analysis using 184 manually curated prophages, with a recall of 85% compared with Phage_Finder (63%), VirSorter (74%), and PHASTER (82%) for (Multi-) FASTA sequences. Moreover, DBSCAN-SWA outperforms the existing standalone prophage prediction tools for high-throughput sequencing data based on the analysis of 19,989 contigs of 400 bacterial genomes collected from Human Microbiome Project (HMP) project. DBSCAN-SWA also provides user-friendly result visualizations including a circular prophage viewer and interactive DataTables. DBSCAN-SWA is implemented in Python3 and is available under an open source GPLv2 license from https://github.com/HIT-ImmunologyLab/DBSCAN-SWA/.
RESUMEN
Rationale: In breast cancer, high intratumor DNA methylation heterogeneity can lead to drug-resistant, metastasis and poor prognosis of tumors, which increases the complexity of cancer diagnosis and treatment. However, most studies are limited to average DNA methylation level of individual CpGs and ignore heterogeneous DNA methylation patterns of cell subpopulations within the tumor. Thus, quantifying the variability in DNA methylation pattern in sequencing reads is valuable for understanding intratumor heterogeneity. Methods: We performed Reduced Representation Bisulfite Sequencing and RNA sequencing for tumor core and tumor periphery regions within one breast tumor. By developing a method named "epialleJS" based on Jensen-Shannon divergence, we detected the differential epialleles between tumor core and tumor periphery (CPDEs). We then explored the correlation between intratumor methylation heterogeneity and hypoxic microenvironment in TCGA breast cancer cohort. Results: More than 70% of CPDEs had higher epipolymorphism in tumor core than tumor periphery, and these CPDEs had lower methylation in tumor core. The CPDEs with lower methylation in tumor core may associate with hypoxic tumor microenvironment. Moreover, we identified a signature of five hypoxia-related DNA methylation markers which can predict the prognosis of breast cancer patients, including a CpG site cg15190451 in gene SLC16A5. Furthermore, immunohistochemical analysis confirmed that the expression of SLC16A5 was associated with clinicopathological characteristics and survival of breast cancer patients. Conclusions: The analysis of intratumor DNA methylation heterogeneity based on epialleles reveals that disordered methylation patterns in tumor core are associated with hypoxic microenvironment, which provides a framework for understanding biological heterogeneous behavior and guidance for developing effective treatment schemes for breast cancer patients.
Asunto(s)
Neoplasias de la Mama/genética , Hipoxia/genética , Microambiente Tumoral/genética , Islas de CpG/genética , Metilación de ADN/genética , Femenino , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Transportadores de Ácidos Monocarboxílicos/genética , PronósticoRESUMEN
The global Coronavirus disease 2019 (COVID-19) pandemic caused by SARS-CoV-2 has affected more than eight million people. There is an urgent need to investigate how the adaptive immunity is established in COVID-19 patients. In this study, we profiled adaptive immune cells of PBMCs from recovered COVID-19 patients with varying disease severity using single-cell RNA and TCR/BCR V(D)J sequencing. The sequencing data revealed SARS-CoV-2-specific shuffling of adaptive immune repertories and COVID-19-induced remodeling of peripheral lymphocytes. Characterization of variations in the peripheral T and B cells from the COVID-19 patients revealed a positive correlation of humoral immune response and T-cell immune memory with disease severity. Sequencing and functional data revealed SARS-CoV-2-specific T-cell immune memory in the convalescent COVID-19 patients. Furthermore, we also identified novel antigens that are responsive in the convalescent patients. Altogether, our study reveals adaptive immune repertories underlying pathogenesis and recovery in severe versus mild COVID-19 patients, providing valuable information for potential vaccine and therapeutic development against SARS-CoV-2 infection.
Asunto(s)
Linfocitos B/inmunología , Betacoronavirus/patogenicidad , Infecciones por Coronavirus/inmunología , Inmunidad Celular , Inmunidad Humoral , Neumonía Viral/inmunología , Linfocitos T/inmunología , Antígenos Virales/genética , Antígenos Virales/inmunología , Linfocitos B/clasificación , Linfocitos B/virología , Betacoronavirus/inmunología , COVID-19 , Estudios de Casos y Controles , China , Convalecencia , Infecciones por Coronavirus/genética , Infecciones por Coronavirus/patología , Infecciones por Coronavirus/virología , Progresión de la Enfermedad , Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Interacciones Huésped-Patógeno/inmunología , Humanos , Memoria Inmunológica , Pandemias , Neumonía Viral/genética , Neumonía Viral/patología , Neumonía Viral/virología , Receptores de Antígenos de Linfocitos B/clasificación , Receptores de Antígenos de Linfocitos B/genética , Receptores de Antígenos de Linfocitos B/inmunología , Receptores de Antígenos de Linfocitos T/clasificación , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T/inmunología , SARS-CoV-2 , Índice de Severidad de la Enfermedad , Análisis de la Célula Individual , Linfocitos T/clasificación , Linfocitos T/virologíaRESUMEN
Several studies have found that DNA methylation is associated with transcriptional regulation and affect sponge regulation of non-coding RNAs in cancer. The integration of circRNA, miRNA, DNA methylation and gene expression data to identify sponge circRNAs is important for revealing the role of DNA methylation-mediated regulation of sponge circRNAs in cancer progression. We established a DNA methylation-mediated circRNA crosstalk network by integrating gene expression, DNA methylation and non-coding RNA data of breast cancer in TCGA. Four modules (26 candidate circRNAs) were mined. Next, 10 DNA methylation-mediated sponge circRNAs (sp_circRNAs) and five sponge driver genes (sp_driver genes) in breast cancer were identified in the CMD network using a computational process. Among the identified genes, ERBB2 was associated with six sponge circRNAs, which illustrates its better sponge regulatory function. Survival analysis showed that DNA methylations of 10 sponge circRNA host genes are potential prognostic biomarkers in the TCGA dataset (p = 0.0239) and GSE78754 dataset (p = 0.0377). In addition, the DNA methylation of two sponge circRNA host genes showed a significant negative correlation with their driver gene expressions. We developed a strategy to predict sponge circRNAs by DNA methylation mediated with playing the role of regulating breast cancer sponge driver genes.
RESUMEN
Tumour heterogeneity is an obstacle to effective breast cancer diagnosis and therapy. DNA methylation is an important regulator of gene expression, thus characterizing tumour heterogeneity by epigenetic features can be clinically informative. In this study, we explored specific prognosis-subtypes based on DNA methylation status using 669 breast cancers from the TCGA database. Nine subgroups were distinguished by consensus clustering using 3869 CpGs that significantly influenced survival. The specific DNA methylation patterns were reflected by different races, ages, tumour stages, receptor status, histological types, metastasis status and prognosis. Compared with the PAM50 subtypes, which use gene expression clustering, DNA methylation subtypes were more elaborate and classified the Basal-like subtype into two different prognosis-subgroups. Additionally, 1252 CpGs (corresponding to 888 genes) were identified as specific hyper/hypomethylation sites for each specific subgroup. Finally, a prognosis model based on Bayesian network classification was constructed and used to classify the test set into DNA methylation subgroups, which corresponded to the classification results of the train set. These specific classifications by DNA methylation can explain the heterogeneity of previous molecular subgroups in breast cancer and will help in the development of personalized treatments for the new specific subtypes.