Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 159
Filtrar
Más filtros

País/Región como asunto
Intervalo de año de publicación
1.
Nat Methods ; 21(4): 574-583, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38459383

RESUMEN

Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.


Asunto(s)
Nanoporos , Diploidia , Haploidia , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Telómero/genética , Análisis de Secuencia de ADN/métodos
2.
Cell ; 148(1-2): 84-98, 2012 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-22265404

RESUMEN

Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.


Asunto(s)
Cromatina/metabolismo , Regulación de la Expresión Génica , Regiones Promotoras Genéticas , ARN Polimerasa II/metabolismo , Transcripción Genética , Línea Celular Tumoral , Inmunoprecipitación de Cromatina , Elementos de Facilitación Genéticos , Estudio de Asociación del Genoma Completo , Humanos
3.
PLoS Biol ; 20(10): e3001834, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36223339

RESUMEN

Neural stem cells (NSCs) divide asymmetrically to balance their self-renewal and differentiation, an imbalance in which can lead to NSC overgrowth and tumor formation. The functions of Parafibromin, a conserved tumor suppressor, in the nervous system are not established. Here, we demonstrate that Drosophila Parafibromin/Hyrax (Hyx) inhibits ectopic NSC formation by governing cell polarity. Hyx is essential for the asymmetric distribution and/or maintenance of polarity proteins. hyx depletion results in the symmetric division of NSCs, leading to the formation of supernumerary NSCs in the larval brain. Importantly, we show that human Parafibromin rescues the ectopic NSC phenotype in Drosophila hyx mutant brains. We have also discovered that Hyx is required for the proper formation of interphase microtubule-organizing center and mitotic spindles in NSCs. Moreover, Hyx is required for the proper localization of 2 key centrosomal proteins, Polo and AurA, and the microtubule-binding proteins Msps and D-TACC in dividing NSCs. Furthermore, Hyx directly regulates the polo and aurA expression in vitro. Finally, overexpression of polo and aurA could significantly suppress ectopic NSC formation and NSC polarity defects caused by hyx depletion. Our data support a model in which Hyx promotes the expression of polo and aurA in NSCs and, in turn, regulates cell polarity and centrosome/microtubule assembly. This new paradigm may be relevant to future studies on Parafibromin/HRPT2-associated cancers.


Asunto(s)
Proteínas de Drosophila , Células-Madre Neurales , Animales , Polaridad Celular , Centrosoma/metabolismo , Drosophila/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Humanos , Células-Madre Neurales/metabolismo , Factores de Transcripción/metabolismo
4.
Nucleic Acids Res ; 51(17): 9001-9018, 2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37572350

RESUMEN

Photoperiods integrate with the circadian clock to coordinate gene expression rhythms and thus ensure plant fitness to the environment. Genome-wide characterization and comparison of rhythmic genes under different light conditions revealed delayed phase under constant darkness (DD) and reduced amplitude under constant light (LL) in rice. Interestingly, ChIP-seq and RNA-seq profiling of rhythmic genes exhibit synchronous circadian oscillation in H3K9ac modifications at their loci and long non-coding RNAs (lncRNAs) expression at proximal loci. To investigate how gene expression rhythm is regulated in rice, we profiled the open chromatin regions and transcription factor (TF) footprints by time-series ATAC-seq. Although open chromatin regions did not show circadian change, a significant number of TFs were identified to rhythmically associate with chromatin and drive gene expression in a time-dependent manner. Further transcriptional regulatory networks mapping uncovered significant correlation between core clock genes and transcription factors involved in light/temperature signaling. In situ Hi-C of ZT8-specific expressed genes displayed highly connected chromatin association at the same time, whereas this ZT8 chromatin connection network dissociates at ZT20, suggesting the circadian control of gene expression by dynamic spatial chromatin conformation. These findings together implicate the existence of a synchronization mechanism between circadian H3K9ac modifications, chromatin association of TF and gene expression, and provides insights into circadian dynamics of spatial chromatin conformation that associate with gene expression rhythms.


Asunto(s)
Ritmo Circadiano , Oryza , Cromatina/genética , Relojes Circadianos/genética , Ritmo Circadiano/genética , Epigenoma , Perfilación de la Expresión Génica , Oryza/genética , Oryza/fisiología , Factores de Transcripción/genética
5.
Nucleic Acids Res ; 50(D1): D60-D71, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34664666

RESUMEN

DNA methylation is known to be the most stable epigenetic modification and has been extensively studied in relation to cell differentiation, development, X chromosome inactivation and disease. Allele-specific DNA methylation (ASM) is a well-established mechanism for genomic imprinting and regulates imprinted gene expression. Previous studies have confirmed that certain special regions with ASM are susceptible and closely related to human carcinogenesis and plant development. In addition, recent studies have proven ASM to be an effective tumour marker. However, research on the functions of ASM in diseases and development is still extremely scarce. Here, we collected 4400 BS-Seq datasets and 1598 corresponding RNA-Seq datasets from 47 species, including human and mouse, to establish a comprehensive ASM database. We obtained the data on DNA methylation level, ASM and allele-specific expressed genes (ASEGs) and further analysed the ASM/ASEG distribution patterns of these species. In-depth ASM distribution analysis and differential methylation analysis conducted in nine cancer types showed results consistent with the reported changes in ASM in key tumour genes and revealed several potential ASM tumour-related genes. Finally, integrating these results, we constructed the first well-resourced and comprehensive ASM database for 47 species (ASMdb, www.dna-asmdb.com).


Asunto(s)
Metilación de ADN/genética , Bases de Datos Genéticas , Epigénesis Genética/genética , Impresión Genómica/genética , Alelos , Animales , Islas de CpG/genética , Humanos , Ratones , Polimorfismo de Nucleótido Simple/genética , RNA-Seq , Inactivación del Cromosoma X/genética
6.
EMBO Rep ; 22(4): e50994, 2021 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-33565211

RESUMEN

The ability of neural stem cells (NSCs) to switch between quiescence and proliferation is crucial for brain development and homeostasis. Increasing evidence suggests that variants of histone lysine methyltransferases including KMT5A are associated with neurodevelopmental disorders. However, the function of KMT5A/Pr-set7/SETD8 in the central nervous system is not well established. Here, we show that Drosophila Pr-Set7 is a novel regulator of NSC reactivation. Loss of function of pr-set7 causes a delay in NSC reactivation and loss of H4K20 monomethylation in the brain. Through NSC-specific in vivo profiling, we demonstrate that Pr-set7 binds to the promoter region of cyclin-dependent kinase 1 (cdk1) and Wnt pathway transcriptional co-activator earthbound1/jerky (ebd1). Further validation indicates that Pr-set7 is required for the expression of cdk1 and ebd1 in the brain. Similar to Pr-set7, Cdk1 and Ebd1 promote NSC reactivation. Finally, overexpression of Cdk1 and Ebd1 significantly suppressed NSC reactivation defects observed in pr-set7-depleted brains. Therefore, Pr-set7 promotes NSC reactivation by regulating Wnt signaling and cell cycle progression. Our findings may contribute to the understanding of mammalian KMT5A/PR-SET7/SETD8 during brain development.


Asunto(s)
Histonas , Células-Madre Neurales , Animales , Proteína Quinasa CDC2 , N-Metiltransferasa de Histona-Lisina/genética , N-Metiltransferasa de Histona-Lisina/metabolismo , Células-Madre Neurales/metabolismo
7.
Cell ; 133(6): 1106-17, 2008 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-18555785

RESUMEN

Transcription factors (TFs) and their specific interactions with targets are crucial for specifying gene-expression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biology as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding sites identifies important features of the transcriptional regulatory networks that define ES-cell identity.


Asunto(s)
Células Madre Embrionarias/metabolismo , Redes Reguladoras de Genes , Transducción de Señal , Animales , Secuencia de Bases , Sitios de Unión , Inmunoprecipitación de Cromatina , Genoma , Factor 4 Similar a Kruppel , Ratones , Complejos Multiproteicos , Factores de Transcripción/metabolismo
8.
Nucleic Acids Res ; 49(6): e33, 2021 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-33444454

RESUMEN

A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts.


Asunto(s)
Algoritmos , Integración Viral , Alphapapillomavirus/genética , Conjuntos de Datos como Asunto , Genoma Humano , Virus de la Hepatitis B/genética , Humanos , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ARN , Programas Informáticos
9.
Nucleic Acids Res ; 49(19): 10879-10894, 2021 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-34643730

RESUMEN

Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.


Asunto(s)
Arabidopsis/genética , Flores/genética , Regulación de la Expresión Génica de las Plantas , Genoma de Planta , Mutación INDEL , Programas Informáticos , Arabidopsis/clasificación , Arabidopsis/crecimiento & desarrollo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Elementos Transponibles de ADN , Conjuntos de Datos como Asunto , Flores/crecimiento & desarrollo , Flores/metabolismo , Duplicación de Gen , Regulación del Desarrollo de la Expresión Génica , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Recombinación Genética
10.
Genome Res ; 29(2): 223-235, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30606742

RESUMEN

The aberrant activities of transcription factors such as the androgen receptor (AR) underpin prostate cancer development. While the AR cis-regulation has been extensively studied in prostate cancer, information pertaining to the spatial architecture of the AR transcriptional circuitry remains limited. In this paper, we propose a novel framework to profile long-range chromatin interactions associated with AR and its collaborative transcription factor, erythroblast transformation-specific related gene (ERG), using chromatin interaction analysis by paired-end tag (ChIA-PET). We identified ERG-associated long-range chromatin interactions as a cooperative component in the AR-associated chromatin interactome, acting in concert to achieve coordinated regulation of a subset of AR target genes. Through multifaceted functional data analysis, we found that AR-ERG interaction hub regions are characterized by distinct functional signatures, including bidirectional transcription and cotranscription factor binding. In addition, cancer-associated long noncoding RNAs were found to be connected near protein-coding genes through AR-ERG looping. Finally, we found strong enrichment of prostate cancer genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) at AR-ERG co-binding sites participating in chromatin interactions and gene regulation, suggesting GWAS target genes identified from chromatin looping data provide more biologically relevant findings than using the nearest gene approach. Taken together, our results revealed an AR-ERG-centric higher-order chromatin structure that drives coordinated gene expression in prostate cancer progression and the identification of potential target genes for therapeutic intervention.


Asunto(s)
Cromatina/metabolismo , Regulación Neoplásica de la Expresión Génica , Neoplasias de la Próstata/genética , Receptores Androgénicos/metabolismo , Transcripción Genética , Línea Celular Tumoral , Cromatina/química , Redes Reguladoras de Genes , Genoma Humano , Humanos , Masculino , Proteínas de Fusión Oncogénica/análisis , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/metabolismo , ARN Largo no Codificante/metabolismo , Regulador Transcripcional ERG/metabolismo , Regulador Transcripcional ERG/fisiología
11.
Bioinformatics ; 37(11): 1497-1505, 2021 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-30989231

RESUMEN

MOTIVATION: Structural variations (SVs) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome. Since paired-end whole genome sequencing data have become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data. RESULTS: We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets. AVAILABILITY AND IMPLEMENTATION: SurVIndel is available at https://github.com/Mesh89/SurVIndel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Bioinformatics ; 37(13): 1821-1827, 2021 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-33453108

RESUMEN

MOTIVATION: Virus integration in the host genome is frequently reported to be closely associated with many human diseases, and the detection of virus integration is a critically challenging task. However, most existing tools show limited specificity and sensitivity. Therefore, the objective of this study is to develop a method for accurate detection of virus integration into host genomes. RESULTS: Herein, we report a novel method termed HIVID2 that is a significant upgrade of HIVID. HIVID2 performs a paired-end combination (PE-combination) for potentially integrated reads. The resulting sequences are then remapped onto the reference genomes, and both split and discordant chimeric reads are used to identify accurate integration breakpoints with high confidence. HIVID2 represents a great improvement in specificity and sensitivity, and predicts breakpoints closer to the real integrations, compared with existing methods. The advantage of our method was demonstrated using both simulated and real datasets. HIVID2 uncovered novel integration breakpoints in well-known cervical cancer-related genes, including FHIT and LRP1B, which was verified using protein expression data. In addition, HIVID2 allows the user to decide whether to automatically perform advanced analysis using the identified virus integrations. By analyzing the simulated data and real data tests, we demonstrated that HIVID2 is not only more accurate than HIVID but also better than other existing programs with respect to both sensitivity and specificity. We believe that HIVID2 will help in enhancing future research associated with virus integration. AVAILABILITYAND IMPLEMENTATION: HIVID2 can be accessed at https://github.com/zengxi-hada/HIVID2/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
BMC Genomics ; 22(1): 581, 2021 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-34330207

RESUMEN

BACKGROUND: The Muscovy duck (Cairina moschata) is an economically important duck species, with favourable growth and carcass composition parameters in comparison to other ducks. However, limited genomic resources for Muscovy duck hinder our understanding of its evolution and genetic diversity. RESULTS: We combined linked-reads sequencing technology and reference-guided methods for de novo genome assembly. The final draft assembly was 1.12 Gbp with 29 autosomes, one sex chromosome and 4,583 unlocalized scaffolds with an N50 size of 77.35 Mb. Based on universal single-copy orthologues (BUSCO), the draft genome assembly completeness was estimated to be 93.30 %. Genome annotation identified 15,580 genes, with 15,537 (99.72 %) genes annotated in public databases. We conducted comparative genomic analyses and found that species-specific and rapidly expanding gene families (compared to other birds) in Muscovy duck are mainly involved in Calcium signaling, Adrenergic signaling in cardiomyocytes, and GnRH signaling pathways. In comparison to the common domestic duck (Anas platyrhynchos), we identified 104 genes exhibiting strong signals of adaptive evolution (Ka/Ks > 1). Most of these genes were associated with immune defence pathways (e.g. IFNAR1 and TLR5). This is indicative of the existence of differences in the immune responses between the two species. Additionally, we combined divergence and polymorphism data to demonstrate the "faster-Z effect" of chromosome evolution. CONCLUSIONS: The chromosome-level genome assembly of Muscovy duck and comparative genomic analyses provide valuable resources for future molecular ecology studies, as well as the evolutionary arms race between the host and influenza viruses.


Asunto(s)
Patos , Genómica , Animales , Aves , Cromosomas , Patos/genética , Genoma , Humanos
14.
Genet Sel Evol ; 53(1): 35, 2021 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-33849442

RESUMEN

BACKGROUND: The most prolific duck genetic resource in the world is located in Southeast/South Asia but little is known about the domestication and complex histories of these duck populations. RESULTS: Based on whole-genome resequencing data of 78 ducks (Anas platyrhynchos) and 31 published whole-genome duck sequences, we detected three geographic distinct genetic groups, including local Chinese, wild, and local Southeast/South Asian populations. We inferred the demographic history of these duck populations with different geographical distributions and found that the Chinese and Southeast/South Asian ducks shared similar demographic features. The Chinese domestic ducks experienced the strongest population bottleneck caused by domestication and the last glacial maximum (LGM) period, whereas the Chinese wild ducks experienced a relatively weak bottleneck caused by domestication only. Furthermore, the bottleneck was more severe in the local Southeast/South Asian populations than in the local Chinese populations, which resulted in a smaller effective population size for the former (7100-11,900). We show that extensive gene flow has occurred between the Southeast/South Asian and Chinese populations, and between the Southeast Asian and South Asian populations. Prolonged gene flow was detected between the Guangxi population from China and its neighboring Southeast/South Asian populations. In addition, based on multiple statistical approaches, we identified a genomic region that included three genes (PNPLA8, THAP5, and DNAJB9) on duck chromosome 1 with a high probability of gene flow between the Guangxi and Southeast/South Asian populations. Finally, we detected strong signatures of selection in genes that are involved in signaling pathways of the nervous system development (e.g., ADCYAP1R1 and PDC) and in genes that are associated with morphological traits such as cell growth (e.g., IGF1R). CONCLUSIONS: Our findings provide valuable information for a better understanding of the domestication and demographic history of the duck, and of the gene flow between local duck populations from Southeast/South Asia and China.


Asunto(s)
Domesticación , Patos/genética , Flujo Génico , Animales , Proteínas Aviares/genética , Cromosomas/genética , Patos/clasificación , Filogenia , Selección Genética , Secuenciación Completa del Genoma
15.
BMC Bioinformatics ; 21(1): 451, 2020 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-33045983

RESUMEN

BACKGROUND: DNA methylation is an important epigenetic modification that plays a critical role in most eukaryotic organisms. Parental alleles in haploid genomes may exhibit different methylation patterns, which can lead to different phenotypes and even different therapeutic and drug responses to diseases. However, to our knowledge, no software is available for the identification of DNA methylation haplotype regions with combined allele-specific DNA methylation, single nucleotide polymorphisms (SNPs) and high-throughput chromosome conformation capture (Hi-C) data. RESULTS: In this paper, we developed a new method, MethHaplo, that identify DNA methylation haplotype regions with allele-specific DNA methylation and SNPs from whole-genome bisulfite sequencing (WGBS) data. Our results showed that methylation haplotype regions were ten times longer than haplotypes with SNPs only. When we integrate WGBS and Hi-C data, MethHaplo could call even longer haplotypes. CONCLUSIONS: This study illustrates the usefulness of methylation haplotypes. By constructing methylation haplotypes for various cell lines, we provide a clearer picture of the effect of DNA methylation on gene expression, histone modification and three-dimensional chromosome structure at the haplotype level. Our method could benefit the study of parental inheritance-related disease and hybrid vigor in agriculture.


Asunto(s)
Alelos , Metilación de ADN , Haplotipos/genética , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma , Epigénesis Genética , Programas Informáticos
16.
Nucleic Acids Res ; 46(20): e122, 2018 11 16.
Artículo en Inglés | MEDLINE | ID: mdl-30137425

RESUMEN

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Elementos Transponibles de ADN/genética , Bases de Datos Factuales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genoma Humano/genética , Genómica/métodos , Humanos , Mutagénesis Insercional , Reproducibilidad de los Resultados
17.
BMC Bioinformatics ; 20(1): 47, 2019 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-30669962

RESUMEN

BACKGROUND: DNA methylation plays crucial roles in most eukaryotic organisms. Bisulfite sequencing (BS-Seq) is a sequencing approach that provides quantitative cytosine methylation levels in genome-wide scope and single-base resolution. However, genomic variations such as insertions and deletions (indels) affect methylation calling, and the alignment of reads near/across indels becomes inaccurate in the presence of polymorphisms. Hence, the simultaneous detection of DNA methylation and indels is important for exploring the mechanisms of functional regulation in organisms. RESULTS: These problems motivated us to develop the algorithm BatMeth2, which can align BS reads with high accuracy while allowing for variable-length indels with respect to the reference genome. The results from simulated and real bisulfite DNA methylation data demonstrated that our proposed method increases alignment accuracy. Additionally, BatMeth2 can calculate the methylation levels of individual loci, genomic regions or functional regions such as genes/transposable elements. Additional programs were also developed to provide methylation data annotation, visualization, and differentially methylated cytosine/region (DMC/DMR) detection. The whole package provides new tools and will benefit bisulfite data analysis. CONCLUSION: BatMeth2 improves DNA methylation calling, particularly for regions close to indels. It is an autorun package and easy to use. In addition, a DNA methylation visualization program and a differential analysis program are provided in BatMeth2. We believe that BatMeth2 will facilitate the study of the mechanisms of DNA methylation in development and disease. BatMeth2 is an open source software program and is available on GitHub ( https://github.com/GuoliangLi-HZAU/BatMeth2 /).


Asunto(s)
Metilación de ADN/genética , Análisis de Datos , Análisis de Secuencia de ADN/métodos , Sulfitos/metabolismo , Algoritmos , Humanos , Programas Informáticos
18.
J Transl Med ; 17(1): 273, 2019 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-31429776

RESUMEN

BACKGROUND: Hepatocellular carcinoma is the second most deadly cancer with late presentation and limited treatment options, highlighting an urgent need to better understand HCC to facilitate the identification of early-stage biomarkers and uncover therapeutic targets for the development of novel therapies for HCC. METHODS: Deep transcriptome sequencing of tumor and paired non-tumor liver tissues was performed to comprehensively evaluate the profiles of both the host and HBV transcripts in HCC patients. Differential gene expression patterns and the dys-regulated genes associated with clinical outcomes were analyzed. Somatic mutations were identified from the sequencing data and the deleterious mutations were predicted. Lastly, human-HBV chimeric transcripts were identified, and their distribution, potential function and expression association were analyzed. RESULTS: Expression profiling identified the significantly upregulated TP73 as a nodal molecule modulating expression of apoptotic genes. Approximately 2.5% of dysregulated genes significantly correlated with HCC clinical characteristics. Of the 110 identified genes, those involved in post-translational modification, cell division and/or transcriptional regulation were upregulated, while those involved in redox reactions were downregulated in tumors of patients with poor prognosis. Mutation signature analysis identified that somatic mutations in HCC tumors were mainly non-synonymous, frequently affecting genes in the micro-environment and cancer pathways. Recurrent mutations occur mainly in ribosomal genes. The most frequently mutated genes were generally associated with a poorer clinical prognosis. Lastly, transcriptome sequencing suggest that HBV replication in the tumors of HCC patients is rare. HBV-human fusion transcripts are a common observation, with favored HBV and host insertion sites being the HBx C-terminus and gene introns (in tumors) and introns/intergenic-regions (in non-tumors), respectively. HBV-fused genes in tumors were mainly involved in RNA binding while those in non-tumors tissues varied widely. These observations suggest that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts may occur during tumorigenesis. CONCLUSIONS: Transcriptome sequencing of HCC patients reveals key cancer molecules and clinically relevant pathways deregulated/mutated in HCC patients and suggests that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts likely occur during the process of tumorigenesis.


Asunto(s)
Carcinoma Hepatocelular/genética , Perfilación de la Expresión Génica , Neoplasias Hepáticas/genética , Transcriptoma/genética , Secuencia de Bases , Ciclo Celular/genética , Cromosomas Humanos/genética , Regulación Neoplásica de la Expresión Génica , Genoma Viral , Virus de la Hepatitis B/genética , Humanos , Intrones/genética , Masculino , Mutación/genética , Sistemas de Lectura Abierta/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Supervivencia , Transactivadores/genética , Proteínas Reguladoras y Accesorias Virales
19.
Nature ; 504(7479): 306-310, 2013 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-24213634

RESUMEN

In multicellular organisms, transcription regulation is one of the central mechanisms modelling lineage differentiation and cell-fate determination. Transcription requires dynamic chromatin configurations between promoters and their corresponding distal regulatory elements. It is believed that their communication occurs within large discrete foci of aggregated RNA polymerases termed transcription factories in three-dimensional nuclear space. However, the dynamic nature of chromatin connectivity has not been characterized at the genome-wide level. Here, through a chromatin interaction analysis with paired-end tagging approach using an antibody that primarily recognizes the pre-initiation complexes of RNA polymerase II, we explore the transcriptional interactomes of three mouse cells of progressive lineage commitment, including pluripotent embryonic stem cells, neural stem cells and neurosphere stem/progenitor cells. Our global chromatin connectivity maps reveal approximately 40,000 long-range interactions, suggest precise enhancer-promoter associations and delineate cell-type-specific chromatin structures. Analysis of the complex regulatory repertoire shows that there are extensive colocalizations among promoters and distal-acting enhancers. Most of the enhancers associate with promoters located beyond their nearest active genes, indicating that the linear juxtaposition is not the only guiding principle driving enhancer target selection. Although promoter-enhancer interactions exhibit high cell-type specificity, promoters involved in interactions are found to be generally common and mostly active among different cells. Chromatin connectivity networks reveal that the pivotal genes of reprogramming functions are transcribed within physical proximity to each other in embryonic stem cells, linking chromatin architecture to coordinated gene expression. Our study sets the stage for the full-scale dissection of spatial and temporal genome structures and their roles in orchestrating development.


Asunto(s)
Cromatina/genética , Cromatina/metabolismo , Elementos de Facilitación Genéticos/genética , Regulación de la Expresión Génica/genética , Regiones Promotoras Genéticas/genética , Animales , Línea Celular , Linaje de la Célula , Células Madre Embrionarias/metabolismo , Hibridación Fluorescente in Situ , Ratones , Células-Madre Neurales/metabolismo , ARN Polimerasa II/metabolismo , Transcripción Genética/genética , Pez Cebra/genética
20.
BMC Bioinformatics ; 18(Suppl 3): 71, 2017 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-28361674

RESUMEN

BACKGROUND: The study of virus integrations in human genome is important since virus integrations were shown to be associated with diseases. In the literature, few methods have been proposed that predict virus integrations using next generation sequencing datasets. Although they work, they are slow and are not very sensitive. RESULTS AND DISCUSSION: This paper introduces a new method BatVI to predict viral integrations. Our method uses a fast screening method to filter out chimeric reads containing possible viral integrations. Next, sensitive alignments of these candidate chimeric reads are called by BLAST. Chimeric reads that are co-localized in the human genome are clustered. Finally, by assembling the chimeric reads in each cluster, high confident virus integration sites are extracted. CONCLUSION: We compared the performance of BatVI with existing methods VirusFinder and VirusSeq using both simulated and real-life datasets of liver cancer patients. BatVI ran an order of magnitude faster and was able to predict almost twice the number of true positives compared to other methods while maintaining a false positive rate less than 1%. For the liver cancer datasets, BatVI uncovered novel integrations to two important genes TERT and MLL4, which were missed by previous studies. Through gene expression data, we verified the correctness of these additional integrations. BatVI can be downloaded from http://biogpu.ddns.comp.nus.edu.sg/~ksung/batvi/index.html .


Asunto(s)
Genoma Humano , Interacciones Huésped-Patógeno/genética , Integración Viral , Algoritmos , Análisis por Conglomerados , ADN Viral/genética , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , N-Metiltransferasa de Histona-Lisina , Humanos , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/virología , Modelos Teóricos , Análisis de Secuencia de ADN , Programas Informáticos , Telomerasa/genética , Telomerasa/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA