Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 148
Filtrar
1.
Nat Methods ; 21(4): 574-583, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38459383

RESUMO

Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.


Assuntos
Nanoporos , Diploide , Haploidia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Telômero/genética , Análise de Sequência de DNA/métodos
2.
Cell ; 148(1-2): 84-98, 2012 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-22265404

RESUMO

Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.


Assuntos
Cromatina/metabolismo , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , RNA Polimerase II/metabolismo , Transcrição Gênica , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Elementos Facilitadores Genéticos , Estudo de Associação Genômica Ampla , Humanos
3.
PLoS Biol ; 20(10): e3001834, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36223339

RESUMO

Neural stem cells (NSCs) divide asymmetrically to balance their self-renewal and differentiation, an imbalance in which can lead to NSC overgrowth and tumor formation. The functions of Parafibromin, a conserved tumor suppressor, in the nervous system are not established. Here, we demonstrate that Drosophila Parafibromin/Hyrax (Hyx) inhibits ectopic NSC formation by governing cell polarity. Hyx is essential for the asymmetric distribution and/or maintenance of polarity proteins. hyx depletion results in the symmetric division of NSCs, leading to the formation of supernumerary NSCs in the larval brain. Importantly, we show that human Parafibromin rescues the ectopic NSC phenotype in Drosophila hyx mutant brains. We have also discovered that Hyx is required for the proper formation of interphase microtubule-organizing center and mitotic spindles in NSCs. Moreover, Hyx is required for the proper localization of 2 key centrosomal proteins, Polo and AurA, and the microtubule-binding proteins Msps and D-TACC in dividing NSCs. Furthermore, Hyx directly regulates the polo and aurA expression in vitro. Finally, overexpression of polo and aurA could significantly suppress ectopic NSC formation and NSC polarity defects caused by hyx depletion. Our data support a model in which Hyx promotes the expression of polo and aurA in NSCs and, in turn, regulates cell polarity and centrosome/microtubule assembly. This new paradigm may be relevant to future studies on Parafibromin/HRPT2-associated cancers.


Assuntos
Proteínas de Drosophila , Células-Tronco Neurais , Animais , Polaridade Celular , Centrossomo/metabolismo , Drosophila/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Humanos , Células-Tronco Neurais/metabolismo , Fatores de Transcrição/metabolismo
4.
Nucleic Acids Res ; 51(17): 9001-9018, 2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37572350

RESUMO

Photoperiods integrate with the circadian clock to coordinate gene expression rhythms and thus ensure plant fitness to the environment. Genome-wide characterization and comparison of rhythmic genes under different light conditions revealed delayed phase under constant darkness (DD) and reduced amplitude under constant light (LL) in rice. Interestingly, ChIP-seq and RNA-seq profiling of rhythmic genes exhibit synchronous circadian oscillation in H3K9ac modifications at their loci and long non-coding RNAs (lncRNAs) expression at proximal loci. To investigate how gene expression rhythm is regulated in rice, we profiled the open chromatin regions and transcription factor (TF) footprints by time-series ATAC-seq. Although open chromatin regions did not show circadian change, a significant number of TFs were identified to rhythmically associate with chromatin and drive gene expression in a time-dependent manner. Further transcriptional regulatory networks mapping uncovered significant correlation between core clock genes and transcription factors involved in light/temperature signaling. In situ Hi-C of ZT8-specific expressed genes displayed highly connected chromatin association at the same time, whereas this ZT8 chromatin connection network dissociates at ZT20, suggesting the circadian control of gene expression by dynamic spatial chromatin conformation. These findings together implicate the existence of a synchronization mechanism between circadian H3K9ac modifications, chromatin association of TF and gene expression, and provides insights into circadian dynamics of spatial chromatin conformation that associate with gene expression rhythms.


Assuntos
Ritmo Circadiano , Oryza , Cromatina/genética , Relógios Circadianos/genética , Ritmo Circadiano/genética , Epigenoma , Perfilação da Expressão Gênica , Oryza/genética , Oryza/fisiologia , Fatores de Transcrição/genética
5.
Nucleic Acids Res ; 50(D1): D60-D71, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34664666

RESUMO

DNA methylation is known to be the most stable epigenetic modification and has been extensively studied in relation to cell differentiation, development, X chromosome inactivation and disease. Allele-specific DNA methylation (ASM) is a well-established mechanism for genomic imprinting and regulates imprinted gene expression. Previous studies have confirmed that certain special regions with ASM are susceptible and closely related to human carcinogenesis and plant development. In addition, recent studies have proven ASM to be an effective tumour marker. However, research on the functions of ASM in diseases and development is still extremely scarce. Here, we collected 4400 BS-Seq datasets and 1598 corresponding RNA-Seq datasets from 47 species, including human and mouse, to establish a comprehensive ASM database. We obtained the data on DNA methylation level, ASM and allele-specific expressed genes (ASEGs) and further analysed the ASM/ASEG distribution patterns of these species. In-depth ASM distribution analysis and differential methylation analysis conducted in nine cancer types showed results consistent with the reported changes in ASM in key tumour genes and revealed several potential ASM tumour-related genes. Finally, integrating these results, we constructed the first well-resourced and comprehensive ASM database for 47 species (ASMdb, www.dna-asmdb.com).


Assuntos
Metilação de DNA/genética , Bases de Dados Genéticas , Epigênese Genética/genética , Impressão Genômica/genética , Alelos , Animais , Ilhas de CpG/genética , Humanos , Camundongos , Polimorfismo de Nucleotídeo Único/genética , RNA-Seq , Inativação do Cromossomo X/genética
6.
EMBO Rep ; 22(4): e50994, 2021 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-33565211

RESUMO

The ability of neural stem cells (NSCs) to switch between quiescence and proliferation is crucial for brain development and homeostasis. Increasing evidence suggests that variants of histone lysine methyltransferases including KMT5A are associated with neurodevelopmental disorders. However, the function of KMT5A/Pr-set7/SETD8 in the central nervous system is not well established. Here, we show that Drosophila Pr-Set7 is a novel regulator of NSC reactivation. Loss of function of pr-set7 causes a delay in NSC reactivation and loss of H4K20 monomethylation in the brain. Through NSC-specific in vivo profiling, we demonstrate that Pr-set7 binds to the promoter region of cyclin-dependent kinase 1 (cdk1) and Wnt pathway transcriptional co-activator earthbound1/jerky (ebd1). Further validation indicates that Pr-set7 is required for the expression of cdk1 and ebd1 in the brain. Similar to Pr-set7, Cdk1 and Ebd1 promote NSC reactivation. Finally, overexpression of Cdk1 and Ebd1 significantly suppressed NSC reactivation defects observed in pr-set7-depleted brains. Therefore, Pr-set7 promotes NSC reactivation by regulating Wnt signaling and cell cycle progression. Our findings may contribute to the understanding of mammalian KMT5A/PR-SET7/SETD8 during brain development.


Assuntos
Histonas , Células-Tronco Neurais , Animais , Proteína Quinase CDC2 , Histona-Lisina N-Metiltransferase/genética , Histona-Lisina N-Metiltransferase/metabolismo , Células-Tronco Neurais/metabolismo
7.
Cell ; 133(6): 1106-17, 2008 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-18555785

RESUMO

Transcription factors (TFs) and their specific interactions with targets are crucial for specifying gene-expression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biology as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding sites identifies important features of the transcriptional regulatory networks that define ES-cell identity.


Assuntos
Células-Tronco Embrionárias/metabolismo , Redes Reguladoras de Genes , Transdução de Sinais , Animais , Sequência de Bases , Sítios de Ligação , Imunoprecipitação da Cromatina , Genoma , Fator 4 Semelhante a Kruppel , Camundongos , Complexos Multiproteicos , Fatores de Transcrição/metabolismo
8.
Nucleic Acids Res ; 49(6): e33, 2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33444454

RESUMO

A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts.


Assuntos
Algoritmos , Integração Viral , Alphapapillomavirus/genética , Conjuntos de Dados como Assunto , Genoma Humano , Vírus da Hepatite B/genética , Humanos , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de RNA , Software
9.
Nucleic Acids Res ; 49(19): 10879-10894, 2021 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-34643730

RESUMO

Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.


Assuntos
Arabidopsis/genética , Flores/genética , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Mutação INDEL , Software , Arabidopsis/classificação , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Elementos de DNA Transponíveis , Conjuntos de Dados como Assunto , Flores/crescimento & desenvolvimento , Flores/metabolismo , Duplicação Gênica , Regulação da Expressão Gênica no Desenvolvimento , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Recombinação Genética
10.
Genome Res ; 29(2): 223-235, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30606742

RESUMO

The aberrant activities of transcription factors such as the androgen receptor (AR) underpin prostate cancer development. While the AR cis-regulation has been extensively studied in prostate cancer, information pertaining to the spatial architecture of the AR transcriptional circuitry remains limited. In this paper, we propose a novel framework to profile long-range chromatin interactions associated with AR and its collaborative transcription factor, erythroblast transformation-specific related gene (ERG), using chromatin interaction analysis by paired-end tag (ChIA-PET). We identified ERG-associated long-range chromatin interactions as a cooperative component in the AR-associated chromatin interactome, acting in concert to achieve coordinated regulation of a subset of AR target genes. Through multifaceted functional data analysis, we found that AR-ERG interaction hub regions are characterized by distinct functional signatures, including bidirectional transcription and cotranscription factor binding. In addition, cancer-associated long noncoding RNAs were found to be connected near protein-coding genes through AR-ERG looping. Finally, we found strong enrichment of prostate cancer genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) at AR-ERG co-binding sites participating in chromatin interactions and gene regulation, suggesting GWAS target genes identified from chromatin looping data provide more biologically relevant findings than using the nearest gene approach. Taken together, our results revealed an AR-ERG-centric higher-order chromatin structure that drives coordinated gene expression in prostate cancer progression and the identification of potential target genes for therapeutic intervention.


Assuntos
Cromatina/metabolismo , Regulação Neoplásica da Expressão Gênica , Neoplasias da Próstata/genética , Receptores Androgênicos/metabolismo , Transcrição Gênica , Linhagem Celular Tumoral , Cromatina/química , Redes Reguladoras de Genes , Genoma Humano , Humanos , Masculino , Proteínas de Fusão Oncogênica/análise , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/metabolismo , RNA Longo não Codificante/metabolismo , Regulador Transcricional ERG/metabolismo , Regulador Transcricional ERG/fisiologia
11.
Bioinformatics ; 37(11): 1497-1505, 2021 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-30989231

RESUMO

MOTIVATION: Structural variations (SVs) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome. Since paired-end whole genome sequencing data have become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data. RESULTS: We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets. AVAILABILITY AND IMPLEMENTATION: SurVIndel is available at https://github.com/Mesh89/SurVIndel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Bioinformatics ; 37(13): 1821-1827, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33453108

RESUMO

MOTIVATION: Virus integration in the host genome is frequently reported to be closely associated with many human diseases, and the detection of virus integration is a critically challenging task. However, most existing tools show limited specificity and sensitivity. Therefore, the objective of this study is to develop a method for accurate detection of virus integration into host genomes. RESULTS: Herein, we report a novel method termed HIVID2 that is a significant upgrade of HIVID. HIVID2 performs a paired-end combination (PE-combination) for potentially integrated reads. The resulting sequences are then remapped onto the reference genomes, and both split and discordant chimeric reads are used to identify accurate integration breakpoints with high confidence. HIVID2 represents a great improvement in specificity and sensitivity, and predicts breakpoints closer to the real integrations, compared with existing methods. The advantage of our method was demonstrated using both simulated and real datasets. HIVID2 uncovered novel integration breakpoints in well-known cervical cancer-related genes, including FHIT and LRP1B, which was verified using protein expression data. In addition, HIVID2 allows the user to decide whether to automatically perform advanced analysis using the identified virus integrations. By analyzing the simulated data and real data tests, we demonstrated that HIVID2 is not only more accurate than HIVID but also better than other existing programs with respect to both sensitivity and specificity. We believe that HIVID2 will help in enhancing future research associated with virus integration. AVAILABILITYAND IMPLEMENTATION: HIVID2 can be accessed at https://github.com/zengxi-hada/HIVID2/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
BMC Genomics ; 22(1): 581, 2021 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-34330207

RESUMO

BACKGROUND: The Muscovy duck (Cairina moschata) is an economically important duck species, with favourable growth and carcass composition parameters in comparison to other ducks. However, limited genomic resources for Muscovy duck hinder our understanding of its evolution and genetic diversity. RESULTS: We combined linked-reads sequencing technology and reference-guided methods for de novo genome assembly. The final draft assembly was 1.12 Gbp with 29 autosomes, one sex chromosome and 4,583 unlocalized scaffolds with an N50 size of 77.35 Mb. Based on universal single-copy orthologues (BUSCO), the draft genome assembly completeness was estimated to be 93.30 %. Genome annotation identified 15,580 genes, with 15,537 (99.72 %) genes annotated in public databases. We conducted comparative genomic analyses and found that species-specific and rapidly expanding gene families (compared to other birds) in Muscovy duck are mainly involved in Calcium signaling, Adrenergic signaling in cardiomyocytes, and GnRH signaling pathways. In comparison to the common domestic duck (Anas platyrhynchos), we identified 104 genes exhibiting strong signals of adaptive evolution (Ka/Ks > 1). Most of these genes were associated with immune defence pathways (e.g. IFNAR1 and TLR5). This is indicative of the existence of differences in the immune responses between the two species. Additionally, we combined divergence and polymorphism data to demonstrate the "faster-Z effect" of chromosome evolution. CONCLUSIONS: The chromosome-level genome assembly of Muscovy duck and comparative genomic analyses provide valuable resources for future molecular ecology studies, as well as the evolutionary arms race between the host and influenza viruses.


Assuntos
Patos , Genômica , Animais , Aves , Cromossomos , Patos/genética , Genoma , Humanos
14.
Genet Sel Evol ; 53(1): 35, 2021 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-33849442

RESUMO

BACKGROUND: The most prolific duck genetic resource in the world is located in Southeast/South Asia but little is known about the domestication and complex histories of these duck populations. RESULTS: Based on whole-genome resequencing data of 78 ducks (Anas platyrhynchos) and 31 published whole-genome duck sequences, we detected three geographic distinct genetic groups, including local Chinese, wild, and local Southeast/South Asian populations. We inferred the demographic history of these duck populations with different geographical distributions and found that the Chinese and Southeast/South Asian ducks shared similar demographic features. The Chinese domestic ducks experienced the strongest population bottleneck caused by domestication and the last glacial maximum (LGM) period, whereas the Chinese wild ducks experienced a relatively weak bottleneck caused by domestication only. Furthermore, the bottleneck was more severe in the local Southeast/South Asian populations than in the local Chinese populations, which resulted in a smaller effective population size for the former (7100-11,900). We show that extensive gene flow has occurred between the Southeast/South Asian and Chinese populations, and between the Southeast Asian and South Asian populations. Prolonged gene flow was detected between the Guangxi population from China and its neighboring Southeast/South Asian populations. In addition, based on multiple statistical approaches, we identified a genomic region that included three genes (PNPLA8, THAP5, and DNAJB9) on duck chromosome 1 with a high probability of gene flow between the Guangxi and Southeast/South Asian populations. Finally, we detected strong signatures of selection in genes that are involved in signaling pathways of the nervous system development (e.g., ADCYAP1R1 and PDC) and in genes that are associated with morphological traits such as cell growth (e.g., IGF1R). CONCLUSIONS: Our findings provide valuable information for a better understanding of the domestication and demographic history of the duck, and of the gene flow between local duck populations from Southeast/South Asia and China.


Assuntos
Domesticação , Patos/genética , Fluxo Gênico , Animais , Proteínas Aviárias/genética , Cromossomos/genética , Patos/classificação , Filogenia , Seleção Genética , Sequenciamento Completo do Genoma
15.
BMC Bioinformatics ; 21(1): 451, 2020 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-33045983

RESUMO

BACKGROUND: DNA methylation is an important epigenetic modification that plays a critical role in most eukaryotic organisms. Parental alleles in haploid genomes may exhibit different methylation patterns, which can lead to different phenotypes and even different therapeutic and drug responses to diseases. However, to our knowledge, no software is available for the identification of DNA methylation haplotype regions with combined allele-specific DNA methylation, single nucleotide polymorphisms (SNPs) and high-throughput chromosome conformation capture (Hi-C) data. RESULTS: In this paper, we developed a new method, MethHaplo, that identify DNA methylation haplotype regions with allele-specific DNA methylation and SNPs from whole-genome bisulfite sequencing (WGBS) data. Our results showed that methylation haplotype regions were ten times longer than haplotypes with SNPs only. When we integrate WGBS and Hi-C data, MethHaplo could call even longer haplotypes. CONCLUSIONS: This study illustrates the usefulness of methylation haplotypes. By constructing methylation haplotypes for various cell lines, we provide a clearer picture of the effect of DNA methylation on gene expression, histone modification and three-dimensional chromosome structure at the haplotype level. Our method could benefit the study of parental inheritance-related disease and hybrid vigor in agriculture.


Assuntos
Alelos , Metilação de DNA , Haplótipos/genética , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma , Epigênese Genética , Software
16.
Nucleic Acids Res ; 46(20): e122, 2018 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-30137425

RESUMO

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.


Assuntos
Algoritmos , Biologia Computacional/métodos , Elementos de DNA Transponíveis/genética , Bases de Dados Factuais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma Humano/genética , Genômica/métodos , Humanos , Mutagênese Insercional , Reprodutibilidade dos Testes
17.
BMC Bioinformatics ; 20(1): 47, 2019 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-30669962

RESUMO

BACKGROUND: DNA methylation plays crucial roles in most eukaryotic organisms. Bisulfite sequencing (BS-Seq) is a sequencing approach that provides quantitative cytosine methylation levels in genome-wide scope and single-base resolution. However, genomic variations such as insertions and deletions (indels) affect methylation calling, and the alignment of reads near/across indels becomes inaccurate in the presence of polymorphisms. Hence, the simultaneous detection of DNA methylation and indels is important for exploring the mechanisms of functional regulation in organisms. RESULTS: These problems motivated us to develop the algorithm BatMeth2, which can align BS reads with high accuracy while allowing for variable-length indels with respect to the reference genome. The results from simulated and real bisulfite DNA methylation data demonstrated that our proposed method increases alignment accuracy. Additionally, BatMeth2 can calculate the methylation levels of individual loci, genomic regions or functional regions such as genes/transposable elements. Additional programs were also developed to provide methylation data annotation, visualization, and differentially methylated cytosine/region (DMC/DMR) detection. The whole package provides new tools and will benefit bisulfite data analysis. CONCLUSION: BatMeth2 improves DNA methylation calling, particularly for regions close to indels. It is an autorun package and easy to use. In addition, a DNA methylation visualization program and a differential analysis program are provided in BatMeth2. We believe that BatMeth2 will facilitate the study of the mechanisms of DNA methylation in development and disease. BatMeth2 is an open source software program and is available on GitHub ( https://github.com/GuoliangLi-HZAU/BatMeth2 /).


Assuntos
Metilação de DNA/genética , Análise de Dados , Análise de Sequência de DNA/métodos , Sulfitos/metabolismo , Algoritmos , Humanos , Software
18.
J Transl Med ; 17(1): 273, 2019 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-31429776

RESUMO

BACKGROUND: Hepatocellular carcinoma is the second most deadly cancer with late presentation and limited treatment options, highlighting an urgent need to better understand HCC to facilitate the identification of early-stage biomarkers and uncover therapeutic targets for the development of novel therapies for HCC. METHODS: Deep transcriptome sequencing of tumor and paired non-tumor liver tissues was performed to comprehensively evaluate the profiles of both the host and HBV transcripts in HCC patients. Differential gene expression patterns and the dys-regulated genes associated with clinical outcomes were analyzed. Somatic mutations were identified from the sequencing data and the deleterious mutations were predicted. Lastly, human-HBV chimeric transcripts were identified, and their distribution, potential function and expression association were analyzed. RESULTS: Expression profiling identified the significantly upregulated TP73 as a nodal molecule modulating expression of apoptotic genes. Approximately 2.5% of dysregulated genes significantly correlated with HCC clinical characteristics. Of the 110 identified genes, those involved in post-translational modification, cell division and/or transcriptional regulation were upregulated, while those involved in redox reactions were downregulated in tumors of patients with poor prognosis. Mutation signature analysis identified that somatic mutations in HCC tumors were mainly non-synonymous, frequently affecting genes in the micro-environment and cancer pathways. Recurrent mutations occur mainly in ribosomal genes. The most frequently mutated genes were generally associated with a poorer clinical prognosis. Lastly, transcriptome sequencing suggest that HBV replication in the tumors of HCC patients is rare. HBV-human fusion transcripts are a common observation, with favored HBV and host insertion sites being the HBx C-terminus and gene introns (in tumors) and introns/intergenic-regions (in non-tumors), respectively. HBV-fused genes in tumors were mainly involved in RNA binding while those in non-tumors tissues varied widely. These observations suggest that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts may occur during tumorigenesis. CONCLUSIONS: Transcriptome sequencing of HCC patients reveals key cancer molecules and clinically relevant pathways deregulated/mutated in HCC patients and suggests that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts likely occur during the process of tumorigenesis.


Assuntos
Carcinoma Hepatocelular/genética , Perfilação da Expressão Gênica , Neoplasias Hepáticas/genética , Transcriptoma/genética , Sequência de Bases , Ciclo Celular/genética , Cromossomos Humanos/genética , Regulação Neoplásica da Expressão Gênica , Genoma Viral , Vírus da Hepatite B/genética , Humanos , Íntrons/genética , Masculino , Mutação/genética , Fases de Leitura Aberta/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Sequências Repetitivas de Ácido Nucleico , Análise de Sobrevida , Transativadores/genética , Proteínas Virais Reguladoras e Acessórias
19.
Nature ; 504(7479): 306-310, 2013 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-24213634

RESUMO

In multicellular organisms, transcription regulation is one of the central mechanisms modelling lineage differentiation and cell-fate determination. Transcription requires dynamic chromatin configurations between promoters and their corresponding distal regulatory elements. It is believed that their communication occurs within large discrete foci of aggregated RNA polymerases termed transcription factories in three-dimensional nuclear space. However, the dynamic nature of chromatin connectivity has not been characterized at the genome-wide level. Here, through a chromatin interaction analysis with paired-end tagging approach using an antibody that primarily recognizes the pre-initiation complexes of RNA polymerase II, we explore the transcriptional interactomes of three mouse cells of progressive lineage commitment, including pluripotent embryonic stem cells, neural stem cells and neurosphere stem/progenitor cells. Our global chromatin connectivity maps reveal approximately 40,000 long-range interactions, suggest precise enhancer-promoter associations and delineate cell-type-specific chromatin structures. Analysis of the complex regulatory repertoire shows that there are extensive colocalizations among promoters and distal-acting enhancers. Most of the enhancers associate with promoters located beyond their nearest active genes, indicating that the linear juxtaposition is not the only guiding principle driving enhancer target selection. Although promoter-enhancer interactions exhibit high cell-type specificity, promoters involved in interactions are found to be generally common and mostly active among different cells. Chromatin connectivity networks reveal that the pivotal genes of reprogramming functions are transcribed within physical proximity to each other in embryonic stem cells, linking chromatin architecture to coordinated gene expression. Our study sets the stage for the full-scale dissection of spatial and temporal genome structures and their roles in orchestrating development.


Assuntos
Cromatina/genética , Cromatina/metabolismo , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica/genética , Regiões Promotoras Genéticas/genética , Animais , Linhagem Celular , Linhagem da Célula , Células-Tronco Embrionárias/metabolismo , Hibridização in Situ Fluorescente , Camundongos , Células-Tronco Neurais/metabolismo , RNA Polimerase II/metabolismo , Transcrição Gênica/genética , Peixe-Zebra/genética
20.
Methods ; 102: 36-49, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-26845461

RESUMO

Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.


Assuntos
Variação Estrutural do Genoma , Genômica/métodos , Análise de Sequência/métodos , Curadoria de Dados , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA