Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
Filtros adicionais











País/Região como assunto
Intervalo de ano
1.
Stem Cell Reports ; 13(1): 193-206, 2019 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-31231024

RESUMO

The temporal order of DNA replication is regulated during development and is highly correlated with gene expression, histone modifications and 3D genome architecture. We tracked changes in replication timing, gene expression, and chromatin conformation capture (Hi-C) A/B compartments over the first two cell cycles during differentiation of human embryonic stem cells to definitive endoderm. Remarkably, transcriptional programs were irreversibly reprogrammed within the first cell cycle and were largely but not universally coordinated with replication timing changes. Moreover, changes in A/B compartment and several histone modifications that normally correlate strongly with replication timing showed weak correlation during the early cell cycles of differentiation but showed increased alignment in later differentiation stages and in terminally differentiated cell lines. Thus, epigenetic cell fate transitions during early differentiation can occur despite dynamic and discordant changes in otherwise highly correlated genomic properties.

2.
Epigenetics ; 14(9): 894-911, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31177910

RESUMO

DNA molecules are highly compacted in the eukaryotic nucleus where distal regulatory elements reach their targets through three-dimensional chromosomal interactions. G-quadruplexes, stable four-stranded non-canonical DNA structures, can change local chromatin organization through the exclusion of nucleosomes. However, the relationship between G-quadruplexes and higher-order genome organization remains unknown. Here, we found that G-quadruplexes are significantly enriched at boundaries of topological associated domains (TADs). Architectural protein occupancy, which plays critical roles in the formation of TADs, was highly correlated with the content of G-quadruplexes at TAD boundaries. Moreover, adjacent boundaries containing G-quadruplexes frequently interacted with each other because of the high enrichment of architectural protein binding sites. Similar to CCCTC-binding factor (CTCF) binding sites, G-quadruplexes also showed strong insulation ability in the separation of adjacent regions. Additionally, the insulation ability of CTCF binding sites and TAD boundaries was significantly reinforced by G-quadruplexes. Furthermore, G-quadruplex motifs on different strands were associated with the orientation of CTCF binding sites. These findings suggest a potential role for G-quadruplexes in loop extrusion. The enrichment of transcription factor binding sites (TFBSs) around regulatory elements containing G-quadruplexes led to frequent interactions between regulatory elements containing G-quadruplexes. Intriguingly, more than 99% of G-quadruplexes overlapped with TFBSs. The binding sites of CTCF and cohesin proteins were preferentially located surrounding G-quadruplexes. Accordingly, we proposed a new mechanism of long-distance gene regulation in which G-quadruplexes are involved in distal interactions between enhancers and promoters.

3.
Bioinformatics ; 2018 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-30475980

RESUMO

Motivation: The replication timing (RT) program has been linked to many key biological processes including cell fate commitment, 3D chromatin organization and transcription regulation. Significant technology progress now allows to characterize the RT program in the entire human genome in a high-throughput and high-resolution fashion. These experiments suggest that RT changes dynamically during development in coordination with gene activity. Since RT is such a fundamental biological process, we believe that an effective quantitative profile of the local RT program from a diverse set of cell types in various developmental stages and lineages can provide crucial biological insights for a genomic locus. Results: In the present study, we explored recurrent and spatially coherent combinatorial profiles from 42 RT programs collected from multiple lineages at diverse differentiation states. We found that a Hidden Markov Model with 15 hidden states provide a good model to describe these genome-wide RT profiling data. Each of the hidden state represents a unique combination of RT profiles across different cell types which we refer to as "RT states". To understand the biological properties of these RT states, we inspected their relationship with chromatin states, gene expression, functional annotation and 3D chromosomal organization. We found that the newly defined RT states possess interesting genome-wide functional properties that add complementary information to the existing annotation of the human genome. Supplementary information: R scripts for inferring HMM models and Perl scripts for further analysis are available https://github.com/PouletAxel/script_HMM_Replication_timing. Supplementary data are available at Bioinformatics online.

4.
Cancer Res ; 78(16): 4760-4773, 2018 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-29898995

RESUMO

The EWS/ETS fusion transcription factors drive Ewing sarcoma (EWS) by orchestrating an oncogenic transcription program. Therapeutic targeting of EWS/ETS has been unsuccessful; however, identifying mediators of the EWS/ETS function could offer new therapeutic options. Here, we describe the dependency of EWS/ETS-driven transcription upon chromatin reader BET bromdomain proteins and investigate the potential of BET inhibitors in treating EWS. EWS/FLI1 and EWS/ERG were found in a transcriptional complex with BRD4, and knockdown of BRD2/3/4 significantly impaired the oncogenic phenotype of EWS cells. RNA-seq analysis following BRD4 knockdown or inhibition with JQ1 revealed an attenuated EWS/ETS transcriptional signature. In contrast to previous reports, JQ1 reduced proliferation and induced apoptosis through MYC-independent mechanisms without affecting EWS/ETS protein levels; this was confirmed by depleting BET proteins using PROTAC-BET degrader (BETd). Polycomb repressive complex 2 (PRC2)-associated factor PHF19 was downregulated by JQ1/BETd or BRD4 knockdown in multiple EWS lines. EWS/FLI1 bound a distal regulatory element of PHF19, and EWS/FLI1 knockdown resulted in downregulation of PHF19 expression. Deletion of PHF19 via CRISPR-Cas9 resulted in a decreased tumorigenic phenotype, a transcriptional signature that overlapped with JQ1 treatment, and increased sensitivity to JQ1. PHF19 expression was also associated with worse prognosis in patients with EWS. In vivo, JQ1 demonstrated antitumor efficacy in multiple mouse xenograft models of EWS. Together these results indicate that EWS/ETS requires BET epigenetic reader proteins for its transcriptional program and can be mitigated by BET inhibitors. This study provides a clear rationale for the clinical utility of BET inhibitors in treating EWS.Significance: These findings reveal the dependency of EWS/ETS transcription factors on BET epigenetic reader proteins and demonstrate the potential of BET inhibitors for the treatment of EWS. Cancer Res; 78(16); 4760-73. ©2018 AACR.

5.
Gigascience ; 7(6)2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29762754

RESUMO

Background: Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings: In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions: Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.


Assuntos
Redes de Comunicação de Computadores , Armazenamento e Recuperação da Informação , Software , Análise por Conglomerados , Humanos , Fluxo de Trabalho
7.
BMC Res Notes ; 10(1): 530, 2017 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-29084591

RESUMO

OBJECTIVE: The majority of sequence variants identified by Genome-wide association studies (GWASs) fall outside of the protein-coding regions. Unlike coding variants, it is challenging to connect these noncoding variants to the pathophysiology of complex diseases/traits due to the lack of functional annotations in the non-coding regions. To overcome this, by leveraging the rich collection of genomic and epigenomic profiles, we have developed DIVAN, or Disease/trait-specific Variant ANnotation, which enables the assignment of a measurement (D-score) for each base of the human genome in a disease/trait-specific manner. To facilitate the utilization of DIVAN, we pre-computed D-scores for every base of the human genome (hg19) for 45 different diseases/traits. RESULTS: In this work, we present a detailed protocol on how to utilize DIVAN software toolkit to retrieve D-scores either by variant identifiers or by genomic regions for a disease/trait of interest. We also demonstrate the utilities of the D-scores using real data examples. We believe that the pre-computed D-scores for 45 diseases/traits is a useful resource to follow up on the discoveries made by GWASs, and the DIVAN software toolkit provides a convenient way to access this resource. DIVAN is freely available at https://sites.google.com/site/emorydivan/software .


Assuntos
Variação Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Polimorfismo de Nucleotídeo Único/genética , Software , Humanos
8.
Stat Biosci ; 9(1): 73-90, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28919931

RESUMO

Modern high-throughput biotechnologies such as microarray and next generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical 'large p, small n' problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features. In this work, we discuss possible causes of the over-correction problem and propose several alternative solutions. Our strategy is rooted in the fact that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model. Through simulation and real data analysis, we demonstrated superior performance of the proposed strategy. Our new strategy also enables borrowing information across different platforms which could be extremely useful with emergence of new technologies and accumulation of data from different platforms in the Big Data era. Our method has been implemented in R package "adaptiveHM", which is freely available from https://github.com/benliemory/adaptiveHM.

9.
Nucleic Acids Res ; 45(W1): W445-W452, 2017 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-28402462

RESUMO

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org.

10.
Sci Rep ; 7: 46398, 2017 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-28429804

RESUMO

A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an 'African Diaspora Power Chip' (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry.

11.
J Innate Immun ; 9(2): 126-144, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27866206

RESUMO

STAT3 is a master transcriptional regulator that plays an important role in the induction of both immune activation and immune tolerance in dendritic cells (DCs). The transcriptional targets of STAT3 in promoting DC activation are becoming increasingly understood; however, the mechanisms underpinning its role in causing DC suppression remain largely unknown. To determine the functional gene targets of STAT3, we compared the genome-wide binding of STAT3 using ChIP sequencing coupled with gene expression microarrays to determine STAT3-dependent gene regulation in DCs after histone deacetylase (HDAC) inhibition. HDAC inhibition boosted the ability of STAT3 to bind to distinct DNA targets and regulate gene expression. Among the top 500 STAT3 binding sites, the frequency of canonical motifs was significantly higher than that of noncanonical motifs. Functional analysis revealed that after treatment with an HDAC inhibitor, the upregulated STAT3 target genes were those that were primarily the negative regulators of proinflammatory cytokines and those in the IL-10 signaling pathway. The downregulated STAT3-dependent targets were those involved in immune effector processes and antigen processing/presentation. The expression and functional relevance of these genes were validated. Specifically, functional studies confirmed that the upregulation of IL-10Ra by STAT3 contributed to the suppressive function of DCs following HDAC inhibition.


Assuntos
Células Dendríticas/fisiologia , Inibidores de Histona Desacetilases/farmacologia , Interleucina-10/metabolismo , Receptores de Interleucina-10/metabolismo , Fator de Transcrição STAT3/metabolismo , Animais , Apresentação do Antígeno/genética , Células da Medula Óssea/efeitos dos fármacos , Células da Medula Óssea/fisiologia , Diferenciação Celular/efeitos dos fármacos , Células Cultivadas , Células Dendríticas/efeitos dos fármacos , Feminino , Regulação da Expressão Gênica/efeitos dos fármacos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Análise em Microsséries , Ligação Proteica , Receptores de Interleucina-10/genética , Fator de Transcrição STAT3/genética , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/genética
12.
Genome Biol ; 17(1): 252, 2016 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-27923386

RESUMO

Understanding the link between non-coding sequence variants, identified in genome-wide association studies, and the pathophysiology of complex diseases remains challenging due to a lack of annotations in non-coding regions. To overcome this, we developed DIVAN, a novel feature selection and ensemble learning framework, which identifies disease-specific risk variants by leveraging a comprehensive collection of genome-wide epigenomic profiles across cell types and factors, along with other static genomic features. DIVAN accurately and robustly recognizes non-coding disease-specific risk variants under multiple testing scenarios; among all the features, histone marks, especially those marks associated with repressed chromatin, are often more informative than others.


Assuntos
Doença/genética , Epigenômica , Variação Genética , Cromatina/genética , Biologia Computacional/métodos , Genoma Humano , Estudo de Associação Genômica Ampla , Código das Histonas/genética , Histonas/genética , Humanos , Polimorfismo de Nucleotídeo Único , Fatores de Risco
13.
PeerJ ; 4: e2571, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27781166

RESUMO

In this study we developed a genome-based method for detecting Staphylococcus aureus subtypes from metagenome shotgun sequence data. We used a binomial mixture model and the coverage counts at >100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of 40 subtypes in metagenome samples. We were able to obtain >87% sensitivity and >94% specificity at 0.025X coverage for S. aureus. We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage >0.025. In both projects, CC8 and CC30 were the most common S. aureus clonal complexes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage often associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are possibly associated with S. aureus carriage but there was limited power to identify factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city.

14.
Nat Commun ; 7: 12522, 2016 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-27725671

RESUMO

The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry.


Assuntos
Grupo com Ancestrais do Continente Africano/genética , Fluxo Gênico , Genoma Humano , Migração Humana , Sequência de Bases , DNA Intergênico/genética , Feminino , Heterogeneidade Genética , Geografia , Humanos , Masculino , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Sexismo
15.
Cell Discov ; 2: 16008, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27462455

RESUMO

Mixed lineage leukemia protein-1 (MLL1) has a critical role in human MLL1 rearranged leukemia (MLLr) and is a validated therapeutic target. However, its role in regulating global gene expression in MLLr cells, as well as its interplay with MLL1 fusion proteins remains unclear. Here we show that despite shared DNA-binding and cofactor interacting domains at the N terminus, MLL1 and MLL-AF9 are recruited to distinct chromatin regions and have divergent functions in regulating the leukemic transcription program. We demonstrate that MLL1, probably through C-terminal interaction with WDR5, is recruited to regulatory enhancers that are enriched for binding sites of E-twenty-six (ETS) family transcription factors, whereas MLL-AF9 binds to chromatin regions that have no H3K4me1 enrichment. Transcriptome-wide changes induced by different small molecule inhibitors also highlight the distinct functions of MLL1 and MLL-AF9. Taken together, our studies provide novel insights on how MLL1 and MLL fusion proteins contribute to leukemic gene expression, which have implications for developing effective therapies in the future.

16.
PLoS Genet ; 12(5): e1006042, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27152617

RESUMO

Selective neuronal vulnerability is characteristic of most degenerative disorders of the CNS, yet mechanisms underlying this phenomenon remain poorly characterized. Many forms of cerebellar degeneration exhibit an anterior-to-posterior gradient of Purkinje cell loss including Niemann-Pick type C1 (NPC) disease, a lysosomal storage disorder characterized by progressive neurological deficits that often begin in childhood. Here, we sought to identify candidate genes underlying vulnerability of Purkinje cells in anterior cerebellar lobules using data freely available in the Allen Brain Atlas. This approach led to the identification of 16 candidate neuroprotective or susceptibility genes. We demonstrate that one candidate gene, heat shock protein beta-1 (HSPB1), promoted neuronal survival in cellular models of NPC disease through a mechanism that involved inhibition of apoptosis. Additionally, we show that over-expression of wild type HSPB1 or a phosphomimetic mutant in NPC mice slowed the progression of motor impairment and diminished cerebellar Purkinje cell loss. We confirmed the modulatory effect of Hspb1 on Purkinje cell degeneration in vivo, as knockdown by Hspb1 shRNA significantly enhanced neuron loss. These results suggest that strategies to promote HSPB1 activity may slow the rate of cerebellar degeneration in NPC disease and highlight the use of bioinformatics tools to uncover pathways leading to neuronal protection in neurodegenerative disorders.


Assuntos
Proteínas de Choque Térmico HSP27/genética , Degeneração Neural/genética , Doença de Niemann-Pick Tipo C/genética , Células de Purkinje/metabolismo , Animais , Apoptose/genética , Sobrevivência Celular/genética , Cerebelo/metabolismo , Cerebelo/patologia , Modelos Animais de Doenças , Proteínas de Choque Térmico HSP27/biossíntese , Humanos , Camundongos , Degeneração Neural/patologia , Degeneração Neural/terapia , Neurônios/metabolismo , Neurônios/patologia , Doença de Niemann-Pick Tipo C/patologia , Doença de Niemann-Pick Tipo C/terapia , Células de Purkinje/patologia , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/uso terapêutico
17.
Bioinformatics ; 32(8): 1214-6, 2016 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-26685307

RESUMO

UNLABELLED: Genome-wide association studies (GWASs) have successfully identified many sequence variants that are significantly associated with common diseases and traits. Tens of thousands of such trait-associated SNPs have already been cataloged, which we believe form a great resource for genomic research. Recent studies have demonstrated that the collection of trait-associated SNPs can be exploited to indicate whether a given genomic interval or intervals are likely to be functionally connected with certain phenotypes or diseases. Despite this importance, currently, there is no ready-to-use computational tool able to connect genomic intervals to phenotypes. Here, we present traseR, an easy-to-use R Bioconductor package that performs enrichment analyses of trait-associated SNPs in arbitrary genomic intervals with flexible options, including testing method, type of background and inclusion of SNPs in LD. AVAILABILITY AND IMPLEMENTATION: The traseR R package preloaded with up-to-date collection of trait-associated SNPs are freely available in Bioconductor CONTACT: zhaohui.qin@emory.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Fenótipo , Software , Genômica , Humanos , Polimorfismo de Nucleotídeo Único
18.
Bioinformatics ; 32(5): 682-9, 2016 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26519502

RESUMO

MOTIVATION: Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical 'large p, small n' problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. RESULTS: Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the 'large p, small n' problem. AVAILABILITY AND IMPLEMENTATION: Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT CONTACT: yuzhu@purdue.edu; zhaohui.qin@emory.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Teorema de Bayes , Algoritmos , Bases de Dados Factuais , Genômica
19.
PLoS Comput Biol ; 11(8): e1004448, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26267278

RESUMO

With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.


Assuntos
Mapeamento Cromossômico/métodos , Diploide , Genômica/métodos , Técnicas de Genotipagem/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Bases de Dados Genéticas , Genoma , Humanos , Análise de Sequência de DNA
20.
Mol Cell ; 58(2): 216-31, 2015 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-25818644

RESUMO

Chromosomes of metazoan organisms are partitioned in the interphase nucleus into discrete topologically associating domains (TADs). Borders between TADs are formed in regions containing active genes and clusters of architectural protein binding sites. The transcription of most genes is repressed after temperature stress in Drosophila. Here we show that temperature stress induces relocalization of architectural proteins from TAD borders to inside TADs, and this is accompanied by a dramatic rearrangement in the 3D organization of the nucleus. TAD border strength declines, allowing for an increase in long-distance inter-TAD interactions. Similar but quantitatively weaker effects are observed upon inhibition of transcription or depletion of individual architectural proteins. Heat shock-induced inter-TAD interactions result in increased contacts among enhancers and promoters of silenced genes, which recruit Pc and form Pc bodies in the nucleolus. These results suggest that the TAD organization of metazoan genomes is plastic and can be reconfigured quickly.


Assuntos
Cromatina/genética , Cromossomos/genética , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Proteínas do Grupo Polycomb/metabolismo , Animais , Linhagem Celular , Proteínas de Drosophila/química , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Elementos Facilitadores Genéticos , Dados de Sequência Molecular , Proteínas do Grupo Polycomb/química , Proteínas do Grupo Polycomb/genética , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Estresse Fisiológico , Temperatura Ambiente
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA