Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Tipo de estudo
Intervalo de ano de publicação
1.
Cell Rep ; 28(5): 1307-1322.e8, 2019 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-31365872

RESUMO

CD40 has major roles in B cell development, activation, and germinal center responses. CD40 hypoactivity causes immunodeficiency whereas its overexpression causes autoimmunity and lymphomagenesis. To systematically identify B cell autonomous CD40 regulators, we use CRISPR/Cas9 genome-scale screens in Daudi B cells stimulated by multimeric CD40 ligand. These highlight known CD40 pathway components and reveal multiple additional mechanisms regulating CD40. The nuclear ubiquitin ligase FBXO11 supports CD40 expression by targeting repressors CTBP1 and BCL6. FBXO11 knockout decreases primary B cell CD40 abundance and impairs class-switch recombination, suggesting that frequent lymphoma monoallelic FBXO11 mutations may balance BCL6 increase with CD40 loss. At the mRNA level, CELF1 controls exon splicing critical for CD40 activity, while the N6-adenosine methyltransferase WTAP negatively regulates CD40 mRNA abundance. At the protein level, ESCRT negatively regulates activated CD40 levels while the negative feedback phosphatase DUSP10 limits downstream MAPK responses. These results serve as a resource for future studies and highlight potential therapeutic targets.

2.
J Virol ; 93(16)2019 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-31167905

RESUMO

Super-enhancers (SEs) are clusters of enhancers marked by extraordinarily high and broad chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) signals for H3K27ac or other transcription factors (TFs). SEs play pivotal roles in development and oncogenesis. Epstein-Barr virus (EBV) super-enhancers (ESEs) are co-occupied by all essential EBV oncogenes and EBV-activated NF-κB subunits. Perturbation of ESEs stops lymphoblastoid cell line (LCL) growth. To further characterize ESEs and identify proteins critical for ESE function, MYC ESEs were cloned upstream of a green fluorescent protein (GFP) reporter. Reporters driven by MYC ESEs 525 kb and 428 kb upstream of MYC (525ESE and 428ESE) had very high activities in LCLs but not in EBV-negative BJAB cells. EBNA2 activated MYC ESE-driven luciferase reporters. CRISPRi targeting 525ESE significantly decreased MYC expression. Genome-wide CRISPR screens identified factors essential for ESE activity. TBP-associated factor (TAF) family proteins, including TAF8, TAF11, and TAF3, were essential for the activity of the integrated 525ESE-driven reporter in LCLs. TAF8 and TAF11 knockout significantly decreased 525ESE activity and MYC transcription. MEF2C was also identified to be essential for 525ESE activity. Depletion of MEF2C decreased 525ESE reporter activity, MYC expression, and LCL growth. MEF2C cDNA resistant to CRIPSR cutting rescued MEF2C knockout and restored 525ESE reporter activity and MYC expression. MEF2C depletion decreased IRF4, EBNA2, and SPI1 binding to 525ESE in LCLs. MEF2C depletion also affected the expression of other ESE target genes, including the ETS1 and BCL2 genes. These data indicated that in addition to EBNA2, TAF family members and MEF2C are essential for ESE activity, MYC expression, and LCL growth.IMPORTANCE SEs play critical roles in cancer development. Since SEs assemble much bigger protein complexes on enhancers than typical enhancers (TEs), they are more sensitive than TEs to perturbations. Understanding the protein composition of SEs that are linked to key oncogenes may identify novel therapeutic targets. A genome-wide CRISPR screen specifically identified proteins essential for MYC ESE activity but not simian virus 40 (SV40) enhancer. These proteins not only were essential for the reporter activity but also were also important for MYC expression and LCL growth. Targeting these proteins may lead to new therapies for EBV-associated cancers.

3.
Genome Biol ; 20(1): 118, 2019 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-31164141

RESUMO

BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/normas , Simulação por Computador
4.
J Biol Chem ; 294(25): 9734-9745, 2019 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-31073033

RESUMO

Early diagnosis of nasopharyngeal carcinoma (NPC) is difficult because of a lack of specific symptoms. Many patients have advanced disease at diagnosis, and these patients respond poorly to treatment. New treatments are therefore needed to improve the outcome of NPC. To better understand the molecular pathogenesis of NPC, here we used an NPC cell line in a genome-wide CRISPR-based knockout screen to identify the cellular factors and pathways essential for NPC (i.e. dependence factors). This screen identified the Moz, Ybf2/Sas3, Sas2, Tip60 histone acetyl transferase complex, NF-κB signaling, purine synthesis, and linear ubiquitination pathways; and MDM2 proto-oncogene as NPC dependence factors/pathways. Using gene knock out, complementary DNA rescue, and inhibitor assays, we found that perturbation of these pathways greatly reduces the growth of NPC cell lines but does not affect growth of SV40-immortalized normal nasopharyngeal epithelial cells. These results suggest that targeting these pathways/proteins may hold promise for achieving better treatment of patients with NPC.

5.
J Virol ; 93(13)2019 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-31019051

RESUMO

Epstein-Barr virus (EBV) infection of human primary resting B lymphocytes (RBLs) leads to the establishment of lymphoblastoid cell lines (LCLs) that can grow indefinitely in vitro EBV transforms RBLs through the expression of viral latency genes, and these genes alter host transcription programs. To globally measure the transcriptome changes during EBV transformation, primary human resting B lymphocytes (RBLs) were infected with B95.8 EBV for 0, 2, 4, 7, 14, 21, and 28 days, and poly(A) plus RNAs were analyzed by transcriptome sequencing (RNA-seq). Analyses of variance (ANOVAs) found 3,669 protein-coding genes that were differentially expressed (false-discovery rate [FDR] < 0.01). Ninety-four percent of LCL genes that are essential for LCL growth and survival were differentially expressed. Pathway analyses identified a significant enrichment of pathways involved in cell proliferation, DNA repair, metabolism, and antiviral responses. RNA-seq also identified long noncoding RNAs (lncRNAs) differentially expressed during EBV infection. Clustered regularly interspaced short palindromic repeat (CRISPR) interference (CRISPRi) and CRISPR activation (CRISPRa) found that CYTOR and NORAD lncRNAs were important for LCL growth. During EBV infection, type III EBV latency genes were expressed rapidly after infection. Immediately after LCL establishment, EBV lytic genes were also expressed in LCLs, and ∼4% of the LCLs express gp350. Chromatin immune precipitation followed by deep sequencing (ChIP-seq) and POLR2A chromatin interaction analysis followed by paired-end tag sequencing (ChIA-PET) data linked EBV enhancers to 90% of EBV-regulated genes. Many genes were linked to enhancers occupied by multiple EBNAs or NF-κB subunits. Incorporating these assays, we generated a comprehensive EBV regulome in LCLs.IMPORTANCE Epstein-Barr virus (EBV) immortalization of resting B lymphocytes (RBLs) is a useful model system to study EBV oncogenesis. By incorporating transcriptome sequencing (RNA-seq), chromatin immune precipitation followed by deep sequencing (ChIP-seq), chromatin interaction analysis followed by paired-end tag sequencing (ChIA-PET), and genome-wide clustered regularly interspaced short palindromic repeat (CRISPR) screen, we identified key pathways that EBV usurps to enable B cell growth and transformation. Multiple layers of regulation could be achieved by cooperations between multiple EBV transcription factors binding to the same enhancers. EBV manipulated the expression of most cell genes essential for lymphoblastoid cell line (LCL) growth and survival. In addition to proteins, long noncoding RNAs (lncRNAs) regulated by EBV also contributed to LCL growth and survival. The data presented in this paper not only allowed us to further define the molecular pathogenesis of EBV but also serve as a useful resource to the EBV research community.

6.
BMC Bioinformatics ; 19(Suppl 5): 112, 2018 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-29671389

RESUMO

BACKGROUND: Somatic copy number alternations (SCNAs) can be utilized to infer tumor subclonal populations in whole genome seuqncing studies, where usually their read count ratios between tumor-normal paired samples serve as the inferring proxy. Existing SCNA based subclonal population inferring tools consider the GC bias of tumor and normal sample is of the same fature, and could be fully offset by read count ratio. However, we found that, the read count ratio on SCNA segments presents a Log linear biased pattern, which influence existing read count ratios based subclonal inferring tools performance. Currently no correction tools take into account the read ratio bias. RESULTS: We present Pre-SCNAClonal, a tool that improving tumor subclonal population inferring by correcting GC-bias at SCNAs level. Pre-SCNAClonal first corrects GC bias using Markov chain Monte Carlo probability model, then accurately locates baseline DNA segments (not containing any SCNAs) with a hierarchy clustering model. We show Pre-SCNAClonal's superiority to exsiting GC-bias correction methods at any level of subclonal population. CONCLUSIONS: Pre-SCNAClonal could be run independently as well as serving as pre-processing/gc-correction step in conjuntion with exsiting SCNA-based subclonal inferring tools.

7.
Biostatistics ; 19(4): 562-578, 2018 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-29121214

RESUMO

Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.

8.
Genome Res ; 27(11): 1930-1938, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-29025895

RESUMO

The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.


Assuntos
Composição de Bases , DNA/metabolismo , Análise de Sequência de DNA/métodos , Algoritmos , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA/química , Reações Falso-Positivas , Genômica , Sequenciamento de Nucleotídeos em Larga Escala
9.
Nat Genet ; 49(11): 1613-1623, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28945250

RESUMO

Perturbations to mammalian SWI/SNF (mSWI/SNF or BAF) complexes contribute to more than 20% of human cancers, with driving roles first identified in malignant rhabdoid tumor, an aggressive pediatric cancer characterized by biallelic inactivation of the core BAF complex subunit SMARCB1 (BAF47). However, the mechanism by which this alteration contributes to tumorigenesis remains poorly understood. We find that BAF47 loss destabilizes BAF complexes on chromatin, absent significant changes in complex assembly or integrity. Rescue of BAF47 in BAF47-deficient sarcoma cell lines results in increased genome-wide BAF complex occupancy, facilitating widespread enhancer activation and opposition of Polycomb-mediated repression at bivalent promoters. We demonstrate differential regulation by two distinct mSWI/SNF assemblies, BAF and PBAF complexes, enhancers and promoters, respectively, suggesting that each complex has distinct functions that are perturbed upon BAF47 loss. Our results demonstrate collaborative mechanisms of mSWI/SNF-mediated gene activation, identifying functions that are co-opted or abated to drive human cancers and developmental disorders.


Assuntos
Carcinogênese/genética , Proteínas Cromossômicas não Histona/genética , Regulação Neoplásica da Expressão Gênica , Tumor Rabdoide/genética , Proteína SMARCB1/genética , Sarcoma/genética , Fatores de Transcrição/genética , Carcinogênese/metabolismo , Carcinogênese/patologia , Linhagem Celular Tumoral , Cromatina/química , Cromatina/metabolismo , Montagem e Desmontagem da Cromatina , Proteínas Cromossômicas não Histona/metabolismo , Elementos Facilitadores Genéticos , Teste de Complementação Genética , Humanos , Proteínas do Grupo Polycomb/genética , Proteínas do Grupo Polycomb/metabolismo , Regiões Promotoras Genéticas , Tumor Rabdoide/metabolismo , Tumor Rabdoide/patologia , Proteína SMARCB1/deficiência , Sarcoma/metabolismo , Sarcoma/patologia , Fatores de Transcrição/metabolismo
10.
J Bioinform Comput Biol ; 15(5): 1750021, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28918707

RESUMO

Structural controllability is the generalization of traditional controllability for dynamical systems. During the last decade, interesting biological discoveries have been inferred by applied structural controllability analysis to biological networks. However, false positive/negative information (i.e. nodes and edges) widely exists in biological networks that documented in public data sources, which can hinder accurate analysis of structural controllability. In this study, we propose WDNfinder, a comprehensive analysis package that provides structural controllability with consideration of node connection strength in biological networks. When applied to the human cancer signaling network and p53-mediate DNA damage response network, WDNfinder shows high accuracy on essential nodes prediction in these networks. Compared to existing methods, WDNfinder can significantly narrow down the set of minimum driver node set (MDS) under the restriction of domain knowledge. When using p53-mediate DNA damage response network as illustration, we find more meaningful MDSs by WDNfinder. The source code is implemented in python and publicly available together with relevant data on GitHub: https://github.com/dustincys/WDNfinder .


Assuntos
Algoritmos , Biologia Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Dano ao DNA/genética , Humanos , Linguagens de Programação , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo
13.
Genome Biol ; 17: 74, 2016 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-27107712

RESUMO

Obtaining RNA-seq measurements involves a complex data analytical process with a large number of competing algorithms as options. There is much debate about which of these methods provides the best approach. Unfortunately, it is currently difficult to evaluate their performance due in part to a lack of sensitive assessment metrics. We present a series of statistical summaries and plots to evaluate the performance in terms of specificity and sensitivity, available as a R/Bioconductor package ( http://bioconductor.org/packages/rnaseqcomp ). Using two independent datasets, we assessed seven competing pipelines. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest.


Assuntos
Algoritmos , Análise de Sequência de RNA/métodos , Animais , Humanos , Valores de Referência , Sensibilidade e Especificidade , Análise de Sequência de RNA/normas
14.
Bioinformatics ; 32(11): 1625-31, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-26568628

RESUMO

MOTIVATION: Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment. RESULTS: We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput. AVAILABILITY AND IMPLEMENTATION: rHAT is implemented in C++; the source code is available at https://github.com/HIT-Bioinformatics/rHAT CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Algoritmos , Genômica , Alinhamento de Sequência , Análise de Sequência de DNA
15.
Bioinformatics ; 31(14): 2262-8, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-25788626

RESUMO

MOTIVATION: Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. RESULTS: We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. AVAILABILITY AND IMPLEMENTATION: The FGB is available at http://mlg.hit.edu.cn/FGB/.


Assuntos
Genoma Humano , Linhagem , Software , Gráficos por Computador , Variação Genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular
16.
Nucleic Acids Res ; 42(Web Server issue): W192-7, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24799434

RESUMO

Advances in high-throughput sequencing technologies have brought us into the individual genome era. Projects such as the 1000 Genomes Project have led the individual genome sequencing to become more and more popular. How to visualize, analyse and annotate individual genomes with knowledge bases to support genome studies and personalized healthcare is still a big challenge. The Personal Genome Browser (PGB) is developed to provide comprehensive functional annotation and visualization for individual genomes based on the genetic-molecular-phenotypic model. Investigators can easily view individual genetic variants, such as single nucleotide variants (SNVs), INDELs and structural variations (SVs), as well as genomic features and phenotypes associated to the individual genetic variants. The PGB especially highlights potential functional variants using the PGB built-in method or SIFT/PolyPhen2 scores. Moreover, the functional risks of genes could be evaluated by scanning individual genetic variants on the whole genome, a chromosome, or a cytoband based on functional implications of the variants. Investigators can then navigate to high risk genes on the scanned individual genome. The PGB accepts Variant Call Format (VCF) and Genetic Variation Format (GVF) files as the input. The functional annotation of input individual genome variants can be visualized in real time by well-defined symbols and shapes. The PGB is available at http://www.pgbrowser.org/.


Assuntos
Variação Genética , Genoma Humano , Software , Gráficos por Computador , Genômica , Humanos , Internet
17.
BMC Med Genomics ; 6 Suppl 1: S5, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23369456

RESUMO

BACKGROUND: Bidirectional promoters are shared promoter sequences between divergent gene pair (genes proximal to each other on opposite strands), and can regulate the genes in both directions. In the human genome, > 10% of protein-coding genes are arranged head-to-head on opposite strands, with transcription start sites that are separated by < 1,000 base pairs. Many transcription factor binding sites occur in the bidirectional promoters that influence the expression of 2 opposite genes. Recently, RNA polymerase II (RPol II) ChIP-seq data are used to identify the promoters of coding genes and non-coding RNAs. However, a bidirectional promoter with RPol II ChIP-Seq data has not been found. RESULTS: In some bidirectional promoter regions, the RPol II forms a bi-peak shape, which indicates that 2 promoters are located in the bidirectional region. We have developed a computational approach to identify the regulatory regions of all divergent gene pairs using genome-wide RPol II binding patterns derived from ChIP-seq data, based upon the assumption that the distribution of RPol II binding patterns around the bidirectional promoters are accumulated by RPol II binding of 2 promoters. In HeLa S3 cells, 249 promoter pairs and 1094 single promoters were identified, of which 76 promoters cover only positive genes, 86 promoters cover only negative genes, and 932 promoters cover 2 genes. Gene expression levels and STAT1 binding sites for different promoter categories were therefore examined. CONCLUSIONS: The regulatory region of bidirectional promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription. From gene expression and transcription factor binding site analysis, the promoters in bidirectional regions may regulate the closest gene, and STAT1 is involved in primary promoter.


Assuntos
RNA Polimerase II/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Sítios de Ligação , Bases de Dados Genéticas , Feminino , Expressão Gênica , Células HeLa , Humanos , Regiões Promotoras Genéticas , Ligação Proteica , RNA Polimerase II/genética , Curva ROC , Fator de Transcrição STAT1/metabolismo , Neoplasias do Colo do Útero/genética , Neoplasias do Colo do Útero/metabolismo
18.
Comp Funct Genomics ; 2012: 376706, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22956892

RESUMO

A number of empirical Bayes models (each with different statistical distribution assumptions) have now been developed to analyze differential DNA methylation using high-density oligonucleotide tiling arrays. However, it remains unclear which model performs best. For example, for analysis of differentially methylated regions for conservative and functional sequence characteristics (e.g., enrichment of transcription factor-binding sites (TFBSs)), the sensitivity of such analyses, using various empirical Bayes models, remains unclear. In this paper, five empirical Bayes models were constructed, based on either a gamma distribution or a log-normal distribution, for the identification of differential methylated loci and their cell division-(1, 3, and 5) and drug-treatment-(cisplatin) dependent methylation patterns. While differential methylation patterns generated by log-normal models were enriched with numerous TFBSs, we observed almost no TFBS-enriched sequences using gamma assumption models. Statistical and biological results suggest log-normal, rather than gamma, empirical Bayes model distribution to be a highly accurate and precise method for differential methylation microarray analysis. In addition, we presented one of the log-normal models for differential methylation analysis and tested its reproducibility by simulation study. We believe this research to be the first extensive comparison of statistical modeling for the analysis of differential DNA methylation, an important biological phenomenon that precisely regulates gene transcription.

19.
Bioinformatics ; 28(14): 1879-86, 2012 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-22611130

RESUMO

MOTIVATION: One of the fundamental questions in genetics study is to identify functional DNA variants that are responsible to a disease or phenotype of interest. Results from large-scale genetics studies, such as genome-wide association studies (GWAS), and the availability of high-throughput sequencing technologies provide opportunities in identifying causal variants. Despite the technical advances, informatics methodologies need to be developed to prioritize thousands of variants for potential causative effects. RESULTS: We present regSNPs, an informatics strategy that integrates several established bioinformatics tools, for prioritizing regulatory SNPs, i.e. the SNPs in the promoter regions that potentially affect phenotype through changing transcription of downstream genes. Comparing to existing tools, regSNPs has two distinct features. It considers degenerative features of binding motifs by calculating the differences on the binding affinity caused by the candidate variants and integrates potential phenotypic effects of various transcription factors. When tested by using the disease-causing variants documented in the Human Gene Mutation Database, regSNPs showed mixed performance on various diseases. regSNPs predicted three SNPs that can potentially affect bone density in a region detected in an earlier linkage study. Potential effects of one of the variants were validated using luciferase reporter assay.


Assuntos
Biologia Computacional/métodos , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas , Fatores de Transcrição/genética , Área Sob a Curva , Sítios de Ligação , Bases de Dados Genéticas , Ligação Genética , Genoma Humano , Estudo de Associação Genômica Ampla , Células HEK293 , Humanos , Fenótipo , Curva ROC
20.
PLoS One ; 7(3): e32928, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22412954

RESUMO

It is now established that, as compared to normal cells, the cancer cell genome has an overall inverse distribution of DNA methylation ("methylome"), i.e., predominant hypomethylation and localized hypermethylation, within "CpG islands" (CGIs). Moreover, although cancer cells have reduced methylation "fidelity" and genomic instability, accurate maintenance of aberrant methylomes that underlie malignant phenotypes remains necessary. However, the mechanism(s) of cancer methylome maintenance remains largely unknown. Here, we assessed CGI methylation patterns propagated over 1, 3, and 5 divisions of A2780 ovarian cancer cells, concurrent with exposure to the DNA cross-linking chemotherapeutic cisplatin, and observed cell generation-successive increases in total hyper- and hypo-methylated CGIs. Empirical bayesian modeling revealed five distinct modes of methylation propagation: (1) heritable (i.e., unchanged) high-methylation (1186 probe loci in CGI microarray); (2) heritable (i.e., unchanged) low-methylation (286 loci); (3) stochastic hypermethylation (i.e., progressively increased, 243 loci); (4) stochastic hypomethylation (i.e., progressively decreased, 247 loci); and (5) considerable "random" methylation (582 loci). These results support a "stochastic model" of DNA methylation equilibrium deriving from the efficiency of two distinct processes, methylation maintenance and de novo methylation. A role for cis-regulatory elements in methylation fidelity was also demonstrated by highly significant (p<2.2×10(-5)) enrichment of transcription factor binding sites in CGI probe loci showing heritably high (118 elements) and low (47 elements) methylation, and also in loci demonstrating stochastic hyper-(30 elements) and hypo-(31 elements) methylation. Notably, loci having "random" methylation heritability displayed nearly no enrichment. These results demonstrate an influence of cis-regulatory elements on the nonrandom propagation of both strictly heritable and stochastically heritable CGIs.


Assuntos
Metilação de DNA , Sequências Reguladoras de Ácido Nucleico , Antineoplásicos/farmacologia , Sítios de Ligação , Ciclo Celular , Linhagem Celular Tumoral , Cisplatino/farmacologia , Análise por Conglomerados , Ilhas de CpG , Metilação de DNA/efeitos dos fármacos , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Neoplasias/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA