RESUMEN
It has been widely accepted that 5-methylcytosine is the only form of DNA methylation in mammalian genomes. Here we identify N(6)-methyladenine as another form of DNA modification in mouse embryonic stem cells. Alkbh1 encodes a demethylase for N(6)-methyladenine. An increase of N(6)-methyladenine levels in Alkbh1-deficient cells leads to transcriptional silencing. N(6)-methyladenine deposition is inversely correlated with the evolutionary age of LINE-1 transposons; its deposition is strongly enriched at young (<1.5 million years old) but not old (>6 million years old) L1 elements. The deposition of N(6)-methyladenine correlates with epigenetic silencing of such LINE-1 transposons, together with their neighbouring enhancers and genes, thereby resisting the gene activation signals during embryonic stem cell differentiation. As young full-length LINE-1 transposons are strongly enriched on the X chromosome, genes located on the X chromosome are also silenced. Thus, N(6)-methyladenine developed a new role in epigenetic silencing in mammalian evolution distinct from its role in gene activation in other organisms. Our results demonstrate that N(6)-methyladenine constitutes a crucial component of the epigenetic regulation repertoire in mammalian genomes.
Asunto(s)
Adenina/análogos & derivados , Metilación de ADN , Epigénesis Genética/genética , Células Madre Embrionarias de Ratones/metabolismo , Adenina/metabolismo , Histona H2a Dioxigenasa, Homólogo 1 de AlkB , Animales , Diferenciación Celular/genética , Elementos Transponibles de ADN/genética , ADN-(Sitio Apurínico o Apirimidínico) Liasa/deficiencia , ADN-(Sitio Apurínico o Apirimidínico) Liasa/genética , ADN-(Sitio Apurínico o Apirimidínico) Liasa/metabolismo , Elementos de Facilitación Genéticos/genética , Evolución Molecular , Silenciador del Gen , Elementos de Nucleótido Esparcido Largo/genética , Mamíferos/genética , Ratones , Células Madre Embrionarias de Ratones/citología , Regulación hacia Arriba/genética , Cromosoma X/genética , Cromosoma X/metabolismoRESUMEN
[This corrects the article DOI: 10.1371/journal.pgen.1005954.].
RESUMEN
We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.
Asunto(s)
Lubina/genética , Mapeo Cromosómico , Animales , Lubina/clasificación , Genoma , Hibridación Fluorescente in Situ , FilogeniaRESUMEN
Despite the recent rapid growth in genome-wide data, much of human variation remains entirely unexplained. A significant challenge in the pursuit of the genetic basis for variation in common human traits is the efficient, coordinated collection of genotype and phenotype data. We have developed a novel research framework that facilitates the parallel study of a wide assortment of traits within a single cohort. The approach takes advantage of the interactivity of the Web both to gather data and to present genetic information to research participants, while taking care to correct for the population structure inherent to this study design. Here we report initial results from a participant-driven study of 22 traits. Replications of associations (in the genes OCA2, HERC2, SLC45A2, SLC24A4, IRF4, TYR, TYRP1, ASIP, and MC1R) for hair color, eye color, and freckling validate the Web-based, self-reporting paradigm. The identification of novel associations for hair morphology (rs17646946, near TCHH; rs7349332, near WNT10A; and rs1556547, near OFCC1), freckling (rs2153271, in BNC2), the ability to smell the methanethiol produced after eating asparagus (rs4481887, near OR2M7), and photic sneeze reflex (rs10427255, near ZEB2, and rs11856995, near NR2F2) illustrates the power of the approach.
Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Cromosomas Humanos , Genómica , Genotipo , Cabello , Humanos , Internet , Modelos Genéticos , FenotipoRESUMEN
Signal peptides are N-terminal sequences that mediate the targeting and translocation of secreted or cell-surface proteins to the endoplasmic reticulum (ER) membrane. Because of the variability among signal peptides, traditional methods for predicting the effects of an amino acid substitution based on sequence conservation methods may be limited in their use. To address this, we present a scoring function that assesses the effects of an amino acid change within the signal peptide by using data from SignalP, a signal peptide prediction algorithm. Our score incorporates the maximum alterations of the C- and S-scores from SignalP between original and changed versions of the signal peptide. We demonstrate that this metric can discriminate disease-associated mutations from single nucleotide polymorphisms (SNPs) in signal peptides. We further show that polymorphisms with low minor allele frequency (MAF) are more likely to affect the function of the signal peptide. In conjunction with Sorting Intolerant From Tolerant (SIFT), a conservation-based amino acid substitution prediction method, our approach classifies such changes to signal peptides more accurately than other known alternatives, including D-score-based methods. We also examine experimentally characterized mutations and find that our metric minimizes false positives and can predict whether the mutation will affect cleavage or translocation. Finally, we apply our approach to a set of recently produced large-scale cancer somatic mutations from colon and breast cancers and generate a prioritized list of mutations in signal peptides that might impair protein function.
Asunto(s)
Sustitución de Aminoácidos , Biología Computacional/métodos , Señales de Clasificación de Proteína/genética , Algoritmos , Animales , Bases de Datos de Proteínas , HumanosRESUMEN
GeneHub-GEPIS is a web application that performs digital expression analysis in human and mouse tissues based on an integrated gene database. Using aggregated expressed sequence tag (EST) library information and EST counts, the application calculates the normalized gene expression levels across a large panel of normal and tumor tissues, thus providing rapid expression profiling for a given gene. The backend GeneHub component of the application contains pre-defined gene structures derived from mRNA transcript sequences from major databases and includes extensive cross references for commonly used gene identifiers. ESTs are then linked to genes based on their precise genomic locations as determined by GMAP. This genome-based approach reduces incorrect matches between ESTs and genes, thus minimizing the noise seen with previous tools. In addition, the gene-centric design makes it possible to add several important features, including text searching capabilities, the ability to accept diverse input values, expression analysis for microRNAs, basic gene annotation, batch analysis and linking between mouse and human genes. GeneHub-GEPIS is available at http://www.cgl.ucsf.edu/Research/genentech/genehub-gepis/ or http://www.gepis.org/.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Perfilación de la Expresión Génica/métodos , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Biomarcadores de Tumor/genética , Etiquetas de Secuencia Expresada , Pruebas Genéticas/métodos , Humanos , Internet , Neoplasias/diagnóstico , Sistemas en Línea , Alineación de Secuencia/métodosRESUMEN
A sequence similarity metric operating on 10 kb upstream regions of gene pairs quantitatively predicts a portion of co-variation of expression of gene pairs in large-scale gene expression studies in human tumors and tumor-derived cell lines. The signal on which the metric depends most strongly originates in the compositional structure of repetitive genomic sequences (particularly Alu elements) present in these upstream regions. This effect is completely separable from effects of isochore composition on gene expression. The results implicate repetitive elements with some functional role in transcriptional regulation of the specific genes in whose promoter regions they reside and lend credence to suggestions that the general phenomenon of repetitive element insertions may be a fundamental evolutionary mechanism for modulating gene transcription.
Asunto(s)
ADN/química , Regulación de la Expresión Génica , Secuencias Repetitivas de Aminoácido , Composición de Base , ADN/genética , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/metabolismo , Células Tumorales CultivadasRESUMEN
BACKGROUND: DNA copy number alterations are frequently observed in ovarian cancer, but it remains a challenge to identify the most relevant alterations and the specific causal genes in those regions. METHODS: We obtained high-resolution 500K SNP array data for 52 ovarian tumors and identified the most statistically significant minimal genomic regions with the most prevalent and highest-level copy number alterations (recurrent CNAs). Within a region of recurrent CNA, comparison of expression levels in tumors with a given CNA to tumors lacking that CNA and to whole normal ovary samples was used to select genes with CNA-specific expression patterns. A public expression array data set of laser capture micro-dissected (LCM) non-malignant fallopian tube epithelia and LCM ovarian serous adenocarcinoma was used to evaluate the effect of cell-type mixture biases. RESULTS: Fourteen recurrent deletions were detected on chromosomes 4, 6, 9, 12, 13, 15, 16, 17, 18, 22 and most prevalently on X and 8. Copy number and expression data suggest several apoptosis mediators as candidate drivers of the 8p deletions. Sixteen recurrent gains were identified on chromosomes 1, 2, 3, 5, 8, 10, 12, 15, 17, 19, and 20, with the most prevalent gains localized to 8q and 3q. Within the 8q amplicon, PVT1, but not MYC, was strongly over-expressed relative to tumors lacking this CNA and showed over-expression relative to normal ovary. Likewise, the cell polarity regulators PRKCI and ECT2 were identified as putative drivers of two distinct amplicons on 3q. Co-occurrence analyses suggested potential synergistic or antagonistic relationships between recurrent CNAs. Genes within regions of recurrent CNA showed an enrichment of Cancer Census genes, particularly when filtered for CNA-specific expression. CONCLUSION: These analyses provide detailed views of ovarian cancer genomic changes and highlight the benefits of using multiple reference sample types for the evaluation of CNA-specific expression changes.
RESUMEN
BACKGROUND: MicroRNAs (miRNAs) are small noncoding RNAs that bind mRNA target transcripts and repress gene expression. They have been implicated in multiple diseases, such as cancer, but the mechanisms of this involvement are not well understood. Given the complexity and degree of interactions between miRNAs and target genes, understanding how miRNAs achieve their specificity is important to understanding miRNA function and identifying their role in disease. RESULTS: Here we report factors that influence miRNA regulation by considering the effects of both single and multiple miRNAs targeting human genes. In the case of single miRNA targeting, we developed a metric that integrates miRNA and mRNA expression data to calculate how changes in miRNA expression affect target mRNA expression. Using the metric, our global analysis shows that the repression of a given miRNA on a target mRNA is modulated by 3' untranslated region length, the number of target sites, and the distance between a pair of binding sites. Additionally, we show that some miRNAs preferentially repress transcripts with longer CTG repeats, suggesting a possible role for miRNAs in repeat expansion disorders such as myotonic dystrophy. We also examine the large class of genes targeted by multiple miRNAs and show that specific types of genes are progressively more enriched as the number of targeting miRNAs increases. Expression microarray data further show that these highly targeted genes are downregulated relative to genes targeted by few miRNAs, which suggests that highly targeted genes are tightly regulated and that their dysregulation may lead to disease. In support of this idea, cancer genes are strongly enriched among highly targeted genes. CONCLUSION: Our data show that the rules governing miRNA targeting are complex, but that understanding the mechanisms that drive such control can uncover miRNAs' role in disease. Our study suggests that the number and arrangement of miRNA recognition sites can influence the degree and specificity of miRNA-mediated gene repression.
Asunto(s)
Regulación hacia Abajo , Regulación de la Expresión Génica , MicroARNs/metabolismo , Animales , Secuencia de Bases , Sitios de Unión , Perros , Humanos , Ratones , Datos de Secuencia Molecular , RatasRESUMEN
MOTIVATION: We present a novel algorithm, MaMF, for identifying transcription factor (TF) binding site motifs. The method is deterministic and depends on an indexing technique to optimize the search process. On common yeast datasets, MaMF performs competitively with other methods. We also present results on a challenging group of eight sets of human genes known to be responsive to a diverse group of TFs. In every case, MaMF finds the annotated motif among the top scoring putative motifs. We compared MaMF against other motif finders on a larger human group of 21 gene sets and found that MaMF performs better than other algorithms. We analyzed the remaining high scoring motifs and show that many correspond to other TFs that are known to co-occur with the annotated TF motifs. The significant and frequent presence of co-occurring transcription factor binding sites explains in part the difficulty of human motif finding. MaMF is a very fast algorithm, suitable for application to large numbers of interesting gene sets.