Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Cell ; 138(2): 314-27, 2009 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-19632181

RESUMEN

Differences in expression, protein interactions, and DNA binding of paralogous transcription factors ("TF parameters") are thought to be important determinants of regulatory and biological specificity. However, both the extent of TF divergence and the relative contribution of individual TF parameters remain undetermined. We comprehensively identify dimerization partners, spatiotemporal expression patterns, and DNA-binding specificities for the C. elegans bHLH family of TFs, and model these data into an integrated network. This network displays both specificity and promiscuity, as some bHLH proteins, DNA sequences, and tissues are highly connected, whereas others are not. By comparing all bHLH TFs, we find extensive divergence and that all three parameters contribute equally to bHLH divergence. Our approach provides a framework for examining divergence for other protein families in C. elegans and in other complex multicellular organisms, including humans. Cross-species comparisons of integrated networks may provide further insights into molecular features underlying protein family evolution. For a video summary of this article, see the PaperFlick file available with the online Supplemental Data.


Asunto(s)
Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Animales , Animales Modificados Genéticamente , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , ADN/metabolismo , Redes Reguladoras de Genes , Masculino , Datos de Secuencia Molecular , Regiones Promotoras Genéticas , Multimerización de Proteína
2.
Cell ; 133(7): 1266-76, 2008 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-18585359

RESUMEN

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.


Asunto(s)
ADN/química , Proteínas de Homeodominio/química , Animales , Secuencia de Bases , Biología Computacional , Secuencia Conservada , ADN/metabolismo , Evolución Molecular , Proteínas de Homeodominio/metabolismo , Ratones , Modelos Moleculares , Unión Proteica , Factores de Transcripción/química , Factores de Transcripción/metabolismo
3.
Genome Res ; 25(10): 1570-80, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26286554

RESUMEN

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.


Asunto(s)
Variación Genética , Genoma Humano , Análisis de Secuencia de ADN/métodos , Algoritmos , Carcinoma Ductal/genética , Carcinoma Ductal de Mama/genética , Fragmentación del ADN , Humanos , Alineación de Secuencia/métodos
4.
Genome Res ; 23(7): 1097-108, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23568837

RESUMEN

Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative whole-genome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma.


Asunto(s)
Neoplasias de la Mama/genética , Transformación Celular Neoplásica/genética , Genoma Humano , Mutación , Alelos , Aneuploidia , Neoplasias de la Mama/patología , Carcinoma/genética , Carcinoma/patología , Progresión de la Enfermedad , Femenino , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple
5.
Nucleic Acids Res ; 40(1): e5, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22064853

RESUMEN

Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.


Asunto(s)
Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Alelos , Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento
6.
Nucleic Acids Res ; 40(Database issue): D1137-43, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22102592

RESUMEN

Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenome Resource (http://oligogenome.stanford.edu/). This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma Humano , Sondas de Oligonucleótidos/química , Análisis de Secuencia de ADN , Mapeo Cromosómico , Humanos , Anotación de Secuencia Molecular , Sondas de Oligonucleótidos/normas
7.
Nucleic Acids Res ; 37(Database issue): D77-82, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18842628

RESUMEN

The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA-binding specificities of proteins. This initial release of the UniPROBE database provides a centralized resource for accessing comprehensive PBM data on the preferences of proteins for all possible sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database hosts DNA-binding data for over 175 nonredundant proteins from a diverse collection of organisms, including the prokaryote Vibrio harveyi, the eukaryotic malarial parasite Plasmodium falciparum, the parasitic Apicomplexan Cryptosporidium parvum, the yeast Saccharomyces cerevisiae, the worm Caenorhabditis elegans, mouse and human. Current web tools include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences. The UniPROBE database is available at http://thebrain.bwh.harvard.edu/uniprobe/.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , Bases de Datos de Proteínas , Análisis por Matrices de Proteínas , Animales , Sitios de Unión , Humanos , Internet , Ratones , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador
8.
Genome Med ; 7(1): 28, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25918554

RESUMEN

BACKGROUND: All cells in an individual are related to one another by a bifurcating lineage tree, in which each node is an ancestral cell that divided into two, each branch connects two nodes, and the root is the zygote. When a somatic mutation occurs in an ancestral cell, all its descendants carry the mutation, which can then serve as a lineage marker for the phylogenetic reconstruction of tumor progression. Using this concept, we investigate cell lineage relationships and genetic heterogeneity of pre-invasive neoplasias compared to invasive carcinomas. METHODS: We deeply sequenced over a thousand phylogenetically informative somatic variants in 66 morphologically independent samples from six patients that represent a spectrum of normal, early neoplasia, carcinoma in situ, and invasive carcinoma. For each patient, we obtained a highly resolved lineage tree that establishes the phylogenetic relationships among the pre-invasive lesions and with the invasive carcinoma. RESULTS: The trees reveal lineage heterogeneity of pre-invasive lesions, both within the same lesion, and between histologically similar ones. On the basis of the lineage trees, we identified a large number of independent recurrences of PIK3CA H1047 mutations in separate lesions in four of the six patients, often separate from the diagnostic carcinoma. CONCLUSIONS: Our analyses demonstrate that multi-sample phylogenetic inference provides insights on the origin of driver mutations, lineage heterogeneity of neoplastic proliferations, and the relationship of genomically aberrant neoplasias with the primary tumors. PIK3CA driver mutations may be comparatively benign inducers of cellular proliferation.

9.
J Comput Biol ; 20(11): 933-44, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24195709

RESUMEN

Next-generation sequencing technologies provide a powerful tool for studying genome evolution during progression of advanced diseases such as cancer. Although many recent studies have employed new sequencing technologies to detect mutations across multiple, genetically related tumors, current methods do not exploit available phylogenetic information to improve the accuracy of their variant calls. Here, we present a novel algorithm that uses somatic single-nucleotide variations (SNVs) in multiple, related tissue samples as lineage markers for phylogenetic tree reconstruction. Our method then leverages the inferred phylogeny to improve the accuracy of SNV discovery. Experimental analyses demonstrate that our method achieves up to 32% improvement for somatic SNV calling of multiple, related samples over the accuracy of GATK's Unified Genotyper, the state-of-the-art multisample SNV caller.


Asunto(s)
Análisis Mutacional de ADN , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Algoritmos , Simulación por Computador , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Mutación , Filogenia
10.
PLoS One ; 6(6): e21088, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21738606

RESUMEN

We have developed an integrated strategy for targeted resequencing and analysis of gene subsets from the human exome for variants. Our capture technology is geared towards resequencing gene subsets substantially larger than can be done efficiently with simplex or multiplex PCR but smaller in scale than exome sequencing. We describe all the steps from the initial capture assay to single nucleotide variant (SNV) discovery. The capture methodology uses in-solution 80-mer oligonucleotides. To provide optimal flexibility in choosing human gene targets, we designed an in silico set of oligonucleotides, the Human OligoExome, that covers the gene exons annotated by the Consensus Coding Sequencing Project (CCDS). This resource is openly available as an Internet accessible database where one can download capture oligonucleotides sequences for any CCDS gene and design custom capture assays. Using this resource, we demonstrated the flexibility of this assay by custom designing capture assays ranging from 10 to over 100 gene targets with total capture sizes from over 100 Kilobases to nearly one Megabase. We established a method to reduce capture variability and incorporated indexing schemes to increase sample throughput. Our approach has multiple applications that include but are not limited to population targeted resequencing studies of specific gene subsets, validation of variants discovered in whole genome sequencing surveys and possible diagnostic analysis of disease gene subsets. We also present a cost analysis demonstrating its cost-effectiveness for large population studies.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Exones/genética , Genoma Humano/genética , Humanos , Reacción en Cadena de la Polimerasa Multiplex , Reacción en Cadena de la Polimerasa
11.
Science ; 324(5935): 1720-3, 2009 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-19443739

RESUMEN

Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.


Asunto(s)
ADN/metabolismo , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Sitios de Unión , ADN/química , Ensayo de Cambio de Movilidad Electroforética , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Ratones , Análisis por Matrices de Proteínas , Unión Proteica , Estructura Terciaria de Proteína , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/metabolismo
12.
Genome Res ; 19(4): 556-66, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19158363

RESUMEN

Transcription factors (TFs) regulate the expression of genes through sequence-specific interactions with DNA-binding sites. However, despite recent progress in identifying in vivo TF binding sites by microarray readout of chromatin immunoprecipitation (ChIP-chip), nearly half of all known yeast TFs are of unknown DNA-binding specificities, and many additional predicted TFs remain uncharacterized. To address these gaps in our knowledge of yeast TFs and their cis regulatory sequences, we have determined high-resolution binding profiles for 89 known and predicted yeast TFs, over more than 2.3 million gapped and ungapped 8-bp sequences ("k-mers"). We report 50 new or significantly different direct DNA-binding site motifs for yeast DNA-binding proteins and motifs for eight proteins for which only a consensus sequence was previously known; in total, this corresponds to over a 50% increase in the number of yeast DNA-binding proteins with experimentally determined DNA-binding specificities. Among other novel regulators, we discovered proteins that bind the PAC (Polymerase A and C) motif (GATGAG) and regulate ribosomal RNA (rRNA) transcription and processing, core cellular processes that are constituent to ribosome biogenesis. In contrast to earlier data types, these comprehensive k-mer binding data permit us to consider the regulatory potential of genomic sequence at the individual word level. These k-mer data allowed us to reannotate in vivo TF binding targets as direct or indirect and to examine TFs' potential effects on gene expression in approximately 1,700 environmental and cellular conditions. These approaches could be adapted to identify TFs and cis regulatory elements in higher eukaryotes.


Asunto(s)
ADN de Hongos/metabolismo , Proteínas de Unión al ADN/metabolismo , Regulación Fúngica de la Expresión Génica , Elementos de Respuesta/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Sitios de Unión , Inmunoprecipitación de Cromatina , Biología Computacional , ADN de Hongos/genética , Proteínas de Unión al ADN/genética , Perfilación de la Expresión Génica , Genoma Fúngico , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa , Regiones Promotoras Genéticas , Unión Proteica , Secuencias Reguladoras de Ácidos Nucleicos , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética
13.
Genome Res ; 17(6): 732-45, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17567993

RESUMEN

For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.


Asunto(s)
Mapeo Cromosómico , Perfilación de la Expresión Génica , Regulación de la Expresión Génica/fisiología , Genoma Humano/fisiología , Sitios de Carácter Cuantitativo/genética , Transcripción Genética/fisiología , Secuencia de Bases , Humanos , Datos de Secuencia Molecular
14.
Ann Neurol ; 56(1): 86-96, 2004 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-15236405

RESUMEN

Nemaline myopathy (NM) is the most common of several congenital myopathies that present with skeletal muscle weakness and hypotonia. It is clinically heterogeneous and the diagnosis is confirmed by identification of nemaline bodies in affected muscles. The skeletal muscle alpha-actin gene (ACTA1) is one of five genes for thin filament proteins identified so far as responsible for different forms of NM. We have screened the ACTA1 gene in a cohort of 109 unrelated patients with NM. Here, we describe clinical and pathological features associated with 29 ACTA1 mutations found in 38 individuals from 28 families. Although ACTA1 mutations cause a remarkably heterogeneous range of phenotypes, they were preferentially associated with severe clinical presentations (p < 0.0001). Most pathogenic ACTA1 mutations were missense changes with two instances of single base pair deletions. Most patients with ACTA1 mutations had no prior family history of neuromuscular disease (24/28). One severe case, caused by compound heterozygous recessive ACTA1 mutations, demonstrated increased alpha-cardiac actin expression, suggesting that cardiac actin might partially compensate for ACTA1 abnormalities in the fetal/neonatal period. This cohort also includes the first instance of an ACTA1 mutation manifesting with adult-onset disease and two pedigrees exhibiting potential incomplete penetrance. Overall, ACTA1 mutations are a common cause of NM, accounting for more than half of severe cases and 26% of all NM cases in this series.


Asunto(s)
Actinas/genética , Músculo Esquelético/fisiología , Mutación , Miopatías Nemalínicas/genética , Actinina/metabolismo , Actinas/metabolismo , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Animales , Biopsia , Niño , Preescolar , Análisis Mutacional de ADN , Femenino , Humanos , Lactante , Masculino , Persona de Mediana Edad , Músculo Esquelético/patología , Miopatías Nemalínicas/diagnóstico , Miopatías Nemalínicas/patología , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA