RESUMEN
Despite modern sequencing efforts, the difficulty in assembly of highly repetitive sequences has prevented resolution of human genome gaps, including some in the coding regions of genes with important biological functions. One such gene, MUC5AC, encodes a large, secreted mucin, which is one of the two major secreted mucins in human airways. The MUC5AC region contains a gap in the human genome reference (hg19) across the large, highly repetitive, and complex central exon. This exon is predicted to contain imperfect tandem repeat sequences and multiple conserved cysteine-rich (CysD) domains. To resolve the MUC5AC genomic gap, we used high-fidelity long PCR followed by single molecule real-time (SMRT) sequencing. This technology yielded long sequence reads and robust coverage that allowed for de novo sequence assembly spanning the entire repetitive region. Furthermore, we used SMRT sequencing of PCR amplicons covering the central exon to identify genetic variation in four individuals. The results demonstrated the presence of segmental duplications of CysD domains, insertions/deletions (indels) of tandem repeats, and single nucleotide variants. Additional studies demonstrated that one of the identified tandem repeat insertions is tagged by nonexonic single nucleotide polymorphisms. Taken together, these data illustrate the successful utility of SMRT sequencing long reads for de novo assembly of large repetitive sequences to fill the gaps in the human genome. Characterization of the MUC5AC gene and the sequence variation in the central exon will facilitate genetic and functional studies for this critical airway mucin.
Asunto(s)
Exones/genética , Genoma Humano/genética , Mucina 5AC/genética , Polimorfismo de Nucleótido Simple/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Humanos , Desequilibrio de Ligamiento/genética , Mucinas/genética , Análisis de Secuencia de ADN/métodosRESUMEN
We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.
Asunto(s)
Emparejamiento Base , Biología Computacional/métodos , Variación Genética , Genoma Humano , Ligasas , Análisis de Secuencia de ADN/métodos , África , Secuencia de Bases , Genómica , Genotipo , Heterocigoto , Homocigoto , Humanos , Polimorfismo de Nucleótido Simple , Estándares de ReferenciaRESUMEN
We developed a massive-scale RNA sequencing protocol, short quantitative random RNA libraries or SQRL, to survey the complexity, dynamics and sequence content of transcriptomes in a near-complete fashion. This method generates directional, random-primed, linear cDNA libraries that are optimized for next-generation short-tag sequencing. We surveyed the poly(A)(+) transcriptomes of undifferentiated mouse embryonic stem cells (ESCs) and embryoid bodies (EBs) at an unprecedented depth (10 Gb), using the Applied Biosystems SOLiD technology. These libraries capture the genomic landscape of expression, state-specific expression, single-nucleotide polymorphisms (SNPs), the transcriptional activity of repeat elements, and both known and new alternative splicing events. We investigated the impact of transcriptional complexity on current models of key signaling pathways controlling ESC pluripotency and differentiation, highlighting how SQRL can be used to characterize transcriptome content and dynamics in a quantitative and reproducible manner, and suggesting that our understanding of transcriptional complexity is far from complete.
Asunto(s)
Células Madre Embrionarias/metabolismo , Perfilación de la Expresión Génica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Animales , Diferenciación Celular , Células Madre Embrionarias/citología , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica/estadística & datos numéricos , Biblioteca de Genes , Ratones , Células Madre Pluripotentes/citología , Células Madre Pluripotentes/metabolismo , Polimorfismo de Nucleótido Simple , Sensibilidad y Especificidad , Transducción de SeñalRESUMEN
Homeobox transcription factors of the vertebrate CRX/OTX family play critical roles in photoreceptor neurons, the rostral brain and circadian processes. In mouse, the three related proteins, CRX, OTX1, and OTX2, fulfill these functions. In Drosophila, the single founding member of this gene family, called orthodenticle (otd), is required during embryonic brain and photoreceptor neuron development. We have used global gene expression analysis in late pupal heads to better characterize the post-embryonic functions of Otd in Drosophila. We have identified 61 genes that are differentially expressed between wild type and a viable eye-specific otd mutant allele. Among them, about one-third represent potentially direct targets of Otd based on their association with evolutionarily conserved Otd-binding sequences. The spectrum of biological functions associated with these gene targets establishes Otd as a critical regulator of photoreceptor morphology and phototransduction, as well as suggests its involvement in circadian processes. Together with the well-documented role of otd in embryonic patterning, this evidence shows that vertebrate and fly genes contribute to analogous biological processes, notwithstanding the significant divergence of the underlying genetic pathways. Our findings underscore the common evolutionary history of photoperception-based functions in vertebrates and invertebrates and support the view that a complex nervous system was already present in the last common ancestor of all bilateria.
Asunto(s)
Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila/genética , Drosophila/metabolismo , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Factores de Transcripción Otx/genética , Factores de Transcripción Otx/metabolismo , Transactivadores/genética , Transactivadores/metabolismo , Vertebrados/genética , Vertebrados/metabolismo , Animales , Animales Modificados Genéticamente , Ritmo Circadiano/genética , Drosophila/crecimiento & desarrollo , Evolución Molecular , Femenino , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Genes de Insecto , Operón Lac , Fototransducción/genética , Masculino , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos , Células Fotorreceptoras de Invertebrados/crecimiento & desarrollo , Células Fotorreceptoras de Invertebrados/metabolismo , Células Fotorreceptoras de Vertebrados/metabolismo , Especificidad de la Especie , Vertebrados/crecimiento & desarrolloRESUMEN
The Decapentaplegic and Notch signaling pathways are thought to direct regional specification in the Drosophila eye-antennal epithelium by controlling the expression of selector genes for the eye (Eyeless/Pax6, Eyes absent) and/or antenna (Distal-less). Here, we investigate the function of these signaling pathways in this process. We find that organ primordia formation is indeed controlled at the level of Decapentaplegic expression but critical steps in regional specification occur earlier than previously proposed. Contrary to previous findings, Notch does not specify eye field identity by promoting Eyeless expression but it influences eye primordium formation through its control of proliferation. Our analysis of Notch function reveals an important connection between proliferation, field size, and regional specification. We propose that field size modulates the interaction between the Decapentaplegic and Wingless pathways, thereby linking proliferation and patterning in eye primordium development.
Asunto(s)
Proteínas de Drosophila/fisiología , Ojo/embriología , Proteínas de la Membrana/fisiología , Regiones Promotoras Genéticas/fisiología , Transducción de Señal/fisiología , Animales , Tipificación del Cuerpo , Diferenciación Celular , Drosophila/embriología , Proteínas de Drosophila/metabolismo , Epitelio/embriología , Epitelio/fisiología , Regulación del Desarrollo de la Expresión Génica , Cabeza/anatomía & histología , Cabeza/embriología , Proteínas de Homeodominio , Inmunohistoquímica/métodos , Proteínas de Insectos/fisiología , Proteínas de la Membrana/metabolismo , Microscopía Confocal , Mutación , Receptores Notch , Transducción de Señal/genética , Factores de Tiempo , Factores de TranscripciónRESUMEN
Bisulfite sequencing is widely used for analysis of DNA methylation status (i.e., 5-methylcytosine [5mC] vs. cytosine [C]) in CpG-rich or other loci in genomic DNA (gDNA). Such methods typically involve reaction of gDNA with bisulfite followed by polymerase chain reaction (PCR) amplification of specific regions of interest that, overall, converts C-->T (thymine) and 5mC-->C and then capillary sequencing to measure C versus T composition at CpG sites. Massively parallel sequencing by oligonucleotide ligation and detection (SOLiD) has recently enabled relatively low-cost whole genome sequencing, and it would be highly desirable to apply such massively parallel sequencing to bisulfite-converted whole genomes to determine DNA methylation status of an entire genome, which has heretofore not been reported. As an initial step toward achieving this goal, we have extended our ongoing interest in improving bisulfite conversion sample preparation to include a human genome-wide fragment library for SOliD. The current article features novel use of formamide denaturant during bisulfite conversion of a suitably constructed library directly in a band slice from polyacryamide gel electrophoresis (PAGE). To validate this new protocol for 5mC-protected fragment library conversion, which we refer to as Bis-PAGE, capillary-based size analysis and Sanger sequencing were carried out for individual amplicons derived from single-molecule PCR (smPCR) of randomly selected library fragments. smPCR/Capillary Sanger sequencing of approximately 200 amplicons unambiguously demonstrated greater than 99% C-->T conversion. All of these approximately 200 Sanger sequences were analyzed with a previously published web-accessible bioinformatics tool (methBLAST) for mapping to human chromosomes, the results of which indicated random distribution of analyzed fragments across all chromosomes. Although these particular Bis-PAGE conversion and quality control methods were exemplified in the context of a fragment library for SOLiD, the concepts can be generalized to include other genome-wide library constructions intended for DNA methylation analysis by alternative high-throughput or massively parallelized methods that are currently available.
Asunto(s)
ADN/análisis , Electroforesis en Gel de Poliacrilamida/métodos , Formamidas/química , Biblioteca Genómica , Análisis de Secuencia de ADN/métodos , Sulfitos/química , Biología Computacional , ADN/química , Electroforesis Capilar , Genoma Humano , Humanos , Masculino , Desnaturalización de Ácido Nucleico , Reacción en Cadena de la PolimerasaRESUMEN
Identifying genetic variants and mutations that underlie human diseases requires development of robust, cost-effective tools for routine resequencing of regions of interest in the human genome. Here, we demonstrate that coupling Applied Biosystems SOLiD system-sequencing platform with microarray capture of targeted regions provides an efficient and robust method for high-coverage resequencing and polymorphism discovery in human protein-coding exons.
Asunto(s)
Polimorfismo Genético , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Tecnología Biomédica/métodos , Exones , Variación Genética , Genoma Humano , Heterocigoto , Homocigoto , Humanos , Datos de Secuencia Molecular , Mutación , Análisis de Secuencia por Matrices de OligonucleótidosRESUMEN
Forward genetic mutational studies, adaptive evolution, and phenotypic screening are powerful tools for creating new variant organisms with desirable traits. However, mutations generated in the process cannot be easily identified with traditional genetic tools. We show that new high-throughput, massively parallel sequencing technologies can completely and accurately characterize a mutant genome relative to a previously sequenced parental (reference) strain. We studied a mutant strain of Pichia stipitis, a yeast capable of converting xylose to ethanol. This unusually efficient mutant strain was developed through repeated rounds of chemical mutagenesis, strain selection, transformation, and genetic manipulation over a period of seven years. We resequenced this strain on three different sequencing platforms. Surprisingly, we found fewer than a dozen mutations in open reading frames. All three sequencing technologies were able to identify each single nucleotide mutation given at least 10-15-fold nominal sequence coverage. Our results show that detecting mutations in evolved and engineered organisms is rapid and cost-effective at the whole-genome level using new sequencing technologies. Identification of specific mutations in strains with altered phenotypes will add insight into specific gene functions and guide further metabolic engineering efforts.
Asunto(s)
Análisis Mutacional de ADN/métodos , Genoma Fúngico , Mutación , Pichia/genética , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
Several inconsistent associations between bipolar I disorder (BD1) and polymorphisms of the genes encoding the serotonin 2A receptor (HTR2A) have been published. We conducted the Transmission Disequilibrium Test (TDT) and case-control comparisons involving nine single nucleotide polymorphisms at the serotonin 2A receptor gene (four SNPs of HTR2A exons and five flanking SNPs). Comparison of BD1 cases (n = 93) with a group of unrelated population based controls (n = 92) revealed associations with SNPs on exons 2 and 3 (516C/T and 1354C/T, respectively), consistent with haplotype-based differences. Analysis of the cases and their available parents using the TDT suggested significant linkage and associations with 1354C/T, as well as haplotypes bearing this SNP. Our results support an etiological role for HTR2A in BD1. In view of the relatively small sample, replicate studies using large samples are needed.