Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nature ; 517(7536): 608-11, 2015 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-25383537

RESUMO

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Genômica , Análise de Sequência de DNA/métodos , Inversão Cromossômica/genética , Cromossomos Humanos Par 10/genética , Clonagem Molecular , Sequência Rica em GC/genética , Haploidia , Humanos , Mutagênese Insercional/genética , Padrões de Referência , Sequências de Repetição em Tandem/genética
2.
Nature ; 471(7339): 473-9, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21179090

RESUMO

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.


Assuntos
Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Transcrição Gênica/genética , Processamento Alternativo/genética , Animais , Sequência de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Éxons/genética , Feminino , Genes de Insetos/genética , Genoma de Inseto/genética , Masculino , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Edição de RNA/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Pequeno RNA não Traduzido/análise , Pequeno RNA não Traduzido/genética , Análise de Sequência , Caracteres Sexuais
3.
Genome Res ; 21(2): 182-92, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21177961

RESUMO

Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals.


Assuntos
Biologia Computacional , Drosophila melanogaster/genética , Genoma de Inseto/genética , Regiões Promotoras Genéticas , Regiões 3' não Traduzidas/genética , Animais , Mapeamento Cromossômico , Drosophila melanogaster/embriologia , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/genética , Estudo de Associação Genômica Ampla , Sítio de Iniciação de Transcrição
4.
Genome Res ; 21(2): 301-14, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21177962

RESUMO

Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are "off" and survival/growth pathways "on." Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common "cell line" gene expression pattern.


Assuntos
Drosophila melanogaster/genética , Variação Genética , Transcrição Gênica , Animais , Linhagem Celular , Análise por Conglomerados , Éxons , Feminino , Perfilação da Expressão Gênica , Masculino , Dados de Sequência Molecular , Transdução de Sinais/genética , Fatores de Transcrição/genética
5.
Genome Res ; 20(7): 890-8, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20501695

RESUMO

Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.


Assuntos
Sequência de Bases/fisiologia , Especificidade de Órgãos/genética , Regiões Promotoras Genéticas/genética , Regiões Promotoras Genéticas/fisiologia , Composição de Bases/fisiologia , Sítios de Ligação/genética , Linhagem Celular , Biologia Computacional/métodos , Epigênese Genética/fisiologia , Expressão Gênica/genética , Expressão Gênica/fisiologia , Células Hep G2 , Fator 4 Nuclear de Hepatócito/genética , Humanos , Ligação Proteica , Transcrição Gênica , Transfecção
6.
Sci Data ; 7(1): 399, 2020 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-33203859

RESUMO

The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Camundongos/genética , Zea mays/genética , Animais , Fragaria/genética , Genoma de Planta , Metagenoma , Ranidae/genética , Análise de Sequência de DNA
7.
Nat Biotechnol ; 33(6): 623-30, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26006009

RESUMO

Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.


Assuntos
Genoma Fúngico , Genoma Humano , Genoma de Inseto , Genoma de Planta , Análise de Sequência de DNA , Animais , Arabidopsis/genética , Sequência de Bases , Cromossomos/genética , Drosophila melanogaster/genética , Heterocromatina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Saccharomyces cerevisiae/genética , Alinhamento de Sequência
8.
Sci Data ; 1: 140045, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25977796

RESUMO

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.


Assuntos
Arabidopsis/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Inseto , Genoma de Planta , Neurospora crassa/genética , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA , Animais , Modelos Animais
10.
Science ; 330(6012): 1787-97, 2010 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-21177974

RESUMO

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.


Assuntos
Cromatina , Drosophila melanogaster/genética , Redes Reguladoras de Genes , Genoma de Inseto , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Biologia Computacional/métodos , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/metabolismo , Epigênese Genética , Regulação da Expressão Gênica , Genes de Insetos , Genômica/métodos , Histonas/metabolismo , Nucleossomos/genética , Nucleossomos/metabolismo , Regiões Promotoras Genéticas , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Fatores de Transcrição/metabolismo , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA