Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nature ; 513(7518): 375-381, 2014 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-25186727

RESUMO

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.


Assuntos
Ciclídeos/classificação , Ciclídeos/genética , Evolução Molecular , Especiação Genética , Genoma/genética , África Oriental , Animais , Elementos de DNA Transponíveis/genética , Duplicação Gênica/genética , Regulação da Expressão Gênica/genética , Genômica , Lagos , MicroRNAs/genética , Filogenia , Polimorfismo Genético/genética
2.
Nature ; 496(7445): 311-6, 2013 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-23598338

RESUMO

The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.


Assuntos
Evolução Biológica , Peixes/classificação , Peixes/genética , Genoma/genética , Animais , Animais Geneticamente Modificados , Embrião de Galinha , Sequência Conservada/genética , Elementos Facilitadores Genéticos/genética , Evolução Molecular , Extremidades/anatomia & histologia , Extremidades/crescimento & desenvolvimento , Peixes/anatomia & histologia , Peixes/fisiologia , Genes Homeobox/genética , Genômica , Imunoglobulina M/genética , Camundongos , Anotação de Sequência Molecular , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência , Análise de Sequência de DNA , Vertebrados/anatomia & histologia , Vertebrados/genética , Vertebrados/fisiologia
3.
Genome Res ; 22(11): 2270-7, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22829535

RESUMO

Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been "finished" at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.


Assuntos
Bactérias/genética , Genoma Bacteriano , Biblioteca Genômica , Análise de Sequência de DNA/métodos , Algoritmos
4.
Genome Res ; 22(11): 2241-9, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22800726

RESUMO

Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.


Assuntos
Escherichia coli/genética , Vetores Genéticos/genética , Genoma Bacteriano , Genoma Fúngico , Biblioteca Genômica , Schizosaccharomyces/genética , Análise de Sequência de DNA/métodos , Animais , Rearranjo Gênico , Camundongos , Camundongos Endogâmicos C57BL
5.
Proc Natl Acad Sci U S A ; 108(4): 1513-8, 2011 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-21187386

RESUMO

Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.


Assuntos
Algoritmos , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Genoma/genética , Humanos , Internet , Camundongos , Reprodutibilidade dos Testes
6.
Nat Genet ; 46(12): 1350-5, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25326702

RESUMO

Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome; however, calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from the finished sequence of 103 randomly chosen fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity by several fold, with the greatest impact in challenging regions of the human genome.


Assuntos
Variação Genética , Genoma Humano , Algoritmos , Sequência de Bases , Mapeamento Cromossômico , Frequência do Gene , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
7.
Gigascience ; 2(1): 10, 2013 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-23870653

RESUMO

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

8.
Genome Biol ; 10(10): R103, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19796385

RESUMO

We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).


Assuntos
Bactérias/genética , Fungos/genética , Genoma/genética , Genômica/métodos , Software , Pareamento de Bases/genética , Reprodutibilidade dos Testes
9.
Genome Res ; 18(5): 810-20, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18340039

RESUMO

New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun "microreads." For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80x coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Bacteriano/genética , Análise de Sequência de DNA/métodos , Campylobacter jejuni/genética , Simulação por Computador , Escherichia coli/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA