Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Genome Res ; 28(7): 1029-1038, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29884752

RESUMO

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Assuntos
Genoma Humano/genética , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Anotação de Sequência Molecular/métodos , RNA/genética , Ratos
2.
Genome Res ; 24(12): 2077-89, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25273068

RESUMO

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Assuntos
Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Animais , Biologia Computacional/métodos , Simulação por Computador , Conjuntos de Dados como Assunto , Estudo de Associação Genômica Ampla , Humanos , Mamíferos/genética , Filogenia , Reprodutibilidade dos Testes
3.
Nature ; 464(7290): 898-902, 2010 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-20237475

RESUMO

Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.


Assuntos
Animais Domésticos/genética , Cães/genética , Genoma/genética , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Animais , Animais Domésticos/classificação , Animais Selvagens/classificação , Animais Selvagens/genética , Cruzamento , Biologia Computacional , Cães/classificação , Evolução Molecular , Ásia Oriental/etnologia , Oriente Médio/etnologia , Fenótipo , Filogenia , Lobos/classificação , Lobos/genética
4.
Genome Res ; 21(9): 1512-28, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21665927

RESUMO

Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new "Cactus" alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.


Assuntos
Algoritmos , Genômica , Alinhamento de Sequência , Software , Animais , Simulação por Computador , Humanos , Camundongos , Primatas , Análise de Sequência de DNA
5.
Genome Res ; 21(8): 1294-305, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21566151

RESUMO

High-throughput genotyping technologies developed for model species can potentially increase the resolution of demographic history and ancestry in wild relatives. We use a SNP genotyping microarray developed for the domestic dog to assay variation in over 48K loci in wolf-like species worldwide. Despite the high mobility of these large carnivores, we find distinct hierarchical population units within gray wolves and coyotes that correspond with geographic and ecologic differences among populations. Further, we test controversial theories about the ancestry of the Great Lakes wolf and red wolf using an analysis of haplotype blocks across all 38 canid autosomes. We find that these enigmatic canids are highly admixed varieties derived from gray wolves and coyotes, respectively. This divergent genomic history suggests that they do not have a shared recent ancestry as proposed by previous researchers. Interspecific hybridization, as well as the process of evolutionary divergence, may be responsible for the observed phenotypic distinction of both forms. Such admixture complicates decisions regarding endangered species restoration and protection.


Assuntos
Evolução Biológica , Canidae/genética , Genoma , Animais , Coiotes/genética , Cães/genética , Evolução Molecular , Genótipo , Haplótipos , Hibridização Genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Lobos/genética
6.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21926179

RESUMO

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Assuntos
Genoma/fisiologia , Genômica/métodos , Análise de Sequência de DNA/métodos
7.
Bioinformatics ; 29(10): 1341-2, 2013 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-23505295

RESUMO

MOTIVATION: Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. RESULTS: We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). AVAILABILITY: All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. CONTACT: hickey@soe.ucsc.edu or haussler@soe.ucsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Alinhamento de Sequência/métodos , Software , Animais , Sequência de Bases , Evolução Molecular , Genômica/métodos , Humanos , Filogenia , Linguagens de Programação , Alinhamento de Sequência/instrumentação
8.
Mamm Genome ; 24(1-2): 80-8, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23064780

RESUMO

The ability to detect recent hybridization between dogs and wolves is important for conservation and legal actions, which often require accurate and rapid resolution of ancestry. The availability of a genetic test for dog-wolf hybrids would greatly support federal and legal enforcement efforts, particularly when the individual in question lacks prior ancestry information. We have developed a panel of 100 unlinked ancestry-informative SNP markers that can detect mixed ancestry within up to four generations of dog-wolf hybridization based on simulations of seven genealogical classes constructed following the rules of Mendelian inheritance. We establish 95 % confidence regions around the spatial clustering of each genealogical class using a tertiary plot of allele dosage and heterozygosity. The first- and second-backcrossed-generation hybrids were the most distinct from parental populations, with >90 % correctly assigned to genealogical class. In this article we provide a tool kit with population-level statistical quantification that can detect recent dog-wolf hybridization using a panel of dog-wolf ancestry-informative SNPs with divergent allele frequency distributions.


Assuntos
Cães/genética , Genótipo , Hibridização Genética , Polimorfismo de Nucleotídeo Único , Lobos/genética , Alelos , Animais , Frequência do Gene , Loci Gênicos , Repetições de Microssatélites , Análise de Componente Principal
9.
Bioinformatics ; 26(12): i237-45, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529912

RESUMO

MOTIVATION: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines. Analyses of current data sets find that genetic alterations between patients can differ but often involve common pathways. It is therefore critical to identify relevant pathways involved in cancer progression and detect how they are altered in different patients. RESULTS: We present a novel method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of a gene and its products, allowing the incorporation of many types of omic data as evidence. The method predicts the degree to which a pathway's activities (e.g. internal gene states, interactions or high-level 'outputs') are altered in the patient using probabilistic inference. Compared with a competing pathway activity inference approach called SPIA, our method identifies altered activities in cancer-related pathways with fewer false-positives in both a glioblastoma multiform (GBM) and a breast cancer dataset. PARADIGM identified consistent pathway-level activities for subsets of the GBM patients that are overlooked when genes are considered in isolation. Further, grouping GBM patients based on their significant pathway perturbations divides them into clinically-relevant subgroups having significantly different survival outcomes. These findings suggest that therapeutics might be chosen that target genes at critical points in the commonly perturbed pathway(s) of a group of patients. AVAILABILITY: Source code available at http://sbenz.github.com/Paradigm,. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Neoplasias/genética , Software , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA , Feminino , Perfilação da Expressão Gênica/métodos , Glioblastoma/genética , Humanos
10.
Nat Genet ; 50(11): 1574-1583, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30275530

RESUMO

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.


Assuntos
Mapeamento Cromossômico , Loci Gênicos , Genoma , Haplótipos , Camundongos Endogâmicos/genética , Animais , Animais de Laboratório , Mapeamento Cromossômico/veterinária , Haplótipos/genética , Camundongos , Camundongos Endogâmicos BALB C/genética , Camundongos Endogâmicos C3H/genética , Camundongos Endogâmicos C57BL/genética , Camundongos Endogâmicos CBA/genética , Camundongos Endogâmicos DBA/genética , Camundongos Endogâmicos NOD/genética , Camundongos Endogâmicos/classificação , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único , Especificidade da Espécie
11.
J Comput Biol ; 22(5): 387-401, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25565268

RESUMO

A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalize the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes' ordering and orientation, creating a pan-genome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pan-genome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test. In addition, we test the use of a pan-genome for describing variations and as a reference for read mapping.


Assuntos
Algoritmos , Genética Populacional/normas , Genoma Humano , Software , Gráficos por Computador , Evolução Molecular , Genética Populacional/estatística & dados numéricos , Humanos , Padrões de Referência , Alinhamento de Sequência , Análise de Sequência de DNA
12.
Science ; 346(6215): 1254449, 2014 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-25504731

RESUMO

To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.


Assuntos
Jacarés e Crocodilos/genética , Aves/genética , Dinossauros/genética , Evolução Molecular , Genoma , Jacarés e Crocodilos/classificação , Animais , Evolução Biológica , Aves/classificação , Sequência Conservada , Elementos de DNA Transponíveis , Dinossauros/classificação , Variação Genética , Anotação de Sequência Molecular , Dados de Sequência Molecular , Filogenia , Répteis/classificação , Répteis/genética , Alinhamento de Sequência , Análise de Sequência de DNA , Transcriptoma
13.
Genome Biol ; 14(3): R22, 2013 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-23497673

RESUMO

BACKGROUND: Retroposed processed gene transcripts are an important source of material for new gene formation on evolutionary timescales. Most prior work on gene retrocopy discovery compared copies in reference genome assemblies to their source genes. Here, we explore gene retrocopy insertion polymorphisms (GRIPs) that are present in the germlines of individual humans, mice, and chimpanzees, and we identify novel gene retrocopy insertions in cancerous somatic tissues that are absent from patient-matched non-cancer genomes. RESULTS: Through analysis of whole-genome sequence data, we found evidence for 48 GRIPs in the genomes of one or more humans sequenced as part of the 1,000 Genomes Project and The Cancer Genome Atlas, but which were not in the human reference assembly. Similarly, we found evidence for 755 GRIPs at distinct locations in one or more of 17 inbred mouse strains but which were not in the mouse reference assembly, and 19 GRIPs across a cohort of 10 chimpanzee genomes, which were not in the chimpanzee reference genome assembly. Many of these insertions are new members of existing gene families whose source genes are highly and widely expressed, and the majority have detectable hallmarks of processed gene retrocopy formation. We estimate the rate of novel gene retrocopy insertions in humans and chimps at roughly one new gene retrocopy insertion for every 6,000 individuals. CONCLUSIONS: We find that gene retrocopy polymorphisms are a widespread phenomenon, present a multi-species analysis of these events, and provide a method for their ascertainment.


Assuntos
Genoma/genética , Variação Estrutural do Genoma/genética , Mamíferos/genética , RNA Mensageiro/genética , Retroelementos/genética , Animais , Ontologia Genética , Genoma Humano/genética , Humanos , Camundongos , Anotação de Sequência Molecular , Mutagênese Insercional/genética , Neoplasias/genética , Pan troglodytes/genética , RNA Mensageiro/metabolismo
14.
Gigascience ; 2(1): 10, 2013 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-23870653

RESUMO

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

15.
J Comput Biol ; 18(3): 469-81, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21385048

RESUMO

We introduce a data structure, analysis, and visualization scheme called a cactus graph for comparing sets of related genomes. In common with multi-break point graphs and A-Bruijn graphs, cactus graphs can represent duplications and general genomic rearrangements, but additionally, they naturally decompose the common substructures in a set of related genomes into a hierarchy of chains that can be visualized as two-dimensional multiple alignments and nets that can be visualized in circular genome plots. Supplementary Material is available at www.liebertonline.com/cmb .


Assuntos
Gráficos por Computador , Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Algoritmos , Animais , Sequência de Bases , DNA/genética , Evolução Molecular , Humanos , Dados de Sequência Molecular
16.
Mol Ecol ; 17(1): 252-74, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17877715

RESUMO

The recovery of the grey wolf in Yellowstone National Park is an outstanding example of a successful reintroduction. A general question concerning reintroduction is the degree to which genetic variation has been preserved and the specific behavioural mechanisms that enhance the preservation of genetic diversity and reduce inbreeding. We have analysed 200 Yellowstone wolves, including all 31 founders, for variation in 26 microsatellite loci over the 10-year reintroduction period (1995-2004). The population maintained high levels of variation (1995 H(0) = 0.69; 2004 H(0) = 0.73) with low levels of inbreeding (1995 F(IS) = -0.063; 2004 F(IS) = -0.051) and throughout, the population expanded rapidly (N(1995) = 21; N(2004) = 169). Pedigree-based effective population size ratios did not vary appreciably over the duration of population expansion (1995 N(e)/N(g) = 0.29; 2000 N(e)/N(g) = 0.26; 2004 N(e)/N(g) = 0.33). We estimated kinship and found only two of 30 natural breeding pairs showed evidence of being related (average r = -0.026, SE = 0.03). We reconstructed the genealogy of 200 wolves based on genetic and field data and discovered that they avoid inbreeding through a wide variety of behavioural mechanisms including absolute avoidance of breeding with related pack members, male-biased dispersal to packs where they breed with nonrelatives, and female-biased subordinate breeding. We documented a greater diversity of such population assembly patterns in Yellowstone than previously observed in any other natural wolf population. Inbreeding avoidance is nearly absolute despite the high probability of within-pack inbreeding opportunities and extensive interpack kinship ties between adjacent packs. Simulations showed that the Yellowstone population has levels of genetic variation similar to that of a population managed for high variation and low inbreeding, and greater than that expected for random breeding within packs or across the entire breeding pool. Although short-term losses in variation seem minimal, future projections of the population at carrying capacity suggest significant inbreeding depression will occur without connectivity and migratory exchange with other populations.


Assuntos
Conservação dos Recursos Naturais/métodos , Variação Genética , Genética Populacional , Endogamia , Lobos/genética , Animais , Efeito Fundador , Repetições de Microssatélites/genética , Linhagem , Densidade Demográfica , Dinâmica Populacional , Wyoming
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA