Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Genome Res ; 28(7): 1029-1038, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29884752

RESUMEN

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Asunto(s)
Genoma Humano/genética , Algoritmos , Animales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Anotación de Secuencia Molecular/métodos , ARN/genética , Ratas
2.
Genome Res ; 24(12): 2077-89, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25273068

RESUMEN

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Asunto(s)
Genoma , Genómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Simulación por Computador , Conjuntos de Datos como Asunto , Estudio de Asociación del Genoma Completo , Humanos , Mamíferos/genética , Filogenia , Reproducibilidad de los Resultados
3.
Nature ; 464(7290): 898-902, 2010 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-20237475

RESUMEN

Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.


Asunto(s)
Animales Domésticos/genética , Perros/genética , Genoma/genética , Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Animales , Animales Domésticos/clasificación , Animales Salvajes/clasificación , Animales Salvajes/genética , Cruzamiento , Biología Computacional , Perros/clasificación , Evolución Molecular , Asia Oriental/etnología , Medio Oriente/etnología , Fenotipo , Filogenia , Lobos/clasificación , Lobos/genética
4.
Genome Res ; 21(9): 1512-28, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21665927

RESUMEN

Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new "Cactus" alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.


Asunto(s)
Algoritmos , Genómica , Alineación de Secuencia , Programas Informáticos , Animales , Simulación por Computador , Humanos , Ratones , Primates , Análisis de Secuencia de ADN
5.
Genome Res ; 21(8): 1294-305, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21566151

RESUMEN

High-throughput genotyping technologies developed for model species can potentially increase the resolution of demographic history and ancestry in wild relatives. We use a SNP genotyping microarray developed for the domestic dog to assay variation in over 48K loci in wolf-like species worldwide. Despite the high mobility of these large carnivores, we find distinct hierarchical population units within gray wolves and coyotes that correspond with geographic and ecologic differences among populations. Further, we test controversial theories about the ancestry of the Great Lakes wolf and red wolf using an analysis of haplotype blocks across all 38 canid autosomes. We find that these enigmatic canids are highly admixed varieties derived from gray wolves and coyotes, respectively. This divergent genomic history suggests that they do not have a shared recent ancestry as proposed by previous researchers. Interspecific hybridization, as well as the process of evolutionary divergence, may be responsible for the observed phenotypic distinction of both forms. Such admixture complicates decisions regarding endangered species restoration and protection.


Asunto(s)
Evolución Biológica , Canidae/genética , Genoma , Animales , Coyotes/genética , Perros/genética , Evolución Molecular , Genotipo , Haplotipos , Hibridación Genética , Fenotipo , Polimorfismo de Nucleótido Simple , Lobos/genética
6.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21926179

RESUMEN

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Asunto(s)
Genoma/fisiología , Genómica/métodos , Análisis de Secuencia de ADN/métodos
7.
Bioinformatics ; 29(10): 1341-2, 2013 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-23505295

RESUMEN

MOTIVATION: Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. RESULTS: We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). AVAILABILITY: All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. CONTACT: hickey@soe.ucsc.edu or haussler@soe.ucsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Alineación de Secuencia/métodos , Programas Informáticos , Animales , Secuencia de Bases , Evolución Molecular , Genómica/métodos , Humanos , Filogenia , Lenguajes de Programación , Alineación de Secuencia/instrumentación
8.
Mamm Genome ; 24(1-2): 80-8, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23064780

RESUMEN

The ability to detect recent hybridization between dogs and wolves is important for conservation and legal actions, which often require accurate and rapid resolution of ancestry. The availability of a genetic test for dog-wolf hybrids would greatly support federal and legal enforcement efforts, particularly when the individual in question lacks prior ancestry information. We have developed a panel of 100 unlinked ancestry-informative SNP markers that can detect mixed ancestry within up to four generations of dog-wolf hybridization based on simulations of seven genealogical classes constructed following the rules of Mendelian inheritance. We establish 95 % confidence regions around the spatial clustering of each genealogical class using a tertiary plot of allele dosage and heterozygosity. The first- and second-backcrossed-generation hybrids were the most distinct from parental populations, with >90 % correctly assigned to genealogical class. In this article we provide a tool kit with population-level statistical quantification that can detect recent dog-wolf hybridization using a panel of dog-wolf ancestry-informative SNPs with divergent allele frequency distributions.


Asunto(s)
Perros/genética , Genotipo , Hibridación Genética , Polimorfismo de Nucleótido Simple , Lobos/genética , Alelos , Animales , Frecuencia de los Genes , Sitios Genéticos , Repeticiones de Microsatélite , Análisis de Componente Principal
9.
Bioinformatics ; 26(12): i237-45, 2010 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-20529912

RESUMEN

MOTIVATION: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines. Analyses of current data sets find that genetic alterations between patients can differ but often involve common pathways. It is therefore critical to identify relevant pathways involved in cancer progression and detect how they are altered in different patients. RESULTS: We present a novel method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of a gene and its products, allowing the incorporation of many types of omic data as evidence. The method predicts the degree to which a pathway's activities (e.g. internal gene states, interactions or high-level 'outputs') are altered in the patient using probabilistic inference. Compared with a competing pathway activity inference approach called SPIA, our method identifies altered activities in cancer-related pathways with fewer false-positives in both a glioblastoma multiform (GBM) and a breast cancer dataset. PARADIGM identified consistent pathway-level activities for subsets of the GBM patients that are overlooked when genes are considered in isolation. Further, grouping GBM patients based on their significant pathway perturbations divides them into clinically-relevant subgroups having significantly different survival outcomes. These findings suggest that therapeutics might be chosen that target genes at critical points in the commonly perturbed pathway(s) of a group of patients. AVAILABILITY: Source code available at http://sbenz.github.com/Paradigm,. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica/métodos , Neoplasias/genética , Programas Informáticos , Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN , Femenino , Perfilación de la Expresión Génica/métodos , Glioblastoma/genética , Humanos
10.
Nat Genet ; 50(11): 1574-1583, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30275530

RESUMEN

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.


Asunto(s)
Mapeo Cromosómico , Sitios Genéticos , Genoma , Haplotipos , Ratones Endogámicos/genética , Animales , Animales de Laboratorio , Mapeo Cromosómico/veterinaria , Haplotipos/genética , Ratones , Ratones Endogámicos BALB C/genética , Ratones Endogámicos C3H/genética , Ratones Endogámicos C57BL/genética , Ratones Endogámicos CBA/genética , Ratones Endogámicos DBA/genética , Ratones Endogámicos NOD/genética , Ratones Endogámicos/clasificación , Anotación de Secuencia Molecular , Filogenia , Polimorfismo de Nucleótido Simple , Especificidad de la Especie
11.
J Comput Biol ; 22(5): 387-401, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25565268

RESUMEN

A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalize the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes' ordering and orientation, creating a pan-genome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pan-genome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test. In addition, we test the use of a pan-genome for describing variations and as a reference for read mapping.


Asunto(s)
Algoritmos , Genética de Población/normas , Genoma Humano , Programas Informáticos , Gráficos por Computador , Evolución Molecular , Genética de Población/estadística & datos numéricos , Humanos , Estándares de Referencia , Alineación de Secuencia , Análisis de Secuencia de ADN
12.
Science ; 346(6215): 1254449, 2014 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-25504731

RESUMEN

To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.


Asunto(s)
Caimanes y Cocodrilos/genética , Aves/genética , Dinosaurios/genética , Evolución Molecular , Genoma , Caimanes y Cocodrilos/clasificación , Animales , Evolución Biológica , Aves/clasificación , Secuencia Conservada , Elementos Transponibles de ADN , Dinosaurios/clasificación , Variación Genética , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Filogenia , Reptiles/clasificación , Reptiles/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Transcriptoma
13.
Genome Biol ; 14(3): R22, 2013 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-23497673

RESUMEN

BACKGROUND: Retroposed processed gene transcripts are an important source of material for new gene formation on evolutionary timescales. Most prior work on gene retrocopy discovery compared copies in reference genome assemblies to their source genes. Here, we explore gene retrocopy insertion polymorphisms (GRIPs) that are present in the germlines of individual humans, mice, and chimpanzees, and we identify novel gene retrocopy insertions in cancerous somatic tissues that are absent from patient-matched non-cancer genomes. RESULTS: Through analysis of whole-genome sequence data, we found evidence for 48 GRIPs in the genomes of one or more humans sequenced as part of the 1,000 Genomes Project and The Cancer Genome Atlas, but which were not in the human reference assembly. Similarly, we found evidence for 755 GRIPs at distinct locations in one or more of 17 inbred mouse strains but which were not in the mouse reference assembly, and 19 GRIPs across a cohort of 10 chimpanzee genomes, which were not in the chimpanzee reference genome assembly. Many of these insertions are new members of existing gene families whose source genes are highly and widely expressed, and the majority have detectable hallmarks of processed gene retrocopy formation. We estimate the rate of novel gene retrocopy insertions in humans and chimps at roughly one new gene retrocopy insertion for every 6,000 individuals. CONCLUSIONS: We find that gene retrocopy polymorphisms are a widespread phenomenon, present a multi-species analysis of these events, and provide a method for their ascertainment.


Asunto(s)
Genoma/genética , Variación Estructural del Genoma/genética , Mamíferos/genética , ARN Mensajero/genética , Retroelementos/genética , Animales , Ontología de Genes , Genoma Humano/genética , Humanos , Ratones , Anotación de Secuencia Molecular , Mutagénesis Insercional/genética , Neoplasias/genética , Pan troglodytes/genética , ARN Mensajero/metabolismo
14.
Gigascience ; 2(1): 10, 2013 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-23870653

RESUMEN

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

15.
J Comput Biol ; 18(3): 469-81, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21385048

RESUMEN

We introduce a data structure, analysis, and visualization scheme called a cactus graph for comparing sets of related genomes. In common with multi-break point graphs and A-Bruijn graphs, cactus graphs can represent duplications and general genomic rearrangements, but additionally, they naturally decompose the common substructures in a set of related genomes into a hierarchy of chains that can be visualized as two-dimensional multiple alignments and nets that can be visualized in circular genome plots. Supplementary Material is available at www.liebertonline.com/cmb .


Asunto(s)
Gráficos por Computador , Genoma , Genómica/métodos , Alineación de Secuencia/métodos , Algoritmos , Animales , Secuencia de Bases , ADN/genética , Evolución Molecular , Humanos , Datos de Secuencia Molecular
16.
Mol Ecol ; 17(1): 252-74, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17877715

RESUMEN

The recovery of the grey wolf in Yellowstone National Park is an outstanding example of a successful reintroduction. A general question concerning reintroduction is the degree to which genetic variation has been preserved and the specific behavioural mechanisms that enhance the preservation of genetic diversity and reduce inbreeding. We have analysed 200 Yellowstone wolves, including all 31 founders, for variation in 26 microsatellite loci over the 10-year reintroduction period (1995-2004). The population maintained high levels of variation (1995 H(0) = 0.69; 2004 H(0) = 0.73) with low levels of inbreeding (1995 F(IS) = -0.063; 2004 F(IS) = -0.051) and throughout, the population expanded rapidly (N(1995) = 21; N(2004) = 169). Pedigree-based effective population size ratios did not vary appreciably over the duration of population expansion (1995 N(e)/N(g) = 0.29; 2000 N(e)/N(g) = 0.26; 2004 N(e)/N(g) = 0.33). We estimated kinship and found only two of 30 natural breeding pairs showed evidence of being related (average r = -0.026, SE = 0.03). We reconstructed the genealogy of 200 wolves based on genetic and field data and discovered that they avoid inbreeding through a wide variety of behavioural mechanisms including absolute avoidance of breeding with related pack members, male-biased dispersal to packs where they breed with nonrelatives, and female-biased subordinate breeding. We documented a greater diversity of such population assembly patterns in Yellowstone than previously observed in any other natural wolf population. Inbreeding avoidance is nearly absolute despite the high probability of within-pack inbreeding opportunities and extensive interpack kinship ties between adjacent packs. Simulations showed that the Yellowstone population has levels of genetic variation similar to that of a population managed for high variation and low inbreeding, and greater than that expected for random breeding within packs or across the entire breeding pool. Although short-term losses in variation seem minimal, future projections of the population at carrying capacity suggest significant inbreeding depression will occur without connectivity and migratory exchange with other populations.


Asunto(s)
Conservación de los Recursos Naturales/métodos , Variación Genética , Genética de Población , Endogamia , Lobos/genética , Animales , Efecto Fundador , Repeticiones de Microsatélite/genética , Linaje , Densidad de Población , Dinámica Poblacional , Wyoming
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA