Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone.

Cloutier, Alison; Sackton, Timothy B; Grayson, Phil; Clamp, Michele; Baker, Allan J; Edwards, Scott V.

Syst Biol ; 68(6): 937-955, 2019 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-31135914

RESUMEN

Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.

Asunto(s)

Genoma/genética , Paleognatos/clasificación , Paleognatos/genética , Filogenia , Animales , Genómica

2.

A high-resolution map of human evolutionary constraint using 29 mammals.

Lindblad-Toh, Kerstin; Garber, Manuel; Zuk, Or; Lin, Michael F; Parker, Brian J; Washietl, Stefan; Kheradpour, Pouya; Ernst, Jason; Jordan, Gregory; Mauceli, Evan; Ward, Lucas D; Lowe, Craig B; Holloway, Alisha K; Clamp, Michele; Gnerre, Sante; Alföldi, Jessica; Beal, Kathryn; Chang, Jean; Clawson, Hiram; Cuff, James; Di Palma, Federica; Fitzgerald, Stephen; Flicek, Paul; Guttman, Mitchell; Hubisz, Melissa J; Jaffe, David B; Jungreis, Irwin; Kent, W James; Kostka, Dennis; Lara, Marcia; Martins, Andre L; Massingham, Tim; Moltke, Ida; Raney, Brian J; Rasmussen, Matthew D; Robinson, Jim; Stark, Alexander; Vilella, Albert J; Wen, Jiayu; Xie, Xiaohui; Zody, Michael C; Baldwin, Jen; Bloom, Toby; Chin, Chee Whye; Heiman, Dave; Nicol, Robert; Nusbaum, Chad; Young, Sarah; Wilkinson, Jane; Worley, Kim C.

Nature ; 478(7370): 476-82, 2011 Oct 12.

Artículo en Inglés | MEDLINE | ID: mdl-21993624

RESUMEN

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering â¼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for â¼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.

Asunto(s)

Evolución Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animales , Enfermedad , Exones/genética , Genómica , Salud , Humanos , Anotación de Secuencia Molecular , Filogenia , ARN/clasificación , ARN/genética , Selección Genética/genética , Alineación de Secuencia , Análisis de Secuencia de ADN

3.

Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.

Mikkelsen, Tarjei S; Wakefield, Matthew J; Aken, Bronwen; Amemiya, Chris T; Chang, Jean L; Duke, Shannon; Garber, Manuel; Gentles, Andrew J; Goodstadt, Leo; Heger, Andreas; Jurka, Jerzy; Kamal, Michael; Mauceli, Evan; Searle, Stephen M J; Sharpe, Ted; Baker, Michelle L; Batzer, Mark A; Benos, Panayiotis V; Belov, Katherine; Clamp, Michele; Cook, April; Cuff, James; Das, Radhika; Davidow, Lance; Deakin, Janine E; Fazzari, Melissa J; Glass, Jacob L; Grabherr, Manfred; Greally, John M; Gu, Wanjun; Hore, Timothy A; Huttley, Gavin A; Kleber, Michael; Jirtle, Randy L; Koina, Edda; Lee, Jeannie T; Mahony, Shaun; Marra, Marco A; Miller, Robert D; Nicholls, Robert D; Oda, Mayumi; Papenfuss, Anthony T; Parra, Zuly E; Pollock, David D; Ray, David A; Schein, Jacqueline E; Speed, Terence P; Thompson, Katherine; VandeBerg, John L; Wade, Claire M.

Nature ; 447(7141): 167-77, 2007 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-17495919

RESUMEN

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.

Asunto(s)

Evolución Molecular , Genoma/genética , Genómica , Zarigüeyas/genética , Animales , Composición de Base , Secuencia Conservada/genética , Elementos Transponibles de ADN/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Biosíntesis de Proteínas , Sintenía/genética , Inactivación del Cromosoma X/genética

4.

Genome sequence, comparative analysis and haplotype structure of the domestic dog.

Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa.

Nature ; 438(7069): 803-19, 2005 Dec 08.

Artículo en Inglés | MEDLINE | ID: mdl-16341006

RESUMEN

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

Asunto(s)

Perros/genética , Evolución Molecular , Genoma/genética , Genómica , Haplotipos/genética , Animales , Secuencia Conservada/genética , Enfermedades de los Perros/genética , Perros/clasificación , Femenino , Humanos , Hibridación Genética , Masculino , Ratones , Mutagénesis/genética , Polimorfismo de Nucleótido Simple/genética , Ratas , Elementos de Nucleótido Esparcido Corto/genética , Sintenía/genética

5.

Error, noise and bias in de novo transcriptome assemblies.

Freedman, Adam H; Clamp, Michele; Sackton, Timothy B.

Mol Ecol Resour ; 21(1): 18-29, 2021 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-32180366

RESUMEN

De novo transcriptome assembly is a powerful tool, and has been widely used over the last decade for making evolutionary inferences. However, it relies on two implicit assumptions: that the assembled transcriptome is an unbiased representation of the underlying expressed transcriptome, and that expression estimates from the assembly are good, if noisy approximations of the relative abundance of expressed transcripts. Using publicly available data for model organisms, we demonstrate that, across assembly algorithms and data sets, these assumptions are consistently violated. Bias exists at the nucleotide level, with genotyping error rates ranging from 30% to 83%. As a result, diversity is underestimated in transcriptome assemblies, with consistent underestimation of heterozygosity in all but the most inbred samples. Even at the gene level, expression estimates show wide deviations from map-to-reference estimates, and positive bias at lower expression levels. Standard filtering of transcriptome assemblies improves the robustness of gene expression estimates but leads to the loss of a meaningful number of protein-coding genes, including many that are highly expressed. We demonstrate a computational method, length-rescaled CPM, to partly alleviate noise and bias in expression estimates. Researchers should consider ways to minimize the impact of bias in transcriptome assemblies.

Asunto(s)

Sesgo , Perfilación de la Expresión Génica , Transcriptoma , Algoritmos

6.

Identifying novel constrained elements by exploiting biased substitution patterns.

Garber, Manuel; Guttman, Mitchell; Clamp, Michele; Zody, Michael C; Friedman, Nir; Xie, Xiaohui.

Bioinformatics ; 25(12): i54-62, 2009 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-19478016

RESUMEN

MOTIVATION: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. RESULTS: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection. AVAILABILITY: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Genómica/métodos , Alineación de Secuencia/métodos , Secuencia de Bases , Evolución Molecular , Programas Informáticos

7.

Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Waterhouse, Andrew M; Procter, James B; Martin, David M A; Clamp, Michèle; Barton, Geoffrey J.

Bioinformatics ; 25(9): 1189-91, 2009 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-19151095

RESUMEN

UNLABELLED: Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server. AVAILABILITY: The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org.

Asunto(s)

Biología Computacional/métodos , Proteínas/química , Alineación de Secuencia/métodos , Programas Informáticos , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína

8.

Distinguishing protein-coding and noncoding genes in the human genome.

Clamp, Michele; Fry, Ben; Kamal, Mike; Xie, Xiaohui; Cuff, James; Lin, Michael F; Kellis, Manolis; Lindblad-Toh, Kerstin; Lander, Eric S.

Proc Natl Acad Sci U S A ; 104(49): 19428-33, 2007 Dec 04.

Artículo en Inglés | MEDLINE | ID: mdl-18040051

RESUMEN

Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.

Asunto(s)

Código Genético , Genoma Humano/genética , Genómica , Sistemas de Lectura Abierta/genética , Proteínas/genética , Animales , Secuencia de Bases , Elementos Transponibles de ADN/genética , Perros , Genes/genética , Humanos , Ratones , Datos de Secuencia Molecular , Seudogenes/genética , Análisis de Secuencia de ADN

9.

Convergent regulatory evolution and loss of flight in paleognathous birds.

Sackton, Timothy B; Grayson, Phil; Cloutier, Alison; Hu, Zhirui; Liu, Jun S; Wheeler, Nicole E; Gardner, Paul P; Clarke, Julia A; Baker, Allan J; Clamp, Michele; Edwards, Scott V.

Science ; 364(6435): 74-78, 2019 04 05.

Artículo en Inglés | MEDLINE | ID: mdl-30948549

RESUMEN

A core question in evolutionary biology is whether convergent phenotypic evolution is driven by convergent molecular changes in proteins or regulatory regions. We combined phylogenomic, developmental, and epigenomic analysis of 11 new genomes of paleognathous birds, including an extinct moa, to show that convergent evolution of regulatory regions, more so than protein-coding genes, is prevalent among developmental pathways associated with independent losses of flight. A Bayesian analysis of 284,001 conserved noncoding elements, 60,665 of which are corroborated as enhancers by open chromatin states during development, identified 2355 independent accelerations along lineages of flightless paleognaths, with functional consequences for driving gene expression in the developing forelimb. Our results suggest that the genomic landscape associated with morphological convergence in ratites has a substantial shared regulatory component.

Asunto(s)

Evolución Biológica , Epigénesis Genética , Evolución Molecular , Vuelo Animal , Paleognatos/anatomía & histología , Paleognatos/genética , Animales , Teorema de Bayes , Cromatina/metabolismo , Secuencia Conservada , Elementos de Facilitación Genéticos , Epigenómica , Exones/genética , Extinción Biológica , Miembro Anterior/anatomía & histología , Paleognatos/fisiología , Fenotipo , Filogenia

10.

Metatranscriptomics profile of the gill microbial community during Bathymodiolus azoricus aquarium acclimatization at atmospheric pressure.

Barros, Inês; Froufe, Hugo; Marnellos, George; Egas, Conceição; Delaney, Jennifer; Clamp, Michele; Santos, Ricardo Serrão; Bettencourt, Raul.

AIMS Microbiol ; 4(2): 240-260, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-31294213

RESUMEN

BACKGROUND: The deep-sea mussels Bathymodiolus azoricus (Bivalvia: Mytilidae) are the dominant macrofauna subsisting at the hydrothermal vents site Menez Gwen in the Mid-Atlantic Ridge (MAR). Their adaptive success in such challenging environments is largely due to their gill symbiotic association with chemosynthetic bacteria. We examined the response of vent mussels as they adapt to sea-level environmental conditions, through an assessment of the relative abundance of host-symbiont related RNA transcripts to better understand how the gill microbiome may drive host-symbiont interactions in vent mussels during hypothetical venting inactivity. RESULTS: The metatranscriptome of B. azoricus was sequenced from gill tissues sampled at different time-points during a five-week acclimatization experiment, using Next-Generation-Sequencing. After Illumina sequencing, a total of 181,985,262 paired-end reads of 150 bp were generated with an average of 16,544,115 read per sample. Metatranscriptome analysis confirmed that experimental acclimatization in aquaria accounted for global gill transcript variation. Additionally, the analysis of 16S and 18S rRNA sequences data allowed for a comprehensive characterization of host-symbiont interactions, which included the gradual loss of gill endosymbionts and signaling pathways, associated with stress responses and energy metabolism, under experimental acclimatization. Dominant active transcripts were assigned to the following KEGG categories: "Ribosome", "Oxidative phosphorylation" and "Chaperones and folding catalysts" suggesting specific metabolic responses to physiological adaptations in aquarium environment. CONCLUSIONS: Gill metagenomics analyses highlighted microbial diversity shifts and a clear pattern of varying mRNA transcript abundancies and expression during acclimatization to aquarium conditions which indicate change in bacterial community activity. This approach holds potential for the discovery of new host-symbiont associations, evidencing new functional transcripts and a clearer picture of methane metabolism during loss of endosymbionts. Towards the end of acclimatization, we observed trends in three major functional subsystems, as evidenced by an increment of transcripts related to genetic information processes; the decrease of chaperone and folding catalysts and oxidative phosphorylation transcripts; but no change in transcripts of gluconeogenesis and co-factors-vitamins.

11.

Three periods of regulatory innovation during vertebrate evolution.

Lowe, Craig B; Kellis, Manolis; Siepel, Adam; Raney, Brian J; Clamp, Michele; Salama, Sofie R; Kingsley, David M; Lindblad-Toh, Kerstin; Haussler, David.

Science ; 333(6045): 1019-24, 2011 Aug 19.

Artículo en Inglés | MEDLINE | ID: mdl-21852499

RESUMEN

The gain, loss, and modification of gene regulatory elements may underlie a substantial proportion of phenotypic changes on animal lineages. To investigate the gain of regulatory elements throughout vertebrate evolution, we identified genome-wide sets of putative regulatory regions for five vertebrates, including humans. These putative regulatory regions are conserved nonexonic elements (CNEEs), which are evolutionarily conserved yet do not overlap any coding or noncoding mature transcript. We then inferred the branch on which each CNEE came under selective constraint. Our analysis identified three extended periods in the evolution of gene regulatory elements. Early vertebrate evolution was characterized by regulatory gains near transcription factors and developmental genes, but this trend was replaced by innovations near extracellular signaling genes, and then innovations near posttranslational protein modifiers.

Asunto(s)

Evolución Biológica , Secuencia Conservada , Evolución Molecular , Elementos Reguladores de la Transcripción , Secuencias Reguladoras de Ácidos Nucleicos , Vertebrados/genética , Animales , Bovinos , ADN Intergénico/genética , Regulación de la Expresión Génica , Genes del Desarrollo , Genoma , Humanos , Cadenas de Markov , Ratones , Oryzias/genética , Filogenia , Procesamiento Proteico-Postraduccional/genética , Selección Genética , Alineación de Secuencia , Smegmamorpha/genética , Factores de Transcripción/genética

12.

Initial sequence and comparative analysis of the cat genome.

Pontius, Joan U; Mullikin, James C; Smith, Douglas R; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A; Agarwala, Richa; Narfström, Kristina; Murphy, William J; Giger, Urs; Roca, Alfred L; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E; Bourque, Guillaume; Tesler, Glenn; O'Brien, Stephen J.

Genome Res ; 17(11): 1675-89, 2007 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-17975172

RESUMEN

The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing approximately 65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence.

Asunto(s)

Gatos/genética , Genoma , Genómica , Animales , Perros , Humanos , Ratones , MicroARNs , Repeticiones de Microsatélite , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Ratas , Secuencias Repetitivas de Ácidos Nucleicos

13.

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.

Margulies, Elliott H; Cooper, Gregory M; Asimenos, George; Thomas, Daryl J; Dewey, Colin N; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B; Bickel, Peter; Holmes, Ian; Mullikin, James C; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A; Rosenbloom, Kate R; Kent, W James; Bouffard, Gerard G; Guan, Xiaobin; Hansen, Nancy F; Idol, Jacquelyn R; Maduro, Valerie V B; Maskeri, Baishali; McDowell, Jennifer C; Park, Morgan; Thomas, Pamela J; Young, Alice C; Blakesley, Robert W; Muzny, Donna M; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Jiang, Huaiyang; Weinstock, George M; Gibbs, Richard A; Graves, Tina; Fulton, Robert; Mardis, Elaine R; Wilson, Richard K; Clamp, Michele; Cuff, James.

Genome Res ; 17(6): 760-74, 2007 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-17567995

RESUMEN

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.

Asunto(s)

Evolución Molecular , Genoma Humano , Mamíferos/genética , Sistemas de Lectura Abierta , Filogenia , Alineación de Secuencia , Animales , Proyecto Genoma Humano , Humanos

14.

An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing.

Margulies, Elliott H; Vinson, Jade P; Miller, Webb; Jaffe, David B; Lindblad-Toh, Kerstin; Chang, Jean L; Green, Eric D; Lander, Eric S; Mullikin, James C; Clamp, Michele.

Proc Natl Acad Sci U S A ; 102(13): 4795-800, 2005 Mar 29.

Artículo en Inglés | MEDLINE | ID: mdl-15778292

RESUMEN

With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.

Asunto(s)

Secuencia Conservada/genética , Genoma Humano , Genómica/métodos , Mamíferos/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia de Bases , Biología Computacional , Humanos , Filogenia , Alineación de Secuencia

15.

Biological database design and implementation.

Birney, Ewan; Clamp, Michele.

Brief Bioinform ; 5(1): 31-8, 2004 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-15153304

RESUMEN

We present our experience of building biological databases. Such databases have most aspects in common with other complex databases in other fields. We do not believe that biological data are that different from complex data in other fields. Our experience has led us to emphasise simplicity and conservative technology choices when building these databases. This is a short paper of advice that we hope is useful to people designing their own biological database.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Diseño de Software , Almacenamiento y Recuperación de la Información , Interfaz Usuario-Computador

16.

Databases and tools for browsing genomes.

Birney, Ewan; Clamp, Michele; Hubbard, Tim.

Annu Rev Genomics Hum Genet ; 3: 293-310, 2002.

Artículo en Inglés | MEDLINE | ID: mdl-12194990

RESUMEN

To maximize the value of genome sequences they need to be integrated with other types of biological data and with each other. The entire collection of data then needs to be made available in a way that is easy to view and mine for complex relationships. The recently determined vertebrate genome sequences of human and mouse are so large that building the infrastructure to manage these datasets is a major challenge. This article reviews the database systems and tools for analysis that have so far been developed to address this.

Asunto(s)

Bases de Datos como Asunto , Genética , Genoma , Animales , Humanos , Internet , Programas Informáticos

17.

GeneWise and Genomewise.

Birney, Ewan; Clamp, Michele; Durbin, Richard.

Genome Res ; 14(5): 988-95, 2004 May.

Artículo en Inglés | MEDLINE | ID: mdl-15123596

RESUMEN

We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and EST-defined spliced structure. Both algorithms are heavily used by the Ensembl annotation system. The GeneWise algorithm was developed from a principled combination of hidden Markov models (HMMs). Both algorithms are highly accurate and can provide both accurate and complete gene structures when used with the correct evidence.

Asunto(s)

Programas Informáticos , Región de Flanqueo 3' , Región de Flanqueo 5' , Algoritmos , Biología Computacional/métodos , ADN Complementario , Modelos Teóricos , Valor Predictivo de las Pruebas , Proyectos de Investigación

18.

ESTGenes: alternative splicing from ESTs in Ensembl.

Eyras, Eduardo; Caccamo, Mario; Curwen, Val; Clamp, Michele.

Genome Res ; 14(5): 976-87, 2004 May.

Artículo en Inglés | MEDLINE | ID: mdl-15123595

RESUMEN

We describe a novel algorithm for deriving the minimal set of nonredundant transcripts compatible with the splicing structure of a set of ESTs mapped on a genome. Sets of ESTs with compatible splicing are represented by a special type of graph. We describe the algorithms for building the graphs and for deriving the minimal set of transcripts from the graphs that are compatible with the evidence. These algorithms are part of the Ensembl automatic gene annotation system, and its results, using ESTs, are provided at www.ensembl.org as ESTgenes for the mosquito, Caenorhabditis briggsae, C. elegans, zebrafish, human, mouse, and rat genomes. Here we also report on the results of this method applied to the human and mouse genomes.

Asunto(s)

Empalme Alternativo/genética , Etiquetas de Secuencia Expresada , Programas Informáticos , Animales , Caenorhabditis/genética , Caenorhabditis elegans/genética , Biología Computacional , Culicidae/genética , ADN de Helmintos/genética , Genes , Genes de Helminto , Genes de Insecto , Humanos , Ratones , Valor Predictivo de las Pruebas , Ratas , Reproducibilidad de los Resultados , Transcripción Genética , Pez Cebra/genética

19.

The Jalview Java alignment editor.

Clamp, Michele; Cuff, James; Searle, Stephen M; Barton, Geoffrey J.

Bioinformatics ; 20(3): 426-7, 2004 Feb 12.

Artículo en Inglés | MEDLINE | ID: mdl-14960472

RESUMEN

Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments. Due to growth in the sequence databases, multiple sequence alignments can often be large and difficult to view efficiently. The Jalview Java alignment editor is presented here, which enables fast viewing and editing of large multiple sequence alignments.

Asunto(s)

Documentación , Hipermedia , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Algoritmos , Sistemas de Administración de Bases de Datos , Procesamiento de Texto

20.

The Ensembl core software libraries.

Stabenau, Arne; McVicker, Graham; Melsopp, Craig; Proctor, Glenn; Clamp, Michele; Birney, Ewan.

Genome Res ; 14(5): 929-33, 2004 May.

Artículo en Inglés | MEDLINE | ID: mdl-15123588

RESUMEN

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means for programmers to access these data. By encapsulating the underlying database structure, the libraries present end users with a simple, abstract interface to a complex data model. Programs that use the libraries rather than SQL to access the data are unaffected by most schema changes. The architecture of the core software libraries, the schema, and the factors influencing their design are described. All code and data are freely available.

Asunto(s)

Biología Computacional , Programas Informáticos , Animales , Bases de Datos Genéticas , Humanos , Diseño de Software

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA