Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Mol Ecol Resour ; 21(1): 18-29, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32180366

RESUMO

De novo transcriptome assembly is a powerful tool, and has been widely used over the last decade for making evolutionary inferences. However, it relies on two implicit assumptions: that the assembled transcriptome is an unbiased representation of the underlying expressed transcriptome, and that expression estimates from the assembly are good, if noisy approximations of the relative abundance of expressed transcripts. Using publicly available data for model organisms, we demonstrate that, across assembly algorithms and data sets, these assumptions are consistently violated. Bias exists at the nucleotide level, with genotyping error rates ranging from 30% to 83%. As a result, diversity is underestimated in transcriptome assemblies, with consistent underestimation of heterozygosity in all but the most inbred samples. Even at the gene level, expression estimates show wide deviations from map-to-reference estimates, and positive bias at lower expression levels. Standard filtering of transcriptome assemblies improves the robustness of gene expression estimates but leads to the loss of a meaningful number of protein-coding genes, including many that are highly expressed. We demonstrate a computational method, length-rescaled CPM, to partly alleviate noise and bias in expression estimates. Researchers should consider ways to minimize the impact of bias in transcriptome assemblies.


Assuntos
Viés , Perfilação da Expressão Gênica , Transcriptoma , Algoritmos
2.
Syst Biol ; 68(6): 937-955, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31135914

RESUMO

Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.


Assuntos
Genoma/genética , Paleógnatas/classificação , Paleógnatas/genética , Filogenia , Animais , Genômica
3.
Science ; 364(6435): 74-78, 2019 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-30948549

RESUMO

A core question in evolutionary biology is whether convergent phenotypic evolution is driven by convergent molecular changes in proteins or regulatory regions. We combined phylogenomic, developmental, and epigenomic analysis of 11 new genomes of paleognathous birds, including an extinct moa, to show that convergent evolution of regulatory regions, more so than protein-coding genes, is prevalent among developmental pathways associated with independent losses of flight. A Bayesian analysis of 284,001 conserved noncoding elements, 60,665 of which are corroborated as enhancers by open chromatin states during development, identified 2355 independent accelerations along lineages of flightless paleognaths, with functional consequences for driving gene expression in the developing forelimb. Our results suggest that the genomic landscape associated with morphological convergence in ratites has a substantial shared regulatory component.


Assuntos
Evolução Biológica , Epigênese Genética , Evolução Molecular , Voo Animal , Paleógnatas/anatomia & histologia , Paleógnatas/genética , Animais , Teorema de Bayes , Cromatina/metabolismo , Sequência Conservada , Elementos Facilitadores Genéticos , Epigenômica , Éxons/genética , Extinção Biológica , Membro Anterior/anatomia & histologia , Paleógnatas/fisiologia , Fenótipo , Filogenia
4.
AIMS Microbiol ; 4(2): 240-260, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31294213

RESUMO

BACKGROUND: The deep-sea mussels Bathymodiolus azoricus (Bivalvia: Mytilidae) are the dominant macrofauna subsisting at the hydrothermal vents site Menez Gwen in the Mid-Atlantic Ridge (MAR). Their adaptive success in such challenging environments is largely due to their gill symbiotic association with chemosynthetic bacteria. We examined the response of vent mussels as they adapt to sea-level environmental conditions, through an assessment of the relative abundance of host-symbiont related RNA transcripts to better understand how the gill microbiome may drive host-symbiont interactions in vent mussels during hypothetical venting inactivity. RESULTS: The metatranscriptome of B. azoricus was sequenced from gill tissues sampled at different time-points during a five-week acclimatization experiment, using Next-Generation-Sequencing. After Illumina sequencing, a total of 181,985,262 paired-end reads of 150 bp were generated with an average of 16,544,115 read per sample. Metatranscriptome analysis confirmed that experimental acclimatization in aquaria accounted for global gill transcript variation. Additionally, the analysis of 16S and 18S rRNA sequences data allowed for a comprehensive characterization of host-symbiont interactions, which included the gradual loss of gill endosymbionts and signaling pathways, associated with stress responses and energy metabolism, under experimental acclimatization. Dominant active transcripts were assigned to the following KEGG categories: "Ribosome", "Oxidative phosphorylation" and "Chaperones and folding catalysts" suggesting specific metabolic responses to physiological adaptations in aquarium environment. CONCLUSIONS: Gill metagenomics analyses highlighted microbial diversity shifts and a clear pattern of varying mRNA transcript abundancies and expression during acclimatization to aquarium conditions which indicate change in bacterial community activity. This approach holds potential for the discovery of new host-symbiont associations, evidencing new functional transcripts and a clearer picture of methane metabolism during loss of endosymbionts. Towards the end of acclimatization, we observed trends in three major functional subsystems, as evidenced by an increment of transcripts related to genetic information processes; the decrease of chaperone and folding catalysts and oxidative phosphorylation transcripts; but no change in transcripts of gluconeogenesis and co-factors-vitamins.

5.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-21993624

RESUMO

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Assuntos
Evolução Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animais , Doença , Éxons/genética , Genômica , Saúde , Humanos , Anotação de Sequência Molecular , Filogenia , RNA/classificação , RNA/genética , Seleção Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA
6.
Science ; 333(6045): 1019-24, 2011 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-21852499

RESUMO

The gain, loss, and modification of gene regulatory elements may underlie a substantial proportion of phenotypic changes on animal lineages. To investigate the gain of regulatory elements throughout vertebrate evolution, we identified genome-wide sets of putative regulatory regions for five vertebrates, including humans. These putative regulatory regions are conserved nonexonic elements (CNEEs), which are evolutionarily conserved yet do not overlap any coding or noncoding mature transcript. We then inferred the branch on which each CNEE came under selective constraint. Our analysis identified three extended periods in the evolution of gene regulatory elements. Early vertebrate evolution was characterized by regulatory gains near transcription factors and developmental genes, but this trend was replaced by innovations near extracellular signaling genes, and then innovations near posttranslational protein modifiers.


Assuntos
Evolução Biológica , Sequência Conservada , Evolução Molecular , Elementos Reguladores de Transcrição , Sequências Reguladoras de Ácido Nucleico , Vertebrados/genética , Animais , Bovinos , DNA Intergênico/genética , Regulação da Expressão Gênica , Genes Controladores do Desenvolvimento , Genoma , Humanos , Cadeias de Markov , Camundongos , Oryzias/genética , Filogenia , Processamento de Proteína Pós-Traducional/genética , Seleção Genética , Alinhamento de Sequência , Smegmamorpha/genética , Fatores de Transcrição/genética
7.
Bioinformatics ; 25(12): i54-62, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19478016

RESUMO

MOTIVATION: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. RESULTS: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection. AVAILABILITY: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genômica/métodos , Alinhamento de Sequência/métodos , Sequência de Bases , Evolução Molecular , Software
8.
Bioinformatics ; 25(9): 1189-91, 2009 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-19151095

RESUMO

UNLABELLED: Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server. AVAILABILITY: The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Software , Bases de Dados de Proteínas , Análise de Sequência de Proteína
9.
Genome Res ; 17(11): 1675-89, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17975172

RESUMO

The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing approximately 65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence.


Assuntos
Gatos/genética , Genoma , Genômica , Animais , Cães , Humanos , Camundongos , MicroRNAs , Repetições de Microssatélites , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Ratos , Sequências Repetitivas de Ácido Nucleico
10.
Proc Natl Acad Sci U S A ; 104(49): 19428-33, 2007 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-18040051

RESUMO

Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.


Assuntos
Código Genético , Genoma Humano/genética , Genômica , Fases de Leitura Aberta/genética , Proteínas/genética , Animais , Sequência de Bases , Elementos de DNA Transponíveis/genética , Cães , Genes/genética , Humanos , Camundongos , Dados de Sequência Molecular , Pseudogenes/genética , Análise de Sequência de DNA
11.
Genome Res ; 17(6): 760-74, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17567995

RESUMO

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.


Assuntos
Evolução Molecular , Genoma Humano , Mamíferos/genética , Fases de Leitura Aberta , Filogenia , Alinhamento de Sequência , Animais , Projeto Genoma Humano , Humanos
12.
Nature ; 447(7141): 167-77, 2007 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-17495919

RESUMO

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Assuntos
Evolução Molecular , Genoma/genética , Genômica , Gambás/genética , Animais , Composição de Bases , Sequência Conservada/genética , Elementos de DNA Transponíveis/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Biossíntese de Proteínas , Sintenia/genética , Inativação do Cromossomo X/genética
13.
Nature ; 438(7069): 803-19, 2005 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-16341006

RESUMO

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.


Assuntos
Cães/genética , Evolução Molecular , Genoma/genética , Genômica , Haplótipos/genética , Animais , Sequência Conservada/genética , Doenças do Cão/genética , Cães/classificação , Feminino , Humanos , Hibridização Genética , Masculino , Camundongos , Mutagênese/genética , Polimorfismo de Nucleotídeo Único/genética , Ratos , Elementos Nucleotídeos Curtos e Dispersos/genética , Sintenia/genética
14.
Proc Natl Acad Sci U S A ; 102(13): 4795-800, 2005 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-15778292

RESUMO

With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.


Assuntos
Sequência Conservada/genética , Genoma Humano , Genômica/métodos , Mamíferos/genética , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Biologia Computacional , Humanos , Filogenia , Alinhamento de Sequência
15.
Genome Res ; 14(5): 929-33, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123588

RESUMO

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means for programmers to access these data. By encapsulating the underlying database structure, the libraries present end users with a simple, abstract interface to a complex data model. Programs that use the libraries rather than SQL to access the data are unaffected by most schema changes. The architecture of the core software libraries, the schema, and the factors influencing their design are described. All code and data are freely available.


Assuntos
Biologia Computacional , Software , Animais , Bases de Dados Genéticas , Humanos , Design de Software
16.
Genome Res ; 14(5): 934-41, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123589

RESUMO

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.


Assuntos
Biologia Computacional/métodos , Sequência de Bases/genética , DNA/genética , Bases de Dados Genéticas/normas , Linguagens de Programação , Proteínas/classificação , Software , Design de Software
17.
Genome Res ; 14(5): 942-50, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123590

RESUMO

As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C. briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.


Assuntos
Automação , Biologia Computacional/métodos , Genes/fisiologia , Animais , Anopheles/genética , Caenorhabditis/genética , DNA/genética , DNA de Helmintos/genética , Etiquetas de Sequências Expressas , Dosagem de Genes , Genes de Helmintos/fisiologia , Genes de Insetos/fisiologia , Genoma , Genoma Humano , Proteínas de Helminto/genética , Humanos , Proteínas de Insetos/genética , Camundongos , Valor Preditivo dos Testes , Proteínas/genética , Pseudogenes/genética , Ratos , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Software , Sequências de Repetição em Tandem/genética , Regiões não Traduzidas/genética
18.
Genome Res ; 14(5): 963-70, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123593

RESUMO

With the completion of the human genome sequence and genome sequence available for other vertebrate genomes, the task of manual annotation at the large genome scale has become a priority. Possibly even more important, is the requirement to curate and improve this annotation in the light of future data. For this to be possible, there is a need for tools to access and manage the annotation. Ensembl provides an excellent means for storing gene structures, genome features, and sequence, but it does not support the extra textual data necessary for manual annotation. We have extended Ensembl to create the Otter manual annotation system. This comprises a relational database schema for storing the manual annotation data, an application-programming interface (API) to access it, an extensible markup language (XML) format to allow transfer of the data, and a server to allow multiuser/multimachine access to the data. We have also written a data-adaptor plugin for the Apollo Browser/Editor to enable it to utilize an Otter server. The otter database is currently used by the Vertebrate Genome Annotation (VEGA) site (http://vega.sanger.ac.uk), which provides access to manually curated human chromosomes. Support is also being developed for using the AceDB annotation editor, FMap, via a perl wrapper called Lace. The Human and Vertebrate Annotation (HAVANA) group annotators at the Sanger center are using this to annotate human chromosomes 1 and 20.


Assuntos
Software , Biologia Computacional/métodos , Bases de Dados Genéticas , Genes/fisiologia , Genoma Humano , Humanos , Sistemas On-Line
19.
Genome Res ; 14(5): 976-87, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123595

RESUMO

We describe a novel algorithm for deriving the minimal set of nonredundant transcripts compatible with the splicing structure of a set of ESTs mapped on a genome. Sets of ESTs with compatible splicing are represented by a special type of graph. We describe the algorithms for building the graphs and for deriving the minimal set of transcripts from the graphs that are compatible with the evidence. These algorithms are part of the Ensembl automatic gene annotation system, and its results, using ESTs, are provided at www.ensembl.org as ESTgenes for the mosquito, Caenorhabditis briggsae, C. elegans, zebrafish, human, mouse, and rat genomes. Here we also report on the results of this method applied to the human and mouse genomes.


Assuntos
Processamento Alternativo/genética , Etiquetas de Sequências Expressas , Software , Animais , Caenorhabditis/genética , Caenorhabditis elegans/genética , Biologia Computacional , Culicidae/genética , DNA de Helmintos/genética , Genes , Genes de Helmintos , Genes de Insetos , Humanos , Camundongos , Valor Preditivo dos Testes , Ratos , Reprodutibilidade dos Testes , Transcrição Gênica , Peixe-Zebra/genética
20.
Genome Res ; 14(5): 988-95, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123596

RESUMO

We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and EST-defined spliced structure. Both algorithms are heavily used by the Ensembl annotation system. The GeneWise algorithm was developed from a principled combination of hidden Markov models (HMMs). Both algorithms are highly accurate and can provide both accurate and complete gene structures when used with the correct evidence.


Assuntos
Software , Região 3'-Flanqueadora , Região 5'-Flanqueadora , Algoritmos , Biologia Computacional/métodos , DNA Complementar , Modelos Teóricos , Valor Preditivo dos Testes , Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA