Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D891-D899, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953337

RESUMO

Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Genoma , Anotação de Sequência Molecular , Filogenia , Software , Humanos
2.
bioRxiv ; 2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37546854

RESUMO

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

3.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36420896

RESUMO

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
Biologia Computacional , Genoma Humano , Humanos , Animais , Camundongos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Genoma Humano/genética , Transcriptoma/genética , Perfilação da Expressão Gênica , Bases de Dados Genéticas
4.
Nucleic Acids Res ; 51(D1): D933-D941, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36318249

RESUMO

Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.


Assuntos
Bases de Dados Genéticas , Software , Animais , Humanos , Anotação de Sequência Molecular , Genômica , Genoma
5.
Nucleic Acids Res ; 50(D1): D988-D995, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791404

RESUMO

Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Anotação de Sequência Molecular , Software , Animais , Biologia Computacional/classificação , Humanos
6.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270111

RESUMO

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genética
7.
BMC Genomics ; 21(1): 196, 2020 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-32126975

RESUMO

BACKGROUND: Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. RESULTS: Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon. CONCLUSIONS: This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.


Assuntos
Sequência Conservada , Éxons/genética , Locos de Características Quantitativas , Receptores Odorantes/genética , Animais , Curadoria de Dados/métodos , Bases de Dados Genéticas , Loci Gênicos , Genoma Humano , Humanos , Camundongos , Pseudogenes
8.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357393

RESUMO

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Pseudogenes/genética , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Software
9.
Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29126148

RESUMO

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Assuntos
Sequência Consenso , Bases de Dados Genéticas , Fases de Leitura Aberta , Animais , Curadoria de Dados/métodos , Curadoria de Dados/normas , Bases de Dados Genéticas/normas , Guias como Assunto , Humanos , Camundongos , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos , Interface Usuário-Computador
10.
Genome Res ; 22(9): 1760-74, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955987

RESUMO

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não Traduzidas
11.
Science ; 335(6070): 823-8, 2012 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-22344438

RESUMO

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.


Assuntos
Variação Genética , Genoma Humano , Proteínas/genética , Doença/genética , Expressão Gênica , Frequência do Gene , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção Genética
12.
J Gen Virol ; 90(Pt 7): 1622-1628, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19339480

RESUMO

Tunisia is a medium-level epidemic country for hepatitis B virus (HBV). This study characterizes, for the first time, full genome HBV strains from Tunisia. Viral load quantification and phylogenetic analyses of full genome or pre-S/S sequences were performed on 196 hepatitis B surface antigen (HBsAg)-positive plasma samples from Tunisian blood donors. The median viral load was 64.65 IU ml(-1) (range<5-7.7x10(8) IU ml(-1)) and 89% of samples had viral loads below 10,000 IU ml(-1). Fifty-nine strains formed a novel subgenotype D7, 41 strains clustered in subgenotype D1, seven strains in subgenotype A2 and one strain in genotype C. The novel subgenotype D7 was defined by maximum Bayesian posterior probability, a genetic divergence from other HBV/D subgenotypes by >4% and a stronger HBV/E signal in the X to core genes than subgenotype D1. In conclusion, HBV/D is dominant in asymptomatic Tunisian HBsAg carriers and a novel subgenotype, D7, was the most common subgenotype found in this population.


Assuntos
DNA Viral/genética , Genoma Viral , Vírus da Hepatite B/classificação , Vírus da Hepatite B/isolamento & purificação , Hepatite B/virologia , Análise de Sequência de DNA , Doadores de Sangue , Análise por Conglomerados , DNA Viral/química , Genótipo , Antígenos de Superfície da Hepatite B/genética , Vírus da Hepatite B/genética , Humanos , Dados de Sequência Molecular , Filogenia , Homologia de Sequência , Tunísia , Carga Viral
14.
Microb Pathog ; 43(5-6): 198-207, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17600669

RESUMO

The contribution of gamma-glutamyl transpeptidase (GGT) to Campylobacter jejuni virulence and colonization of the avian gut has been investigated. The presence of the ggt gene in C. jejuni strains directly correlated with the expression of GGT activity as measured by cleavage and transfer of the gamma-glutamyl moiety. Inactivation of the monocistronic ggt gene in C. jejuni strain 81116 resulted in isogenic mutants with undetectable GGT activity; nevertheless, these mutants grew normally in vitro. However, the mutants had increased motility, a 5.4-fold higher invasion efficiency into INT407 cells in vitro and increased resistance to hydrogen peroxide stress. Moreover, the apoptosis-inducing activity of the ggt mutant was significantly lower than that of the parental strain. In vivo studies showed that, although GGT activity was not required for initial colonization of 1-day-old chicks, the enzyme was required for persistent colonization of the avian gut.


Assuntos
Infecções por Campylobacter/veterinária , Campylobacter jejuni/fisiologia , Trato Gastrointestinal/microbiologia , gama-Glutamiltransferase/fisiologia , Adaptação Biológica/imunologia , Animais , Aves , Infecções por Campylobacter/microbiologia , Campylobacter jejuni/enzimologia , Campylobacter jejuni/genética , Campylobacter jejuni/patogenicidade , Galinhas/microbiologia , gama-Glutamiltransferase/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...