Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36420896

RESUMO

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
Biologia Computacional , Genoma Humano , Humanos , Animais , Camundongos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Genoma Humano/genética , Transcriptoma/genética , Perfilação da Expressão Gênica , Bases de Dados Genéticas
2.
Nucleic Acids Res ; 51(D1): D933-D941, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36318249

RESUMO

Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.


Assuntos
Bases de Dados Genéticas , Software , Animais , Humanos , Anotação de Sequência Molecular , Genômica , Genoma
3.
Nucleic Acids Res ; 50(D1): D988-D995, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791404

RESUMO

Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Anotação de Sequência Molecular , Software , Animais , Biologia Computacional/classificação , Humanos
4.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270111

RESUMO

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genética
5.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357393

RESUMO

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Pseudogenes/genética , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Software
6.
Genome Res ; 22(9): 1760-74, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955987

RESUMO

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não Traduzidas
7.
Science ; 335(6070): 823-8, 2012 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-22344438

RESUMO

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.


Assuntos
Variação Genética , Genoma Humano , Proteínas/genética , Doença/genética , Expressão Gênica , Frequência do Gene , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção Genética
8.
Microbiology (Reading) ; 156(Pt 11): 3255-3269, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20829291

RESUMO

Comparison of the complete genome sequence of Bacteroides fragilis 638R, originally isolated in the USA, was made with two previously sequenced strains isolated in the UK (NCTC 9343) and Japan (YCH46). The presence of 10 loci containing genes associated with polysaccharide (PS) biosynthesis, each including a putative Wzx flippase and Wzy polymerase, was confirmed in all three strains, despite a lack of cross-reactivity between NCTC 9343 and 638R surface PS-specific antibodies by immunolabelling and microscopy. Genomic comparisons revealed an exceptional level of PS biosynthesis locus diversity. Of the 10 divergent PS-associated loci apparent in each strain, none is similar between NCTC 9343 and 638R. YCH46 shares one locus with NCTC 9343, confirmed by mAb labelling, and a second different locus with 638R, making a total of 28 divergent PS biosynthesis loci amongst the three strains. The lack of expression of the phase-variable large capsule (LC) in strain 638R, observed in NCTC 9343, is likely to be due to a point mutation that generates a stop codon within a putative initiating glycosyltransferase, necessary for the expression of the LC in NCTC 9343. Other major sequence differences were observed to arise from different numbers and variety of inserted extra-chromosomal elements, in particular prophages. Extensive horizontal gene transfer has occurred within these strains, despite the presence of a significant number of divergent DNA restriction and modification systems that act to prevent acquisition of foreign DNA. The level of amongst-strain diversity in PS biosynthesis loci is unprecedented.


Assuntos
Cápsulas Bacterianas/genética , Bacteroides fragilis/genética , Variação Genética , Genoma Bacteriano , Cápsulas Bacterianas/biossíntese , Bacteroides fragilis/isolamento & purificação , Hibridização Genômica Comparativa , DNA Bacteriano/genética , Humanos , Dados de Sequência Molecular , Análise de Sequência de DNA
9.
BMC Genomics ; 10: 302, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19583835

RESUMO

BACKGROUND: The Gram-negative bacterium Photorhabdus asymbiotica (Pa) has been recovered from human infections in both North America and Australia. Recently, Pa has been shown to have a nematode vector that can also infect insects, like its sister species the insect pathogen P. luminescens (Pl). To understand the relationship between pathogenicity to insects and humans in Photorhabdus we have sequenced the complete genome of Pa strain ATCC43949 from North America. This strain (formerly referred to as Xenorhabdus luminescens strain 2) was isolated in 1977 from the blood of an 80 year old female patient with endocarditis, in Maryland, USA. Here we compare the complete genome of Pa ATCC43949 with that of the previously sequenced insect pathogen P. luminescens strain TT01 which was isolated from its entomopathogenic nematode vector collected from soil in Trinidad and Tobago. RESULTS: We found that the human pathogen Pa had a smaller genome (5,064,808 bp) than that of the insect pathogen Pl (5,688,987 bp) but that each pathogen carries approximately one megabase of DNA that is unique to each strain. The reduced size of the Pa genome is associated with a smaller diversity in insecticidal genes such as those encoding the Toxin complexes (Tc's), Makes caterpillars floppy (Mcf) toxins and the Photorhabdus Virulence Cassettes (PVCs). The Pa genome, however, also shows the addition of a plasmid related to pMT1 from Yersinia pestis and several novel pathogenicity islands including a novel Type Three Secretion System (TTSS) encoding island. Together these data suggest that Pa may show virulence against man via the acquisition of the pMT1-like plasmid and specific effectors, such as SopB, that promote its persistence inside human macrophages. Interestingly the loss of insecticidal genes in Pa is not reflected by a loss of pathogenicity towards insects. CONCLUSION: Our results suggest that North American isolates of Pa have acquired virulence against man via the acquisition of a plasmid and specific virulence factors with similarity to those shown to play roles in pathogenicity against humans in other bacteria.


Assuntos
Hibridização Genômica Comparativa , Genoma Bacteriano , Photorhabdus/genética , Photorhabdus/patogenicidade , Animais , Linhagem Celular , Doenças Transmissíveis Emergentes/microbiologia , DNA Bacteriano/genética , Infecções por Enterobacteriaceae/microbiologia , Ilhas Genômicas , Genômica , Humanos , Camundongos , Mariposas/microbiologia , América do Norte , Photorhabdus/isolamento & purificação , Plasmídeos , Análise de Sequência de DNA , Especificidade da Espécie , Virulência
10.
BMC Genomics ; 10: 239, 2009 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-19460133

RESUMO

BACKGROUND: Chlamydia trachomatis is the most common cause of sexually transmitted infections globally and the leading cause of preventable blindness in the developing world. There are two biovariants of C. trachomatis: 'trachoma', causing ocular and genital tract infections, and the invasive 'lymphogranuloma venereum' strains. Recently, a new variant of the genital tract C. trachomatis emerged in Sweden. This variant escaped routine diagnostic tests because it carries a plasmid with a deletion. Failure to detect this strain has meant it has spread rapidly across the country provoking a worldwide alert. In addition to being a key diagnostic target, the plasmid has been linked to chlamydial virulence. Analysis of chlamydial plasmids and their cognate chromosomes was undertaken to provide insights into the evolutionary relationship between chromosome and plasmid. This is essential knowledge if the plasmid is to be continued to be relied on as a key diagnostic marker, and for an understanding of the evolution of Chlamydia trachomatis. RESULTS: The genomes of two new C. trachomatis strains were sequenced, together with plasmids from six C. trachomatis isolates, including the new variant strain from Sweden. The plasmid from the new Swedish variant has a 377 bp deletion in the first predicted coding sequence, abolishing the site used for PCR detection, resulting in negative diagnosis. In addition, the variant plasmid has a 44 bp duplication downstream of the deletion. The region containing the second predicted coding sequence is the most highly conserved region of the plasmids investigated. Phylogenetic analysis of the plasmids and chromosomes are fully congruent. Moreover this analysis also shows that ocular and genital strains diverged from a common C. trachomatis progenitor. CONCLUSION: The evolutionary pathways of the chlamydial genome and plasmid imply that inheritance of the plasmid is tightly linked with its cognate chromosome. These data suggest that the plasmid is not a highly mobile genetic element and does not transfer readily between isolates. Comparative analysis of the plasmid sequences has revealed the most conserved regions that should be used to design future plasmid based nucleic acid amplification tests, to avoid diagnostic failures.


Assuntos
Chlamydia trachomatis/genética , Evolução Molecular , Genoma Bacteriano , Plasmídeos/genética , Técnicas de Tipagem Bacteriana , Chlamydia trachomatis/classificação , Chlamydia trachomatis/isolamento & purificação , DNA Bacteriano/genética , Humanos , Mutação INDEL , Filogenia , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência , Análise de Sequência de DNA , Deleção de Sequência , Suécia
11.
BMC Genomics ; 10: 54, 2009 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-19175920

RESUMO

BACKGROUND: Streptococcus uberis, a Gram positive bacterial pathogen responsible for a significant proportion of bovine mastitis in commercial dairy herds, colonises multiple body sites of the cow including the gut, genital tract and mammary gland. Comparative analysis of the complete genome sequence of S. uberis strain 0140J was undertaken to help elucidate the biology of this effective bovine pathogen. RESULTS: The genome revealed 1,825 predicted coding sequences (CDSs) of which 62 were identified as pseudogenes or gene fragments. Comparisons with related pyogenic streptococci identified a conserved core (40%) of orthologous CDSs. Intriguingly, S. uberis 0140J displayed a lower number of mobile genetic elements when compared with other pyogenic streptococci, however bacteriophage-derived islands and a putative genomic island were identified. Comparative genomics analysis revealed most similarity to the genomes of Streptococcus agalactiae and Streptococcus equi subsp. zooepidemicus. In contrast, streptococcal orthologs were not identified for 11% of the CDSs, indicating either unique retention of ancestral sequence, or acquisition of sequence from alternative sources. Functions including transport, catabolism, regulation and CDSs encoding cell envelope proteins were over-represented in this unique gene set; a limited array of putative virulence CDSs were identified. CONCLUSION: S. uberis utilises nutritional flexibility derived from a diversity of metabolic options to successfully occupy a discrete ecological niche. The features observed in S. uberis are strongly suggestive of an opportunistic pathogen adapted to challenging and changing environmental parameters.


Assuntos
Adaptação Biológica/genética , Genoma Bacteriano , Streptococcus/genética , Animais , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Bovinos , Hibridização Genômica Comparativa , DNA Bacteriano/genética , Evolução Molecular , Perfilação da Expressão Gênica , Genes Bacterianos , Ilhas Genômicas , Mastite Bovina/microbiologia , Filogenia , Análise de Sequência de DNA , Streptococcus/metabolismo , Streptococcus/patogenicidade , Virulência
12.
Genome Res ; 19(1): 12-23, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19047519

RESUMO

Pseudomonas aeruginosa isolates have a highly conserved core genome representing up to 90% of the total genomic sequence with additional variable accessory genes, many of which are found in genomic islands or islets. The identification of the Liverpool Epidemic Strain (LES) in a children's cystic fibrosis (CF) unit in 1996 and its subsequent observation in several centers in the United Kingdom challenged the previous widespread assumption that CF patients acquire only unique strains of P. aeruginosa from the environment. To learn about the forces that shaped the development of this important epidemic strain, the genome of the earliest archived LES isolate, LESB58, was sequenced. The sequence revealed the presence of many large genomic islands, including five prophage clusters, one defective (pyocin) prophage cluster, and five non-phage islands. To determine the role of these clusters, an unbiased signature tagged mutagenesis study was performed, followed by selection in the chronic rat lung infection model. Forty-seven mutants were identified by sequencing, including mutants in several genes known to be involved in Pseudomonas infection. Furthermore, genes from four prophage clusters and one genomic island were identified and in direct competition studies with the parent isolate; four were demonstrated to strongly impact on competitiveness in the chronic rat lung infection model. This strongly indicates that enhanced in vivo competitiveness is a major driver for maintenance and diversifying selection of these genomic prophage genes.


Assuntos
Prófagos/genética , Infecções por Pseudomonas/microbiologia , Fagos de Pseudomonas/genética , Pseudomonas aeruginosa/patogenicidade , Pseudomonas aeruginosa/virologia , Animais , Surtos de Doenças , Farmacorresistência Bacteriana/genética , Inglaterra/epidemiologia , Proteínas de Fímbrias/genética , Genes Bacterianos , Genes Virais , Genoma Bacteriano , Humanos , Família Multigênica , Mutagênese , Antígenos O/genética , Prófagos/isolamento & purificação , Prófagos/patogenicidade , Infecções por Pseudomonas/epidemiologia , Fagos de Pseudomonas/isolamento & purificação , Fagos de Pseudomonas/patogenicidade , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/isolamento & purificação , Ratos , Virulência/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA