Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
1.
Nature ; 622(7981): 41-47, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37794265

RESUMO

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Assuntos
Genes , Genoma Humano , Anotação de Sequência Molecular , Isoformas de Proteínas , Humanos , Genoma Humano/genética , Anotação de Sequência Molecular/normas , Anotação de Sequência Molecular/tendências , Isoformas de Proteínas/genética , Projeto Genoma Humano , Pseudogenes , RNA/genética
2.
EMBO J ; 40(6): e107409, 2021 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-33565128

RESUMO

A new inter-governmental research infrastructure, ELIXIR, aims to unify bioinformatics resources and life science data across Europe, thereby facilitating their mining and (re-)use.


Assuntos
Pesquisa Biomédica , Biologia Computacional , Armazenamento e Recuperação da Informação , Disciplinas das Ciências Biológicas , Europa (Continente) , Humanos
3.
Nat Rev Genet ; 17(12): 758-772, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27773922

RESUMO

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.


Assuntos
Eucariotos/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Animais , Humanos
4.
J Proteome Res ; 20(4): 1821-1825, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33720718

RESUMO

The large diversity of experimental methods in proteomics as well as their increasing usage across biological and clinical research has led to the development of hundreds if not thousands of software tools to aid in the analysis and interpretation of the resulting data. Detailed information about these tools needs to be collected, categorized, and validated to guarantee their optimal utilization. A tools registry like bio.tools enables users and developers to identify new tools with more powerful algorithms or to find tools with similar functions for comparison. Here we present the content of the registry, which now comprises more than 1000 proteomics tool entries. Furthermore, we discuss future applications and engagement with other community efforts resulting in a high impact on the bioinformatics landscape.


Assuntos
Proteômica , Software , Algoritmos , Biologia Computacional
5.
BMC Genomics ; 21(1): 196, 2020 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-32126975

RESUMO

BACKGROUND: Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. RESULTS: Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon. CONCLUSIONS: This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.


Assuntos
Sequência Conservada , Éxons/genética , Locos de Características Quantitativas , Receptores Odorantes/genética , Animais , Curadoria de Dados/métodos , Bases de Dados Genéticas , Loci Gênicos , Genoma Humano , Humanos , Camundongos , Pseudogenes
8.
Genes Dev ; 25(1): 1-10, 2011 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-21205862

RESUMO

The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.


Assuntos
Inativação Gênica/fisiologia , Privacidade Genética , Anotação de Sequência Molecular , Privacidade Genética/tendências , Variação Genética , Genoma Humano/genética , Humanos , Polimorfismo de Nucleotídeo Único
9.
Nucleic Acids Res ; 44(D1): D710-6, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26687719

RESUMO

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.


Assuntos
Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Animais , Genes , Variação Genética , Humanos , Internet , Camundongos , Proteínas/genética , Ratos , Sequências Reguladoras de Ácido Nucleico , Software
11.
Nature ; 474(7351): 337-42, 2011 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-21677750

RESUMO

Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.


Assuntos
Deleção de Genes , Técnicas de Inativação de Genes/métodos , Genes/genética , Estudos de Associação Genética/métodos , Genoma/genética , Camundongos Knockout/genética , Alelos , Animais , Biologia Computacional , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Genes Letais/genética , Vetores Genéticos/genética , Genômica , Genótipo , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Mutagênese Insercional/métodos , Fenótipo , Reação em Cadeia da Polimerase , Ratos
12.
Nucleic Acids Res ; 43(Database issue): D123-9, 2015 01.
Artigo em Inglês | MEDLINE | ID: mdl-25352543

RESUMO

The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Mapeamento Cromossômico , Humanos , Internet , RNA não Traduzido/genética , Análise de Sequência de RNA
13.
Nucleic Acids Res ; 43(Database issue): D662-9, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25352552

RESUMO

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Epigênese Genética , Variação Genética , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , Software
14.
Proc Natl Acad Sci U S A ; 111(37): 13361-6, 2014 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-25157146

RESUMO

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Filogenia , Pseudogenes/genética , Animais , Evolução Molecular , Estudos de Associação Genética , Humanos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Homologia de Sequência do Ácido Nucleico
15.
Hum Mol Genet ; 23(22): 5866-78, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-24939910

RESUMO

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.


Assuntos
Proteínas/genética , Biologia Computacional , Genoma Humano , Humanos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/metabolismo , Proteômica
16.
Hum Mol Genet ; 23(16): 4420-32, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24688116

RESUMO

The genetic contribution to the variation in human lifespan is ∼ 25%. Despite the large number of identified disease-susceptibility loci, it is not known which loci influence population mortality. We performed a genome-wide association meta-analysis of 7729 long-lived individuals of European descent (≥ 85 years) and 16 121 younger controls (<65 years) followed by replication in an additional set of 13 060 long-lived individuals and 61 156 controls. In addition, we performed a subset analysis in cases aged ≥ 90 years. We observed genome-wide significant association with longevity, as reflected by survival to ages beyond 90 years, at a novel locus, rs2149954, on chromosome 5q33.3 (OR = 1.10, P = 1.74 × 10(-8)). We also confirmed association of rs4420638 on chromosome 19q13.32 (OR = 0.72, P = 3.40 × 10(-36)), representing the TOMM40/APOE/APOC1 locus. In a prospective meta-analysis (n = 34 103), the minor allele of rs2149954 (T) on chromosome 5q33.3 associates with increased survival (HR = 0.95, P = 0.003). This allele has previously been reported to associate with low blood pressure in middle age. Interestingly, the minor allele (T) associates with decreased cardiovascular mortality risk, independent of blood pressure. We report on the first GWAS-identified longevity locus on chromosome 5q33.3 influencing survival in the general European population. The minor allele of this locus associates with low blood pressure in middle age, although the contribution of this allele to survival may be less dependent on blood pressure. Hence, the pleiotropic mechanisms by which this intragenic variation contributes to lifespan regulation have to be elucidated.


Assuntos
Loci Gênicos/fisiologia , Longevidade/genética , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Doenças Cardiovasculares/genética , Mapeamento Cromossômico , Cromossomos Humanos Par 19 , Cromossomos Humanos Par 5 , Feminino , Estudo de Associação Genômica Ampla , Humanos , Hipertensão/genética , Masculino , Fenótipo , Estudos Prospectivos , População Branca
17.
Genome Res ; 23(12): 1961-73, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24172201

RESUMO

The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.


Assuntos
Genômica/métodos , Anotação de Sequência Molecular , Proteínas/genética , Transcriptoma , Processamento Alternativo , Animais , Bases de Dados Genéticas , Evolução Molecular , Genoma Humano , Humanos , Proteômica , RNA Longo não Codificante , Alinhamento de Sequência
18.
Nat Methods ; 10(12): 1177-84, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24185837

RESUMO

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.


Assuntos
Biologia Computacional/métodos , Splicing de RNA , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Caenorhabditis elegans , Drosophila melanogaster , Éxons , Perfilação da Expressão Gênica , Genoma , Humanos , Íntrons , Sítios de Splice de RNA , RNA Mensageiro/metabolismo , Software
19.
Nat Methods ; 10(12): 1185-91, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24185836

RESUMO

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.


Assuntos
Splicing de RNA , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Animais , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Éxons , Reações Falso-Positivas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Células K562 , Camundongos , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes , Software
20.
Nucleic Acids Res ; 42(Database issue): D771-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24316575

RESUMO

The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).


Assuntos
Bases de Dados Genéticas , Genoma , Anotação de Sequência Molecular , Animais , Genoma Humano , Genômica , Humanos , Internet , Camundongos , Camundongos Endogâmicos NOD , Camundongos Knockout , Ratos , Suínos/genética , Peixe-Zebra/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA