Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Nature ; 622(7981): 41-47, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37794265

RESUMO

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Assuntos
Genes , Genoma Humano , Anotação de Sequência Molecular , Isoformas de Proteínas , Humanos , Genoma Humano/genética , Anotação de Sequência Molecular/normas , Anotação de Sequência Molecular/tendências , Isoformas de Proteínas/genética , Projeto Genoma Humano , Pseudogenes , RNA/genética
2.
Cell Biol Toxicol ; 36(3): 261-272, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31599373

RESUMO

In the advanced stages, malignant melanoma (MM) has a very poor prognosis. Due to tremendous efforts in cancer research over the last 10 years, and the introduction of novel therapies such as targeted therapies and immunomodulators, the rather dark horizon of the median survival has dramatically changed from under 1 year to several years. With the advent of proteomics, deep-mining studies can reach low-abundant expression levels. The complexity of the proteome, however, still surpasses the dynamic range capabilities of current analytical techniques. Consequently, many predicted protein products with potential biological functions have not yet been verified in experimental proteomic data. This category of 'missing proteins' (MP) is comprised of all proteins that have been predicted but are currently unverified. As part of the initiative launched in 2016 in the USA, the European Cancer Moonshot Center has performed numerous deep proteomics analyses on samples from MM patients. In this study, nine MPs were clearly identified by mass spectrometry in MM metastases. Some MPs significantly correlated with proteins that possess identical PFAM structural domains; and other MPs were significantly associated with cancer-related proteins. This is the first study to our knowledge, where unknown and novel proteins have been annotated in metastatic melanoma tumour tissue.


Assuntos
Melanoma/genética , Metástase Neoplásica/genética , Proteômica/métodos , Adulto , Biomarcadores Tumorais/genética , Feminino , Genoma Humano/genética , Humanos , Masculino , Pessoa de Meia-Idade , Anotação de Sequência Molecular/métodos , Anotação de Sequência Molecular/tendências , Prognóstico , Proteoma/genética , Proteoma/metabolismo , Neoplasias Cutâneas/genética , Melanoma Maligno Cutâneo
3.
Genome Biol ; 20(1): 244, 2019 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-31744546

RESUMO

BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


Assuntos
Anotação de Sequência Molecular/tendências , Animais , Biofilmes , Candida albicans/genética , Drosophila melanogaster/genética , Genoma Bacteriano , Genoma Fúngico , Humanos , Locomoção , Memória de Longo Prazo , Anotação de Sequência Molecular/métodos , Pseudomonas aeruginosa/genética
4.
Gigascience ; 7(8)2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30107399

RESUMO

Background: The Gene Ontology (GO) is one of the most widely used resources in molecular and cellular biology, largely through the use of "enrichment analysis." To facilitate informed use of GO, we present GOtrack (https://gotrack.msl.ubc.ca), which provides access to historical records and trends in the GO and GO annotations. Findings: GOtrack gives users access to gene- and term-level information on annotations for nine model organisms as well as an interactive tool that measures the stability of enrichment results over time for user-provided "hit lists" of genes. To document the effects of GO evolution on enrichment, we analyzed more than 2,500 published hit lists of human genes (most older than 9 years ); 53% of hit lists were considered to yield significantly stable enrichment results. Conclusions: Because stability is far from assured for any individual hit list, GOtrack can lead to more informed and cautious application of GO to genomics research.


Assuntos
Ontologia Genética/tendências , Genômica/métodos , Anotação de Sequência Molecular/tendências , Animais , Eucariotos/genética , Humanos
5.
Microb Biotechnol ; 11(4): 588-605, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29806194

RESUMO

Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.


Assuntos
Bactérias/genética , Genoma Bacteriano , Anotação de Sequência Molecular/tendências , Bactérias/classificação , Bactérias/isolamento & purificação , Big Data , Biologia Computacional , Bases de Dados Genéticas , Anotação de Sequência Molecular/métodos
7.
Arq. bras. med. vet. zootec ; 68(2): 489-496, mar.-abr. 2016. tab
Artigo em Português | LILACS | ID: lil-779784

RESUMO

Objetivou-se com este estudo estimar parâmetros genéticos para produções parciais e acumuladas de ovos em uma linha fêmea de frangos de corte comercial. Foram considerados 10 períodos mensais entre 25 e 64 semanas, três períodos parciais de 25 a 32, 33 a 48 e 49 a 64 semanas, e três períodos acumulados de 25 até 30, 40 e 50 semanas de idade. Os componentes de covariância e parâmetros genéticos foram obtidos pelo método da máxima verossimilhança restrita, sob o modelo animal considerando o efeito fixo de incubação e os efeitos aleatórios genético aditivo e residual. As estimativas de herdabilidade variaram de 0,12 a 0,41. Evidenciou-se que os períodos anteriores e posteriores ao maior nível de produção apresentam maior variabilidade genética. As correlações genéticas entre os períodos de produção de ovos estudados variaram de -0,12 a 0,98. De modo geral, o padrão de variação foi semelhante entre as estratégias avaliadas, e todas foram geneticamente associadas com a produção total. Os resultados deste estudo mostraram que a melhoria da produção total é viável por meio de seleção de registros parciais. No entanto, caso se considere a eficiência relativa de seleção, o segundo mês e os períodos a partir da quadragésima semana de produção seriam os mais indicados.


The aim of this study was to estimate genetic parameters for partial and cumulative egg production in a commercial broiler female line. Ten monthly periods between 25 and 64 weeks, three partial periods of 25 to 32, 33 to 48 and 49 to 64 cumulative weeks and three periods of 25 to 30, 40 and 50 weeks of age and total egg production were considered. The restricted maximum likelihood method under the animal model was used to estimate the covariance components and genetic parameters. The fixed effect of incubation and the additive genetic and residual random effects were considered. The estimated heritability ranged from 0.12 to 0.41. These estimates showed that the anterior and posterior periods of the higher production have greater genetic variability. The genetic correlations between periods of the egg production studied ranged from -0.12 to 0.98. In general, the pattern of variation was similar between the strategies evaluated and all were genetically associated with the total egg production. The results of this study showed that the improvement of the total egg production is feasible by selection of partial records. However, considering the relative efficiency of selection, the second month and the periods from the fortieth week of production would be the most suitable.


Assuntos
Animais , Aves Domésticas/anatomia & histologia , Aves Domésticas/genética , Ovos , Carga Genética , Galinhas/genética , Anotação de Sequência Molecular/tendências , Linhagem , Fenótipo
8.
Nat Struct Mol Biol ; 22(1): 5-7, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25565026

RESUMO

Recent advances in RNA-sequencing technologies have led to the discovery of thousands of previously unannotated noncoding transcripts, including many long noncoding RNAs (lncRNAs) whose functions remain largely unknown. Here we discuss considerations and best practices in lncRNA identification and annotation, which we hope will foster functional and mechanistic exploration.


Assuntos
Regulação da Expressão Gênica , RNA não Traduzido/genética , RNA não Traduzido/fisiologia , Biologia Molecular/tendências , Anotação de Sequência Molecular/tendências
10.
Methods ; 79-80: 32-40, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25308971

RESUMO

As high throughput methods, such as whole genome genotyping arrays, whole exome sequencing (WES) and whole genome sequencing (WGS), have detected huge amounts of genetic variants associated with human diseases, function annotation of these variants is an indispensable step in understanding disease etiology. Large-scale functional genomics projects, such as The ENCODE Project and Roadmap Epigenomics Project, provide genome-wide profiling of functional elements across different human cell types and tissues. With the urgent demands for identification of disease-causal variants, comprehensive and easy-to-use annotation tool is highly in demand. Here we review and discuss current progress and trend of the variant annotation field. Furthermore, we introduce a comprehensive web portal for annotating human genetic variants. We use gene-based features and the latest functional genomics datasets to annotate single nucleotide variation (SNVs) in human, at whole genome scale. We further apply several function prediction algorithms to annotate SNVs that might affect different biological processes, including transcriptional gene regulation, alternative splicing, post-transcriptional regulation, translation and post-translational modifications. The SNVrap web portal is freely available at http://jjwanglab.org/snvrap.


Assuntos
Anotação de Sequência Molecular/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Processamento Alternativo , Regulação da Expressão Gênica , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular/tendências
11.
Proc Natl Acad Sci U S A ; 111(10): 3733-8, 2014 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-24567391

RESUMO

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular/tendências , Proteínas/química , Proteômica/tendências , Biologia Computacional , Anotação de Sequência Molecular/métodos , Especificidade da Espécie
15.
Nat Rev Genet ; 12(10): 703-14, 2011 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-21921926

RESUMO

Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.


Assuntos
Coleta de Dados/tendências , Haplótipos/genética , Sequência de Bases , Biologia Computacional/métodos , Biologia Computacional/tendências , Coleta de Dados/métodos , Bases de Dados Genéticas/tendências , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/tendências , Haplótipos/fisiologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Humanos , Anotação de Sequência Molecular/métodos , Anotação de Sequência Molecular/tendências , Polimorfismo de Nucleotídeo Único/fisiologia
16.
Nat Rev Genet ; 12(10): 671-82, 2011 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-21897427

RESUMO

Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies - along with some perspectives on transcriptome assembly in the near future.


Assuntos
Perfilação da Expressão Gênica/tendências , Animais , Sequência de Bases , Clonagem Molecular , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Humanos , Modelos Biológicos , Anotação de Sequência Molecular/métodos , Anotação de Sequência Molecular/tendências , Dados de Sequência Molecular , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/tendências , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/tendências
17.
Curr Protein Pept Sci ; 12(6): 503-7, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21787300

RESUMO

Evidence is accumulating that small open reading frames (sORF, <100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.


Assuntos
Códon/genética , Biologia Computacional/métodos , Anotação de Sequência Molecular/métodos , Fases de Leitura Aberta/genética , Biologia Computacional/tendências , Bases de Dados de Proteínas , Previsões , Anotação de Sequência Molecular/tendências , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética
19.
BMC Biol ; 8: 149, 2010 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-21176148

RESUMO

BACKGROUND: Discovery that the transcriptional output of the human genome is far more complex than predicted by the current set of protein-coding annotations and that most RNAs produced do not appear to encode proteins has transformed our understanding of genome complexity and suggests new paradigms of genome regulation. However, the fraction of all cellular RNA whose function we do not understand and the fraction of the genome that is utilized to produce that RNA remain controversial. This is not simply a bookkeeping issue because the degree to which this un-annotated transcription is present has important implications with respect to its biologic function and to the general architecture of genome regulation. For example, efforts to elucidate how non-coding RNAs (ncRNAs) regulate genome function will be compromised if that class of RNAs is dismissed as simply 'transcriptional noise'. RESULTS: We show that the relative mass of RNA whose function and/or structure we do not understand (the so called 'dark matter' RNAs), as a proportion of all non-ribosomal, non-mitochondrial human RNA (mt-RNA), can be greater than that of protein-encoding transcripts. This observation is obscured in studies that focus only on polyA-selected RNA, a method that enriches for protein coding RNAs and at the same time discards the vast majority of RNA prior to analysis. We further show the presence of a large number of very long, abundantly-transcribed regions (100's of kb) in intergenic space and further show that expression of these regions is associated with neoplastic transformation. These overlap some regions found previously in normal human embryonic tissues and raises an interesting hypothesis as to the function of these ncRNAs in both early development and neoplastic transformation. CONCLUSIONS: We conclude that 'dark matter' RNA can constitute the majority of non-ribosomal, non-mitochondrial-RNA and a significant fraction arises from numerous very long, intergenic transcribed regions that could be involved in neoplastic transformation.


Assuntos
Genoma Humano , Anotação de Sequência Molecular/normas , RNA Nuclear/genética , Adolescente , Animais , Neoplasias Ósseas/genética , Neoplasias Ósseas/metabolismo , Neoplasias Ósseas/patologia , Encéfalo/metabolismo , Drosophila/genética , Genoma Humano/genética , Genoma de Inseto , Humanos , Células K562 , Bases de Conhecimento , Fígado/metabolismo , Anotação de Sequência Molecular/tendências , Metástase Neoplásica/genética , RNA/genética , RNA Mitocondrial , RNA Nuclear/metabolismo , RNA Ribossômico/genética , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patologia , Análise de Sequência de RNA/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...