Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Más filtros










Intervalo de año de publicación
1.
Nature ; 622(7981): 41-47, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37794265

RESUMEN

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Asunto(s)
Genes , Genoma Humano , Anotación de Secuencia Molecular , Isoformas de Proteínas , Humanos , Genoma Humano/genética , Anotación de Secuencia Molecular/normas , Anotación de Secuencia Molecular/tendencias , Isoformas de Proteínas/genética , Proyecto Genoma Humano , Seudogenes , ARN/genética
2.
Cell Biol Toxicol ; 36(3): 261-272, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-31599373

RESUMEN

In the advanced stages, malignant melanoma (MM) has a very poor prognosis. Due to tremendous efforts in cancer research over the last 10 years, and the introduction of novel therapies such as targeted therapies and immunomodulators, the rather dark horizon of the median survival has dramatically changed from under 1 year to several years. With the advent of proteomics, deep-mining studies can reach low-abundant expression levels. The complexity of the proteome, however, still surpasses the dynamic range capabilities of current analytical techniques. Consequently, many predicted protein products with potential biological functions have not yet been verified in experimental proteomic data. This category of 'missing proteins' (MP) is comprised of all proteins that have been predicted but are currently unverified. As part of the initiative launched in 2016 in the USA, the European Cancer Moonshot Center has performed numerous deep proteomics analyses on samples from MM patients. In this study, nine MPs were clearly identified by mass spectrometry in MM metastases. Some MPs significantly correlated with proteins that possess identical PFAM structural domains; and other MPs were significantly associated with cancer-related proteins. This is the first study to our knowledge, where unknown and novel proteins have been annotated in metastatic melanoma tumour tissue.


Asunto(s)
Melanoma/genética , Metástasis de la Neoplasia/genética , Proteómica/métodos , Adulto , Biomarcadores de Tumor/genética , Femenino , Genoma Humano/genética , Humanos , Masculino , Persona de Mediana Edad , Anotación de Secuencia Molecular/métodos , Anotación de Secuencia Molecular/tendencias , Pronóstico , Proteoma/genética , Proteoma/metabolismo , Neoplasias Cutáneas/genética , Melanoma Cutáneo Maligno
3.
Genome Biol ; 20(1): 244, 2019 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-31744546

RESUMEN

BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


Asunto(s)
Anotación de Secuencia Molecular/tendencias , Animales , Biopelículas , Candida albicans/genética , Drosophila melanogaster/genética , Genoma Bacteriano , Genoma Fúngico , Humanos , Locomoción , Memoria a Largo Plazo , Anotación de Secuencia Molecular/métodos , Pseudomonas aeruginosa/genética
4.
Gigascience ; 7(8)2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-30107399

RESUMEN

Background: The Gene Ontology (GO) is one of the most widely used resources in molecular and cellular biology, largely through the use of "enrichment analysis." To facilitate informed use of GO, we present GOtrack (https://gotrack.msl.ubc.ca), which provides access to historical records and trends in the GO and GO annotations. Findings: GOtrack gives users access to gene- and term-level information on annotations for nine model organisms as well as an interactive tool that measures the stability of enrichment results over time for user-provided "hit lists" of genes. To document the effects of GO evolution on enrichment, we analyzed more than 2,500 published hit lists of human genes (most older than 9 years ); 53% of hit lists were considered to yield significantly stable enrichment results. Conclusions: Because stability is far from assured for any individual hit list, GOtrack can lead to more informed and cautious application of GO to genomics research.


Asunto(s)
Ontología de Genes/tendencias , Genómica/métodos , Anotación de Secuencia Molecular/tendencias , Animales , Eucariontes/genética , Humanos
5.
Microb Biotechnol ; 11(4): 588-605, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29806194

RESUMEN

Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.


Asunto(s)
Bacterias/genética , Genoma Bacteriano , Anotación de Secuencia Molecular/tendencias , Bacterias/clasificación , Bacterias/aislamiento & purificación , Macrodatos , Biología Computacional , Bases de Datos Genéticas , Anotación de Secuencia Molecular/métodos
7.
Arq. bras. med. vet. zootec ; 68(2): 489-496, mar.-abr. 2016. tab
Artículo en Portugués | LILACS | ID: lil-779784

RESUMEN

Objetivou-se com este estudo estimar parâmetros genéticos para produções parciais e acumuladas de ovos em uma linha fêmea de frangos de corte comercial. Foram considerados 10 períodos mensais entre 25 e 64 semanas, três períodos parciais de 25 a 32, 33 a 48 e 49 a 64 semanas, e três períodos acumulados de 25 até 30, 40 e 50 semanas de idade. Os componentes de covariância e parâmetros genéticos foram obtidos pelo método da máxima verossimilhança restrita, sob o modelo animal considerando o efeito fixo de incubação e os efeitos aleatórios genético aditivo e residual. As estimativas de herdabilidade variaram de 0,12 a 0,41. Evidenciou-se que os períodos anteriores e posteriores ao maior nível de produção apresentam maior variabilidade genética. As correlações genéticas entre os períodos de produção de ovos estudados variaram de -0,12 a 0,98. De modo geral, o padrão de variação foi semelhante entre as estratégias avaliadas, e todas foram geneticamente associadas com a produção total. Os resultados deste estudo mostraram que a melhoria da produção total é viável por meio de seleção de registros parciais. No entanto, caso se considere a eficiência relativa de seleção, o segundo mês e os períodos a partir da quadragésima semana de produção seriam os mais indicados.


The aim of this study was to estimate genetic parameters for partial and cumulative egg production in a commercial broiler female line. Ten monthly periods between 25 and 64 weeks, three partial periods of 25 to 32, 33 to 48 and 49 to 64 cumulative weeks and three periods of 25 to 30, 40 and 50 weeks of age and total egg production were considered. The restricted maximum likelihood method under the animal model was used to estimate the covariance components and genetic parameters. The fixed effect of incubation and the additive genetic and residual random effects were considered. The estimated heritability ranged from 0.12 to 0.41. These estimates showed that the anterior and posterior periods of the higher production have greater genetic variability. The genetic correlations between periods of the egg production studied ranged from -0.12 to 0.98. In general, the pattern of variation was similar between the strategies evaluated and all were genetically associated with the total egg production. The results of this study showed that the improvement of the total egg production is feasible by selection of partial records. However, considering the relative efficiency of selection, the second month and the periods from the fortieth week of production would be the most suitable.


Asunto(s)
Animales , Aves de Corral/anatomía & histología , Aves de Corral/genética , Huevos , Carga Genética , Pollos/genética , Anotación de Secuencia Molecular/tendencias , Linaje , Fenotipo
8.
Nat Struct Mol Biol ; 22(1): 5-7, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25565026

RESUMEN

Recent advances in RNA-sequencing technologies have led to the discovery of thousands of previously unannotated noncoding transcripts, including many long noncoding RNAs (lncRNAs) whose functions remain largely unknown. Here we discuss considerations and best practices in lncRNA identification and annotation, which we hope will foster functional and mechanistic exploration.


Asunto(s)
Regulación de la Expresión Génica , ARN no Traducido/genética , ARN no Traducido/fisiología , Biología Molecular/tendencias , Anotación de Secuencia Molecular/tendencias
10.
Methods ; 79-80: 32-40, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25308971

RESUMEN

As high throughput methods, such as whole genome genotyping arrays, whole exome sequencing (WES) and whole genome sequencing (WGS), have detected huge amounts of genetic variants associated with human diseases, function annotation of these variants is an indispensable step in understanding disease etiology. Large-scale functional genomics projects, such as The ENCODE Project and Roadmap Epigenomics Project, provide genome-wide profiling of functional elements across different human cell types and tissues. With the urgent demands for identification of disease-causal variants, comprehensive and easy-to-use annotation tool is highly in demand. Here we review and discuss current progress and trend of the variant annotation field. Furthermore, we introduce a comprehensive web portal for annotating human genetic variants. We use gene-based features and the latest functional genomics datasets to annotate single nucleotide variation (SNVs) in human, at whole genome scale. We further apply several function prediction algorithms to annotate SNVs that might affect different biological processes, including transcriptional gene regulation, alternative splicing, post-transcriptional regulation, translation and post-translational modifications. The SNVrap web portal is freely available at http://jjwanglab.org/snvrap.


Asunto(s)
Anotación de Secuencia Molecular/métodos , Polimorfismo de Nucleótido Simple , Algoritmos , Empalme Alternativo , Regulación de la Expresión Génica , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular/tendencias
11.
Proc Natl Acad Sci U S A ; 111(10): 3733-8, 2014 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-24567391

RESUMEN

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular/tendencias , Proteínas/química , Proteómica/tendencias , Biología Computacional , Anotación de Secuencia Molecular/métodos , Especificidad de la Especie
15.
Nat Rev Genet ; 12(10): 703-14, 2011 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-21921926

RESUMEN

Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.


Asunto(s)
Recolección de Datos/tendencias , Haplotipos/genética , Secuencia de Bases , Biología Computacional/métodos , Biología Computacional/tendencias , Recolección de Datos/métodos , Bases de Datos Genéticas/tendencias , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/tendencias , Haplotipos/fisiología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/tendencias , Humanos , Anotación de Secuencia Molecular/métodos , Anotación de Secuencia Molecular/tendencias , Polimorfismo de Nucleótido Simple/fisiología
16.
Nat Rev Genet ; 12(10): 671-82, 2011 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-21897427

RESUMEN

Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies - along with some perspectives on transcriptome assembly in the near future.


Asunto(s)
Perfilación de la Expresión Génica/tendencias , Animales , Secuencia de Bases , Clonación Molecular , Perfilación de la Expresión Génica/métodos , Biblioteca de Genes , Humanos , Modelos Biológicos , Anotación de Secuencia Molecular/métodos , Anotación de Secuencia Molecular/tendencias , Datos de Secuencia Molecular , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/tendencias , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/tendencias
17.
Curr Protein Pept Sci ; 12(6): 503-7, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21787300

RESUMEN

Evidence is accumulating that small open reading frames (sORF, <100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.


Asunto(s)
Codón/genética , Biología Computacional/métodos , Anotación de Secuencia Molecular/métodos , Sistemas de Lectura Abierta/genética , Biología Computacional/tendencias , Bases de Datos de Proteínas , Predicción , Anotación de Secuencia Molecular/tendencias , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética
19.
BMC Biol ; 8: 149, 2010 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-21176148

RESUMEN

BACKGROUND: Discovery that the transcriptional output of the human genome is far more complex than predicted by the current set of protein-coding annotations and that most RNAs produced do not appear to encode proteins has transformed our understanding of genome complexity and suggests new paradigms of genome regulation. However, the fraction of all cellular RNA whose function we do not understand and the fraction of the genome that is utilized to produce that RNA remain controversial. This is not simply a bookkeeping issue because the degree to which this un-annotated transcription is present has important implications with respect to its biologic function and to the general architecture of genome regulation. For example, efforts to elucidate how non-coding RNAs (ncRNAs) regulate genome function will be compromised if that class of RNAs is dismissed as simply 'transcriptional noise'. RESULTS: We show that the relative mass of RNA whose function and/or structure we do not understand (the so called 'dark matter' RNAs), as a proportion of all non-ribosomal, non-mitochondrial human RNA (mt-RNA), can be greater than that of protein-encoding transcripts. This observation is obscured in studies that focus only on polyA-selected RNA, a method that enriches for protein coding RNAs and at the same time discards the vast majority of RNA prior to analysis. We further show the presence of a large number of very long, abundantly-transcribed regions (100's of kb) in intergenic space and further show that expression of these regions is associated with neoplastic transformation. These overlap some regions found previously in normal human embryonic tissues and raises an interesting hypothesis as to the function of these ncRNAs in both early development and neoplastic transformation. CONCLUSIONS: We conclude that 'dark matter' RNA can constitute the majority of non-ribosomal, non-mitochondrial-RNA and a significant fraction arises from numerous very long, intergenic transcribed regions that could be involved in neoplastic transformation.


Asunto(s)
Genoma Humano , Anotación de Secuencia Molecular/normas , ARN Nuclear/genética , Adolescente , Animales , Neoplasias Óseas/genética , Neoplasias Óseas/metabolismo , Neoplasias Óseas/patología , Encéfalo/metabolismo , Drosophila/genética , Genoma Humano/genética , Genoma de los Insectos , Humanos , Células K562 , Bases del Conocimiento , Hígado/metabolismo , Anotación de Secuencia Molecular/tendencias , Metástasis de la Neoplasia/genética , ARN/genética , ARN Mitocondrial , ARN Nuclear/metabolismo , ARN Ribosómico/genética , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patología , Análisis de Secuencia de ARN/normas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...