Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
1.
Cell ; 133(4): 727-41, 2008 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-18485879

RESUMO

p53 and p19(ARF) are tumor suppressors frequently mutated in human tumors. In a high-throughput screen in mice for mutations collaborating with either p53 or p19(ARF) deficiency, we identified 10,806 retroviral insertion sites, implicating over 300 loci in tumorigenesis. This dataset reveals 20 genes that are specifically mutated in either p19(ARF)-deficient, p53-deficient or wild-type mice (including Flt3, mmu-mir-106a-363, Smg6, and Ccnd3), as well as networks of significant collaborative and mutually exclusive interactions between cancer genes. Furthermore, we found candidate tumor suppressor genes, as well as distinct clusters of insertions within genes like Flt3 and Notch1 that induce mutants with different spectra of genetic interactions. Cross species comparative analysis with aCGH data of human cancer cell lines revealed known and candidate oncogenes (Mmp13, Slamf6, and Rreb1) and tumor suppressors (Wwox and Arfrp2). This dataset should prove to be a rich resource for the study of genetic interactions that underlie tumorigenesis.


Assuntos
Inibidor p16 de Quinase Dependente de Ciclina/metabolismo , Redes Reguladoras de Genes , Genes Supressores de Tumor , Neoplasias/genética , Proteína Supressora de Tumor p53/metabolismo , Animais , Linhagem Celular Tumoral , Clonagem Molecular , Inibidor p16 de Quinase Dependente de Ciclina/genética , Genes p53 , Genômica/métodos , Humanos , Camundongos , Camundongos Knockout , Mutagênese Insercional , Neoplasias/metabolismo , Análise de Sequência de DNA
2.
Br J Clin Pharmacol ; 88(10): 4297-4310, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34907575

RESUMO

Pharmacogenomics (PGx) relates to the study of genetic factors determining variability in drug response. Implementing PGx testing in paediatric patients can enhance drug safety, helping to improve drug efficacy or reduce the risk of toxicity. Despite its clinical relevance, the implementation of PGx testing in paediatric practice to date has been variable and limited. As with most paediatric pharmacological studies, there are well-recognised barriers to obtaining high-quality PGx evidence, particularly when patient numbers may be small, and off-label or unlicensed prescribing remains widespread. Furthermore, trials enrolling small numbers of children can rarely, in isolation, provide sufficient PGx evidence to change clinical practice, so extrapolation from larger PGx studies in adult patients, where scientifically sound, is essential. This review paper discusses the relevance of PGx to paediatrics and considers implementation strategies from a child health perspective. Examples are provided from Canada, the Netherlands and the UK, with consideration of the different healthcare systems and their distinct approaches to implementation, followed by future recommendations based on these cumulative experiences. Improving the evidence base demonstrating the clinical utility and cost-effectiveness of paediatric PGx testing will be critical to drive implementation forwards. International, interdisciplinary collaborations will enhance paediatric data collation, interpretation and evidence curation, while also supporting dedicated paediatric PGx educational initiatives. PGx consortia and paediatric clinical research networks will continue to play a central role in the streamlined development of effective PGx implementation strategies to help optimise paediatric pharmacotherapy.


Assuntos
Pediatria , Testes Farmacogenômicos , Criança , Análise Custo-Benefício , Humanos , Países Baixos , Farmacogenética
3.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357393

RESUMO

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Pseudogenes/genética , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Software
4.
Dement Geriatr Cogn Disord ; 49(3): 295-302, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32854092

RESUMO

INTRODUCTION: Caregivers for people with dementia face a number of challenges such as changing family relationships, social isolation, or financial difficulties. Internet usage and social media are increasingly being recognised as resources to increase support and general public health. OBJECTIVE: Using automated analysis, the aim of this study was to explore (i) the age and sex of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) which subreddits authors are posting to, (iv) the types of messages posted, and (v) the content of these posts. METHODS: We analysed Reddit posts concerning dementia diagnoses and used a previously developed text analysis pipeline to determine attributes of the posts and their authors. The posts were further examined through manual annotation of the diagnosis provided and the person affected. Lastly, we investigated the communities posters engage with and assessed the contents of the posts with an automated topic gathering/clustering technique. RESULTS: Five hundred and thirty-five Reddit posts were identified as relevant and further processed. The majority of posters in our dataset are females and predominantly close relatives, such as parents and grandparents, are mentioned. The communities frequented and topics gathered reflect not only the person's diagnosis but also potential outcomes, for example hardships experienced by the caregiver or the requirement for legal support. CONCLUSIONS: This work demonstrates the value of social media data as a resource for in-depth examination of caregivers' experience after a dementia diagnosis. It is important to study groups actively posting online, both in topic-specific and general communities, as they are most likely to benefit from novel internet-based support systems or interventions.


Assuntos
Cuidadores/psicologia , Demência , Intervenção Baseada em Internet/estatística & dados numéricos , Mídias Sociais/estatística & dados numéricos , Apoio Social , Demência/diagnóstico , Demência/economia , Demência/psicologia , Relações Familiares , Estresse Financeiro , Humanos , Isolamento Social
5.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25164755

RESUMO

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNA
6.
Proc Natl Acad Sci U S A ; 111(37): 13361-6, 2014 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-25157146

RESUMO

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Filogenia , Pseudogenes/genética , Animais , Evolução Molecular , Estudos de Associação Genética , Humanos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Homologia de Sequência do Ácido Nucleico
7.
Proc Natl Acad Sci U S A ; 111(17): 6131-8, 2014 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-24753594

RESUMO

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.


Assuntos
DNA/genética , Genoma Humano/genética , Evolução Biológica , Doença/genética , Humanos , Sequências Reguladoras de Ácido Nucleico/genética , Software
8.
Nat Methods ; 10(12): 1177-84, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24185837

RESUMO

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.


Assuntos
Biologia Computacional/métodos , Splicing de RNA , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Caenorhabditis elegans , Drosophila melanogaster , Éxons , Perfilação da Expressão Gênica , Genoma , Humanos , Íntrons , Sítios de Splice de RNA , RNA Mensageiro/metabolismo , Software
9.
Nat Methods ; 10(12): 1185-91, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24185836

RESUMO

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.


Assuntos
Splicing de RNA , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Animais , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Éxons , Reações Falso-Positivas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Células K562 , Camundongos , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes , Software
10.
Bioinformatics ; 31(24): 4029-31, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26315906

RESUMO

UNLABELLED: High-throughput sequencing technologies survey genetic variation at genome scale and are increasingly used to study the contribution of rare and low-frequency genetic variants to human traits. As part of the Cohorts arm of the UK10K project, genetic variants called from low-read depth (average 7×) whole genome sequencing of 3621 cohort individuals were analysed for statistical associations with 64 different phenotypic traits of biomedical importance. Here, we describe a novel genome browser based on the Biodalliance platform developed to provide interactive access to the association results of the project. AVAILABILITY AND IMPLEMENTATION: The browser is available at http://www.uk10k.org/dalliance.html. Source code for the Biodalliance platform is available under a BSD license from http://github.com/dasmoth/dalliance, and for the LD-display plugin and backend from http://github.com/dasmoth/ldserv.


Assuntos
Estudos de Associação Genética , Variação Genética , Genoma Humano , Software , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Desequilíbrio de Ligação
11.
Nucleic Acids Res ; 42(Database issue): D749-55, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24316576

RESUMO

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Cordados/genética , Variação Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Fenótipo , Ratos
12.
Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24217909

RESUMO

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Assuntos
Bases de Dados Genéticas , Proteínas/genética , Animais , Éxons , Genômica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Sequência
13.
Genome Res ; 22(9): 1698-710, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955982

RESUMO

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.


Assuntos
Perfilação da Expressão Gênica/métodos , Genoma Humano , Transcriptoma , Biologia Computacional/métodos , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Anotação de Sequência Molecular , Fases de Leitura Aberta , Isoformas de RNA , RNA Mensageiro/química , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Sensibilidade e Especificidade
14.
Genome Res ; 22(9): 1775-89, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955988

RESUMO

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.


Assuntos
Bases de Dados Genéticas , RNA Longo não Codificante/genética , Processamento Alternativo , Animais , Núcleo Celular/genética , Núcleo Celular/metabolismo , Análise por Conglomerados , Evolução Molecular , Éxons , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Primatas/genética , Processamento Pós-Transcricional do RNA , Sítios de Splice de RNA , RNA Mensageiro/genética , Seleção Genética , Transcrição Gênica
15.
Genome Res ; 22(9): 1760-74, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955987

RESUMO

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não Traduzidas
17.
Nucleic Acids Res ; 41(Database issue): D48-55, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203987

RESUMO

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Regulação da Expressão Gênica , Variação Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Ratos , Software , Peixe-Zebra/genética
18.
Am J Med Genet C Semin Med Genet ; 166C(1): 93-104, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24634402

RESUMO

Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: (1) identify clinically valid genetic variants; (2) decide whether they are actionable and what the action should be; and (3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.


Assuntos
Variação Genética/genética , Informática Médica/métodos , Fenótipo , Medicina de Precisão/métodos , Educação , Humanos , Disseminação de Informação/métodos , National Human Genome Research Institute (U.S.) , Medicina de Precisão/tendências , Estados Unidos
19.
Genome Res ; 21(5): 756-67, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21460061

RESUMO

Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).


Assuntos
Processamento Alternativo , Genes , Peptídeos/genética , Proteômica/métodos , Pseudogenes/genética , Espectrometria de Massas em Tandem/métodos , Animais , Genoma , Genômica/métodos , Camundongos , Peptídeos/química
20.
Nucleic Acids Res ; 40(Database issue): D84-90, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22086963

RESUMO

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Regulação da Expressão Gênica , Variação Genética , Humanos , Camundongos , Anotação de Sequência Molecular , Ratos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA