Pesquisa | Biblioteca Virtual em Saúde

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Morales, Joannella; Pujar, Shashikant; Loveland, Jane E; Astashyn, Alex; Bennett, Ruth; Berry, Andrew; Cox, Eric; Davidson, Claire; Ermolaeva, Olga; Farrell, Catherine M; Fatima, Reham; Gil, Laurent; Goldfarb, Tamara; Gonzalez, Jose M; Haddad, Diana; Hardy, Matthew; Hunt, Toby; Jackson, John; Joardar, Vinita S; Kay, Michael; Kodali, Vamsi K; McGarvey, Kelly M; McMahon, Aoife; Mudge, Jonathan M; Murphy, Daniel N; Murphy, Michael R; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Thibaud-Nissen, Françoise; Threadgold, Glen; Vatsan, Anjana R; Wallin, Craig; Webb, David; Flicek, Paul; Birney, Ewan; Pruitt, Kim D; Frankish, Adam; Cunningham, Fiona; Murphy, Terence D.

Nature ; 604(7905): 310-315, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35388217

RESUMO

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.

Assuntos

Biologia Computacional , Bases de Dados Genéticas , Genômica , Genoma , Humanos , Disseminação de Informação , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26553804

RESUMO

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

Assuntos

Bases de Dados Genéticas , Genômica , Animais , Bovinos , Perfilação da Expressão Gênica , Genoma Fúngico , Genoma Humano , Genoma Microbiano , Genoma de Planta , Genoma Viral , Genômica/normas , Humanos , Invertebrados/genética , Camundongos , Anotação de Sequência Molecular , Nematoides/genética , Filogenia , RNA Longo não Codificante/genética , Ratos , Padrões de Referência , Análise de Sequência de Proteína , Análise de Sequência de RNA , Vertebrados/genética

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA