RESUMEN
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genómica , Genoma , Humanos , Difusión de la Información , Anotación de Secuencia Molecular , National Library of Medicine (U.S.) , Estados UnidosRESUMEN
Eukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature. The curated data set is comprised of richly annotated sequence records, descriptive records in the NCBI Gene database, reference genome feature annotation, and activity-based interactions between nongenic regions, target genes, and each other. The data set provides succinct functional details and transparent experimental evidence, leverages data from multiple experimental sources, is readily accessible and adaptable, and uses a flexible data model. The data have multiple uses for basic functional discovery, bioinformatics studies, genetic variant interpretation; as known positive controls for epigenomic data evaluation; and as reference standards for functional interactions. Comparisons to other gene regulatory data sets show that the RefSeqFE data set includes a wider range of feature types representing more areas of biology, but it is comparatively smaller and subject to data selection biases. RefSeqFEs thus provide an alternative and complementary resource for experimentally assayed functional elements, with future data set growth expected.
Asunto(s)
Biología Computacional , Genoma , Animales , Bases de Datos Genéticas , Eucariontes/genética , Humanos , Ratones , Estándares de ReferenciaRESUMEN
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
Asunto(s)
Secuencia de Consenso , Bases de Datos Genéticas , Sistemas de Lectura Abierta , Animales , Curaduría de Datos/métodos , Curaduría de Datos/normas , Bases de Datos Genéticas/normas , Guías como Asunto , Humanos , Ratones , Anotación de Secuencia Molecular , National Library of Medicine (U.S.) , Estados Unidos , Interfaz Usuario-ComputadorRESUMEN
The perpetual arms race between bacteria and phage has resulted in the evolution of efficient resistance systems that protect bacteria from phage infection. Such systems, which include the CRISPR-Cas and restriction-modification systems, have proven to be invaluable in the biotechnology and dairy industries. Here, we report on a six-gene cassette in Bacillus cereus which, when integrated into the Bacillus subtilis genome, confers resistance to a broad range of phages, including both virulent and temperate ones. This cassette includes a putative Lon-like protease, an alkaline phosphatase domain protein, a putative RNA-binding protein, a DNA methylase, an ATPase-domain protein, and a protein of unknown function. We denote this novel defense system BREX (Bacteriophage Exclusion) and show that it allows phage adsorption but blocks phage DNA replication. Furthermore, our results suggest that methylation on non-palindromic TAGGAG motifs in the bacterial genome guides self/non-self discrimination and is essential for the defensive function of the BREX system. However, unlike restriction-modification systems, phage DNA does not appear to be cleaved or degraded by BREX, suggesting a novel mechanism of defense. Pan genomic analysis revealed that BREX and BREX-like systems, including the distantly related Pgl system described in Streptomyces coelicolor, are widely distributed in ~10% of all sequenced microbial genomes and can be divided into six coherent subtypes in which the gene composition and order is conserved. Finally, we detected a phage family that evades the BREX defense, implying that anti-BREX mechanisms may have evolved in some phages as part of their arms race with bacteria.
Asunto(s)
Bacillus subtilis/virología , Bacteriófagos/genética , Bacteriófagos/patogenicidad , Metilación de ADN , Metilasas de Modificación del ADN/genética , Genoma Microbiano , Virulencia/genética , Bacillus subtilis/genética , Bacteriófagos/crecimiento & desarrollo , Evolución Biológica , Metilasas de Modificación del ADN/metabolismo , ADN Bacteriano/genética , ADN Viral/genética , FilogeniaRESUMEN
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Bovinos , Perfilación de la Expresión Génica , Genoma Fúngico , Genoma Humano , Genoma Microbiano , Genoma de Planta , Genoma Viral , Genómica/normas , Humanos , Invertebrados/genética , Ratones , Anotación de Secuencia Molecular , Nematodos/genética , Filogenia , ARN Largo no Codificante/genética , Ratas , Estándares de Referencia , Análisis de Secuencia de Proteína , Análisis de Secuencia de ARN , Vertebrados/genéticaRESUMEN
Complete and accurate annotation of the mouse genome is critical to the advancement of research conducted on this important model organism. The National Center for Biotechnology Information (NCBI) develops and maintains many useful resources to assist the mouse research community. In particular, the reference sequence (RefSeq) database provides high-quality annotation of multiple mouse genome assemblies using a combinatorial approach that leverages computation, manual curation, and collaboration. Implementation of this conservative and rigorous approach, which focuses on representation of only full-length and non-redundant data, produces high-quality annotation products. RefSeq records explicitly link sequences to current knowledge in a timely manner, updating public records regularly and rapidly in response to nomenclature updates, addition of new relevant publications, collaborator discussion, and user feedback. Whole genome re-annotation is also conducted at least every 12-18 months, and often more frequently in response to assembly updates or availability of informative data. This article highlights key features and advantages of RefSeq genome annotation products and presents an overview of NCBI processes to generate these data. Further discussion of NCBI's resources highlights useful features and the best methods for accessing our data.
Asunto(s)
Secuencia de Aminoácidos/genética , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Genoma , Animales , Internet , RatonesRESUMEN
Recombination between homologous chromosomes of different parental origin (homologs) is necessary for their accurate segregation during meiosis. It has been suggested that meiotic inter-homolog recombination is promoted by a barrier to inter-sister-chromatid recombination, imposed by meiosis-specific components of the chromosome axis. Consistent with this, measures of Holliday junction-containing recombination intermediates (joint molecules [JMs]) show a strong bias towards inter-homolog and against inter-sister JMs. However, recombination between sister chromatids also has an important role in meiosis. The genomes of diploid organisms in natural populations are highly polymorphic for insertions and deletions, and meiotic double-strand breaks (DSBs) that form within such polymorphic regions must be repaired by inter-sister recombination. Efforts to study inter-sister recombination during meiosis, in particular to determine recombination frequencies and mechanisms, have been constrained by the inability to monitor the products of inter-sister recombination. We present here molecular-level studies of inter-sister recombination during budding yeast meiosis. We examined events initiated by DSBs in regions that lack corresponding sequences on the homolog, and show that these DSBs are efficiently repaired by inter-sister recombination. This occurs with the same timing as inter-homolog recombination, but with reduced (2- to 3-fold) yields of JMs. Loss of the meiotic-chromosome-axis-associated kinase Mek1 accelerates inter-sister DSB repair and markedly increases inter-sister JM frequencies. Furthermore, inter-sister JMs formed in mek1Δ mutants are preferentially lost, while inter-homolog JMs are maintained. These findings indicate that inter-sister recombination occurs frequently during budding yeast meiosis, with the possibility that up to one-third of all recombination events occur between sister chromatids. We suggest that a Mek1-dependent reduction in the rate of inter-sister repair, combined with the destabilization of inter-sister JMs, promotes inter-homolog recombination while retaining the capacity for inter-sister recombination when inter-homolog recombination is not possible.
Asunto(s)
Cromátides/genética , Roturas del ADN de Doble Cadena , Reparación del ADN , Meiosis/genética , Saccharomyces cerevisiae/genética , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMEN
Efficient repair of DNA double-strand breaks (DSBs) requires the coordination of checkpoint signaling and enzymatic repair functions. To study these processes during gene conversion at a single chromosomal break, we monitored mating-type switching in Saccharomyces cerevisiae strains defective in the Rad1-Rad10-Slx4 complex. Rad1-Rad10 is a structure-specific endonuclease that removes 3' nonhomologous single-stranded ends that are generated during many recombination events. Slx4 is a known target of the DNA damage response that forms a complex with Rad1-Rad10 and is critical for 3'-end processing during repair of DSBs by single-strand annealing. We found that mutants lacking an intact Rad1-Rad10-Slx4 complex displayed RAD9- and MAD2-dependent cell cycle delays and decreased viability during mating-type switching. In particular, these mutants exhibited a unique pattern of dead and switched daughter cells arising from the same DSB-containing cell. Furthermore, we observed that mutations in post-replicative lesion bypass factors (mms2Delta, mph1Delta) resulted in decreased viability during mating-type switching and conferred shorter cell cycle delays in rad1Delta mutants. We conclude that Rad1-Rad10-Slx4 promotes efficient repair during gene conversion events involving a single 3' nonhomologous tail and propose that the rad1Delta and slx4Delta mutant phenotypes result from inefficient repair of a lesion at the MAT locus that is bypassed by replication-mediated repair.
Asunto(s)
Reparación del ADN , Proteínas de Unión al ADN/genética , Endodesoxirribonucleasas/genética , Endonucleasas/genética , Mutación , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Ciclo Celular , Supervivencia Celular , Roturas del ADN de Doble Cadena , Daño del ADN , Enzimas Reparadoras del ADN , Proteínas de Unión al ADN/metabolismo , Endodesoxirribonucleasas/metabolismo , Endonucleasas/metabolismo , Modelos Genéticos , Proteína 2 Homóloga a MutS/análisis , Proteína 2 Homóloga a MutS/metabolismo , Fenotipo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/análisis , Proteínas de Saccharomyces cerevisiae/metabolismo , Endonucleasas Específicas del ADN y ARN con un Solo FilamentoRESUMEN
The Saccharomyces cerevisiae mismatch repair (MMR) protein MSH6 and the SGS1 helicase were recently shown to play similarly important roles in preventing recombination between divergent DNA sequences in a single-strand annealing (SSA) assay. In contrast, MMR factors such as Mlh1p, Pms1p, and Exo1p were shown to not be required or to play only minimal roles. In this study we tested mutations that disrupt Sgs1p helicase activity, Msh2p-Msh6p mismatch recognition, and ATP binding and hydrolysis activities for their effect on preventing recombination between divergent DNA sequences (heteroduplex rejection) during SSA. The results support a model in which the Msh proteins act with Sgs1p to unwind DNA recombination intermediates containing mismatches. Importantly, msh2 mutants that displayed separation-of-function phenotypes with respect to nonhomologous tail removal during SSA and heteroduplex rejection were characterized. These studies suggest that nonhomologous tail removal is a separate function of Msh proteins that is likely to involve a distinct DNA binding activity. The involvement of Sgs1p in heteroduplex rejection but not nonhomologous tail removal further illustrates that subsets of MMR proteins collaborate with factors in different DNA repair pathways to maintain genome stability.
Asunto(s)
Disparidad de Par Base , Reparación del ADN/genética , Ácidos Nucleicos Heterodúplex/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Adenosina Trifosfato/metabolismo , ADN Helicasas/genética , ADN Helicasas/metabolismo , Replicación del ADN , ADN de Hongos , Modelos Genéticos , Mutación , Recombinación Genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMEN
Chromatin immunoprecipitation is a technique that allows one to examine the in vivo localization of proteins to DNA. This technique is well suited for studying genetic recombination since it can provide both a temporal and spatial assessment of the dynamic association of proteins with DNA in both wild-type and mutant backgrounds. To perform this procedure, cells undergoing a synchronous recombination event are treated with a crosslinking agent. Following cell lysis and shearing of the DNA, immunoprecipitation is used to isolate the protein of interest, along with any DNA that is crosslinked to the protein. Polymerase chain reaction (PCR) is then used to determine the relative amounts of DNA associated with the protein of interest throughout the recombination event. This in vivo chemical crosslinking technique can be used to localize proteins to both double-strand breaks and recombination intermediates.
Asunto(s)
Cromatina/genética , Recombinación Genética , Saccharomyces cerevisiae/genética , Cromatina/aislamiento & purificación , Cromosomas Fúngicos/genética , Cruzamientos Genéticos , ADN de Hongos/genética , ADN de Hongos/aislamiento & purificación , Indicadores y Reactivos , Modelos Genéticos , Reacción en Cadena de la Polimerasa/métodosRESUMEN
Recombination between moderately divergent DNA sequences is impaired compared with identical sequences. In yeast, an HO endonuclease-induced double-strand break can be repaired by single-strand annealing (SSA) between flanking homologous sequences. A 3% sequence divergence between 205-bp sequences flanking the double-strand break caused a 6-fold reduction in repair compared with identical sequences. This reduction in heteroduplex rejection was suppressed in a mismatch repair-defective msh6 Delta strain and partially suppressed in an msh2 separation-of-function mutant. In mlh1 Delta strains, heteroduplex rejection was greater than in msh6 Delta strains but less than in wild type. Deleting PMS1, MLH2,or MLH3 had no effect on heteroduplex rejection, but a pms1 Delta mlh2 Delta mlh3 Delta triple mutant resembled mlh1 Delta. However, correction of the mismatches within heteroduplex SSA intermediates required PMS1 and MLH1 to the same extent as MSH2 and MSH6. An SSA competition assay in which either diverged or identical repeats can be used for repair showed that heteroduplex DNA is likely to be unwound rather than degraded. This conclusion is supported by the finding that deleting the SGS1 helicase also suppressed heteroduplex rejection.