Pesquisa | Secretaria de Estado da Saúde

1.

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Morales, Joannella; Pujar, Shashikant; Loveland, Jane E; Astashyn, Alex; Bennett, Ruth; Berry, Andrew; Cox, Eric; Davidson, Claire; Ermolaeva, Olga; Farrell, Catherine M; Fatima, Reham; Gil, Laurent; Goldfarb, Tamara; Gonzalez, Jose M; Haddad, Diana; Hardy, Matthew; Hunt, Toby; Jackson, John; Joardar, Vinita S; Kay, Michael; Kodali, Vamsi K; McGarvey, Kelly M; McMahon, Aoife; Mudge, Jonathan M; Murphy, Daniel N; Murphy, Michael R; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Thibaud-Nissen, Françoise; Threadgold, Glen; Vatsan, Anjana R; Wallin, Craig; Webb, David; Flicek, Paul; Birney, Ewan; Pruitt, Kim D; Frankish, Adam; Cunningham, Fiona; Murphy, Terence D.

Nature ; 604(7905): 310-315, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35388217

RESUMO

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.

Assuntos

Biologia Computacional , Bases de Dados Genéticas , Genômica , Genoma , Humanos , Disseminação de Informação , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos

2.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Beck, Jeff; Bolton, Evan E; Brister, J Rodney; Chan, Jessica; Comeau, Donald C; Connor, Ryan; DiCuccio, Michael; Farrell, Catherine M; Feldgarden, Michael; Fine, Anna M; Funk, Kathryn; Hatcher, Eneida; Hoeppner, Marilu; Kane, Megan; Kannan, Sivakumar; Katz, Kenneth S; Kelly, Christopher; Klimke, William; Kim, Sunghwan; Kimchi, Avi; Landrum, Melissa; Lathrop, Stacy; Lu, Zhiyong; Malheiro, Adriana; Marchler-Bauer, Aron; Murphy, Terence D; Phan, Lon; Prasad, Arjun B; Pujar, Shashikant; Sawyer, Amanda; Schmieder, Erin; Schneider, Valerie A; Schoch, Conrad L; Sharma, Shobha; Thibaud-Nissen, Françoise; Trawick, Barton W; Venkatapathi, Thilakam; Wang, Jiyao; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37994677

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , National Library of Medicine (U.S.) , Biotecnologia/instrumentação , Bases de Dados de Ácidos Nucleicos , Internet , Estados Unidos

3.

RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse.

Farrell, Catherine M; Goldfarb, Tamara; Rangwala, Sanjida H; Astashyn, Alexander; Ermolaeva, Olga D; Hem, Vichet; Katz, Kenneth S; Kodali, Vamsi K; Ludwig, Frank; Wallin, Craig L; Pruitt, Kim D; Murphy, Terence D.

Genome Res ; 32(1): 175-188, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34876495

RESUMO

Eukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature. The curated data set is comprised of richly annotated sequence records, descriptive records in the NCBI Gene database, reference genome feature annotation, and activity-based interactions between nongenic regions, target genes, and each other. The data set provides succinct functional details and transparent experimental evidence, leverages data from multiple experimental sources, is readily accessible and adaptable, and uses a flexible data model. The data have multiple uses for basic functional discovery, bioinformatics studies, genetic variant interpretation; as known positive controls for epigenomic data evaluation; and as reference standards for functional interactions. Comparisons to other gene regulatory data sets show that the RefSeqFE data set includes a wider range of feature types representing more areas of biology, but it is comparatively smaller and subject to data selection biases. RefSeqFEs thus provide an alternative and complementary resource for experimentally assayed functional elements, with future data set growth expected.

Assuntos

Biologia Computacional , Genoma , Animais , Bases de Dados Genéticas , Eucariotos/genética , Humanos , Camundongos , Padrões de Referência

4.

Database resources of the National Center for Biotechnology Information in 2023.

Sayers, Eric W; Bolton, Evan E; Brister, J Rodney; Canese, Kathi; Chan, Jessica; Comeau, Donald C; Farrell, Catherine M; Feldgarden, Michael; Fine, Anna M; Funk, Kathryn; Hatcher, Eneida; Kannan, Sivakumar; Kelly, Christopher; Kim, Sunghwan; Klimke, William; Landrum, Melissa J; Lathrop, Stacy; Lu, Zhiyong; Madden, Thomas L; Malheiro, Adriana; Marchler-Bauer, Aron; Murphy, Terence D; Phan, Lon; Pujar, Shashikant; Rangwala, Sanjida H; Schneider, Valerie A; Tse, Tony; Wang, Jiyao; Ye, Jian; Trawick, Barton W; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 51(D1): D29-D38, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36370100

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Estados Unidos , National Library of Medicine (U.S.) , Alinhamento de Sequência , Biotecnologia , Internet

5.

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D.

Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29126148

RESUMO

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.

Assuntos

Sequência Consenso , Bases de Dados Genéticas , Fases de Leitura Aberta , Animais , Curadoria de Dados/métodos , Curadoria de Dados/normas , Bases de Dados Genéticas/normas , Guias como Assunto , Humanos , Camundongos , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos , Interface Usuário-Computador

6.

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26553804

RESUMO

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

Assuntos

Bases de Dados Genéticas , Genômica , Animais , Bovinos , Perfilação da Expressão Gênica , Genoma Fúngico , Genoma Humano , Genoma Microbiano , Genoma de Planta , Genoma Viral , Genômica/normas , Humanos , Invertebrados/genética , Camundongos , Anotação de Sequência Molecular , Nematoides/genética , Filogenia , RNA Longo não Codificante/genética , Ratos , Padrões de Referência , Análise de Sequência de Proteína , Análise de Sequência de RNA , Vertebrados/genética

7.

RefSeq: an update on mammalian reference sequences.

Pruitt, Kim D; Brown, Garth R; Hiatt, Susan M; Thibaud-Nissen, Françoise; Astashyn, Alexander; Ermolaeva, Olga; Farrell, Catherine M; Hart, Jennifer; Landrum, Melissa J; McGarvey, Kelly M; Murphy, Michael R; O'Leary, Nuala A; Pujar, Shashikant; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Shkeda, Andrei; Sun, Hanzhen; Tamez, Pamela; Tully, Raymond E; Wallin, Craig; Webb, David; Weber, Janet; Wu, Wendy; DiCuccio, Michael; Kitts, Paul; Maglott, Donna R; Murphy, Terence D; Ostell, James M.

Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24259432

RESUMO

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

Assuntos

Bases de Dados Genéticas , Genômica , Mamíferos/genética , Animais , Eucariotos/genética , Éxons , Genoma , Genômica/normas , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética , RNA/química , Padrões de Referência

8.

Current status and new features of the Consensus Coding Sequence database.

Farrell, Catherine M; O'Leary, Nuala A; Harte, Rachel A; Loveland, Jane E; Wilming, Laurens G; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen M J; Aken, Bronwen; Hiatt, Susan M; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles A; Brown, Garth R; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike P; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian D; Hunt, Toby; Webb, David; Thomas, Mark; Tamez, Pamela; Rangwala, Sanjida H; McGarvey, Kelly M; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan M; Gonzalez, Jose M; Gilbert, James G R; Trevanion, Stephen J; Baertsch, Robert; Harrow, Jennifer L; Hubbard, Tim; Ostell, James M; Haussler, David; Pruitt, Kim D.

Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24217909

RESUMO

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

Assuntos

Bases de Dados Genéticas , Proteínas/genética , Animais , Éxons , Genômica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Sequência

9.

Mouse genome annotation by the RefSeq project.

McGarvey, Kelly M; Goldfarb, Tamara; Cox, Eric; Farrell, Catherine M; Gupta, Tripti; Joardar, Vinita S; Kodali, Vamsi K; Murphy, Michael R; O'Leary, Nuala A; Pujar, Shashikant; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Webb, David; Wright, Mathew W; Murphy, Terence D; Pruitt, Kim D.

Mamm Genome ; 26(9-10): 379-90, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26215545

RESUMO

Complete and accurate annotation of the mouse genome is critical to the advancement of research conducted on this important model organism. The National Center for Biotechnology Information (NCBI) develops and maintains many useful resources to assist the mouse research community. In particular, the reference sequence (RefSeq) database provides high-quality annotation of multiple mouse genome assemblies using a combinatorial approach that leverages computation, manual curation, and collaboration. Implementation of this conservative and rigorous approach, which focuses on representation of only full-length and non-redundant data, produces high-quality annotation products. RefSeq records explicitly link sequences to current knowledge in a timely manner, updating public records regularly and rapidly in response to nomenclature updates, addition of new relevant publications, collaborator discussion, and user feedback. Whole genome re-annotation is also conducted at least every 12-18 months, and often more frequently in response to assembly updates or availability of informative data. This article highlights key features and advantages of RefSeq genome annotation products and presents an overview of NCBI processes to generate these data. Further discussion of NCBI's resources highlights useful features and the best methods for accessing our data.

Assuntos

Sequência de Aminoácidos/genética , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genoma , Animais , Internet , Camundongos

10.

Conserved CTCF insulator elements flank the mouse and human beta-globin loci.

Farrell, Catherine M; West, Adam G; Felsenfeld, Gary.

Mol Cell Biol ; 22(11): 3820-31, 2002 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-11997516

RESUMO

A binding site for the transcription factor CTCF is responsible for enhancer-blocking activity in a variety of vertebrate insulators, including the insulators at the 5' and 3' chromatin boundaries of the chicken beta-globin locus. To date, no functional domain boundaries have been defined at mammalian beta-globin loci, which are embedded within arrays of functional olfactory receptor genes. In an attempt to define boundary elements that could separate these gene clusters, CTCF-binding sites were searched for at the most distal DNase I-hypersensitive sites (HSs) of the mouse and human beta-globin loci. Conserved CTCF sites were found at 5'HS5 and 3'HS1 of both loci. All of these sites could bind to CTCF in vitro. The sites also functioned as insulators in enhancer-blocking assays at levels correlating with CTCF-binding affinity, although enhancer-blocking activity was weak with the mouse 5'HS5 site. These results show that with respect to enhancer-blocking elements, the architecture of the mouse and human beta-globin loci is similar to that found previously for the chicken beta-globin locus. Unlike the chicken locus, the mouse and human beta-globin loci do not have nearby transitions in chromatin structure but the data suggest that 3'HS1 and 5'HS5 may function as insulators that prevent inappropriate interactions between beta-globin regulatory elements and those of neighboring domains or subdomains, many of which possess strong enhancers.

Assuntos

Proteínas de Ligação a DNA/genética , Globinas/genética , Proteínas Repressoras , Fatores de Transcrição/genética , Animais , Sequência de Bases , Sítios de Ligação/genética , Fator de Ligação a CCCTC , Linhagem Celular , Galinhas , Sequência Conservada , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Elementos Facilitadores Genéticos , Humanos , Técnicas In Vitro , Camundongos , Modelos Genéticos , Dados de Sequência Molecular , Família Multigênica , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie , Fatores de Transcrição/metabolismo

11.

A complex chromatin landscape revealed by patterns of nuclease sensitivity and histone modification within the mouse beta-globin locus.

Bulger, Michael; Schübeler, Dirk; Bender, M A; Hamilton, Joan; Farrell, Catherine M; Hardison, Ross C; Groudine, Mark.

Mol Cell Biol ; 23(15): 5234-44, 2003 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-12861010

RESUMO

In order to create an extended map of chromatin features within a mammalian multigene locus, we have determined the extent of nuclease sensitivity and the pattern of histone modifications associated with the mouse beta-globin genes in adult erythroid tissue. We show that the nuclease-sensitive domain encompasses the beta-globin genes along with several flanking olfactory receptor genes that are inactive in erythroid cells. We describe enhancer-blocking or boundary elements on either side of the locus that are bound in vivo by the transcription factor CTCF, but we found that they do not coincide with transitions in nuclease sensitivity flanking the locus or with patterns of histone modifications within it. In addition, histone hyperacetylation and dimethylation of histone H3 K4 are not uniform features of the nuclease-sensitive mouse beta-globin domain but rather define distinct subdomains within it. Our results reveal a complex chromatin landscape for the active beta-globin locus and illustrate the complexity of broad structural changes that accompany gene activation.

Assuntos

Cromatina/metabolismo , Desoxirribonucleases/metabolismo , Globinas/genética , Histonas/metabolismo , Proteínas Repressoras , Animais , Sequência de Bases , Sítios de Ligação , Southern Blotting , Fator de Ligação a CCCTC , Núcleo Celular/metabolismo , Galinhas , Proteínas de Ligação a DNA/metabolismo , Eritrócitos/metabolismo , Globinas/metabolismo , Humanos , Células K562 , Camundongos , Modelos Genéticos , Dados de Sequência Molecular , Testes de Precipitina , Estrutura Terciária de Proteína , Baço/metabolismo , Fatores de Transcrição/metabolismo

12.

Genomic domains and regulatory elements operating at the domain level.

Razin, Sergey V; Farrell, Catherine M; Recillas-Targa, Félix.

Int Rev Cytol ; 226: 63-125, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12921236

RESUMO

The sequencing of the complete genomes of several organisms, including humans, has so far not contributed much to our understanding of the mechanisms regulating gene expression in the course of realization of developmental programs. In this so-called "postgenomic" era, we still do not understand how (if at all) the long-range organization of the genome is related to its function. The domain hypothesis of the eukaryotic genome organization postulates that the genome is subdivided into a number of semiindependent functional units (domains) that may include one or several functionally related genes, with these domains having well-defined borders, and operate under the control of special (domain-level) regulatory systems. This hypothesis was extensively discussed in the literature over the past 15 years. Yet it is still unclear whether the hypothesis is valid or not. There is evidence both supporting and questioning this hypothesis. The most conclusive data supporting the domain hypothesis come from studies of avian and mammalian beta-globin domains. In this review we will critically discuss the present state of the studies on these and other genomic domains, paying special attention to the domain-level regulatory systems known as locus control regions (LCRs). Based on this discussion, we will try to reevaluate the domain hypothesis of the organization of the eukaryotic genome.

Assuntos

DNA/genética , Regulação da Expressão Gênica , Genes Reguladores/genética , Genoma , Animais , Humanos

13.

Tracking and coordinating an international curation effort for the CCDS Project.

Harte, Rachel A; Farrell, Catherine M; Loveland, Jane E; Suner, Marie-Marthe; Wilming, Laurens; Aken, Bronwen; Barrell, Daniel; Frankish, Adam; Wallin, Craig; Searle, Steve; Diekhans, Mark; Harrow, Jennifer; Pruitt, Kim D.

Database (Oxford) ; 2012: bas008, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22434842

RESUMO

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi.

Assuntos

Sequência Consenso , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , Animais , Humanos , Camundongos

14.

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Pruitt, Kim D; Harrow, Jennifer; Harte, Rachel A; Wallin, Craig; Diekhans, Mark; Maglott, Donna R; Searle, Steve; Farrell, Catherine M; Loveland, Jane E; Ruef, Barbara J; Hart, Elizabeth; Suner, Marie-Marthe; Landrum, Melissa J; Aken, Bronwen; Ayling, Sarah; Baertsch, Robert; Fernandez-Banet, Julio; Cherry, Joshua L; Curwen, Val; Dicuccio, Michael; Kellis, Manolis; Lee, Jennifer; Lin, Michael F; Schuster, Michael; Shkeda, Andrew; Amid, Clara; Brown, Garth; Dukhanina, Oksana; Frankish, Adam; Hart, Jennifer; Maidak, Bonnie L; Mudge, Jonathan; Murphy, Michael R; Murphy, Terence; Rajan, Jeena; Rajput, Bhanu; Riddick, Lillian D; Snow, Catherine; Steward, Charles; Webb, David; Weber, Janet A; Wilming, Laurens; Wu, Wenyu; Birney, Ewan; Haussler, David; Hubbard, Tim; Ostell, James; Durbin, Richard; Lipman, David.

Genome Res ; 19(7): 1316-23, 2009 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-19498102

RESUMO

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

Assuntos

Sequência Consenso , Genoma , Fases de Leitura Aberta/genética , Animais , Humanos , Camundongos , Alinhamento de Sequência

15.

Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns.

Roh, Tae-young; Wei, Gang; Farrell, Catherine M; Zhao, Keji.

Genome Res ; 17(1): 74-81, 2007 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-17135569

RESUMO

Comparative genomic studies have been useful in identifying transcriptional regulatory elements in higher eukaryotic genomes, but many important regulatory elements cannot be detected by such analyses due to evolutionary variations and alignment tool limitations. Therefore, in this study we exploit the highly conserved nature of epigenetic modifications to identify potential transcriptional enhancers. By using a high-resolution genome-wide mapping technique, which combines the chromatin immunoprecipitation and serial analysis of gene expression assays, we have recently determined the distribution of lysine 9/14-diacetylated histone H3 in human T cells. We showed the existence of 46,813 regions with clusters of histone acetylation, termed histone acetylation islands, some of which correspond to known transcriptional regulatory elements. In the present study, we find that 4679 sequences conserved between human and pufferfish coincide with histone acetylation islands, and random sampling shows that 33% (13/39) of these can function as transcriptional enhancers in human Jurkat T cells. In addition, by comparing the human histone acetylation island sequences with mouse genome sequences, we find that despite the conservation of many of these regions between these species, 21,855 of these sequences are not conserved. Furthermore, we demonstrate that about 50% (26/51) of these nonconserved sequences have enhancer activity in Jurkat cells, and that many of the orthologous mouse sequences also have enhancer activity in addition to conserved epigenetic modification patterns in mouse T-cell chromatin. Therefore, by combining epigenetic modification and sequence data, we have established a novel genome-wide method for identifying regulatory elements not discernable by comparative genomics alone.

Assuntos

Elementos Facilitadores Genéticos , Genoma , Histonas/metabolismo , Linfócitos T/metabolismo , Acetilação , Animais , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Sequência Conservada , Perfilação da Expressão Gênica , Genoma Humano , Genômica , Humanos , Células Jurkat , Camundongos , Takifugu , Transfecção

16.

Prospects and implications of using chromatin insulators in gene therapy and transgenesis.

Recillas-Targa, Félix; Valadez-Graham, Viviana; Farrell, Catherine M.

Bioessays ; 26(7): 796-807, 2004 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-15221861

RESUMO

Gene therapy has emerged from the idea of inserting a wild-type copy of a gene in order to restore the proper expression and function of a damaged gene. Initial efforts have focused on finding the proper vector and delivery method to introduce a corrected gene to the affected tissue or cell type. Even though these first attempts are clearly promising, several problems remain unsolved. A major problem is the influence of chromatin structure on transgene expression. To overcome chromatin-dependent repressive transgenic states, researchers have begun to use chromatin regulatory elements to drive transgene expression. Insulators or chromatin boundaries are able to protect a transgene against chromatin position effects at their genomic integration sites, and they are able to maintain transgene expression for long periods of time. Therefore, these elements may be very useful tools in gene therapy applications for ensuring high-level and stable expression of transgenes.

Assuntos

Cromatina/fisiologia , Técnicas de Transferência de Genes , Terapia Genética/métodos , Elementos Isolantes/genética , Animais , Cromatina/genética , Vetores Genéticos/genética , Humanos , Transgenes/genética

17.

The barrier function of an insulator couples high histone acetylation levels with specific protection of promoter DNA from methylation.

Mutskov, Vesco J; Farrell, Catherine M; Wade, Paul A; Wolffe, Alan P; Felsenfeld, Gary.

Genes Dev ; 16(12): 1540-54, 2002 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-12080092

RESUMO

Stably integrated transgenes flanked by the chicken beta-globin HS4 insulator are protected against chromosomal position effects and gradual extinction of expression during long-term propagation in culture. To investigate the mechanism of action of this insulator, we used bisulfite genomic sequencing to examine the methylation of individual CpG sites within insulated transgenes, and compared this with patterns of histone acetylation. Surprisingly, although the histones of the entire insulated transgene are highly acetylated, only a specific region in the promoter, containing binding sites for erythroid-specific transcription factors, is highly protected from DNA methylation. This critical region is methylated in noninsulated and inactive lines. MBD3 and Mi-2, subunits of the Mi-2/NuRD repressor complex, are bound in vivo to these silenced noninsulated transgenes. In contrast, insulated cell lines do not show any enrichment of Mi-2/NuRD proteins very late in culture. In addition to the high levels of histone acetylation observed across the entire insulated transgene, significant peaks of H3 acetylation are present over the HS4 insulator elements. Targeted histone acetylation by the chicken beta-globin insulator occurs independently of gene transcription and does not require the presence of a functional enhancer. We suggest that this acetylation is in turn responsible for the maintenance of a region of unmethylated DNA over the promoter. Whereas DNA methylation often leads to histone deacetylation, here acetylation appears to prevent methylation.

Assuntos

Adenosina Trifosfatases , DNA Helicases , Histonas/metabolismo , Acetilação , Animais , Autoantígenos/metabolismo , Western Blotting , Linhagem Celular , Separação Celular , Galinhas , Ilhas de CpG , Metilação de DNA , DNA Complementar/metabolismo , Proteínas de Ligação a DNA/metabolismo , Citometria de Fluxo , Inativação Gênica , Globinas/metabolismo , Histona Desacetilases/metabolismo , Complexo Mi-2 de Remodelação de Nucleossomo e Desacetilase , Modelos Genéticos , Plasmídeos/metabolismo , Reação em Cadeia da Polimerase , Regiões Promotoras Genéticas , Ligação Proteica , Estrutura Terciária de Proteína , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA , Fatores de Tempo , Fatores de Transcrição/metabolismo , Transcrição Gênica , Transgenes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa