Pesquisa | Portal de Pesquisa da BVS

1.

RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.

Haft, Daniel H; Badretdin, Azat; Coulouris, George; DiCuccio, Michael; Durkin, A Scott; Jovenitti, Eric; Li, Wenjun; Mersha, Megdelawit; O'Neill, Kathleen R; Virothaisakun, Joel; Thibaud-Nissen, Françoise.

Nucleic Acids Res ; 52(D1): D762-D769, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37962425

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.

Assuntos

Archaea , Bactérias , Bases de Dados de Ácidos Nucleicos , Metagenoma , Archaea/genética , Bactérias/genética , Bases de Dados de Ácidos Nucleicos/normas , Bases de Dados de Ácidos Nucleicos/tendências , Genoma Arqueal/genética , Genoma Bacteriano/genética , Internet , Anotação de Sequência Molecular , Proteínas/genética

2.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Beck, Jeff; Bolton, Evan E; Brister, J Rodney; Chan, Jessica; Comeau, Donald C; Connor, Ryan; DiCuccio, Michael; Farrell, Catherine M; Feldgarden, Michael; Fine, Anna M; Funk, Kathryn; Hatcher, Eneida; Hoeppner, Marilu; Kane, Megan; Kannan, Sivakumar; Katz, Kenneth S; Kelly, Christopher; Klimke, William; Kim, Sunghwan; Kimchi, Avi; Landrum, Melissa; Lathrop, Stacy; Lu, Zhiyong; Malheiro, Adriana; Marchler-Bauer, Aron; Murphy, Terence D; Phan, Lon; Prasad, Arjun B; Pujar, Shashikant; Sawyer, Amanda; Schmieder, Erin; Schneider, Valerie A; Schoch, Conrad L; Sharma, Shobha; Thibaud-Nissen, Françoise; Trawick, Barton W; Venkatapathi, Thilakam; Wang, Jiyao; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37994677

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , National Library of Medicine (U.S.) , Biotecnologia/instrumentação , Bases de Dados de Ácidos Nucleicos , Internet , Estados Unidos

3.

Analysis of 3.5 million SARS-CoV-2 sequences reveals unique mutational trends with consistent nucleotide and codon frequencies.

Fumagalli, Sarah E; Padhiar, Nigam H; Meyer, Douglas; Katneni, Upendra; Bar, Haim; DiCuccio, Michael; Komar, Anton A; Kimchi-Sarfaty, Chava.

Virol J ; 20(1): 31, 2023 02 17.

Artigo em Inglês | MEDLINE | ID: mdl-36812119

RESUMO

BACKGROUND: Since the onset of the SARS-CoV-2 pandemic, bioinformatic analyses have been performed to understand the nucleotide and synonymous codon usage features and mutational patterns of the virus. However, comparatively few have attempted to perform such analyses on a considerably large cohort of viral genomes while organizing the plethora of available sequence data for a month-by-month analysis to observe changes over time. Here, we aimed to perform sequence composition and mutation analysis of SARS-CoV-2, separating sequences by gene, clade, and timepoints, and contrast the mutational profile of SARS-CoV-2 to other comparable RNA viruses. METHODS: Using a cleaned, filtered, and pre-aligned dataset of over 3.5 million sequences downloaded from the GISAID database, we computed nucleotide and codon usage statistics, including calculation of relative synonymous codon usage values. We then calculated codon adaptation index (CAI) changes and a nonsynonymous/synonymous mutation ratio (dN/dS) over time for our dataset. Finally, we compiled information on the types of mutations occurring for SARS-CoV-2 and other comparable RNA viruses, and generated heatmaps showing codon and nucleotide composition at high entropy positions along the Spike sequence. RESULTS: We show that nucleotide and codon usage metrics remain relatively consistent over the 32-month span, though there are significant differences between clades within each gene at various timepoints. CAI and dN/dS values vary substantially between different timepoints and different genes, with Spike gene on average showing both the highest CAI and dN/dS values. Mutational analysis showed that SARS-CoV-2 Spike has a higher proportion of nonsynonymous mutations than analogous genes in other RNA viruses, with nonsynonymous mutations outnumbering synonymous ones by up to 20:1. However, at several specific positions, synonymous mutations were overwhelmingly predominant. CONCLUSIONS: Our multifaceted analysis covering both the composition and mutation signature of SARS-CoV-2 gives valuable insight into the nucleotide frequency and codon usage heterogeneity of SARS-CoV-2 over time, and its unique mutational profile compared to other RNA viruses.

Assuntos

COVID-19 , Vírus de RNA , Humanos , SARS-CoV-2/genética , Nucleotídeos , COVID-19/genética , Códon , Mutação , Genoma Viral , Vírus de RNA/genética , Evolução Molecular

4.

Collection and curation of prokaryotic genome assemblies from type strains at NCBI.

Kannan, Sivakumar; Sharma, Shobha; Ciufo, Stacy; Clark, Karen; Turner, Seán; Kitts, Paul A; Schoch, Conrad L; DiCuccio, Michael; Kimchi, Avi.

Int J Syst Evol Microbiol ; 73(1)2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-36748495

RESUMO

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.

Assuntos

Bases de Dados de Ácidos Nucleicos , Ácidos Graxos , Análise de Sequência de DNA , Reprodutibilidade dos Testes , RNA Ribossômico 16S/genética , Filogenia , Composição de Bases , DNA Bacteriano/genética , Técnicas de Tipagem Bacteriana , Ácidos Graxos/química

5.

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Li, Wenjun; O'Neill, Kathleen R; Haft, Daniel H; DiCuccio, Michael; Chetvernin, Vyacheslav; Badretdin, Azat; Coulouris, George; Chitsaz, Farideh; Derbyshire, Myra K; Durkin, A Scott; Gonzales, Noreen R; Gwadz, Marc; Lanczycki, Christopher J; Song, James S; Thanki, Narmada; Wang, Jiyao; Yamashita, Roxanne A; Yang, Mingzhang; Zheng, Chanjuan; Marchler-Bauer, Aron; Thibaud-Nissen, Françoise.

Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33270901

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Proteínas/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Genômica/métodos , Internet , Proteínas/classificação , Interface Usuário-Computador

6.

Gene variants of coagulation related proteins that interact with SARS-CoV-2.

Holcomb, David; Alexaki, Aikaterini; Hernandez, Nancy; Hunt, Ryan; Laurie, Kyle; Kames, Jacob; Hamasaki-Katagiri, Nobuko; Komar, Anton A; DiCuccio, Michael; Kimchi-Sarfaty, Chava.

PLoS Comput Biol ; 17(3): e1008805, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33730015

RESUMO

Thrombosis is a recognized complication of Coronavirus disease of 2019 (COVID-19) and is often associated with poor prognosis. There is a well-recognized link between coagulation and inflammation, however, the extent of thrombotic events associated with COVID-19 warrants further investigation. Poly(A) Binding Protein Cytoplasmic 4 (PABPC4), Serine/Cysteine Proteinase Inhibitor Clade G Member 1 (SERPING1) and Vitamin K epOxide Reductase Complex subunit 1 (VKORC1), which are all proteins linked to coagulation, have been shown to interact with SARS proteins. We computationally examined the interaction of these with SARS-CoV-2 proteins and, in the case of VKORC1, we describe its binding to ORF7a in detail. We examined the occurrence of variants of each of these proteins across populations and interrogated their potential contribution to COVID-19 severity. Potential mechanisms, by which some of these variants may contribute to disease, are proposed. Some of these variants are prevalent in minority groups that are disproportionally affected by severe COVID-19. Therefore, we are proposing that further investigation around these variants may lead to better understanding of disease pathogenesis in minority groups and more informed therapeutic approaches.

Assuntos

Coagulação Sanguínea , Proteínas Sanguíneas/genética , COVID-19/metabolismo , Proteína Inibidora do Complemento C1/genética , Proteínas de Ligação a Poli(A)/genética , SARS-CoV-2/metabolismo , Vitamina K Epóxido Redutases/genética , Anticoagulantes/administração & dosagem , Proteínas Sanguíneas/metabolismo , COVID-19/fisiopatologia , COVID-19/virologia , Proteína Inibidora do Complemento C1/metabolismo , Estudo de Associação Genômica Ampla , Humanos , Modelos Moleculares , Mutação , Proteínas de Ligação a Poli(A)/metabolismo , Ligação Proteica , SARS-CoV-2/genética , Índice de Gravidade de Doença , Proteínas Virais/metabolismo , Vitamina K Epóxido Redutases/metabolismo , Varfarina/administração & dosagem

7.

RefSeq: an update on prokaryotic genome annotation and curation.

Haft, Daniel H; DiCuccio, Michael; Badretdin, Azat; Brover, Vyacheslav; Chetvernin, Vyacheslav; O'Neill, Kathleen; Li, Wenjun; Chitsaz, Farideh; Derbyshire, Myra K; Gonzales, Noreen R; Gwadz, Marc; Lu, Fu; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Zheng, Chanjuan; Thibaud-Nissen, Françoise; Geer, Lewis Y; Marchler-Bauer, Aron; Pruitt, Kim D.

Nucleic Acids Res ; 46(D1): D851-D860, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29112715

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

Assuntos

Curadoria de Dados , Bases de Dados de Ácidos Nucleicos , Genoma , Anotação de Sequência Molecular , Células Procarióticas , Archaea/genética , Bactérias/genética , Bases de Dados de Proteínas , Eucariotos/genética , Previsões , Humanos , Homologia de Sequência , Software , Vírus/genética

8.

Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI.

Ciufo, Stacy; Kannan, Sivakumar; Sharma, Shobha; Badretdin, Azat; Clark, Karen; Turner, Seán; Brover, Slava; Schoch, Conrad L; Kimchi, Avi; DiCuccio, Michael.

Int J Syst Evol Microbiol ; 68(7): 2386-2392, 2018 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-29792589

RESUMO

Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Nucleotídeos/genética , Filogenia , Composição de Bases , Células Procarióticas , Análise de Sequência de DNA

9.

NCBI prokaryotic genome annotation pipeline.

Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James.

Nucleic Acids Res ; 44(14): 6614-24, 2016 08 19.

Artigo em Inglês | MEDLINE | ID: mdl-27342282

RESUMO

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

Assuntos

Genoma Bacteriano , Anotação de Sequência Molecular , Células Procarióticas/metabolismo , Bactérias/genética , Proteínas de Bactérias/química , Bases de Dados de Ácidos Nucleicos , Genes Bacterianos

10.

Assembly: a resource for assembled genomes at NCBI.

Kitts, Paul A; Church, Deanna M; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D; Pruitt, Kim D; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D73-80, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26578580

RESUMO

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Genoma , Humanos , Internet , Camundongos

11.

RefSeq: an update on mammalian reference sequences.

Pruitt, Kim D; Brown, Garth R; Hiatt, Susan M; Thibaud-Nissen, Françoise; Astashyn, Alexander; Ermolaeva, Olga; Farrell, Catherine M; Hart, Jennifer; Landrum, Melissa J; McGarvey, Kelly M; Murphy, Michael R; O'Leary, Nuala A; Pujar, Shashikant; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Shkeda, Andrei; Sun, Hanzhen; Tamez, Pamela; Tully, Raymond E; Wallin, Craig; Webb, David; Weber, Janet; Wu, Wendy; DiCuccio, Michael; Kitts, Paul; Maglott, Donna R; Murphy, Terence D; Ostell, James M.

Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24259432

RESUMO

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

Assuntos

Bases de Dados Genéticas , Genômica , Mamíferos/genética , Animais , Eucariotos/genética , Éxons , Genoma , Genômica/normas , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética , RNA/química , Padrões de Referência

12.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 40(Database issue): D13-25, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22140104

RESUMO

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados como Assunto , Bases de Dados Genéticas , Bases de Dados de Proteínas , Expressão Gênica , Genômica , Internet , Modelos Moleculares , National Library of Medicine (U.S.) , Publicações Periódicas como Assunto , PubMed , Alinhamento de Sequência , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA , Bibliotecas de Moléculas Pequenas , Estados Unidos

13.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 39(Database issue): D38-51, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21097890

RESUMO

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , Bases de Dados de Proteínas , Expressão Gênica , Genômica , National Library of Medicine (U.S.) , Estrutura Terciária de Proteína , PubMed , Alinhamento de Sequência , Análise de Sequência de DNA , Análise de Sequência de RNA , Software , Integração de Sistemas , Estados Unidos

14.

Lineage-specific biology revealed by a finished genome assembly of the mouse.

Church, Deanna M; Goodstadt, Leo; Hillier, Ladeana W; Zody, Michael C; Goldstein, Steve; She, Xinwe; Bult, Carol J; Agarwala, Richa; Cherry, Joshua L; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E; Ponting, Chris P.

PLoS Biol ; 7(5): e1000112, 2009 May 05.

Artigo em Inglês | MEDLINE | ID: mdl-19468303

RESUMO

The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.

Assuntos

Biologia Computacional/métodos , Genoma/genética , Animais , Bases de Dados Genéticas , Duplicação Gênica , Genoma/fisiologia , Humanos , Camundongos

15.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; John Wilbur, W; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 38(Database issue): D5-16, 2010 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-19910364

RESUMO

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Algoritmos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Genoma Bacteriano , Genoma Viral , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , National Institutes of Health (U.S.) , National Library of Medicine (U.S.) , Software , Estados Unidos

16.

Protocol to identify host-viral protein interactions between coagulation-related proteins and their genetic variants with SARS-CoV-2 proteins.

Holcomb, David D; Jankowska, Katarzyna I; Hernandez, Nancy; Laurie, Kyle; Kames, Jacob; Hamasaki-Katagiri, Nobuko; Komar, Anton A; DiCuccio, Michael; Kimchi-Sarfaty, Chava.

STAR Protoc ; 3(3): 101648, 2022 09 16.

Artigo em Inglês | MEDLINE | ID: mdl-36052345

RESUMO

Here, we describe a bioinformatics pipeline that evaluates the interactions between coagulation-related proteins and genetic variants with SARS-CoV-2 proteins. This pipeline searches for host proteins that may bind to viral protein and identifies and scores the protein genetic variants to predict the disease pathogenesis in specific subpopulations. Additionally, it is able to find structurally similar motifs and identify potential binding sites within the host-viral protein complexes to unveil viral impact on regulated biological processes and/or host-protein impact on viral invasion or reproduction. For complete details on the use and execution of this protocol, please refer to Holcomb et al. (2021).

Assuntos

COVID-19 , SARS-CoV-2 , Sítios de Ligação , COVID-19/genética , Interações entre Hospedeiro e Microrganismos , Humanos , SARS-CoV-2/genética , Proteínas Virais/genética

17.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 37(Database issue): D5-15, 2009 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18940862

RESUMO

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , Expressão Gênica , Genes , Genômica , Genótipo , National Library of Medicine (U.S.) , Fenótipo , Estrutura Terciária de Proteína , Proteômica , PubMed , Homologia de Sequência , Integração de Sistemas , Estados Unidos

18.

In Silico Evaluation of Cyclophilin Inhibitors as Potential Treatment for SARS-CoV-2.

Laurie, Kyle; Holcomb, David; Kames, Jacob; Komar, Anton A; DiCuccio, Michael; Ibla, Juan C; Kimchi-Sarfaty, Chava.

Open Forum Infect Dis ; 8(6): ofab189, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34109257

RESUMO

BACKGROUND: The advent of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) provoked researchers to propose multiple antiviral strategies to improve patients' outcomes. Studies provide evidence that cyclosporine A (CsA) decreases SARS-CoV-2 replication in vitro and decreases mortality rates of coronavirus disease 2019 (COVID-19) patients. CsA binds cyclophilins, which isomerize prolines, affecting viral protein activity. METHODS: We investigated the proline composition from various coronavirus proteomes to identify proteins that may critically rely on cyclophilin's peptidyl-proline isomerase activity and found that the nucleocapsid (N) protein significantly depends on cyclophilin A (CyPA). We modeled CyPA and N protein interactions to demonstrate the N protein as a potential indirect therapeutic target of CsA, which we propose may impede coronavirus replication by obstructing nucleocapsid folding. RESULTS: Finally, we analyzed the literature and protein-protein interactions, finding evidence that, by inhibiting CyPA, CsA may impact coagulation proteins and hemostasis. CONCLUSIONS: Despite CsA's promising antiviral characteristics, the interactions between cyclophilins and coagulation factors emphasize risk stratification for COVID patients with thrombosis dispositions.

19.

Distinct signatures of codon and codon pair usage in 32 primary tumor types in the novel database CancerCoCoPUTs for cancer-specific codon usage.

Meyer, Douglas; Kames, Jacob; Bar, Haim; Komar, Anton A; Alexaki, Aikaterini; Ibla, Juan; Hunt, Ryan C; Santana-Quintero, Luis V; Golikov, Anton; DiCuccio, Michael; Kimchi-Sarfaty, Chava.

Genome Med ; 13(1): 122, 2021 07 28.

Artigo em Inglês | MEDLINE | ID: mdl-34321100

RESUMO

BACKGROUND: Gene expression is highly variable across tissues of multi-cellular organisms, influencing the codon usage of the tissue-specific transcriptome. Cancer disrupts the gene expression pattern of healthy tissue resulting in altered codon usage preferences. The topic of codon usage changes as they relate to codon demand, and tRNA supply in cancer is of growing interest. METHODS: We analyzed transcriptome-weighted codon and codon pair usage based on The Cancer Genome Atlas (TCGA) RNA-seq data from 6427 solid tumor samples and 632 normal tissue samples. This dataset represents 32 cancer types affecting 11 distinct tissues. Our analysis focused on tissues that give rise to multiple solid tumor types and cancer types that are present in multiple tissues. RESULTS: We identified distinct patterns of synonymous codon usage changes for different cancer types affecting the same tissue. For example, a substantial increase in GGT-glycine was observed in invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and mixed invasive ductal and lobular carcinoma (IDLC) of the breast. Change in synonymous codon preference favoring GGT correlated with change in synonymous codon preference against GGC in IDC and IDLC, but not in ILC. Furthermore, we examined the codon usage changes between paired healthy/tumor tissue from the same patient. Using clinical data from TCGA, we conducted a survival analysis of patients based on the degree of change between healthy and tumor-specific codon usage, revealing an association between larger changes and increased mortality. We have also created a database that contains cancer-specific codon and codon pair usage data for cancer types derived from TCGA, which represents a comprehensive tool for codon-usage-oriented cancer research. CONCLUSIONS: Based on data from TCGA, we have highlighted tumor type-specific signatures of codon and codon pair usage. Paired data revealed variable changes to codon usage patterns, which must be considered when designing personalized cancer treatments. The associated database, CancerCoCoPUTs, represents a comprehensive resource for codon and codon pair usage in cancer and is available at https://dnahive.fda.gov/review/cancercocoputs/ . These findings are important to understand the relationship between tRNA supply and codon demand in cancer states and could help guide the development of new cancer therapeutics.

Assuntos

Uso do Códon , Códon , Biologia Computacional/métodos , Bases de Dados Genéticas , Neoplasias/diagnóstico , Neoplasias/genética , Biomarcadores Tumorais , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Estudo de Associação Genômica Ampla , Genômica/métodos , Humanos , Estimativa de Kaplan-Meier , Neoplasias/mortalidade , Prognóstico , Transcriptoma

20.

Database resources of the National Center for Biotechnology Information.

Wheeler, David L; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene.

Nucleic Acids Res ; 36(Database issue): D13-21, 2008 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18045790

RESUMO

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Assuntos

Bases de Dados Genéticas , National Library of Medicine (U.S.) , Animais , Bases de Dados de Ácidos Nucleicos , Expressão Gênica , Genômica , Genótipo , Humanos , Internet , Modelos Moleculares , Fenótipo , Proteômica , Alinhamento de Sequência , Estados Unidos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA