Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 294, 2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39242990

RESUMO

Mouse (Mus musculus) models have been heavily utilized in developmental biology research to understand mammalian embryonic development, as mice share many genetic, physiological, and developmental characteristics with humans. New explorations into the integration of temporal (stage-specific) and transcriptional (tissue-specific) data have expanded our knowledge of mouse embryo tissue-specific gene functions. To better understand the substantial impact of synonymous mutational variations in the cell-state-specific transcriptome on a tissue's codon and codon pair usage landscape, we have established a novel resource-Mouse Embryo Codon and Codon Pair Usage Tables (Mouse Embryo CoCoPUTs). This webpage not only offers codon and codon pair usage, but also GC, dinucleotide, and junction dinucleotide usage, encompassing four strains, 15 murine embryonic tissue groups, 18 Theiler stages, and 26 embryonic days. Here, we leverage Mouse Embryo CoCoPUTs and employ the use of heatmaps to depict usage changes over time and a comparison to human usage for each strain and embryonic time point, highlighting unique differences and similarities. The usage similarities found between mouse and human central nervous system data highlight the translation for projects leveraging mouse models. Data for this analysis can be directly retrieved from Mouse Embryo CoCoPUTs. This cutting-edge resource plays a crucial role in deciphering the complex interplay between usage patterns and embryonic development, offering valuable insights into variation across diverse tissues, strains, and stages. Its applications extend across multiple domains, with notable advantages for biotherapeutic development, where optimizing codon usage can enhance protein expression; one can compare strains, tissues, and mouse embryonic stages in one query. Additionally, Mouse Embryo CoCoPUTs holds great potential in the field of tissue-specific genetic engineering, providing insights for tailoring gene expression to specific tissues for targeted interventions. Furthermore, this resource may enhance our understanding of the nuanced connections between usage biases and tissue-specific gene function, contributing to the development of more accurate predictive models for genetic disorders.


Assuntos
Transcriptoma , Animais , Camundongos , Transcriptoma/genética , Embrião de Mamíferos/metabolismo , Humanos , Desenvolvimento Embrionário/genética , Uso do Códon/genética
2.
Nucleic Acids Res ; 52(D1): D762-D769, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962425

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.


Assuntos
Archaea , Bactérias , Bases de Dados de Ácidos Nucleicos , Metagenoma , Archaea/genética , Bactérias/genética , Bases de Dados de Ácidos Nucleicos/normas , Bases de Dados de Ácidos Nucleicos/tendências , Genoma Arqueal/genética , Genoma Bacteriano/genética , Internet , Anotação de Sequência Molecular , Proteínas/genética
3.
Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37994677

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Assuntos
Bases de Dados Genéticas , National Library of Medicine (U.S.) , Biotecnologia/instrumentação , Bases de Dados de Ácidos Nucleicos , Internet , Estados Unidos
4.
Artigo em Inglês | MEDLINE | ID: mdl-36748495

RESUMO

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.


Assuntos
Bases de Dados de Ácidos Nucleicos , Ácidos Graxos , Análise de Sequência de DNA , Reprodutibilidade dos Testes , RNA Ribossômico 16S/genética , Filogenia , Composição de Bases , DNA Bacteriano/genética , Técnicas de Tipagem Bacteriana , Ácidos Graxos/química
5.
Virol J ; 20(1): 31, 2023 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-36812119

RESUMO

BACKGROUND: Since the onset of the SARS-CoV-2 pandemic, bioinformatic analyses have been performed to understand the nucleotide and synonymous codon usage features and mutational patterns of the virus. However, comparatively few have attempted to perform such analyses on a considerably large cohort of viral genomes while organizing the plethora of available sequence data for a month-by-month analysis to observe changes over time. Here, we aimed to perform sequence composition and mutation analysis of SARS-CoV-2, separating sequences by gene, clade, and timepoints, and contrast the mutational profile of SARS-CoV-2 to other comparable RNA viruses. METHODS: Using a cleaned, filtered, and pre-aligned dataset of over 3.5 million sequences downloaded from the GISAID database, we computed nucleotide and codon usage statistics, including calculation of relative synonymous codon usage values. We then calculated codon adaptation index (CAI) changes and a nonsynonymous/synonymous mutation ratio (dN/dS) over time for our dataset. Finally, we compiled information on the types of mutations occurring for SARS-CoV-2 and other comparable RNA viruses, and generated heatmaps showing codon and nucleotide composition at high entropy positions along the Spike sequence. RESULTS: We show that nucleotide and codon usage metrics remain relatively consistent over the 32-month span, though there are significant differences between clades within each gene at various timepoints. CAI and dN/dS values vary substantially between different timepoints and different genes, with Spike gene on average showing both the highest CAI and dN/dS values. Mutational analysis showed that SARS-CoV-2 Spike has a higher proportion of nonsynonymous mutations than analogous genes in other RNA viruses, with nonsynonymous mutations outnumbering synonymous ones by up to 20:1. However, at several specific positions, synonymous mutations were overwhelmingly predominant. CONCLUSIONS: Our multifaceted analysis covering both the composition and mutation signature of SARS-CoV-2 gives valuable insight into the nucleotide frequency and codon usage heterogeneity of SARS-CoV-2 over time, and its unique mutational profile compared to other RNA viruses.


Assuntos
COVID-19 , Vírus de RNA , Humanos , SARS-CoV-2/genética , Nucleotídeos , COVID-19/genética , Códon , Mutação , Genoma Viral , Vírus de RNA/genética , Evolução Molecular
6.
STAR Protoc ; 3(3): 101648, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36052345

RESUMO

Here, we describe a bioinformatics pipeline that evaluates the interactions between coagulation-related proteins and genetic variants with SARS-CoV-2 proteins. This pipeline searches for host proteins that may bind to viral protein and identifies and scores the protein genetic variants to predict the disease pathogenesis in specific subpopulations. Additionally, it is able to find structurally similar motifs and identify potential binding sites within the host-viral protein complexes to unveil viral impact on regulated biological processes and/or host-protein impact on viral invasion or reproduction. For complete details on the use and execution of this protocol, please refer to Holcomb et al. (2021).


Assuntos
COVID-19 , SARS-CoV-2 , Sítios de Ligação , COVID-19/genética , Interações entre Hospedeiro e Microrganismos , Humanos , SARS-CoV-2/genética , Proteínas Virais/genética
7.
Genome Med ; 13(1): 122, 2021 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-34321100

RESUMO

BACKGROUND: Gene expression is highly variable across tissues of multi-cellular organisms, influencing the codon usage of the tissue-specific transcriptome. Cancer disrupts the gene expression pattern of healthy tissue resulting in altered codon usage preferences. The topic of codon usage changes as they relate to codon demand, and tRNA supply in cancer is of growing interest. METHODS: We analyzed transcriptome-weighted codon and codon pair usage based on The Cancer Genome Atlas (TCGA) RNA-seq data from 6427 solid tumor samples and 632 normal tissue samples. This dataset represents 32 cancer types affecting 11 distinct tissues. Our analysis focused on tissues that give rise to multiple solid tumor types and cancer types that are present in multiple tissues. RESULTS: We identified distinct patterns of synonymous codon usage changes for different cancer types affecting the same tissue. For example, a substantial increase in GGT-glycine was observed in invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and mixed invasive ductal and lobular carcinoma (IDLC) of the breast. Change in synonymous codon preference favoring GGT correlated with change in synonymous codon preference against GGC in IDC and IDLC, but not in ILC. Furthermore, we examined the codon usage changes between paired healthy/tumor tissue from the same patient. Using clinical data from TCGA, we conducted a survival analysis of patients based on the degree of change between healthy and tumor-specific codon usage, revealing an association between larger changes and increased mortality. We have also created a database that contains cancer-specific codon and codon pair usage data for cancer types derived from TCGA, which represents a comprehensive tool for codon-usage-oriented cancer research. CONCLUSIONS: Based on data from TCGA, we have highlighted tumor type-specific signatures of codon and codon pair usage. Paired data revealed variable changes to codon usage patterns, which must be considered when designing personalized cancer treatments. The associated database, CancerCoCoPUTs, represents a comprehensive resource for codon and codon pair usage in cancer and is available at https://dnahive.fda.gov/review/cancercocoputs/ . These findings are important to understand the relationship between tRNA supply and codon demand in cancer states and could help guide the development of new cancer therapeutics.


Assuntos
Uso do Códon , Códon , Biologia Computacional/métodos , Bases de Dados Genéticas , Neoplasias/diagnóstico , Neoplasias/genética , Biomarcadores Tumorais , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Estudo de Associação Genômica Ampla , Genômica/métodos , Humanos , Estimativa de Kaplan-Meier , Neoplasias/mortalidade , Prognóstico , Transcriptoma
8.
Open Forum Infect Dis ; 8(6): ofab189, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34109257

RESUMO

BACKGROUND: The advent of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) provoked researchers to propose multiple antiviral strategies to improve patients' outcomes. Studies provide evidence that cyclosporine A (CsA) decreases SARS-CoV-2 replication in vitro and decreases mortality rates of coronavirus disease 2019 (COVID-19) patients. CsA binds cyclophilins, which isomerize prolines, affecting viral protein activity. METHODS: We investigated the proline composition from various coronavirus proteomes to identify proteins that may critically rely on cyclophilin's peptidyl-proline isomerase activity and found that the nucleocapsid (N) protein significantly depends on cyclophilin A (CyPA). We modeled CyPA and N protein interactions to demonstrate the N protein as a potential indirect therapeutic target of CsA, which we propose may impede coronavirus replication by obstructing nucleocapsid folding. RESULTS: Finally, we analyzed the literature and protein-protein interactions, finding evidence that, by inhibiting CyPA, CsA may impact coagulation proteins and hemostasis. CONCLUSIONS: Despite CsA's promising antiviral characteristics, the interactions between cyclophilins and coagulation factors emphasize risk stratification for COVID patients with thrombosis dispositions.

9.
PLoS Comput Biol ; 17(3): e1008805, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33730015

RESUMO

Thrombosis is a recognized complication of Coronavirus disease of 2019 (COVID-19) and is often associated with poor prognosis. There is a well-recognized link between coagulation and inflammation, however, the extent of thrombotic events associated with COVID-19 warrants further investigation. Poly(A) Binding Protein Cytoplasmic 4 (PABPC4), Serine/Cysteine Proteinase Inhibitor Clade G Member 1 (SERPING1) and Vitamin K epOxide Reductase Complex subunit 1 (VKORC1), which are all proteins linked to coagulation, have been shown to interact with SARS proteins. We computationally examined the interaction of these with SARS-CoV-2 proteins and, in the case of VKORC1, we describe its binding to ORF7a in detail. We examined the occurrence of variants of each of these proteins across populations and interrogated their potential contribution to COVID-19 severity. Potential mechanisms, by which some of these variants may contribute to disease, are proposed. Some of these variants are prevalent in minority groups that are disproportionally affected by severe COVID-19. Therefore, we are proposing that further investigation around these variants may lead to better understanding of disease pathogenesis in minority groups and more informed therapeutic approaches.


Assuntos
Coagulação Sanguínea , Proteínas Sanguíneas/genética , COVID-19/metabolismo , Proteína Inibidora do Complemento C1/genética , Proteínas de Ligação a Poli(A)/genética , SARS-CoV-2/metabolismo , Vitamina K Epóxido Redutases/genética , Anticoagulantes/administração & dosagem , Proteínas Sanguíneas/metabolismo , COVID-19/fisiopatologia , COVID-19/virologia , Proteína Inibidora do Complemento C1/metabolismo , Estudo de Associação Genômica Ampla , Humanos , Modelos Moleculares , Mutação , Proteínas de Ligação a Poli(A)/metabolismo , Ligação Proteica , SARS-CoV-2/genética , Índice de Gravidade de Doença , Proteínas Virais/metabolismo , Vitamina K Epóxido Redutases/metabolismo , Varfarina/administração & dosagem
10.
Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270901

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Proteínas/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Genômica/métodos , Internet , Proteínas/classificação , Interface Usuário-Computador
11.
F1000Res ; 9: 174, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33014344

RESUMO

Ribosome profiling provides the opportunity to evaluate translation kinetics at codon level resolution. Here, we describe ribosome profiling data, generated from two HEK293T cell lines. The ribosome profiling data are composed of Ribo-seq (mRNA sequencing data from ribosome protected fragments) and RNA-seq data (total RNA sequencing). The two HEK293T cell lines each express a version of the F9 gene, both of which are translated into identical proteins in terms of their amino acid sequences. However, these F9 genes vary drastically in their codon usage and predicted mRNA structure. We also provide the pipeline that we used to analyze the data. Further analyzing this dataset holds great potential as it can be used i) to unveil insights into the composition and regulation of the transcriptome, ii) for comparison with other ribosome profiling datasets, iii) to measure the rate of protein synthesis across the proteome and identify differences in elongation rates, iv) to discover previously unidentified translation of peptides, v) to explore the effects of codon usage or codon context in translational kinetics and vi) to investigate cotranslational folding. Importantly, a unique feature of this dataset, compared to other available ribosome profiling data, is the presence of the F9 gene in two very distinct coding sequences.


Assuntos
Códon/genética , Fator IX/genética , Biossíntese de Proteínas , Ribossomos/genética , Células HEK293 , Humanos
12.
Sci Rep ; 10(1): 15643, 2020 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-32973171

RESUMO

As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses, for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/prevenção & controle , Proteínas do Nucleocapsídeo/genética , Pandemias/prevenção & controle , Pneumonia Viral/prevenção & controle , Glicoproteína da Espícula de Coronavírus/genética , Vacinas Virais/imunologia , Sequência de Bases , COVID-19 , Vacinas contra COVID-19 , Infecções por Coronavirus/imunologia , Proteínas do Nucleocapsídeo de Coronavírus , Genoma Viral/genética , Humanos , Proteínas do Nucleocapsídeo/imunologia , Fosfoproteínas , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus/imunologia , Vacinas de Produtos Inativados/imunologia , Sequenciamento Completo do Genoma
13.
bioRxiv ; 2020 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-32935103

RESUMO

Thrombosis has been one of the complications of the Coronavirus disease of 2019 (COVID-19), often associated with poor prognosis. There is a well-recognized link between coagulation and inflammation, however, the extent of thrombotic events associated with COVID-19 warrants further investigation. Poly(A) Binding Protein Cytoplasmic 4 (PABPC4), Serine/Cysteine Proteinase Inhibitor Clade G Member 1 (SERPING1) and Vitamin K epOxide Reductase Complex subunit 1 (VKORC1), which are all proteins linked to coagulation, have been shown to interact with SARS proteins. We computationally examined the interaction of these with SARS-CoV-2 proteins and, in the case of VKORC1, we describe its binding to ORF7a in detail. We examined the occurrence of variants of each of these proteins across populations and interrogated their potential contribution to COVID-19 severity. Potential mechanisms by which some of these variants may contribute to disease are proposed. Some of these variants are prevalent in minority groups that are disproportionally affected by severe COVID-19. Therefore, we are proposing that further investigation around these variants may lead to better understanding of disease pathogenesis in minority groups and more informed therapeutic approaches. AUTHOR SUMMARY: Increased blood clotting, especially in the lungs, is a common complication of COVID-19. Infectious diseases cause inflammation which in turn can contribute to increased blood clotting. However, the extent of clot formation that is seen in the lungs of COVID-19 patients suggests that there may be a more direct link. We identified three human proteins that are involved indirectly in the blood clotting cascade and have been shown to interact with proteins of SARS virus, which is closely related to the novel coronavirus. We examined computationally the interaction of these human proteins with the viral proteins. We looked for genetic variants of these proteins and examined how these variants are distributed across populations. We investigated whether variants of these genes could impact severity of COVID-19. Further investigation around these variants may provide clues for the pathogenesis of COVID-19 particularly in minority groups.

14.
Thromb Haemost ; 120(12): 1668-1679, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32838472

RESUMO

Coronavirus disease of 2019 (COVID-19) is the clinical manifestation of the respiratory infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). While primarily recognized as a respiratory disease, it is clear that COVID-19 is systemic illness impacting multiple organ systems. One defining clinical feature of COVID-19 has been the high incidence of thrombotic events. The underlying processes and risk factors for the occurrence of thrombotic events in COVID-19 remain inadequately understood. While severe bacterial, viral, or fungal infections are well recognized to activate the coagulation system, COVID-19-associated coagulopathy is likely to have unique mechanistic features. Inflammatory-driven processes are likely primary drivers of coagulopathy in COVID-19, but the exact mechanisms linking inflammation to dysregulated hemostasis and thrombosis are yet to be delineated. Cumulative findings of microvascular thrombosis has raised question if the endothelium and microvasculature should be a point of investigative focus. von Willebrand factor (VWF) and its protease, a disintegrin and metalloproteinase with a thrombospondin type 1 motif, member 13 (ADAMTS-13), play important role in the maintenance of microvascular hemostasis. In inflammatory conditions, imbalanced VWF-ADAMTS-13 characterized by elevated VWF levels and inhibited and/or reduced activity of ADAMTS-13 has been reported. Also, an imbalance between ADAMTS-13 activity and VWF antigen is associated with organ dysfunction and death in patients with systemic inflammation. A thorough understanding of VWF-ADAMTS-13 interactions during early and advanced phases of COVID-19 could help better define the pathophysiology, guide thromboprophylaxis and treatment, and improve clinical prognosis.


Assuntos
COVID-19/complicações , Coagulação Intravascular Disseminada/etiologia , Microvasos/patologia , SARS-CoV-2/fisiologia , Trombose/etiologia , Proteína ADAMTS13/metabolismo , Animais , Coagulação Sanguínea/imunologia , Humanos , Fator de von Willebrand/metabolismo
15.
bioRxiv ; 2020 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-32511300

RESUMO

As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.

16.
Sci Rep ; 9(1): 15449, 2019 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-31664102

RESUMO

Synonymous codons occur with different frequencies in different organisms, a phenomenon termed codon usage bias. Codon optimization, a common term for a variety of approaches used widely by the biopharmaceutical industry, involves synonymous substitutions to increase protein expression. It had long been presumed that synonymous variants, which, by definition, do not alter the primary amino acid sequence, have no effect on protein structure and function. However, a critical mass of reports suggests that synonymous codon variations may impact protein conformation. To investigate the impact of synonymous codons usage on protein expression and function, we designed an optimized coagulation factor IX (FIX) variant and used multiple methods to compare its properties to the wild-type FIX upon expression in HEK293T cells. We found that the two variants differ in their conformation, even when controlling for the difference in expression levels. Using ribosome profiling, we identified robust changes in the translational kinetics of the two variants and were able to identify a region in the gene that may have a role in altering the conformation of the protein. Our data have direct implications for codon optimization strategies, for production of recombinant proteins and gene therapies.


Assuntos
Códon , Fator IX/química , Fator IX/genética , Terapia Genética , Biossíntese de Proteínas , Código Genético , Células HEK293 , Humanos , Conformação Proteica
18.
Int J Syst Evol Microbiol ; 68(7): 2386-2392, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29792589

RESUMO

Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Nucleotídeos/genética , Filogenia , Composição de Bases , Células Procarióticas , Análise de Sequência de DNA
20.
Nucleic Acids Res ; 46(D1): D851-D860, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29112715

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Assuntos
Curadoria de Dados , Bases de Dados de Ácidos Nucleicos , Genoma , Anotação de Sequência Molecular , Células Procarióticas , Archaea/genética , Bactérias/genética , Bases de Dados de Proteínas , Eucariotos/genética , Previsões , Humanos , Homologia de Sequência , Software , Vírus/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA