Búsqueda | Portal Regional de la BVS

A new and updated resource for codon usage tables.

Athey, John; Alexaki, Aikaterini; Osipova, Ekaterina; Rostovtsev, Alexandre; Santana-Quintero, Luis V; Katneni, Upendra; Simonyan, Vahan; Kimchi-Sarfaty, Chava.

BMC Bioinformatics ; 18(1): 391, 2017 Sep 02.

Artículo en Inglés | MEDLINE | ID: mdl-28865429

RESUMEN

BACKGROUND: Due to the degeneracy of the genetic code, most amino acids can be encoded by multiple synonymous codons. Synonymous codons naturally occur with different frequencies in different organisms. The choice of codons may affect protein expression, structure, and function. Recombinant gene technologies commonly take advantage of the former effect by implementing a technique termed codon optimization, in which codons are replaced with synonymous ones in order to increase protein expression. This technique relies on the accurate knowledge of codon usage frequencies. Accurately quantifying codon usage bias for different organisms is useful not only for codon optimization, but also for evolutionary and translation studies: phylogenetic relations of organisms, and host-pathogen co-evolution relationships, may be explored through their codon usage similarities. Furthermore, codon usage has been shown to affect protein structure and function through interfering with translation kinetics, and cotranslational protein folding. RESULTS: Despite the obvious need for accurate codon usage tables, currently available resources are either limited in scope, encompassing only organisms from specific domains of life, or greatly outdated. Taking advantage of the exponential growth of GenBank and the creation of NCBI's RefSeq database, we have developed a new database, the High-performance Integrated Virtual Environment-Codon Usage Tables (HIVE-CUTs), to present and analyse codon usage tables for every organism with publicly available sequencing data. Compared to existing databases, this new database is more comprehensive, addresses concerns that limited the accuracy of earlier databases, and provides several new functionalities, such as the ability to view and compare codon usage between individual organisms and across taxonomical clades, through graphical representation or through commonly used indices. In addition, it is being routinely updated to keep up with the continuous flow of new data in GenBank and RefSeq. CONCLUSION: Given the impact of codon usage bias on recombinant gene technologies, this database will facilitate effective development and review of recombinant drug products and will be instrumental in a wide area of biological research. The database is available at hive.biochemistry.gwu.edu/review/codon .

Asunto(s)

Codón , Bases de Datos de Ácidos Nucleicos , Animales , Humanos

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja.

Database (Oxford) ; 20162016.

Artículo en Inglés | MEDLINE | ID: mdl-26989153

RESUMEN

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Interfaz Usuario-Computador , Biología Computacional , Mutación/genética , Poliovirus/genética , Vacunas contra Poliovirus/inmunología , Proteómica , Recombinación Genética , Alineación de Secuencia , Estadística como Asunto

Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes.

Faison, William J; Rostovtsev, Alexandre; Castro-Nallar, Eduardo; Crandall, Keith A; Chumakov, Konstantin; Simonyan, Vahan; Mazumder, Raja.

Genomics ; 104(1): 1-7, 2014 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-24930720

RESUMEN

UNLABELLED: Next-generation sequencing data can be mapped to a reference genome to identify single-nucleotide polymorphisms/variations (SNPs/SNVs; called SNPs hereafter). In theory, SNPs can be compared across several samples and the differences can be used to create phylogenetic trees depicting relatedness among the samples. However, in practice this is difficult because currently there is no stand-alone tool that takes SNP data directly as input and produces phylogenetic trees. In response to this need, PhyloSNP application was created with two analysis methods 1) a quantitative method that creates the presence/absence matrix which can be directly used to generate phylogenetic trees or creates a tree from a shrunk genome alignment (includes additional bases surrounding the SNP position) and 2) a qualitative method that clusters samples based on the frequency of different bases found at a particular position. The algorithms were used to generate trees from Poliovirus, Burkholderia and human cancer genomics NGS datasets. AVAILABILITY: PhyloSNP is freely available for download at http://hive.biochemistry.gwu.edu/dna.cgi?cmd=phylosnp.

Asunto(s)

Burkholderia pseudomallei/genética , Genoma Humano , Genómica/métodos , Filogenia , Poliovirus/genética , Polimorfismo de Nucleótido Simple , Alineación de Secuencia/métodos , Algoritmos , Humanos , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA