Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 14: 197, 2013 Jun 18.
Article in English | MEDLINE | ID: mdl-23777206

ABSTRACT

BACKGROUND: Influenza A viruses possess RNA genomes that mutate frequently in response to immune pressures. The mutations in the hemagglutinin genes are particularly significant, as the hemagglutinin proteins mediate attachment and fusion to host cells, thereby influencing viral pathogenicity and species specificity. Large-scale influenza A genome sequencing efforts have been ongoing to understand past epidemics and pandemics and anticipate future outbreaks. Sequencing efforts thus far have generated nearly 9,000 distinct hemagglutinin amino acid sequences. DESCRIPTION: Comparative models for all publicly available influenza A hemagglutinin protein sequences (8,769 to date) were generated using the Rosetta modeling suite. The C-alpha root mean square deviations between a randomly chosen test set of models and their crystallographic templates were less than 2 Å, suggesting that the modeling protocols yielded high-quality results. The models were compiled into an online resource, the Hemagglutinin Structure Prediction (HASP) server. The HASP server was designed as a scientific tool for researchers to visualize hemagglutinin protein sequences of interest in a three-dimensional context. With a built-in molecular viewer, hemagglutinin models can be compared side-by-side and navigated by a corresponding sequence alignment. The models and alignments can be downloaded for offline use and further analysis. CONCLUSIONS: The modeling protocols used in the HASP server scale well for large amounts of sequences and will keep pace with expanded sequencing efforts. The conservative approach to modeling and the intuitive search and visualization interfaces allow researchers to quickly analyze hemagglutinin sequences of interest in the context of the most highly related experimental structures, and allow them to directly compare hemagglutinin sequences to each other simultaneously in their two- and three-dimensional contexts. The models and methodology have shown utility in current research efforts and the ongoing aim of the HASP server is to continue to accelerate influenza A research and have a positive impact on global public health.


Subject(s)
Databases, Protein , Hemagglutinin Glycoproteins, Influenza Virus/chemistry , Protein Conformation , Sequence Alignment , Sequence Analysis, Protein , Software
2.
Nucleic Acids Res ; 41(Database issue): D571-8, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23093593

ABSTRACT

The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.


Subject(s)
Databases, Genetic , Papillomaviridae/genetics , Genome, Viral , Genomics , Internet , Molecular Sequence Annotation , Sequence Analysis , User-Computer Interface , Viral Proteins/chemistry , Viral Proteins/genetics
3.
Genome Biol ; 12(4): R33, 2011.
Article in English | MEDLINE | ID: mdl-21463505

ABSTRACT

BACKGROUND: The human malaria parasite Plasmodium falciparum survives pressures from the host immune system and antimalarial drugs by modifying its genome. Genetic recombination and nucleotide substitution are the two major mechanisms that the parasite employs to generate genome diversity. A better understanding of these mechanisms may provide important information for studying parasite evolution, immune evasion and drug resistance. RESULTS: Here, we used a high-density tiling array to estimate the genetic recombination rate among 32 progeny of a P. falciparum genetic cross (7G8 × GB4). We detected 638 recombination events and constructed a high-resolution genetic map. Comparing genetic and physical maps, we obtained an overall recombination rate of 9.6 kb per centimorgan and identified 54 candidate recombination hotspots. Similar to centromeres in other organisms, the sequences of P. falciparum centromeres are found in chromosome regions largely devoid of recombination activity. Motifs enriched in hotspots were also identified, including a 12-bp G/C-rich motif with 3-bp periodicity that may interact with a protein containing 11 predicted zinc finger arrays. CONCLUSIONS: These results show that the P. falciparum genome has a high recombination rate, although it also follows the overall rule of meiosis in eukaryotes with an average of approximately one crossover per chromosome per meiosis. GC-rich repetitive motifs identified in the hotspot sequences may play a role in the high recombination rate observed. The lack of recombination activity in centromeric regions is consistent with the observations of reduced recombination near the centromeres of other organisms.


Subject(s)
Crossing Over, Genetic , Meiosis/genetics , Plasmodium falciparum/genetics , Recombination, Genetic/genetics , Chromosome Mapping , Crosses, Genetic , Genetic Variation , Genome, Protozoan , Humans , Malaria/parasitology
4.
Infect Genet Evol ; 11(1): 248-9, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20801234

ABSTRACT

The variable regions (VR) of the surface-exposed PorA protein of Meningococci are used for subtyping and are considered the most abundant epitopes of outer membrane vesicle-based vaccine preparations. We have developed both a database that maintains all the known VR3 alleles and a web-based application for the rapid identification and submission of new VR3 variants based on sequence comparison.


Subject(s)
Alleles , Databases, Genetic , Internet , Neisseria meningitidis/genetics , Porins/genetics
5.
Plant Cell Rep ; 30(4): 613-29, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21188383

ABSTRACT

Black cohosh (Actaea racemosa L., syn. Cimicifuga racemosa, Nutt., Ranunculaceae) is a popular herb used for relieving menopausal discomforts. A variety of secondary metabolites, including triterpenoids, phenolic dimers, and serotonin derivatives have been associated with its biological activity, but the genes and metabolic pathways as well as the tissue distribution of their production in this plant are unknown. A gene discovery effort was initiated in A. racemosa by partial sequencing of cDNA libraries constructed from young leaf, rhizome, and root tissues. In total, 2,066 expressed sequence tags (ESTs) were assembled into 1,590 unique genes (unigenes). Most of the unigenes were predicted to encode primary metabolism genes, but about 70 were identified as putative secondary metabolism genes. Several of these candidates were analyzed further and full-length cDNA and genomic sequences for a putative 2,3 oxidosqualene cyclase (CAS1) and two BAHD-type acyltransferases (ACT1 and HCT1) were obtained. Homology-based PCR screening for the central gene in plant serotonin biosynthesis, tryptophan decarboxylase (TDC), identified two TDC-related sequences in A. racemosa. CAS1, ACT1, and HCT1 were expressed in most plant tissues, whereas expression of TDC genes was detected only sporadically in immature flower heads and some very young leaf tissues. The cDNA libraries described and assorted genes identified provide initial insight into gene content and diversity in black cohosh, and provide tools and resources for detailed investigations of secondary metabolite genes and enzymes in this important medicinal plant.


Subject(s)
Cimicifuga/metabolism , Expressed Sequence Tags , Cimicifuga/genetics , Intramolecular Transferases/chemistry , Intramolecular Transferases/genetics , Intramolecular Transferases/metabolism , Plant Proteins/chemistry , Plant Proteins/genetics , Plant Proteins/metabolism , Reverse Transcriptase Polymerase Chain Reaction
6.
Hum Mutat ; 31(9): 1080-8, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20652909

ABSTRACT

Hyper-IgM syndrome and Common Variable Immunodeficiency are heterogeneous disorders characterized by a predisposition to serious infection and impaired or absent neutralizing antibody responses. Although a number of single gene defects have been associated with these immune deficiency disorders, the genetic basis of many cases is not known. To facilitate mutation screening in patients with these syndromes, we have developed a custom 300-kb resequencing array, the Hyper-IgM/CVID chip, which interrogates 1,576 coding exons and intron-exon junction regions from 148 genes implicated in B-cell development and immunoglobulin isotype switching. Genomic DNAs extracted from patients were hybridized to the array using a high-throughput protocol for target sequence amplification, pooling, and hybridization. A Web-based application, SNP Explorer, was developed to directly analyze and visualize the single nucleotide polymorphism (SNP) annotation and for quality filtering. Several mutations in known disease-susceptibility genes such as CD40LG, TNFRSF13B, IKBKG, AICDA, as well as rare nucleotide changes in other genes such as TRAF3IP2, were identified in patient DNA samples and validated by direct sequencing. We conclude that the Hyper-IgM/CVID chip combined with SNP Explorer may provide a cost-effective tool for high-throughput discovery of novel mutations among hundreds of disease-relevant genes in patients with inherited antibody deficiency.


Subject(s)
Immunologic Deficiency Syndromes/diagnosis , Immunologic Deficiency Syndromes/genetics , Oligonucleotide Array Sequence Analysis/methods , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Software , Gene Frequency/genetics , Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing/economics , Humans , Internet , Polymerase Chain Reaction , Reproducibility of Results
7.
Mol Biol Evol ; 24(10): 2158-68, 2007 Oct.
Article in English | MEDLINE | ID: mdl-17646255

ABSTRACT

Claims of intron-structure correlations have played a major role in debates surrounding split gene origins. In the formative (as opposed to disruptive or "insertional") model of split gene origins, introns represent the scars of chimaeric gene assembly. When analyzed retrospectively, formative introns should tend to fall between modular units, if such units exist, or at least to exhibit a preference for sites favorable to chimaera formation. However, there is another possible source of preferences: under a disruptive model of split gene origins, fortuitous intron-structure correlations may arise because the gain of introns is biased with respect to flanking nucleotide sequences. To investigate the extent to which a sequence-biased intron gain model may account for the present-day distribution of introns, data on over 10,000 introns in eukaryotic protein-coding genes were integrated with structural data from a set of 1,851 nonredundant protein chains. The positions of introns with respect to secondary structures, solvent accessibility, and so-called "modules" were evaluated relative to the expectations of a null model, a disruptive model based on amino acid frequencies at splice junctions, and a formative model defined relative to these. The null model can be excluded for most structural features and is highly improbable when intron sites are grouped by reading frame phase. Phase-dependent correlations with secondary structure and side-chain surface accessibility are particularly strong. However, these phase-dependent correlations are explained largely by the sequence-based disruptive model.


Subject(s)
Base Sequence , Introns/genetics , Models, Genetic , Protein Structure, Tertiary , Proteins , Amino Acids/chemistry , Amino Acids/genetics , Animals , Databases, Genetic , Molecular Sequence Data , Proteins/chemistry , Proteins/genetics
8.
BMC Bioinformatics ; 8: 191, 2007 Jun 08.
Article in English | MEDLINE | ID: mdl-17559666

ABSTRACT

BACKGROUND: Evolutionary analysis provides a formal framework for comparative analysis of genomic and other data. In evolutionary analysis, observed data are treated as the terminal states of characters that have evolved (via transitions between states) along the branches of a tree. The NEXUS standard of Maddison, et al. (1997; Syst. Biol. 46: 590-621) provides a portable, expressive and flexible text format for representing character-state data and trees. However, due to its complexity, NEXUS is not well supported by software and is not easily accessible to bioinformatics users and developers. RESULTS: Bio::NEXUS is an application programming interface (API) implemented in Perl, available from CPAN and SourceForge. The 22 Bio::NEXUS modules define 351 methods in 4229 lines of code, with 2706 lines of POD (Plain Old Documentation). Bio::NEXUS provides an object-oriented interface to reading, writing and manipulating the contents of NEXUS files. It closely follows the extensive explanation of the NEXUS format provided by Maddison et al., supplemented with a few extensions such as support for the NHX (New Hampshire Extended) tree format. CONCLUSION: In spite of some limitations owing to the complexity of NEXUS files and the lack of a formal grammar, NEXUS will continue to be useful for years to come. Bio::NEXUS provides a user-friendly API for NEXUS supplemented with an extensive set of methods for manipulations such as re-rooting trees and selecting subsets of data. Bio::NEXUS can be used as glue code for connecting existing software that uses NEXUS, or as a framework for new applications.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Databases, Genetic , Evolution, Molecular , Information Storage and Retrieval/methods , Programming Languages , Software , User-Computer Interface
9.
Bioinformatics ; 22(1): 120-1, 2006 Jan 01.
Article in English | MEDLINE | ID: mdl-16267087

ABSTRACT

SUMMARY: Nexplorer is a web-based program for interactive browsing and manipulation of character data in NEXUS format, well suited for use with alignments and trees representing families of homologous genes or proteins. Users may upload a sequence family dataset, or choose from one of several thousand already available. Nexplorer provides a flexible means to develop customized views that combine a tree and a data matrix or alignment, to create subsets of data, and to output data files or publication-quality graphics. AVAILABILITY: Web access is from http://www.molevol.org/nexplorer


Subject(s)
Computational Biology/methods , Animals , Computer Graphics , Databases, Protein , Humans , Internet , Mitochondrial Proton-Translocating ATPases/genetics , Phylogeny , Programming Languages , Sequence Alignment , Sequence Analysis, DNA , Sequence Analysis, Protein , Software , User-Computer Interface
10.
Nucleic Acids Res ; 32(Database issue): D59-63, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681359

ABSTRACT

Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the record's sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493,983 genes--351,918 intron- containing genes and 142,065 intron-less genes. Xpro is updated for each new GenBank release and is freely available via the internet at http://origin.bic. nus.edu.sg/xpro.


Subject(s)
Databases, Genetic , Eukaryotic Cells , Genes , Proteins/genetics , Alternative Splicing/genetics , Animals , Computational Biology , Expressed Sequence Tags , Humans , Information Storage and Retrieval , Internet , Introns/genetics , Reproducibility of Results , Sequence Homology
SELECTION OF CITATIONS
SEARCH DETAIL
...