Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nucleic Acids Res ; 35(22): e150, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-18039711

RESUMEN

Protein structural annotation and classification is an important and challenging problem in bioinformatics. Research towards analysis of sequence-structure correspondences is critical for better understanding of a protein's structure, function, and its interaction with other molecules. Clustering of protein domains based on their structural similarities provides valuable information for protein classification schemes. In this article, we attempt to determine whether structure information alone is sufficient to adequately classify protein structures. We present an algorithm that identifies regions of structural similarity within a given set of protein structures, and uses those regions for clustering. In our approach, called STRALCP (STRucture ALignment-based Clustering of Proteins), we generate detailed information about global and local similarities between pairs of protein structures, identify fragments (spans) that are structurally conserved among proteins, and use these spans to group the structures accordingly. We also provide a web server at http://as2ts.llnl.gov/AS2TS/STRALCP/ for selecting protein structures, calculating structurally conserved regions and performing automated clustering.


Asunto(s)
Algoritmos , Estructura Terciaria de Proteína , Proteínas/clasificación , Homología Estructural de Proteína , Secuencia de Aminoácidos , Análisis por Conglomerados , Internet , Modelos Moleculares , Datos de Secuencia Molecular , Alineación de Secuencia , Programas Informáticos
2.
Nucleic Acids Res ; 33(18): 5838-50, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16243783

RESUMEN

Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(-3)-10(-5) (approximately 8x coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of approximately 1% (3x to 6x coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.


Asunto(s)
Genoma Bacteriano , Genoma Viral , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Biología Computacional , Marburgvirus/genética , Marburgvirus/aislamiento & purificación , Filogenia , Alineación de Secuencia , Programas Informáticos , Virus de la Viruela/genética , Virus de la Viruela/aislamiento & purificación , Proteínas Virales/química
3.
BMC Bioinformatics ; 7: 459, 2006 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-17044936

RESUMEN

BACKGROUND: MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. DESCRIPTION: MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. CONCLUSION: MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports. Access to MannDB is freely available at http://manndb.llnl.gov/.


Asunto(s)
Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Interfaz Usuario-Computador , Algoritmos , Secuencia de Aminoácidos , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Sitios de Unión , Gráficos por Computador , Sistemas de Administración de Bases de Datos , Internet , Datos de Secuencia Molecular , Unión Proteica , Proteoma/química , Proteoma/clasificación , Proteoma/genética , Proteoma/metabolismo , Programas Informáticos , Integración de Sistemas
4.
J Clin Microbiol ; 43(4): 1807-17, 2005 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-15815002

RESUMEN

Computational analyses of genome sequences may elucidate protein signatures unique to a target pathogen. We constructed a Protein Signature Pipeline to guide the selection of short peptide sequences to serve as targets for detection and therapeutics. In silico identification of good target peptides that are conserved among strains and unique compared to other species generates a list of peptides. These peptides may be developed in the laboratory as targets of antibody, peptide, and ligand binding for detection assays and therapeutics or as targets for vaccine development. In this paper, we assess how the amount of sequence data affects our ability to identify conserved, unique protein signature candidates. To determine the amount of sequence data required to select good protein signature candidates, we have built a computationally intensive system called the Sequencing Analysis Pipeline (SAP). The SAP performs thousands of Monte Carlo simulations, each calling the Protein Signature Pipeline, to assess how the amount of sequence data for a target organism affects the ability to predict peptide signature candidates. Viral species differ substantially in the number of genomes required to predict protein signature targets. Patterns do not appear based on genome structure. There are more protein than DNA signatures due to greater intraspecific conservation at the protein than at the nucleotide level. We conclude that it is necessary to use the SAP as a dynamic system to assess the need for continued sequencing for each species individually and to update predictions with each additional genome that is sequenced.


Asunto(s)
Secuencia de Bases , Virus ADN/clasificación , Genoma Viral , Virus ARN/clasificación , Proteínas Virales/química , Virosis/diagnóstico , Biología Computacional/métodos , Virus ADN/genética , Virus ADN/aislamiento & purificación , Humanos , Método de Montecarlo , Virus ARN/genética , Virus ARN/aislamiento & purificación , Análisis de Secuencia de ADN , Proteínas Virales/genética , Virosis/tratamiento farmacológico , Virosis/virología
5.
Bioinformatics ; 21(14): 3089-96, 2005 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-15905278

RESUMEN

MOTIVATION: Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. RESULTS: We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set of ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A, we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics and vaccines.


Asunto(s)
Algoritmos , Modelos Químicos , Modelos Moleculares , Ricina/análisis , Ricina/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Sitios de Unión , Simulación por Computador , Secuencia Conservada , Datos de Secuencia Molecular , Unión Proteica , Conformación Proteica , Homología de Secuencia de Aminoácido
6.
J Clin Microbiol ; 42(12): 5472-6, 2004 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-15583268

RESUMEN

We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (near neighbors) that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near-neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near-neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. Severe acute respiratory syndrome and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near-neighbor sequences are urgently needed. Our results also indicate that double-stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.


Asunto(s)
Secuencia de Bases , Virus ADN/clasificación , Genoma Viral , Virus ARN/clasificación , Virosis/diagnóstico , Virus ADN/genética , Humanos , Método de Montecarlo , Virus ARN/genética , Especificidad de la Especie , Virosis/virología
7.
Brief Bioinform ; 4(2): 133-49, 2003 06.
Artículo en Inglés | MEDLINE | ID: mdl-12846395

RESUMEN

Rapid advances in the genomic sequencing of bacteria and viruses over the past few years have made it possible to consider sequencing the genomes of all pathogens that affect humans and the crops and livestock upon which our lives depend. Recent events make it imperative that full genome sequencing be accomplished as soon as possible for pathogens that could be used as weapons of mass destruction or disruption. This sequence information must be exploited to provide rapid and accurate diagnostics to identify pathogens and distinguish them from harmless near-neighbours and hoaxes. The Chem-Bio Non-Proliferation (CBNP) programme of the US Department of Energy (DOE) began a large-scale effort of pathogen detection in early 2000 when it was announced that the DOE would be providing bio-security at the 2002 Winter Olympic Games in Salt Lake City, Utah. Our team at the Lawrence Livermore National Lab (LLNL) was given the task of developing reliable and validated assays for a number of the most likely bioterrorist agents. The short timeline led us to devise a novel system that utilised whole-genome comparison methods to rapidly focus on parts of the pathogen genomes that had a high probability of being unique. Assays developed with this approach have been validated by the Centers for Disease Control (CDC). They were used at the 2002 Winter Olympics, have entered the public health system, and have been in continual use for non-publicised aspects of homeland defence since autumn 2001. Assays have been developed for all major threat list agents for which adequate genomic sequence is available, as well as for other pathogens requested by various government agencies. Collaborations with comparative genomics algorithm developers have enabled our LLNL team to make major advances in pathogen detection, since many of the existing tools simply did not scale well enough to be of practical use for this application. It is hoped that a discussion of a real-life practical application of comparative genomics algorithms may help spur algorithm developers to tackle some of the many remaining problems that need to be addressed. Solutions to these problems will advance a wide range of biological disciplines, only one of which is pathogen detection. For example, exploration in evolution and phylogenetics, annotating gene coding regions, predicting and understanding gene function and regulation, and untangling gene networks all rely on tools for aligning multiple sequences, detecting gene rearrangements and duplications, and visualising genomic data. Two key problems currently needing improved solutions are: (1) aligning incomplete, fragmentary sequence (eg draft genome contigs or arbitrary genome regions) with both complete genomes and other fragmentary sequences; and (2) ordering, aligning and visualising non-colinear gene rearrangements and inversions in addition to the colinear alignments handled by current tools.


Asunto(s)
Bioterrorismo , Genómica/métodos , Secuencia de Aminoácidos , Animales , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Genes Bacterianos , Genes Virales , Genoma , Humanos , Modelos Moleculares , Estructura Terciaria de Proteína , Alineación de Secuencia , Programas Informáticos , Estados Unidos , Proteínas Virales/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA