Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
mBio ; : e0210523, 2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37905805

RESUMO

A cornerstone of bacterial molecular biology is the ability to genetically manipulate the microbe under study. Many bacteria are difficult to manipulate genetically, a phenotype due in part to robust removal of newly acquired DNA, for example, by restriction-modification (R-M) systems. Here, we report approaches that dramatically improve bacterial transformation efficiency, piloted using a microbe that is challenging to transform due to expression of many R-M systems, Helicobacter pylori. Initially, we identified conditions that dampened expression of several R-M systems and concomitantly enhanced transformation efficiency. We then identified an approach that would broadly protect newly acquired DNA. We computationally predicted under-represented short DNA sequences in the H. pylori genome, with the idea that these sequences reflect targets of sequence-based surveillance such as R-M systems. We then used this information to modify and eliminate such sites in antibiotic resistance cassettes, creating a "stealth" version. Modifying antibiotic resistance cassettes in this way resulted in significantly higher transformation efficiency compared to non-modified cassettes, a response that was genomic loci independent. Our results suggest that avoiding R-M systems, via modification of under-represented DNA sequences or transformation conditions, is a powerful method to enhance DNA transformation. Our approach to identify under-represented sequences is applicable to any microbe with a sequenced genome.IMPORTANCEManipulating the genomes of bacteria is critical to many fields. Such manipulations are made by genetic engineering, which often requires new pieces of DNA to be added to the genome. Bacteria have robust systems for identifying and degrading new DNA, some of which rely on restriction enzymes. These enzymes cut DNA at specific sequences. We identified a set of DNA sequences that are missing normally from a bacterium's genome, more than would be expected by chance. Eliminating these sequences from a new piece of DNA allowed it to be incorporated into the bacterial genome at a higher frequency than new DNA containing the sequences. Removing such sequences appears to allow the new DNA to fly under the bacterial radar in "stealth" mode. This transformation improvement approach is straightforward to apply and likely broadly applicable.

2.
mBio ; 8(1)2017 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-28223462

RESUMO

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB α-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains.IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish "the genome" of a bacterial strain. Variability is usually reduced ("only sequence from a single colony"), ignored ("just publish the consensus"), or placed in the "too-hard" basket ("analysis of raw read data is more robust"). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as "the genome" of a bacterial strain may be misleading.


Assuntos
Variação Genética , Genoma Bacteriano , Helicobacter pylori/genética , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Camundongos , Mutação
3.
Bioinformatics ; 31(12): 1897-903, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-25649617

RESUMO

MOTIVATION: Nanopore-based sequencing techniques can reconstruct properties of biosequences by analyzing the sequence-dependent ionic current steps produced as biomolecules pass through a pore. Typically this involves alignment of new data to a reference, where both reference construction and alignment have been performed by hand. RESULTS: We propose an automated method for aligning nanopore data to a reference through the use of hidden Markov models. Several features that arise from prior processing steps and from the class of enzyme used can be simply incorporated into the model. Previously, the M2MspA nanopore was shown to be sensitive enough to distinguish between cytosine, methylcytosine and hydroxymethylcytosine. We validated our automated methodology on a subset of that data by automatically calculating an error rate for the distinction between the three cytosine variants and show that the automated methodology produces a 2-3% error rate, lower than the 10% error rate from previous manual segmentation and alignment. AVAILABILITY AND IMPLEMENTATION: The data, output, scripts and tutorials replicating the analysis are available at https://github.com/UCSCNanopore/Data/tree/master/Automation.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Cadeias de Markov , Nanoporos , Análise de Sequência de DNA/métodos , 5-Metilcitosina/química , Citosina/análogos & derivados , Citosina/química , Metilação de DNA , Epigenômica , Humanos , Alinhamento de Sequência
4.
Proc Natl Acad Sci U S A ; 110(47): 18910-5, 2013 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-24167260

RESUMO

Cytosine, 5-methylcytosine, and 5-hydroxymethylcytosine were identified during translocation of single DNA template strands through a modified Mycobacterium smegmatis porin A (M2MspA) nanopore under control of phi29 DNA polymerase. This identification was based on three consecutive ionic current states that correspond to passage of modified or unmodified CG dinucleotides and their immediate neighbors through the nanopore limiting aperture. To establish quality scores for these calls, we examined ~3,300 translocation events for 48 distinct DNA constructs. Each experiment analyzed a mixture of cytosine-, 5-methylcytosine-, and 5-hydroxymethylcytosine-bearing DNA strands that contained a marker that independently established the correct cytosine methylation status at the target CG of each molecule tested. To calculate error rates for these calls, we established decision boundaries using a variety of machine-learning methods. These error rates depended upon the identity of the bases immediately 5' and 3' of the targeted CG dinucleotide, and ranged from 1.7% to 12.2% for a single-pass read. We estimate that Q40 values (0.01% error rates) for methylation status calls could be achieved by reading single molecules 5-19 times depending upon sequence context.


Assuntos
5-Metilcitosina/isolamento & purificação , Citosina/análogos & derivados , Citosina/isolamento & purificação , Metilação de DNA/genética , DNA/análise , Epigenômica/métodos , Nanoporos , 5-Metilcitosina/química , Citosina/química , Projetos de Pesquisa
5.
Nat Biotechnol ; 30(4): 344-8, 2012 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-22334048

RESUMO

An emerging DNA sequencing technique uses protein or solid-state pores to analyze individual strands as they are driven in single-file order past a nanoscale sensor. However, uncontrolled electrophoresis of DNA through these nanopores is too fast for accurate base reads. Here, we describe forward and reverse ratcheting of DNA templates through the α-hemolysin nanopore controlled by phi29 DNA polymerase without the need for active voltage control. DNA strands were ratcheted through the pore at median rates of 2.5-40 nucleotides per second and were examined at one nucleotide spatial precision in real time. Up to 500 molecules were processed at ∼130 molecules per hour through one pore. The probability of a registry error (an insertion or deletion) at individual positions during one pass along the template strand ranged from 10% to 24.5% without optimization. This strategy facilitates multiple reads of individual strands and is transferable to other nanopore devices for implementation of DNA sequence analysis.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nanoporos , Replicação do DNA/genética , DNA Polimerase Dirigida por DNA/química , DNA Polimerase Dirigida por DNA/genética , Proteínas Hemolisinas/química , Nucleotídeos/química , Nucleotídeos/genética
6.
Stand Genomic Sci ; 6(3): 336-45, 2012 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-23407329

RESUMO

Pyrobaculum oguniense TE7 is an aerobic hyperthermophilic crenarchaeon isolated from a hot spring in Japan. Here we describe its main chromosome of 2,436,033 bp, with three large-scale inversions and an extra-chromosomal element of 16,887 bp. We have annotated 2,800 protein-coding genes and 145 RNA genes in this genome, including nine H/ACA-like small RNA, 83 predicted C/D box small RNA, and 47 transfer RNA genes. Comparative analyses with the closest known relative, the anaerobe Pyrobaculum arsenaticum from Italy, reveals unexpectedly high synteny and nucleotide identity between these two geographically distant species. Deep sequencing of a mixture of genomic DNA from multiple cells has illuminated some of the genome dynamics potentially shared with other species in this genus.

7.
Mol Syst Biol ; 7: 539, 2011 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-21988835

RESUMO

Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.


Assuntos
Mineração de Dados/métodos , Proteínas/análise , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Biologia de Sistemas , Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Bases de Dados Factuais , Dados de Sequência Molecular , Proteínas/química , Software , Biologia de Sistemas/instrumentação , Biologia de Sistemas/métodos
8.
J Bacteriol ; 193(17): 4338-45, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21725005

RESUMO

We report the identification and characterization of a previously unidentified protein domain found in bacterial chemoreceptors and other bacterial signal transduction proteins. This domain contains a motif of three noncontiguous histidines and one cysteine, arranged as Hxx[WFYL]x(21-28)Cx[LFMVI]Gx[WFLVI]x(18-27)HxxxH(boldface type indicates residues that are nearly 100% conserved). This domain was first identified in the soluble Helicobacter pylori chemoreceptor TlpD. Using inductively coupled plasma mass spectrometry on heterologously and natively expressed TlpD, we determined that this domain binds zinc with a subfemtomolar dissociation constant. We thus named the domain CZB, for chemoreceptor zinc binding. Further analysis showed that many bacterial signaling proteins contain the CZB domain, most commonly proteins that participate in chemotaxis but also those that participate in c-di-GMP signaling and nitrate/nitrite sensing, among others. Proteins bearing the CZB domain are found in several bacterial phyla. The variety of signaling proteins using the CZB domain suggests that it plays a critical role in several signal transduction pathways.


Assuntos
Proteínas de Bactérias/química , Citoplasma/metabolismo , Receptores Citoplasmáticos e Nucleares/química , Transdução de Sinais , Zinco/metabolismo , Motivos de Aminoácidos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Quimiotaxia , Clonagem Molecular , GMP Cíclico/análogos & derivados , GMP Cíclico/genética , GMP Cíclico/metabolismo , Cisteína/química , Cisteína/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Helicobacter pylori/genética , Histidina/química , Histidina/metabolismo , Mutação , Ligação Proteica , Receptores Citoplasmáticos e Nucleares/genética , Receptores Citoplasmáticos e Nucleares/metabolismo
9.
Bioinformatics ; 27(13): 1765-71, 2011 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-21551138

RESUMO

MOTIVATION: Accurate prediction of genes encoding small proteins (on the order of 50 amino acids or less) remains an elusive open problem in bioinformatics. Some of the best methods for gene prediction use either sequence composition analysis or sequence similarity to a known protein coding sequence. These methods often fail for small proteins, however, either due to a lack of experimentally verified small protein coding genes or due to the limited statistical significance of statistics on small sequences. Our approach is based upon the hypothesis that true small proteins will be under selective pressure for encoding the particular amino acid sequence, for ease of translation by the ribosome and for structural stability. This stability can be achieved either independently or as part of a larger protein complex. Given this assumption, it follows that small proteins should display conserved local protein structure properties much like larger proteins. Our method incorporates neural-net predictions for three local structure alphabets within a comparative genomic approach using a genomic alignment of 22 closely related bacteria genomes to generate predictions for whether or not a given open reading frame (ORF) encodes for a small protein. RESULTS: We have applied this method to the complete genome for Escherichia coli strain K12 and looked at how well our method performed on a set of 60 experimentally verified small proteins from this organism. Out of a total of 11 407 possible ORFs, we found that 6 of the top 10 and 27 of the top 100 predictions belonged to the set of 60 experimentally verified small proteins. We found 35 of all the true small proteins within the top 200 predictions. We compared our method to Glimmer, using a default Glimmer protocol and a modified small ORF Glimmer protocol with a lower minimum size cutoff. The default Glimmer protocol identified 16 of the true small proteins (all in the top 200 predictions), but failed to predict on 34 due to size cutoffs. The small ORF Glimmer protocol made predictions for all the experimentally verified small proteins but only contained 9 of the 60 true small proteins within the top 200 predictions. CONTACT: jsamayoa@jhu.edu


Assuntos
Escherichia coli K12/química , Escherichia coli K12/genética , Proteínas de Escherichia coli/isolamento & purificação , Fases de Leitura Aberta , Sequência de Bases , Códon , Enterobacteriaceae/química , Enterobacteriaceae/genética , Escherichia coli/química , Escherichia coli/genética , Proteínas de Escherichia coli/química , Genômica/métodos , Alinhamento de Sequência
10.
Bioinformatics ; 26(5): 596-602, 2010 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-20130034

RESUMO

MOTIVATION: Some first order methods for protein sequence analysis inherently treat each position as independent. We develop a general framework for introducing longer range interactions. We then demonstrate the power of our approach by applying it to secondary structure prediction; under the independence assumption, sequences produced by existing methods can produce features that are not protein like, an extreme example being a helix of length 1. Our goal was to make the predictions from state of the art methods more realistic, without loss of performance by other measures. RESULTS: Our framework for longer range interactions is described as a k-mer order model. We succeeded in applying our model to the specific problem of secondary structure prediction, to be used as an additional layer on top of existing methods. We achieved our goal of making the predictions more realistic and protein like, and remarkably this also improved the overall performance. We improve the Segment OVerlap (SOV) score by 1.8%, but more importantly we radically improve the probability of the real sequence given a prediction from an average of 0.271 per residue to 0.385. Crucially, this improvement is obtained using no additional information. AVAILABILITY: http://supfam.cs.bris.ac.uk/kmer


Assuntos
Biologia Computacional/métodos , Estrutura Secundária de Proteína , Proteínas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Dados de Sequência Molecular , Alinhamento de Sequência , Análise de Sequência de Proteína
11.
Proteins ; 77 Suppl 9: 114-22, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19768677

RESUMO

A correct alignment is an essential requirement in homology modeling. Yet in order to bridge the structural gap between template and target, which may not only involve loop rearrangements, but also shifts of secondary structure elements and repacking of core residues, high-resolution refinement methods with full atomic details are needed. Here, we describe four approaches that address this "last mile of the protein folding problem" and have performed well during CASP8, yielding physically realistic models: YASARA, which runs molecular dynamics simulations of models in explicit solvent, using a new partly knowledge-based all atom force field derived from Amber, whose parameters have been optimized to minimize the damage done to protein crystal structures. The LEE-SERVER, which makes extensive use of conformational space annealing to create alignments, to help Modeller build physically realistic models while satisfying input restraints from templates and CHARMM stereochemistry, and to remodel the side-chains. ROSETTA, whose high resolution refinement protocol combines a physically realistic all atom force field with Monte Carlo minimization to allow the large conformational space to be sampled quickly. And finally UNDERTAKER, which creates a pool of candidate models from various templates and then optimizes them with an adaptive genetic algorithm, using a primarily empirical cost function that does not include bond angle, bond length, or other physics-like terms.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Proteínas/química , Alinhamento de Sequência/métodos , Algoritmos , Conformação Proteica , Software
12.
Proteins ; 77 Suppl 9: 191-5, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19639637

RESUMO

Our group tested three quality assessment functions in CASP8: a function which used only distance constraints derived from alignments (SAM-T08-MQAO), a function which added other single-model terms to the distance constraints (SAM-T08-MQAU), and a function which used both single-model and consensus terms (SAM-T08-MQAC). We analyzed the functions both for ranking models for a single target and for producing an accurate estimate of GDT_TS. Our functions were optimized for the ranking problem, so are perhaps more appropriate for metaserver applications than for providing trustworthiness estimates for single models. On the CASP8 test, the functions with more terms performed better. The MQAC consensus method was substantially better than either single-model function, and the MQAU function was substantially better than the MQAO function that used only constraints from alignments.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Modelos Moleculares , Conformação Proteica , Software
13.
Nucleic Acids Res ; 37(Web Server issue): W492-7, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19483096

RESUMO

The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue-residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html.


Assuntos
Conformação Proteica , Software , Bases de Dados de Proteínas , Cadeias de Markov , Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular , Proteínas/química , Reprodutibilidade dos Testes , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína
14.
Bioinformatics ; 25(12): i281-8, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19478000

RESUMO

MOTIVATION: Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slip-knots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. RESULTS: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect un-protein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares , Dobramento de Proteína
15.
Proteins ; 75(3): 540-9, 2009 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-19003987

RESUMO

Given a set of alternative models for a specific protein sequence, the model quality assessment (MQA) problem asks for an assignment of scores to each model in the set. A good MQA program assigns these scores such that they correlate well with real quality of the models, ideally scoring best that model which is closest to the true structure. In this article, we present a new approach for addressing the MQA problem. It is based on distance constraints extracted from alignments to templates of known structure, and is implemented in the Undertaker program for protein structure prediction. One novel feature is that we extract noncontact constraints as well as contact constraints. We describe how the distance constraint extraction is done and we show how they can be used to address the MQA problem. We have compared our method on CASP7 targets and the results show that our method is at least comparable with the best MQA methods that were assessed at CASP7. We also propose a new evaluation measure, Kendall's tau, that is more interpretable than conventional measures used for evaluating MQA methods (Pearson's r and Spearman's rho). We show clear examples where Kendall's tau agrees much more with our intuition of a correct MQA, and we therefore propose that Kendall's tau be used for future CASP MQA assessments.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/química , Caspase 7/química , Simulação por Computador , Humanos , Modelos Moleculares , Conformação Proteica , Reprodutibilidade dos Testes
16.
Proteins ; 75(3): 550-5, 2009 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-19004017

RESUMO

Undertaker is a program designed to help predict protein structure using alignments to proteins of known structure and fragment assembly. The program generates conformations and uses cost functions to select the best structures from among the generated conformations. This paper describes the use of Undertaker's cost functions for model quality assessment. We achieve an accuracy that is similar to other methods, without using consensus-based techniques. Adding consensus-based features further improves our approach substantially. We report several correlation measures, including a new weighted version of Kendall's tau (tau(3)) and show model quality assessment results superior to previously published results on all correlation measures when using only models with no missing atoms.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/química , Modelos Moleculares , Conformação Proteica , Reprodutibilidade dos Testes
17.
Bioinformatics ; 24(21): 2453-9, 2008 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-18757875

RESUMO

MOTIVATION: Predictions of protein local structure, derived from sequence alignment information alone, provide visualization tools for biologists to evaluate the importance of amino acid residue positions of interest in the absence of X-ray crystal/NMR structures or homology models. They are also useful as inputs to sequence analysis and modeling tools, such as hidden Markov models (HMMs), which can be used to search for homology in databases of known protein structure. In addition, local structure predictions can be used as a component of cost functions in genetic algorithms that predict protein tertiary structure. We have developed a program (predict-2nd) that trains multilayer neural networks and have applied it to numerous local structure alphabets, tuning network parameters such as the number of layers, the number of units in each layer and the window sizes of each layer. We have had the most success with four-layer networks, with gradually increasing window sizes at each layer. RESULTS: Because the four-layer neural nets occasionally get trapped in poor local optima, our training protocol now uses many different random starts, with short training runs, followed by more training on the best performing networks from the short runs. One recent addition to the program is the option to add a guide sequence to the profile inputs, increasing the number of inputs per position by 20. We find that use of a guide sequence provides a small but consistent improvement in the predictions for several different local-structure alphabets. AVAILABILITY: Local structure prediction with the methods described here is available for use online at http://www.soe.ucsc.edu/compbio/SAM_T08/T08-query.html. The source code and example networks for PREDICT-2ND are available at http://www.soe.ucsc.edu/~karplus/predict-2nd/ A required C++ library is available at http://www.soe.ucsc.edu/~karplus/ultimate/


Assuntos
Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Dados de Sequência Molecular , Redes Neurais de Computação , Alinhamento de Sequência , Análise de Sequência de Proteína
18.
Proteins ; 69 Suppl 8: 159-64, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17932918

RESUMO

Prediction of protein structures continues to be a difficult problem, particularly when there are no solved structures for homologous proteins to use as templates. Local structure prediction (secondary structure and burial) is fairly reliable, but does not provide enough information to produce complete three-dimensional structures. Residue-residue contact prediction, though still not highly reliable, may provide a useful guide for assembling local structure prediction into full tertiary prediction. We develop a neural network which is applied to pairs of residue positions and outputs a probability of contact between the positions. One of the neural net inputs is a novel statistic for detecting correlated mutations: the statistical significance of the mutual information between the corresponding columns of a multiple sequence alignment. This statistic, combined with a second statistic based on the propensity of two amino acid types being in contact, results in a simple neural network that is a good predictor of contacts. Adding more features from amino-acid distributions and local structure predictions, the final neural network predicts contacts better than other submitted contact predictions at CASP7, including contact predictions derived from fragment-based tertiary models on free-modeling domains. It is still not known if contact predictions can improve tertiary models on free-modeling domains. Available at http://www.soe.ucsc.edu/research/compbio/SAM_T06/T06-query.html.


Assuntos
Redes Neurais de Computação , Conformação Proteica , Algoritmos , Interpretação Estatística de Dados , Mutação , Proteínas/química , Alinhamento de Sequência
19.
J Bacteriol ; 188(3): 1049-59, 2006 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-16428409

RESUMO

Phase variation between smooth and rugose colony variants of Vibrio cholerae is predicted to be important for the pathogen's survival in its natural aquatic ecosystems. The rugose variant forms corrugated colonies, exhibits increased levels of resistance to osmotic, acid, and oxidative stresses, and has an enhanced capacity to form biofilms. Many of these phenotypes are mediated in part by increased production of an exopolysaccharide termed VPS. In this study, we compared total protein profiles of the smooth and rugose variants using two-dimensional gel electrophoresis and identified one protein that is present at a higher level in the rugose variant. A mutation in the gene encoding this protein, which does not have any known homologs in the protein databases, causes cells to form biofilms that are more fragile and sensitive to sodium dodecyl sulfate than wild-type biofilms. The results indicate that the gene, termed rbmA (rugosity and biofilm structure modulator A), is required for rugose colony formation and biofilm structure integrity in V. cholerae. Transcription of rbmA is positively regulated by the response regulator VpsR but not VpsT.


Assuntos
Proteínas de Bactérias/isolamento & purificação , Biofilmes/crescimento & desenvolvimento , Vibrio cholerae/fisiologia , Proteínas de Bactérias/metabolismo , Regulação Bacteriana da Expressão Gênica , Genes Reguladores , Polissacarídeos Bacterianos/metabolismo , Polissacarídeos Bacterianos/fisiologia , Vibrio cholerae/genética
20.
Protein Eng Des Sel ; 18(12): 597-605, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16246822

RESUMO

This paper proposes a strategy to translate experimental 1H NMR proton distance restraints into their corresponding heavy atom distance restraints for the purpose of protein structure prediction. The relationships between interproton distances and the corresponding heavy atom distances are determined by studying well-resolved X-ray protein structures. The data from the interproton distances of amide protons, alpha-protons, beta-protons and side chain methyl protons are plotted against the corresponding heavy atoms in scatter plots and then fitted with linear equations for lower bounds, upper bounds and optimal fits. We also transform the scatter plots into two-dimensional heat maps and three-dimensional histograms, which identify the regions where data points concentrate. The common interproton distances between amide protons, alpha-protons, beta-protons in alpha-helices, anti-parallel beta-sheets and parallel beta-sheets are also tabulated. We have found several patterns emerging from the distance relationships between heavy atom pairs and their corresponding proton pairs. All our upper bound, lower bound and optimal fit results for translating the interproton distance into their corresponding heavy atom distances are tabulated.


Assuntos
Algoritmos , Ressonância Magnética Nuclear Biomolecular , Proteínas/química , Conformação Proteica , Prótons
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA