Pesquisa | Portal Regional da BVS

1.

N-Terminal Peptide Detection with Optimized Peptide-Spectrum Matching and Streamlined Sequence Libraries.

Lycette, Brynne E; Glickman, Jacob W; Roth, Samuel J; Cram, Abigail E; Kim, Tae Hee; Krizanc, Danny; Weir, Michael P.

J Proteome Res ; 15(9): 2891-9, 2016 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-27498768

RESUMO

We identified tryptic peptides in yeast cell lysates that map to translation initiation sites downstream of the annotated start sites using the peptide-spectrum matching algorithms OMSSA and Mascot. To increase the accuracy of peptide-spectrum matching, both algorithms were run using several standardized parameter sets, and Mascot was run utilizing a, b, and y ions from collision-induced dissociation. A large fraction (22%) of the detected N-terminal peptides mapped to translation initiation downstream of the annotated initiation sites. Expression of several truncated proteins from downstream initiation in the same reading frame as the full-length protein (frame 1) was verified by western analysis. To facilitate analysis of the larger proteome of Drosophila, we created a streamlined sequence library from which all duplicated trypsin fragments had been removed. OMSSA assessment using this "stripped" library revealed 171 peptides that map to downstream translation initiation sites, 76% of which are in the same reading frame as the full-length annotated proteins, although some are in different reading frames creating new protein sequences not in the annotated proteome. Sequences surrounding implicated downstream AUG start codons are associated with nucleotide preferences with a pronounced three-base periodicity N1^G2^A3.

Assuntos

Bases de Dados de Proteínas/normas , Proteínas de Drosophila/análise , Proteínas Fúngicas/análise , Peptídeos/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/normas , Algoritmos , Sequência de Aminoácidos , Animais , Códon de Iniciação , Anotação de Sequência Molecular , Proteômica/normas , Fases de Leitura , Padrões de Referência

2.

Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data.

Francisco, Juan Carlos; Cohan, Frederick M; Krizanc, Danny.

Int J Bioinform Res Appl ; 10(4-5): 409-25, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24989860

RESUMO

Identification of closely related, ecologically distinct populations of bacteria would benefit microbiologists working in many fields including systematics, epidemiology and biotechnology. Several laboratories have recently developed algorithms aimed at demarcating such 'ecotypes'. We examine the ability of four of these algorithms to correctly identify ecotypes from sequence data. We tested the algorithms on synthetic sequences, with known history and habitat associations, generated under the stable ecotype model and on data from Bacillus strains isolated from Death Valley where previous work has confirmed the existence of multiple ecotypes. We found that one of the algorithms (ecotype simulation) performs significantly better than the others (AdaptML, GMYC, BAPS) in both instances. Unfortunately, it was also shown to be the least efficient of the four. While ecotype simulation is the most accurate, it is by a large margin the slowest of the algorithms tested. Attempts at improving its efficiency are underway.

Assuntos

Algoritmos , Bacillus/classificação , Biologia Computacional/métodos , Ecótipo , Análise de Sequência de DNA/métodos , Bacillus/genética , Genes Bacterianos , Modelos Estatísticos , Software , Especificidade da Espécie

3.

Assessment of MS/MS search algorithms with parent-protein profiling.

Lin, Miin S; Cherny, Justin J; Fournier, Claire T; Roth, Samuel J; Krizanc, Danny; Weir, Michael P.

J Proteome Res ; 13(4): 1823-32, 2014 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-24533481

RESUMO

Peptide mass spectrometry relies crucially on algorithms that match peptides to spectra. We describe a method to evaluate the accuracy of these algorithms based on the masses of parent proteins before trypsin endoprotease digestion. Measurement of conformance to parent proteins provides a score for comparison of the performances of different algorithms as well as alternative parameter settings for a given algorithm. Tracking of conformance scores for spectrum matches to proteins with progressively lower expression levels revealed that conformance scores are not uniform within data sets but are significantly lower for less abundant proteins. Similarly peptides with lower algorithm peptide-spectrum match scores have lower conformance. Although peptide mass spectrometry data is typically filtered through decoy analysis to ensure a low false discovery rate, this analysis confirms that the filtered data should not be considered as having a uniform confidence. The analysis suggests that use of different algorithms and multiple standardized parameter settings of these algorithms can increase significantly the numbers of peptides identified. This data set can be used as a resource for future algorithm assessment.

Assuntos

Algoritmos , Mapeamento de Peptídeos/métodos , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Humanos , Fragmentos de Peptídeos/análise , Fragmentos de Peptídeos/química , Proteínas/análise , Proteínas/química , Tripsina

4.

Asymptotic structural properties of quasi-random saturated structures of RNA.

Clote, Peter; Kranakis, Evangelos; Krizanc, Danny.

Algorithms Mol Biol ; 8(1): 24, 2013 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-24156624

RESUMO

BACKGROUND: RNA folding depends on the distribution of kinetic traps in the landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In previous work, we investigated asymptotic combinatorics of both random saturated structures and of quasi-random saturated structures, where the latter are constructed by a natural stochastic process. RESULTS: We prove that for quasi-random saturated structures with the uniform distribution, the asymptotic expected number of external loops is O(logn) and the asymptotic expected maximum stem length is O(logn), while under the Zipf distribution, the asymptotic expected number of external loops is O(log2n) and the asymptotic expected maximum stem length is O(logn/log logn). CONCLUSIONS: Quasi-random saturated structures are generated by a stochastic greedy method, which is simple to implement. Structural features of random saturated structures appear to resemble those of quasi-random saturated structures, and the latter appear to constitute a class for which both the generation of sampled structures as well as a combinatorial investigation of structural features may be simpler to undertake.

5.

Asymptotic number of hairpins of saturated RNA secondary structures.

Clote, Peter; Kranakis, Evangelos; Krizanc, Danny.

Bull Math Biol ; 75(12): 2410-30, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24142625

RESUMO

In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least Î¸=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685[Symbol: see text]n (-3/2)[Symbol: see text]1.62178 (n) , and the asymptotic expected number of hairpins follows a normal distribution with mean [Formula: see text]. Similar results are given for values Î¸=1,3, and p=1,1/2,3/8; for instance, when Î¸=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194[Symbol: see text]n, a value greater than the asymptotic expected number 0.105573[Symbol: see text]n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica(™) computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/ .

Assuntos

Conformação de Ácido Nucleico , RNA/química , RNA/genética , Algoritmos , Biologia Computacional , Sequências Repetidas Invertidas , Conceitos Matemáticos , Modelos Moleculares

6.

Speedy speciation in a bacterial microcosm: new species can arise as frequently as adaptations within a species.

Koeppel, Alexander F; Wertheim, Joel O; Barone, Laura; Gentile, Nicole; Krizanc, Danny; Cohan, Frederick M.

ISME J ; 7(6): 1080-91, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23364353

RESUMO

Microbiologists are challenged to explain the origins of enormous numbers of bacterial species worldwide. Contributing to this extreme diversity may be a simpler process of speciation in bacteria than in animals and plants, requiring neither sexual nor geographical isolation between nascent species. Here, we propose and test a novel hypothesis for the extreme diversity of bacterial species-that splitting of one population into multiple ecologically distinct populations (cladogenesis) may be as frequent as adaptive improvements within a single population's lineage (anagenesis). We employed a set of experimental microcosms to address the relative rates of adaptive cladogenesis and anagenesis among the descendants of a Bacillus subtilis clone, in the absence of competing species. Analysis of the evolutionary trajectories of genetic markers indicated that in at least 7 of 10 replicate microcosm communities, the original population founded one or more new, ecologically distinct populations (ecotypes) before a single anagenetic event occurred within the original population. We were able to support this inference by identifying putative ecotypes formed in these communities through differences in genetic marker association, colony morphology and microhabitat association; we then confirmed the ecological distinctness of these putative ecotypes in competition experiments. Adaptive mutations leading to new ecotypes appeared to be about as common as those improving fitness within an existing ecotype. These results suggest near parity of anagenesis and cladogenesis rates in natural populations that are depauperate of bacterial diversity.

Assuntos

Bacillus subtilis/classificação , Bacillus subtilis/genética , Especiação Genética , Adaptação Fisiológica , Bacillus subtilis/fisiologia , Evolução Biológica , Ecótipo , Genética Populacional , Geografia

7.

Amino termini of many yeast proteins map to downstream start codons.

Fournier, Claire T; Cherny, Justin J; Truncali, Kris; Robbins-Pianka, Adam; Lin, Miin S; Krizanc, Danny; Weir, Michael P.

J Proteome Res ; 11(12): 5712-9, 2012 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-23140384

RESUMO

Comprehensive knowledge of proteome complexity is crucial to understanding cell function. Amino termini of yeast proteins were identified through peptide mass spectrometry on glutaraldehyde-treated cell lysates as well as a parallel assessment of publicly deposited spectra. An unexpectedly large fraction of detected amino-terminal peptides (35%) mapped to translation initiation at AUG codons downstream of the annotated start codon. Many of the implicated genes have suboptimal sequence contexts for translation initiation near their annotated AUG, and their ribosome profiles show elevated tag densities consistent with translation initiation at downstream AUGs as well as their annotated AUGs. These data suggest that a significant fraction of the yeast proteome derives from initiation at downstream AUGs, increasing significantly the repertoire of encoded proteins and their potential functions and cellular localizations.

Assuntos

Códon de Iniciação/metabolismo , Proteínas Fúngicas/metabolismo , Mapeamento de Peptídeos/métodos , Proteoma/análise , Saccharomycetales/metabolismo , Acetilação , Algoritmos , Códon de Iniciação/genética , Bases de Dados de Proteínas , Proteínas Fúngicas/genética , Genes Fúngicos , Glutaral/metabolismo , Anotação de Sequência Molecular , Fases de Leitura Aberta , Iniciação Traducional da Cadeia Peptídica , Proteólise , Proteoma/metabolismo , Proteômica/métodos , Ribossomos/metabolismo , Saccharomycetales/genética , Análise de Sequência de Proteína , Espectrometria de Massas em Tandem

8.

On the page number of RNA secondary structures with pseudoknots.

Clote, Peter; Dobrev, Stefan; Dotu, Ivan; Kranakis, Evangelos; Krizanc, Danny; Urrutia, Jorge.

J Math Biol ; 65(6-7): 1337-57, 2012 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-22159642

RESUMO

Let S denote the set of (possibly noncanonical) base pairs {i, j } of an RNA tertiary structure; i.e. {i, j} ∈ S if there is a hydrogen bond between the ith and jth nucleotide. The page number of S, denoted π(S), is the minimum number k such that Scan be decomposed into a disjoint union of k secondary structures. Here, we show that computing the page number is NP-complete; we describe an exact computation of page number, using constraint programming, and determine the page number of a collection of RNA tertiary structures, for which the topological genus is known. We describe an approximation algorithm from which it follows that ω(S) ≤ π(S) ≤ ω(S) ã»log n,where the clique number of S, ω(S), denotes the maximum number of base pairs that pairwise cross each other.

Assuntos

Pareamento de Bases , Modelos Químicos , Conformação de Ácido Nucleico , RNA/química , Ligação de Hidrogênio , Modelos Genéticos , Modelos Moleculares , Termodinâmica

9.

Ecology of speciation in the genus Bacillus.

Connor, Nora; Sikorski, Johannes; Rooney, Alejandro P; Kopac, Sarah; Koeppel, Alexander F; Burger, Andrew; Cole, Scott G; Perry, Elizabeth B; Krizanc, Danny; Field, Nicholas C; Slaton, Michèle; Cohan, Frederick M.

Appl Environ Microbiol ; 76(5): 1349-58, 2010 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-20048064

RESUMO

Microbial ecologists and systematists are challenged to discover the early ecological changes that drive the splitting of one bacterial population into two ecologically distinct populations. We have aimed to identify newly divergent lineages ("ecotypes") bearing the dynamic properties attributed to species, with the rationale that discovering their ecological differences would reveal the ecological dimensions of speciation. To this end, we have sampled bacteria from the Bacillus subtilis-Bacillus licheniformis clade from sites differing in solar exposure and soil texture within a Death Valley canyon. Within this clade, we hypothesized ecotype demarcations based on DNA sequence diversity, through analysis of the clade's evolutionary history by Ecotype Simulation (ES) and AdaptML. Ecotypes so demarcated were found to be significantly different in their associations with solar exposure and soil texture, suggesting that these and covarying environmental parameters are among the dimensions of ecological divergence for newly divergent Bacillus ecotypes. Fatty acid composition appeared to contribute to ecotype differences in temperature adaptation, since those ecotypes with more warm-adapting fatty acids were isolated more frequently from sites with greater solar exposure. The recognized species and subspecies of the B. subtilis-B. licheniformis clade were found to be nearly identical to the ecotypes demarcated by ES, with a few exceptions where a recognized taxon is split at most into three putative ecotypes. Nevertheless, the taxa recognized do not appear to encompass the full ecological diversity of the B. subtilis-B. licheniformis clade: ES and AdaptML identified several newly discovered clades as ecotypes that are distinct from any recognized taxon.

Assuntos

Bacillus/classificação , Bacillus/genética , Biodiversidade , Ecossistema , Microbiologia Ambiental , Bacillus/química , Bacillus/isolamento & purificação , Análise por Conglomerados , DNA Bacteriano/química , DNA Bacteriano/genética , Ácidos Graxos/análise , Especiação Genética , Genótipo , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Homologia de Sequência , Estados Unidos

10.

Asymptotics of canonical and saturated RNA secondary structures.

Clote, Peter; Kranakis, Evangelos; Krizanc, Danny; Salvy, Bruno.

J Bioinform Comput Biol ; 7(5): 869-93, 2009 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-19785050

RESUMO

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 . n(-3/2) . 2.618034(n). In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures - canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 . n(-3/2) . 1.96798(n) and that the expected number of base pairs in a canonical secondary structure is 0.31724 . n. The asymptotic number of canonical secondary structures was obtained much earlier by Hofacker, Schuster and Stadler using a different method. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we show that the asymptotic number of saturated structures is 1.07427 . n(-3/2) . 2.35467(n), the asymptotic expected number of base pairs is 0.337361 . n, and the asymptotic number of saturated stem-loop structures is 0.323954 . 1.69562(n), in contrast to the number 2(n - 2) of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we apply the work of Drmota to show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs is 0.340633 . n.

Assuntos

Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Sequência de Bases , Simulação por Computador , Methanococcaceae/química , Methanococcaceae/genética , Modelos Moleculares , Modelos Estatísticos , Dados de Sequência Molecular , RNA Arqueal/química , RNA Arqueal/genética , RNA Ribossômico 5S/química , RNA Ribossômico 5S/genética , Software , Processos Estocásticos

11.

Identifying the fundamental units of bacterial diversity: a paradigm shift to incorporate ecology into bacterial systematics.

Koeppel, Alexander; Perry, Elizabeth B; Sikorski, Johannes; Krizanc, Danny; Warner, Andrew; Ward, David M; Rooney, Alejandro P; Brambilla, Evelyne; Connor, Nora; Ratcliff, Rodney M; Nevo, Eviatar; Cohan, Frederick M.

Proc Natl Acad Sci U S A ; 105(7): 2504-9, 2008 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-18272490

RESUMO

The central questions of bacterial ecology and evolution require a method to consistently demarcate, from the vast and diverse set of bacterial cells within a natural community, the groups playing ecologically distinct roles (ecotypes). Because of a lack of theory-based guidelines, current methods in bacterial systematics fail to divide the bacterial domain of life into meaningful units of ecology and evolution. We introduce a sequence-based approach ("ecotype simulation") to model the evolutionary dynamics of bacterial populations and to identify ecotypes within a natural community, focusing here on two Bacillus clades surveyed from the "Evolution Canyons" of Israel. This approach has identified multiple ecotypes within traditional species, with each predicted to be an ecologically distinct lineage; many such ecotypes were confirmed to be ecologically distinct, with specialization to different canyon slopes with different solar exposures. Ecotype simulation provides a long-needed natural foundation for microbial ecology and systematics.

Assuntos

Bacillus/classificação , Ecologia , Algoritmos , Simulação por Computador , Poluição Ambiental , Dados de Sequência Molecular , Filogenia

12.

On realizing shapes in the theory of RNA neutral networks.

Clote, Peter; Gasieniec, Leszek; Kolpakov, Roman; Kranakis, Evangelos; Krizanc, Danny.

J Theor Biol ; 236(2): 216-27, 2005 Sep 21.

Artigo em Inglês | MEDLINE | ID: mdl-15878180

RESUMO

It is known (Reidys et al., 1997b. Bull. Math. Biol. 59(2), 339-397) that for any two secondary structures S,S' there exists an RNA sequence compatible with both, and that this result does not extend to more than two secondary structures. Indeed, a simple formula for the number of RNA sequences compatible with secondary structures S,S' plays a role in the algorithms of Flamm et al. (2001. RNA 7, 254-265) and of Abfalter et al. (2003. Proceedings of the German Conference on Bioinformatics, ) to design an RNA switch. Here we show that a natural extension of this problem is NP-complete. Unless P=NP, there is no polynomial time algorithm, which when given secondary structures S1,...,S(k), for k4, determines the least number of positions, such that after removal of all base pairs incident to these positions there exists an RNA nucleotide sequence compatible with the given secondary structures. We also consider a restricted version of this problem with a "fixed maximum" number of possible stars and show that it has a simple polynomial time solution.

Assuntos

Sequência de Bases , Modelos Genéticos , Redes Neurais de Computação , Conformação de Ácido Nucleico , Sítios de Splice de RNA , Splicing de RNA , Algoritmos , Animais , Análise de Sequência de RNA , Trypanosoma/genética

13.

Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency.

Clote, Peter; Ferré, Fabrizio; Kranakis, Evangelos; Krizanc, Danny.

RNA ; 11(5): 578-91, 2005 May.

Artigo em Inglês | MEDLINE | ID: mdl-15840812

RESUMO

We present results of computer experiments that indicate that several RNAs for which the native state (minimum free energy secondary structure) is functionally important (type III hammerhead ribozymes, signal recognition particle RNAs, U2 small nucleolar spliceosomal RNAs, certain riboswitches, etc.) all have lower folding energy than random RNAs of the same length and dinucleotide frequency. Additionally, we find that whole mRNA as well as 5'-UTR, 3'-UTR, and cds regions of mRNA have folding energies comparable to that of random RNA, although there may be a statistically insignificant trace signal in 3'-UTR and cds regions. Various authors have used nucleotide (approximate) pattern matching and the computation of minimum free energy as filters to detect potential RNAs in ESTs and genomes. We introduce a new concept of the asymptotic Z-score and describe a fast, whole-genome scanning algorithm to compute asymptotic minimum free energy Z-scores of moving-window contents. Asymptotic Z-score computations offer another filter, to be used along with nucleotide pattern matching and minimum free energy computations, to detect potential functional RNAs in ESTs and genomic regions.

Assuntos

Conformação de Ácido Nucleico , Nucleotídeos/análise , RNA/química , RNA/genética , Regiões 3' não Traduzidas/química , Regiões 3' não Traduzidas/genética , Regiões 3' não Traduzidas/metabolismo , Regiões 5' não Traduzidas/química , Regiões 5' não Traduzidas/genética , Regiões 5' não Traduzidas/metabolismo , Algoritmos , Composição de Bases , Sequência de Bases , Biologia Computacional , Simulação por Computador , Etiquetas de Sequências Expressas , Cadeias de Markov , Nucleotídeos/química , Nucleotídeos/genética , Nucleotídeos/metabolismo , RNA/metabolismo , Termodinâmica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA