Pesquisa | Biblioteca Virtual em Saúde

Prediction of Signal Peptides in Proteins from Malaria Parasites.

Burdukiewicz, Michal; Sobczyk, Piotr; Chilimoniuk, Jaroslaw; Gagat, Przemyslaw; Mackiewicz, Pawel.

Int J Mol Sci ; 19(12)2018 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-30469512

RESUMO

Signal peptides are N-terminal presequences responsible for targeting proteins to the endomembrane system, and subsequent subcellular or extracellular compartments, and consequently condition their proper function. The significance of signal peptides stimulates development of new computational methods for their detection. These methods employ learning systems trained on datasets comprising signal peptides from different types of proteins and taxonomic groups. As a result, the accuracy of predictions are high in the case of signal peptides that are well-represented in databases, but might be low in other, atypical cases. Such atypical signal peptides are present in proteins found in apicomplexan parasites, causative agents of malaria and toxoplasmosis. Apicomplexan proteins have a unique amino acid composition due to their AT-biased genomes. Therefore, we designed a new, more flexible and universal probabilistic model for recognition of atypical eukaryotic signal peptides. Our approach called signalHsmm includes knowledge about the structure of signal peptides and physicochemical properties of amino acids. It is able to recognize signal peptides from the malaria parasites and related species more accurately than popular programs. Moreover, it is still universal enough to provide prediction of other signal peptides on par with the best preforming predictors.

Assuntos

Plasmodium/química , Sinais Direcionadores de Proteínas , Proteínas de Protozoários/química , Análise de Sequência de Proteína/métodos , Aminoácidos/química , Cadeias de Markov , Análise de Sequência de Proteína/normas

Amyloidogenic motifs revealed by n-gram analysis.

Burdukiewicz, Michal; Sobczyk, Piotr; Rödiger, Stefan; Duda-Madej, Anna; Mackiewicz, Pawel; Kotulska, Malgorzata.

Sci Rep ; 7(1): 12961, 2017 10 11.

Artigo em Inglês | MEDLINE | ID: mdl-29021608

RESUMO

Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form ß-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis .

Assuntos

Amiloide/química , Sequência de Aminoácidos , Área Sob a Curva , Peptídeos/química , Reprodutibilidade dos Testes , Software

Controlling the Rate of GWAS False Discoveries.

Brzyski, Damian; Peterson, Christine B; Sobczyk, Piotr; Candès, Emmanuel J; Bogdan, Malgorzata; Sabatti, Chiara.

Genetics ; 205(1): 61-75, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-27784720

RESUMO

With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR-controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR-controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on prescreening to identify the level of resolution of distinct hypotheses. We show how FDR-controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the North Finland Birth Cohort 66 cohort study.

Assuntos

Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Estudos de Coortes , Reações Falso-Positivas , Predisposição Genética para Doença , Genoma Humano , Genômica/métodos , Humanos , Modelos Lineares , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Valor Preditivo dos Testes

Methods for comparing multiple digital PCR experiments.

Burdukiewicz, Michal; Rödiger, Stefan; Sobczyk, Piotr; Menschikowski, Mario; Schierack, Peter; Mackiewicz, Pawel.

Biomol Detect Quantif ; 9: 14-9, 2016 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-27551672

RESUMO

The estimated mean copy per partition (λ) is the essential information from a digital PCR (dPCR) experiment because λ can be used to calculate the target concentration in a sample. However, little information is available how to statistically compare dPCR runs of multiple runs or reduplicates. The comparison of λ values from several runs is a multiple comparison problem, which can be solved using the binary structure of dPCR data. We propose and evaluate two novel methods based on Generalized Linear Models (GLM) and Multiple Ratio Tests (MRT) for comparison of digital PCR experiments. We enriched our MRT framework with computation of simultaneous confidence intervals suitable for comparing multiple dPCR runs. The evaluation of both statistical methods support that MRT is faster and more robust for dPCR experiments performed in large scale. Our theoretical results were confirmed by the analysis of dPCR measurements of dilution series. Both methods were implemented in the dpcR package (v. 0.2) for the open source R statistical computing environment.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA