Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Int J Mol Sci ; 19(12)2018 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-30469512

RESUMEN

Signal peptides are N-terminal presequences responsible for targeting proteins to the endomembrane system, and subsequent subcellular or extracellular compartments, and consequently condition their proper function. The significance of signal peptides stimulates development of new computational methods for their detection. These methods employ learning systems trained on datasets comprising signal peptides from different types of proteins and taxonomic groups. As a result, the accuracy of predictions are high in the case of signal peptides that are well-represented in databases, but might be low in other, atypical cases. Such atypical signal peptides are present in proteins found in apicomplexan parasites, causative agents of malaria and toxoplasmosis. Apicomplexan proteins have a unique amino acid composition due to their AT-biased genomes. Therefore, we designed a new, more flexible and universal probabilistic model for recognition of atypical eukaryotic signal peptides. Our approach called signalHsmm includes knowledge about the structure of signal peptides and physicochemical properties of amino acids. It is able to recognize signal peptides from the malaria parasites and related species more accurately than popular programs. Moreover, it is still universal enough to provide prediction of other signal peptides on par with the best preforming predictors.


Asunto(s)
Plasmodium/química , Señales de Clasificación de Proteína , Proteínas Protozoarias/química , Análisis de Secuencia de Proteína/métodos , Aminoácidos/química , Cadenas de Markov , Análisis de Secuencia de Proteína/normas
2.
Sci Rep ; 7(1): 12961, 2017 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-29021608

RESUMEN

Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form ß-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis .


Asunto(s)
Amiloide/química , Secuencia de Aminoácidos , Área Bajo la Curva , Péptidos/química , Reproducibilidad de los Resultados , Programas Informáticos
3.
Genetics ; 205(1): 61-75, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27784720

RESUMEN

With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR-controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR-controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on prescreening to identify the level of resolution of distinct hypotheses. We show how FDR-controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the North Finland Birth Cohort 66 cohort study.


Asunto(s)
Estudios de Asociación Genética/métodos , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Estudios de Cohortes , Reacciones Falso Positivas , Predisposición Genética a la Enfermedad , Genoma Humano , Genómica/métodos , Humanos , Modelos Lineales , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Valor Predictivo de las Pruebas
4.
Biomol Detect Quantif ; 9: 14-9, 2016 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-27551672

RESUMEN

The estimated mean copy per partition (λ) is the essential information from a digital PCR (dPCR) experiment because λ can be used to calculate the target concentration in a sample. However, little information is available how to statistically compare dPCR runs of multiple runs or reduplicates. The comparison of λ values from several runs is a multiple comparison problem, which can be solved using the binary structure of dPCR data. We propose and evaluate two novel methods based on Generalized Linear Models (GLM) and Multiple Ratio Tests (MRT) for comparison of digital PCR experiments. We enriched our MRT framework with computation of simultaneous confidence intervals suitable for comparing multiple dPCR runs. The evaluation of both statistical methods support that MRT is faster and more robust for dPCR experiments performed in large scale. Our theoretical results were confirmed by the analysis of dPCR measurements of dilution series. Both methods were implemented in the dpcR package (v. 0.2) for the open source R statistical computing environment.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...