Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Int J Mol Sci ; 18(2)2017 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-28157153

RESUMEN

Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.


Asunto(s)
Antígenos Bacterianos/inmunología , Vacunas Bacterianas/inmunología , Biología Computacional/métodos , Aprendizaje Automático , Vacunas de Subunidad/inmunología , Antígenos Bacterianos/genética , Área Bajo la Curva , Proteínas Bacterianas/genética , Proteínas Bacterianas/inmunología , Vacunas Bacterianas/genética , Mapeo Epitopo , Epítopos/genética , Epítopos/inmunología , Humanos , Mutagénesis , Curva ROC , Máquina de Vectores de Soporte , Vacunas de Subunidad/genética
2.
Bioinformatics ; 31(15): 2530-6, 2015 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-25819671

RESUMEN

BACKGROUND: In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation. RESULTS: Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.


Asunto(s)
Biología Computacional/métodos , Proteoma/metabolismo , ARN Mensajero/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Transcriptoma , Algoritmos , Biomarcadores/análisis , Regulación Fúngica de la Expresión Génica , Genoma Fúngico , Saccharomyces cerevisiae/metabolismo
3.
Bioinformatics ; 29(23): 3060-6, 2013 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-24045772

RESUMEN

MOTIVATION: Despite much dynamical cellular behaviour being achieved by accurate regulation of protein concentrations, messenger RNA abundances, measured by microarray technology, and more recently by deep sequencing techniques, are widely used as proxies for protein measurements. Although for some species and under some conditions, there is good correlation between transcriptome and proteome level measurements, such correlation is by no means universal due to post-transcriptional and post-translational regulation, both of which are highly prevalent in cells. Here, we seek to develop a data-driven machine learning approach to bridging the gap between these two levels of high-throughput omic measurements on Saccharomyces cerevisiae and deploy the model in a novel way to uncover mRNA-protein pairs that are candidates for post-translational regulation. RESULTS: The application of feature selection by sparsity inducing regression (l1 norm regularization) leads to a stable set of features: i.e. mRNA, ribosomal occupancy, ribosome density, tRNA adaptation index and codon bias while achieving a feature reduction from 37 to 5. A linear predictor used with these features is capable of predicting protein concentrations fairly accurately (R² = 0.86). Proteins whose concentration cannot be predicted accurately, taken as outliers with respect to the predictor, are shown to have annotation evidence of post-translational modification, significantly more than random subsets of similar size P < 0.02. In a data mining sense, this work also shows a wider point that outliers with respect to a learning method can carry meaningful information about a problem domain.


Asunto(s)
Biología Computacional/métodos , Regulación Fúngica de la Expresión Génica , Procesamiento Proteico-Postraduccional , Proteoma/análisis , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transcriptoma , Inteligencia Artificial , Codón/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , Ribosomas/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...