Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioinformatics ; 33(1): 135-136, 2017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-27605098

RESUMEN

DAPAR and ProStaR are software tools to perform the statistical analysis of label-free XIC-based quantitative discovery proteomics experiments. DAPAR contains procedures to filter, normalize, impute missing value, aggregate peptide intensities, perform null hypothesis significance tests and select the most likely differentially abundant proteins with a corresponding false discovery rate. ProStaR is a graphical user interface that allows friendly access to the DAPAR functionalities through a web browser. AVAILABILITY AND IMPLEMENTATION: DAPAR and ProStaR are implemented in the R language and are available on the website of the Bioconductor project (http://www.bioconductor.org/). A complete tutorial and a toy dataset are accompanying the packages. CONTACT: samuel.wieczorek@cea.fr, florence.combes@cea.fr, thomas.burger@cea.fr.


Asunto(s)
Péptidos/química , Proteínas/química , Proteómica/métodos , Programas Informáticos
2.
J Proteome Res ; 15(4): 1116-25, 2016 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-26906401

RESUMEN

Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.


Asunto(s)
Adenocarcinoma/química , Carcinoma de Pulmón de Células no Pequeñas/química , Neoplasias Pulmonares/química , Proteínas de Neoplasias/análisis , Péptidos/análisis , Proteómica/estadística & datos numéricos , Adenocarcinoma/diagnóstico , Adenocarcinoma/metabolismo , Algoritmos , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico , Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Simulación por Computador , Interpretación Estadística de Datos , Conjuntos de Datos como Asunto , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Espectrometría de Masas/estadística & datos numéricos , Proteínas de Neoplasias/metabolismo , Péptidos/metabolismo
3.
Mol Cell Proteomics ; 13(8): 2147-67, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24872594

RESUMEN

Photosynthesis has shaped atmospheric and ocean chemistries and probably changed the climate as well, as oxygen is released from water as part of the photosynthetic process. In photosynthetic eukaryotes, this process occurs in the chloroplast, an organelle containing the most abundant biological membrane, the thylakoids. The thylakoids of plants and some green algae are structurally inhomogeneous, consisting of two main domains: the grana, which are piles of membranes gathered by stacking forces, and the stroma-lamellae, which are unstacked thylakoids connecting the grana. The major photosynthetic complexes are unevenly distributed within these compartments because of steric and electrostatic constraints. Although proteomic analysis of thylakoids has been instrumental to define its protein components, no extensive proteomic study of subthylakoid localization of proteins in the BBY (grana) and the stroma-lamellae fractions has been achieved so far. To fill this gap, we performed a complete survey of the protein composition of these thylakoid subcompartments using thylakoid membrane fractionations. We employed semiquantitative proteomics coupled with a data analysis pipeline and manual annotation to differentiate genuine BBY and stroma-lamellae proteins from possible contaminants. About 300 thylakoid (or potentially thylakoid) proteins were shown to be enriched in either the BBY or the stroma-lamellae fractions. Overall, present findings corroborate previous observations obtained for photosynthetic proteins that used nonproteomic approaches. The originality of the present proteomic relies in the identification of photosynthetic proteins whose differential distribution in the thylakoid subcompartments might explain already observed phenomenon such as LHCII docking. Besides, from the present localization results we can suggest new molecular actors for photosynthesis-linked activities. For instance, most PsbP-like subunits being differently localized in stroma-lamellae, these proteins could be linked to the PSI-NDH complex in the context of cyclic electron flow around PSI. In addition, we could identify about a hundred new likely minor thylakoid (or chloroplast) proteins, some of them being potential regulators of the chloroplast physiology.


Asunto(s)
Arabidopsis/metabolismo , Espectrometría de Masas/métodos , Tilacoides/metabolismo , Fotosíntesis , Proteínas de Plantas/aislamiento & purificación , Proteómica/métodos
4.
Brief Bioinform ; 14(4): 469-90, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22851511

RESUMEN

Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.


Asunto(s)
Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , Transcriptoma , Simulación por Computador , Bases de Datos Genéticas , Expresión Génica , Variación Genética , Genoma
5.
Mol Divers ; 18(3): 637-54, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24671521

RESUMEN

Antibiotic resistance has increased over the past two decades. New approaches for the discovery of novel antibacterials are required and innovative strategies will be necessary to identify novel and effective candidates. Related to this problem, the exploration of bacterial targets that remain unexploited by the current antibiotics in clinical use is required. One of such targets is the ß-ketoacyl-acyl carrier protein synthase III (FabH). Here, we report a ligand-based modeling methodology for the virtual-screening of large collections of chemical compounds in the search of potential FabH inhibitors. QSAR models are developed for a diverse dataset of 296 FabH inhibitors using an in-house modeling framework. All models showed high fitting, robustness, and generalization capabilities. We further investigated the performance of the developed models in a virtual screening scenario. To carry out this investigation, we implemented a desirability-based algorithm for decoys selection that was shown effective in the selection of high quality decoys sets. Once the QSAR models were validated in the context of a virtual screening experiment their limitations arise. For this reason, we explored the potential of ensemble modeling to overcome the limitations associated to the use of single classifiers. Through a detailed evaluation of the virtual screening performance of ensemble models it was evidenced, for the first time to our knowledge, the benefits of this approach in a virtual screening scenario. From all the obtained results, we could arrive to a significant main conclusion: at least for FabH inhibitors, virtual screening performance is not guaranteed by predictive QSAR models.


Asunto(s)
3-Oxoacil-(Proteína Transportadora de Acil) Sintasa/antagonistas & inhibidores , Evaluación Preclínica de Medicamentos/métodos , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/farmacología , Relación Estructura-Actividad Cuantitativa , Interfaz Usuario-Computador , Escherichia coli/enzimología , Ligandos , Modelos Moleculares
6.
BMC Bioinformatics ; 13: 335, 2012 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-23259851

RESUMEN

BACKGROUND: With an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck. RESULTS: We present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well. CONCLUSIONS: By providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Programas Informáticos , Acceso a la Información , Humanos
7.
Bioinformatics ; 27(22): 3204-5, 2011 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-21937664

RESUMEN

Microarray technology has become an integral part of biomedical research and increasing amounts of datasets become available through public repositories. However, re-use of these datasets is severely hindered by unstructured, missing or incorrect biological samples information; as well as the wide variety of preprocessing methods in use. The inSilicoDb R/Bioconductor package is a command-line front-end to the InSilico DB, a web-based database currently containing 86 104 expert-curated human Affymetrix expression profiles compiled from 1937 GEO repository series. The use of this package builds on the Bioconductor project's focus on reproducibility by enabling a clear workflow in which not only analysis, but also the retrieval of verified data is supported.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Bases de Datos Genéticas , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos
8.
J Chem Inf Model ; 52(9): 2366-86, 2012 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-22856471

RESUMEN

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.


Asunto(s)
Algoritmos , Automatización , Diseño de Fármacos , Relación Estructura-Actividad Cuantitativa , Ligandos , Modelos Teóricos
9.
ISRN Bioinform ; 2014: 345106, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25937953

RESUMEN

An increasing amount of microarray gene expression data sets is available through public repositories. Their huge potential in making new findings is yet to be unlocked by making them available for large-scale analysis. In order to do so it is essential that independent studies designed for similar biological problems can be integrated, so that new insights can be obtained. These insights would remain undiscovered when analyzing the individual data sets because it is well known that the small number of biological samples used per experiment is a bottleneck in genomic analysis. By increasing the number of samples the statistical power is increased and more general and reliable conclusions can be drawn. In this work, two different approaches for conducting large-scale analysis of microarray gene expression data-meta-analysis and data merging-are compared in the context of the identification of cancer-related biomarkers, by analyzing six independent lung cancer studies. Within this study, we investigate the hypothesis that analyzing large cohorts of samples resulting in merging independent data sets designed to study the same biological problem results in lower false discovery rates than analyzing the same data sets within a more conservative meta-analysis approach.

10.
Artículo en Inglés | MEDLINE | ID: mdl-23929862

RESUMEN

The potential of microarray gene expression (MAGE) data is only partially explored due to the limited number of samples in individual studies. This limitation can be surmounted by merging or integrating data sets originating from independent MAGE experiments, which are designed to study the same biological problem. However, this process is hindered by batch effects that are study-dependent and result in random data distortion; therefore numerical transformations are needed to render the integration of different data sets accurate and meaningful. Our contribution in this paper is two-fold. First we propose GENESHIFT, a new nonparametric batch effect removal method based on two key elements from statistics: empirical density estimation and the inner product as a distance measure between two probability density functions; second we introduce a new validation index of batch effect removal methods based on the observation that samples from two independent studies drawn from a same population should exhibit similar probability density functions. We evaluated and compared the GENESHIFT method with four other state-of-the-art methods for batch effect removal: Batch-mean centering, empirical Bayes or COMBAT, distance-weighted discrimination, and cross-platform normalization. Several validation indices providing complementary information about the efficiency of batch effect removal methods have been employed in our validation framework. The results show that none of the methods clearly outperforms the others. More than that, most of the methods used for comparison perform very well with respect to some validation indices while performing very poor with respect to others. GENESHIFT exhibits robust performances and its average rank is the highest among the average ranks of all methods used for comparison.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Programas Informáticos , Simulación por Computador , Bases de Datos Genéticas , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados , Estadísticas no Paramétricas , Análisis de Matrices Tisulares
11.
Artículo en Inglés | MEDLINE | ID: mdl-22350210

RESUMEN

A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Varianza , Teorema de Bayes , Marcadores Genéticos , Teoría de la Información , Curva ROC , Estadísticas no Paramétricas
12.
Genome Biol ; 13(11): R104, 2012 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-23158523

RESUMEN

Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.


Asunto(s)
Genómica/métodos , Neoplasias/genética , Programas Informáticos , Bases de Datos Genéticas , Genoma , Humanos , Navegador Web
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA