Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 38(23): 5214-5221, 2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36264124

RESUMEN

MOTIVATION: The standard approach for statistical inference in differential expression (DE) analyses is to control the false discovery rate (FDR). However, controlling the FDR does not in fact imply that the proportion of false discoveries is upper bounded. Moreover, no statistical guarantee can be given on subsets of genes selected by FDR thresholding. These known limitations are overcome by post hoc inference, which provides guarantees of the number of proportion of false discoveries among arbitrary gene selections. However, post hoc inference methods are not yet widely used for DE studies. RESULTS: In this article, we demonstrate the relevance and illustrate the performance of adaptive interpolation-based post hoc methods for two-group DE studies. First, we formalize the use of permutation-based methods to obtain sharp confidence bounds that are adaptive to the dependence between genes. Then, we introduce a generic linear time algorithm for computing post hoc bounds, making these bounds applicable to large-scale two-group DE studies. The use of the resulting Adaptive Simes bound is illustrated on a RNA sequencing study. Comprehensive numerical experiments based on real microarray and RNA sequencing data demonstrate the statistical performance of the method. AVAILABILITY AND IMPLEMENTATION: A cross-platform open source implementation within the R package sanssouci is available at https://sanssouci-org.github.io/sanssouci/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , ARN , Análisis de Secuencia de ARN/métodos , ARN/genética , Perfilación de la Expresión Génica/métodos
2.
J Struct Biol ; 214(4): 107907, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36272694

RESUMEN

Backbone dihedral angles ϕ and ψ are the main structural descriptors of proteins and peptides. The distribution of these angles has been investigated over decades as they are essential for the validation and refinement of experimental measurements, as well as for structure prediction and design methods. The dependence of these distributions, not only on the nature of each amino acid but also on that of the closest neighbors, has been the subject of numerous studies. Although neighbor-dependent distributions are nowadays generally accepted as a good model, there is still some controversy about the combined effects of left and right neighbors. We have investigated this question using rigorous methods based on recently-developed statistical techniques. Our results unambiguously demonstrate that the influence of left and right neighbors cannot be considered independently. Consequently, three-residue fragments should be considered as the minimal building blocks to investigate polypeptide sequence-structure relationships.


Asunto(s)
Péptidos
3.
Neuroimage ; 260: 119492, 2022 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-35870698

RESUMEN

Cluster-level inference procedures are widely used for brain mapping. These methods compare the size of clusters obtained by thresholding brain maps to an upper bound under the global null hypothesis, computed using Random Field Theory or permutations. However, the guarantees obtained by this type of inference - i.e. at least one voxel is truly activated in the cluster - are not informative with regards to the strength of the signal therein. There is thus a need for methods to assess the amount of signal within clusters; yet such methods have to take into account that clusters are defined based on the data, which creates circularity in the inference scheme. This has motivated the use of post hoc estimates that allow statistically valid estimation of the proportion of activated voxels in clusters. In the context of fMRI data, the All-Resolutions Inference framework introduced in Rosenblatt et al. (2018) provides post hoc estimates of the proportion of activated voxels. However, this method relies on parametric threshold families, which results in conservative inference. In this paper, we leverage randomization methods to adapt to data characteristics and obtain tighter false discovery control. We obtain Notip, for Non-parametric True Discovery Proportion control: a powerful, non-parametric method that yields statistically valid guarantees on the proportion of activated voxels in data-derived clusters. Numerical experiments demonstrate substantial gains in number of detections compared with state-of-the-art methods on 36 fMRI datasets. The conditions under which the proposed method brings benefits are also discussed.


Asunto(s)
Mapeo Encefálico , Encéfalo , Encéfalo/diagnóstico por imagen , Mapeo Encefálico/métodos , Humanos , Imagen por Resonancia Magnética/métodos
4.
Brief Bioinform ; 16(4): 600-15, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25202135

RESUMEN

A number of bioinformatic or biostatistical methods are available for analyzing DNA copy number profiles measured from microarray or sequencing technologies. In the absence of rich enough gold standard data sets, the performance of these methods is generally assessed using unrealistic simulation studies, or based on small real data analyses. To make an objective and reproducible performance assessment, we have designed and implemented a framework to generate realistic DNA copy number profiles of cancer samples with known truth. These profiles are generated by resampling publicly available SNP microarray data from genomic regions with known copy-number state. The original data have been extracted from dilutions series of tumor cell lines with matched blood samples at several concentrations. Therefore, the signal-to-noise ratio of the generated profiles can be controlled through the (known) percentage of tumor cells in the sample. This article describes this framework and its application to a comparison study between methods for segmenting DNA copy number profiles from SNP microarrays. This study indicates that no single method is uniformly better than all others. It also helps identifying pros and cons of the compared methods as a function of biologically informative parameters, such as the fraction of tumor cells in the sample and the proportion of heterozygous markers. This comparison study may be reproduced using the open source and cross-platform R package jointseg, which implements the proposed data generation and evaluation framework: http://r-forge.r-project.org/R/?group_id=1562.


Asunto(s)
Variaciones en el Número de Copia de ADN , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple
5.
Bioinformatics ; 31(18): 3054-6, 2015 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-26002884

RESUMEN

UNLABELLED: We describe the implementation of the method introduced by Chambaz et al. in 2012. We also demonstrate its genome-wide application to the integrative search of new regions with strong association between DNA copy number and gene expression accounting for DNA methylation in breast cancers. AVAILABILITY AND IMPLEMENTATION: An open-source R package tmle.npvi is available from CRAN (http://cran.r-project.org/). CONTACT: pierre.neuvial@genopole.cnrs.fr.


Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN , Metilación de ADN , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Programas Informáticos , Algoritmos , Femenino , Perfilación de la Expresión Génica , Humanos
6.
BMC Bioinformatics ; 16: 148, 2015 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-25951947

RESUMEN

BACKGROUND: Genome-wide association studies (GWAS) aim at finding genetic markers that are significantly associated with a phenotype of interest. Single nucleotide polymorphism (SNP) data from the entire genome are collected for many thousands of SNP markers, leading to high-dimensional regression problems where the number of predictors greatly exceeds the number of observations. Moreover, these predictors are statistically dependent, in particular due to linkage disequilibrium (LD). We propose a three-step approach that explicitly takes advantage of the grouping structure induced by LD in order to identify common variants which may have been missed by single marker analyses (SMA). In the first step, we perform a hierarchical clustering of SNPs with an adjacency constraint using LD as a similarity measure. In the second step, we apply a model selection approach to the obtained hierarchy in order to define LD blocks. Finally, we perform Group Lasso regression on the inferred LD blocks. We investigate the efficiency of this approach compared to state-of-the art regression methods: haplotype association tests, SMA, and Lasso and Elastic-Net regressions. RESULTS: Our results on simulated data show that the proposed method performs better than state-of-the-art approaches as soon as the number of causal SNPs within an LD block exceeds 2. Our results on semi-simulated data and a previously published HIV data set illustrate the relevance of the proposed method and its robustness to a real LD structure. The method is implemented in the R package BALD (Blockwise Approach using Linkage Disequilibrium), available from http://www.math-evry.cnrs.fr/publications/logiciels . CONCLUSIONS: Our results show that the proposed method is efficient not only at the level of LD blocks by inferring well the underlying block structure but also at the level of individual SNPs. Thus, this study demonstrates the importance of tailored integration of biological knowledge in high-dimensional genomic studies such as GWAS.


Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo/métodos , Haplotipos/genética , Desequilibrio de Ligamiento , Modelos Teóricos , Polimorfismo de Nucleótido Simple/genética , Marcadores Genéticos/genética , Humanos
7.
Proc Natl Acad Sci U S A ; 109(8): 2724-9, 2012 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-22003129

RESUMEN

Breast cancers are comprised of molecularly distinct subtypes that may respond differently to pathway-targeted therapies now under development. Collections of breast cancer cell lines mirror many of the molecular subtypes and pathways found in tumors, suggesting that treatment of cell lines with candidate therapeutic compounds can guide identification of associations between molecular subtypes, pathways, and drug response. In a test of 77 therapeutic compounds, nearly all drugs showed differential responses across these cell lines, and approximately one third showed subtype-, pathway-, and/or genomic aberration-specific responses. These observations suggest mechanisms of response and resistance and may inform efforts to develop molecular assays that predict clinical response.


Asunto(s)
Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/tratamiento farmacológico , Transducción de Señal/efectos de los fármacos , Neoplasias de la Mama/genética , Línea Celular Tumoral , Ensayos de Selección de Medicamentos Antitumorales , Femenino , Dosificación de Gen/genética , Humanos , Modelos Biológicos , Transducción de Señal/genética , Transcripción Genética/efectos de los fármacos
8.
Bioinformatics ; 28(13): 1793-4, 2012 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-22576175

RESUMEN

SUMMARY: CalMaTe calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina. AVAILABILITY: The method is available on CRAN (http://cran.r-project.org/) in the open-source R package calmate, which also includes an add-on to the Aroma Project framework (http://www.aroma-project.org/).


Asunto(s)
Alelos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Humanos , Neoplasias/genética
9.
J Mol Biol ; 435(14): 168053, 2023 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-36934808

RESUMEN

The structural investigation of intrinsically disordered proteins (IDPs) requires ensemble models describing the diversity of the conformational states of the molecule. Due to their probabilistic nature, there is a need for new paradigms that understand and treat IDPs from a purely statistical point of view, considering their conformational ensembles as well-defined probability distributions. In this work, we define a conformational ensemble as an ordered set of probability distributions and provide a suitable metric to detect differences between two given ensembles at the residue level, both locally and globally. The underlying geometry of the conformational space is properly integrated, one ensemble being characterized by a set of probability distributions supported on the three-dimensional Euclidean space (for global-scale comparisons) and on the two-dimensional flat torus (for local-scale comparisons). The inherent uncertainty of the data is also taken into account to provide finer estimations of the differences between ensembles. Additionally, an overall distance between ensembles is defined from the differences at the residue level. We illustrate the potential of the approach with several examples of applications for the comparison of conformational ensembles: (i) produced from molecular dynamics (MD) simulations using different force fields, and (ii) before and after refinement with experimental data. We also show the usefulness of the method to assess the convergence of MD simulations, and discuss other potential applications such as in machine-learning-based approaches. The numerical tool has been implemented in Python through easy-to-use Jupyter Notebooks available at https://gitlab.laas.fr/moma/WASCO.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Proteínas Intrínsecamente Desordenadas/química , Conformación Proteica , Simulación de Dinámica Molecular , Probabilidad , Aprendizaje Automático
10.
Bioinformatics ; 27(15): 2038-46, 2011 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-21666266

RESUMEN

MOTIVATION: High-throughput techniques facilitate the simultaneous measurement of DNA copy number at hundreds of thousands of sites on a genome. Older techniques allow measurement only of total copy number, the sum of the copy number contributions from the two parental chromosomes. Newer single nucleotide polymorphism (SNP) techniques can in addition enable quantifying parent-specific copy number (PSCN). The raw data from such experiments are two-dimensional, but are unphased. Consequently, inference based on them necessitates development of new analytic methods. METHODS: We have adapted and enhanced the circular binary segmentation (CBS) algorithm for this purpose with focus on paired test and reference samples. The essence of paired parent-specific CBS (Paired PSCBS) is to utilize the original CBS algorithm to identify regions of equal total copy number and then to further segment these regions where there have been changes in PSCN. For the final set of regions, calls are made of equal parental copy number and loss of heterozygosity (LOH). PSCN estimates are computed both before and after calling. RESULTS: The methodology is evaluated by simulation and on glioblastoma data. In the simulation, PSCBS compares favorably to established methods. On the glioblastoma data, PSCBS identifies interesting genomic regions, such as copy-neutral LOH. AVAILABILITY: The Paired PSCBS method is implemented in an open-source R package named PSCBS, available on CRAN (http://cran.r-project.org/).


Asunto(s)
Algoritmos , Dosificación de Gen , Glioblastoma/genética , Análisis de Secuencia de ADN/métodos , Alelos , Simulación por Computador , Frecuencia de los Genes , Humanos , Pérdida de Heterocigocidad , Polimorfismo de Nucleótido Simple , Programas Informáticos
11.
J Bioinform Comput Biol ; 19(1): 2140003, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33653235

RESUMEN

In many cancers, mechanisms of gene regulation can be severely altered. Identification of deregulated genes, which do not follow the regulation processes that exist between transcription factors and their target genes, is of importance to better understand the development of the disease. We propose a methodology to detect deregulation mechanisms with a particular focus on cancer subtypes. This strategy is based on the comparison between tumoral and healthy cells. First, we use gene expression data from healthy cells to infer a reference gene regulatory network. Then, we compare it with gene expression levels in tumor samples to detect deregulated target genes. We finally measure the ability of each transcription factor to explain these deregulations. We apply our method on a public bladder cancer data set derived from The Cancer Genome Atlas project and confirm that it captures hallmarks of cancer subtypes. We also show that it enables the discovery of new potential biomarkers.


Asunto(s)
Algoritmos , Regulación Neoplásica de la Expresión Génica , Modelos Genéticos , Neoplasias/genética , Neoplasias/patología , Redes Reguladoras de Genes , Humanos , Factores de Transcripción/genética , Neoplasias de la Vejiga Urinaria/genética
12.
BMC Bioinformatics ; 11: 245, 2010 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-20462408

RESUMEN

BACKGROUND: High-throughput genotyping microarrays assess both total DNA copy number and allelic composition, which makes them a tool of choice for copy number studies in cancer, including total copy number and loss of heterozygosity (LOH) analyses. Even after state of the art preprocessing methods, allelic signal estimates from genotyping arrays still suffer from systematic effects that make them difficult to use effectively for such downstream analyses. RESULTS: We propose a method, TumorBoost, for normalizing allelic estimates of one tumor sample based on estimates from a single matched normal. The method applies to any paired tumor-normal estimates from any microarray-based technology, combined with any preprocessing method. We demonstrate that it increases the signal-to-noise ratio of allelic signals, making it significantly easier to detect allelic imbalances. CONCLUSIONS: TumorBoost increases the power to detect somatic copy-number events (including copy-neutral LOH) in the tumor from allelic signals of Affymetrix or Illumina origin. We also conclude that high-precision allelic estimates can be obtained from a single pair of tumor-normal hybridizations, if TumorBoost is combined with single-array preprocessing methods such as (allele-specific) CRMA v2 for Affymetrix or BeadStudio's (proprietary) XY-normalization method for Illumina. A bounded-memory implementation is available in the open-source and cross-platform R package aroma.cn, which is part of the Aroma Project (http://www.aroma-project.org/).


Asunto(s)
Alelos , Dosificación de Gen/genética , Genómica/métodos , Genotipo , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Perfilación de la Expresión Génica/métodos
13.
NAR Genom Bioinform ; 2(2): lqaa025, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33575582

RESUMEN

The development of single-cell transcriptomic technologies yields large datasets comprising multimodal informations, such as transcriptomes and immunophenotypes. Despite the current explosion of methods for pre-processing and integrating multimodal single-cell data, there is currently no user-friendly software to display easily and simultaneously both immunophenotype and transcriptome-based UMAP/t-SNE plots from the pre-processed data. Here, we introduce Single-Cell Virtual Cytometer, an open-source software for flow cytometry-like visualization and exploration of pre-processed multi-omics single cell datasets. Using an original CITE-seq dataset of PBMC from an healthy donor, we illustrate its use for the integrated analysis of transcriptomes and epitopes of functional maturation in human peripheral T lymphocytes. So this free and open-source algorithm constitutes a unique resource for biologists seeking for a user-friendly analytic tool for multimodal single cell datasets.

14.
Algorithms Mol Biol ; 14: 22, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31807137

RESUMEN

BACKGROUND: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of 10 4 to 10 5 for each chromosome. RESULTS: By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds. AVAILABILITY AND IMPLEMENTATION: Software and sample data are available as an R package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

15.
Bioinformatics ; 23(18): 2407-14, 2007 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-17720703

RESUMEN

MOTIVATION: One of the most challenging tasks in the post-genomic era is the reconstruction of transcriptional regulation networks. The goal is to identify, for each gene expressed in a particular cellular context, the regulators affecting its transcription, and the co-ordination of several regulators in specific types of regulation. DNA microarrays can be used to investigate relationships between regulators and their target genes, through simultaneous observations of their RNA levels. RESULTS: We propose a data mining system for inferring transcriptional regulation relationships from RNA expression values. This system is particularly suitable for the detection of cooperative transcriptional regulation. We model regulatory relationships as labelled two-layer gene regulatory networks, and describe a method for the efficient learning of these bipartite networks from discretized expression data sets. We also evaluate the statistical significance of such inferred networks and validate our methods on two public yeast expression data sets. AVAILABILITY: http://www.lri.fr/~elati/licorn.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Inteligencia Artificial , Bases de Datos de Proteínas , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica/fisiología , Almacenamiento y Recuperación de la Información/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Algoritmos , Simulación por Computador , Modelos Biológicos , Proteoma/genética , ARN/metabolismo
16.
Nucleic Acids Res ; 34(Web Server issue): W477-81, 2006 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-16845053

RESUMEN

Assessing variations in DNA copy number is crucial for understanding constitutional or somatic diseases, particularly cancers. The recently developed array-CGH (comparative genomic hybridization) technology allows this to be investigated at the genomic level. We report the availability of a web tool for analysing array-CGH data. CAPweb (CGH array Analysis Platform on the Web) is intended as a user-friendly tool enabling biologists to completely analyse CGH arrays from the raw data to the visualization and biological interpretation. The user typically performs the following bioinformatics steps of a CGH array project within CAPweb: the secure upload of the results of CGH array image analysis and of the array annotation (genomic position of the probes); first level analysis of each array, including automatic normalization of the data (for correcting experimental biases), breakpoint detection and status assignment (gain, loss or normal); validation or deletion of the analysis based on a summary report and quality criteria; visualization and biological analysis of the genomic profiles and results through a user-friendly interface. CAPweb is accessible at http://bioinfo.curie.fr/CAPweb.


Asunto(s)
Biología Computacional/métodos , Dosificación de Gen , Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Rotura Cromosómica , Gráficos por Computador , ADN/análisis , Internet , Interfaz Usuario-Computador
17.
Sci Rep ; 8(1): 17945, 2018 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-30546106

RESUMEN

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.

18.
Bioinformatics ; 22(17): 2066-73, 2006 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-16820431

RESUMEN

MOTIVATION: Microarray-based CGH (Comparative Genomic Hybridization), transcriptome arrays and other large-scale genomic technologies are now routinely used to generate a vast amount of genomic profiles. Exploratory analysis of this data is crucial in helping to understand the data and to help form biological hypotheses. This step requires visualization of the data in a meaningful way to visualize the results and to perform first level analyses. RESULTS: We have developed a graphical user interface for visualization and first level analysis of molecular profiles. It is currently in use at the Institut Curie for cancer research projects involving CGH arrays, transcriptome arrays, SNP (single nucleotide polymorphism) arrays, loss of heterozygosity results (LOH), and Chromatin ImmunoPrecipitation arrays (ChIP chips). The interface offers the possibility of studying these different types of information in a consistent way. Several views are proposed, such as the classical CGH karyotype view or genome-wide multi-tumor comparison. Many functionalities for analyzing CGH data are provided by the interface, including looking for recurrent regions of alterations, confrontation to transcriptome data or clinical information, and clustering. Our tool consists of PHP scripts and of an applet written in Java. It can be run on public datasets at http://bioinfo.curie.fr/vamp AVAILABILITY: The VAMP software (Visualization and Analysis of array-CGH,transcriptome and other Molecular Profiles) is available upon request. It can be tested on public datasets at http://bioinfo.curie.fr/vamp. The documentation is available at http://bioinfo.curie.fr/vamp/doc.


Asunto(s)
Mapeo Cromosómico/métodos , Proteoma/metabolismo , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador , Algoritmos , Gráficos por Computador , Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Dosificación de Gen/genética , Almacenamiento y Recuperación de la Información/métodos , Proteoma/genética , Factores de Transcripción/genética
19.
Sci Rep ; 7(1): 15126, 2017 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-29123141

RESUMEN

One of the most challenging problems in the development of new anticancer drugs is the very high attrition rate. The so-called "drug repositioning process" propose to find new therapeutic indications to already approved drugs. For this, new analytic methods are required to optimize the information present in large-scale pharmacogenomics datasets. We analyzed data from the Genomics of Drug Sensitivity in Cancer and Cancer Cell Line Encyclopedia studies. We focused on common cell lines (n = 471), considering the molecular information, and the drug sensitivity for common drugs screened (n = 15). We propose a novel classification based on transcriptomic profiles of cell lines, according to a biological network-driven gene selection process. Our robust molecular classification displays greater homogeneity of drug sensitivity than cancer cell line grouped based on tissue of origin. We then identified significant associations between cell line cluster and drug response robustly found between both datasets. We further demonstrate the relevance of our method using two additional external datasets and distinct sensitivity metrics. Some associations were still found robust, despite cell lines and drug responses' variations. This study defines a robust molecular classification of cancer cell lines that could be used to find new therapeutic indications to known compounds.


Asunto(s)
Antineoplásicos/farmacología , Perfilación de la Expresión Génica/métodos , Farmacogenética/métodos , Línea Celular Tumoral , Humanos
20.
BMC Bioinformatics ; 7: 264, 2006 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-16716215

RESUMEN

BACKGROUND: Array-based comparative genomic hybridization (array-CGH) is a recently developed technique for analyzing changes in DNA copy number. As in all microarray analyses, normalization is required to correct for experimental artifacts while preserving the true biological signal. We investigated various sources of systematic variation in array-CGH data and identified two distinct types of spatial effect of no biological relevance as the predominant experimental artifacts: continuous spatial gradients and local spatial bias. Local spatial bias affects a large proportion of arrays, and has not previously been considered in array-CGH experiments. RESULTS: We show that existing normalization techniques do not correct these spatial effects properly. We therefore developed an automatic method for the spatial normalization of array-CGH data. This method makes it possible to delineate and to eliminate and/or correct areas affected by spatial bias. It is based on the combination of a spatial segmentation algorithm called NEM (Neighborhood Expectation Maximization) and spatial trend estimation. We defined quality criteria for array-CGH data, demonstrating significant improvements in data quality with our method for three data sets coming from two different platforms (198, 175 and 26 BAC-arrays). CONCLUSION: We have designed an automatic algorithm for the spatial normalization of BAC CGH-array data, preventing the misinterpretation of experimental artifacts as biologically relevant outliers in the genomic profile. This algorithm is implemented in the R package MANOR (Micro-Array NORmalization), which is described at http://bioinfo.curie.fr/projects/manor and available from the Bioconductor site http://www.bioconductor.org. It can also be tested on the CAPweb bioinformatics platform at http://bioinfo.curie.fr/CAPweb.


Asunto(s)
Algoritmos , Artefactos , Mapeo Cromosómico/métodos , Hibridación in Situ/métodos , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Simulación por Computador , Interpretación Estadística de Datos , Dosificación de Gen , Modelos Estadísticos , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA