Pesquisa | Biblioteca Virtual em Saúde

Band-based similarity indices for gene expression classification and clustering.

Torrente, Aurora.

Sci Rep ; 11(1): 21609, 2021 11 03.

Artigo em Inglês | MEDLINE | ID: mdl-34732744

RESUMO

The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.

Assuntos

Algoritmos , Biomarcadores Tumorais/genética , Interpretação Estatística de Dados , Neoplasias/genética , Análise por Conglomerados , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/classificação , Neoplasias/patologia

Induced correlations and rupture of molecular chaos by anisotropic dissipative Janus hard disks.

Lasanta, Antonio; Torrente, Aurora; López de Haro, Mariano.

Phys Rev E ; 100(5-1): 052128, 2019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-31870030

RESUMO

A system of smooth "frozen" Janus-type disks is studied. Such disks cannot rotate and are divided by their diameter into two sides of different inelasticities. Taking as a reference a system of colored elastic disks, we find differences in the behavior of the collisions once the anisotropy is included. A homogeneous state, akin to the homogeneous cooling state of granular gases, is seen to arise and the singular behavior of both the collisions and the precollisional correlations are highlighted.

Large Mpemba-like effect in a gas of inelastic rough hard spheres.

Torrente, Aurora; López-Castaño, Miguel A; Lasanta, Antonio; Reyes, Francisco Vega; Prados, Antonio; Santos, Andrés.

Phys Rev E ; 99(6-1): 060901, 2019 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-31330601

RESUMO

We report the emergence of a giant Mpemba effect in the uniformly heated gas of inelastic rough hard spheres: The initially hotter sample may cool sooner than the colder one, even when the initial temperatures differ by more than one order of magnitude. In order to understand this behavior, it suffices to consider the simplest Maxwellian approximation for the velocity distribution in a kinetic approach. The largeness of the effect stems from the fact that the rotational and translational temperatures, which obey two coupled evolution equations, are comparable. Our theoretical predictions agree very well with molecular dynamics and direct simulation Monte Carlo data.

clustComp, a bioconductor package for the comparison of clustering results.

Torrente, Aurora; Brazma, Alvis.

Bioinformatics ; 33(24): 4001-4003, 2017 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-28961761

RESUMO

SUMMARY: clustComp is an open source Bioconductor package that implements different techniques for the comparison of two gene expression clustering results. These include flat versus flat and hierarchical versus flat comparisons. The visualization of the similarities is provided by means of a bipartite graph, whose layout is heuristically optimized. Its flexibility allows a suitable visualization for both small and large datasets. AVAILABILITY AND IMPLEMENTATION: The package is available at http://bioconductor.org/packages/clustComp/ and contains a 'vignette' outlying the typical use of the algorithms. CONTACT: etorrent@est-econ.uc3m.es. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica/métodos , Software , Algoritmos , Análise por Conglomerados

Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.

Torrente, Aurora; Lukk, Margus; Xue, Vincent; Parkinson, Helen; Rung, Johan; Brazma, Alvis.

PLoS One ; 11(6): e0157484, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27322383

RESUMO

Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from â¼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of â¼20,000 genes and â¼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain.

Assuntos

Biomarcadores Tumorais/biossíntese , Regulação Neoplásica da Expressão Gênica , Proteínas de Neoplasias/biossíntese , Neoplasias/genética , Biomarcadores Tumorais/genética , Ciclo Celular/genética , Diferenciação Celular/genética , Divisão Celular/genética , Biologia Computacional , Replicação do DNA/genética , Bases de Dados Genéticas , Humanos , Proteínas de Neoplasias/genética , Neoplasias/patologia , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Componente Principal , Análise Serial de Proteínas

DepthTools: an R package for a robust analysis of gene expression data.

Torrente, Aurora; López-Pintado, Sara; Romo, Juan.

BMC Bioinformatics ; 14: 237, 2013 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-23885712

RESUMO

BACKGROUND: The use of DNA microarrays and oligonucleotide chips of high density in modern biomedical research provides complex, high dimensional data which have been proven to convey crucial information about gene expression levels and to play an important role in disease diagnosis. Therefore, there is a need for developing new, robust statistical techniques to analyze these data. RESULTS: depthTools is an R package for a robust statistical analysis of gene expression data, based on an efficient implementation of a feasible notion of depth, the Modified Band Depth. This software includes several visualization and inference tools successfully applied to high dimensional gene expression data. A user-friendly interface is also provided via an R-commander plugin. CONCLUSION: We illustrate the utility of the depthTools package, that could be used, for instance, to achieve a better understanding of genome-level variation between tumors and to facilitate the development of personalized treatments.

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Algoritmos , Genoma , Humanos , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo

A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases.

Lahti, Leo; Torrente, Aurora; Elo, Laura L; Brazma, Alvis; Rung, Johan.

Nucleic Acids Res ; 41(10): e110, 2013 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-23563154

RESUMO

Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.

Assuntos

Algoritmos , Análise de Sequência com Séries de Oligonucleotídeos , Teorema de Bayes , Perfilação da Expressão Gênica , Humanos

Robust depth-based tools for the analysis of gene expression data.

López-Pintado, Sara; Romo, Juan; Torrente, Aurora.

Biostatistics ; 11(2): 254-64, 2010 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-20064844

RESUMO

Microarray experiments provide data on the expression levels of thousands of genes and, therefore, statistical methods applicable to the analysis of such high-dimensional data are needed. In this paper, we propose robust nonparametric tools for the description and analysis of microarray data based on the concept of functional depth, which measures the centrality of an observation within a sample. We show that this concept can be easily adapted to high-dimensional observations and, in particular, to gene expression data. This allows the development of the following depth-based inference tools: (1) a scale curve for measuring and visualizing the dispersion of a set of points, (2) a rank test for deciding if 2 groups of multidimensional observations come from the same population, and (3) supervised classification techniques for assigning a new sample to one of G given groups. We apply these methods to microarray data, and to simulated data including contaminated models, and show that they are robust, efficient, and competitive with other procedures proposed in the literature, outperforming them in some situations.

Assuntos

Biometria/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Algoritmos , Simulação por Computador , Expressão Gênica/genética , Humanos , Leucemia Mieloide Aguda/metabolismo , Masculino , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo , Neoplasias da Próstata/metabolismo , Estatísticas não Paramétricas

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

Torrente, Aurora; Kapushesky, Misha; Brazma, Alvis.

Bioinformatics ; 21(21): 3993-9, 2005 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-16141251

RESUMO

MOTIVATION: Clustering is one of the most widely used methods in unsupervised gene expression data analysis. The use of different clustering algorithms or different parameters often produces rather different results on the same data. Biological interpretation of multiple clustering results requires understanding how different clusters relate to each other. It is particularly non-trivial to compare the results of a hierarchical and a flat, e.g. k-means, clustering. RESULTS: We present a new method for comparing and visualizing relationships between different clustering results, either flat versus flat, or flat versus hierarchical. When comparing a flat clustering to a hierarchical clustering, the algorithm cuts different branches in the hierarchical tree at different levels to optimize the correspondence between the clusters. The optimization function is based on graph layout aesthetics or on mutual information. The clusters are displayed using a bipartite graph where the edges are weighted proportionally to the number of common elements in the respective clusters and the weighted number of crossings is minimized. The performance of the algorithm is tested using simulated and real gene expression data. The algorithm is implemented in the online gene expression data analysis tool Expression Profiler. AVAILABILITY: http://www.ebi.ac.uk/expressionprofiler

Assuntos

Algoritmos , Análise por Conglomerados , Gráficos por Computador , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-Computador

10.

Expression Profiler: next generation--an online platform for analysis of microarray data.

Kapushesky, Misha; Kemmeren, Patrick; Culhane, Aedín C; Durinck, Steffen; Ihmels, Jan; Körner, Christine; Kull, Meelis; Torrente, Aurora; Sarkans, Ugis; Vilo, Jaak; Brazma, Alvis.

Nucleic Acids Res ; 32(Web Server issue): W465-70, 2004 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-15215431

RESUMO

Expression Profiler (EP, http://www.ebi.ac.uk/expressionprofiler) is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture, Expression Profiler: next generation (EP:NG), modularizes the original design and allows individual analysis-task-related components to be developed by different groups and yet still seamlessly to work together and share the same user interface look and feel. Data analysis components for gene expression data preprocessing, missing value imputation, filtering, clustering methods, visualization, significant gene finding, between group analysis and other statistical components are available from the EBI (European Bioinformatics Institute) web site. The web-based design of Expression Profiler supports data sharing and collaborative analysis in a secure environment. Developed tools are integrated with the microarray gene expression database ArrayExpress and form the exploratory analytical front-end to those data. EP:NG is an open-source project, encouraging broad distribution and further extensions from the scientific community.

Assuntos

Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Software , Genômica , Internet , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA