Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
BMC Bioinformatics ; 23(1): 522, 2022 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-36474143

RESUMO

BACKGROUND: A deep understanding of carcinogenesis at the DNA level underpins many advances in cancer prevention and treatment. Mutational signatures provide a breakthrough conceptualisation, as well as an analysis framework, that can be used to build such understanding. They capture somatic mutation patterns and at best identify their causes. Most studies in this context have focused on an inherently additive analysis, e.g. by non-negative matrix factorization, where the mutations within a cancer sample are explained by a linear combination of independent mutational signatures. However, other recent studies show that the mutational signatures exhibit non-additive interactions. RESULTS: We carefully analysed such additive model fits from the PCAWG study cataloguing mutational signatures as well as their activities across thousands of cancers. Our analysis identified systematic and non-random structure of residuals that is left unexplained by the additive model. We used hierarchical clustering to identify cancer subsets with similar residual profiles to show that both systematic mutation count overestimation and underestimation take place. We propose an extension to the additive mutational signature model-multiplicatively acting modulatory processes-and develop a maximum-likelihood framework to identify such modulatory mutational signatures. The augmented model is expressive enough to almost fully remove the observed systematic residual patterns. CONCLUSION: We suggest the modulatory processes biologically relate to sample specific DNA repair propensities with cancer or tissue type specific profiles. Overall, our results identify an interesting direction where to expand signature analysis.


Assuntos
Neoplasias , Humanos , Mutação , Neoplasias/genética
2.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34368832

RESUMO

Drug combination therapy is a promising strategy to treat complex diseases such as cancer and infectious diseases. However, current knowledge of drug combination therapies, especially in cancer patients, is limited because of adverse drug effects, toxicity and cell line heterogeneity. Screening new drug combinations requires substantial efforts since considering all possible combinations between drugs is infeasible and expensive. Therefore, building computational approaches, particularly machine learning methods, could provide an effective strategy to overcome drug resistance and improve therapeutic efficacy. In this review, we group the state-of-the-art machine learning approaches to analyze personalized drug combination therapies into three categories and discuss each method in each category. We also present a short description of relevant databases used as a benchmark in drug combination therapies and provide a list of well-known, publicly available interactive data analysis portals. We highlight the importance of data integration on the identification of drug combinations. Finally, we address the advantages of combining multiple data sources on drug combination analysis by showing an experimental comparison.


Assuntos
Aprendizado de Máquina , Protocolos de Quimioterapia Combinada Antineoplásica/administração & dosagem , Biologia Computacional/métodos , Humanos , Neoplasias/tratamento farmacológico , Medicina de Precisão
3.
Brief Bioinform ; 22(1): 346-359, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31838491

RESUMO

Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact:  betul.guvenc@aalto.fi.


Assuntos
Resistencia a Medicamentos Antineoplásicos , Genômica/métodos , Medicina de Precisão/métodos , Humanos , Aprendizado de Máquina , Variantes Farmacogenômicos
4.
Bioinformatics ; 35(14): i218-i224, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510659

RESUMO

MOTIVATION: Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. RESULTS: We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, principal component analysis and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction. AVAILABILITY AND IMPLEMENTATION: Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer.


Assuntos
Aprendizado de Máquina , Humanos , Neoplasias
5.
Bioinformatics ; 34(13): i395-i403, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949984

RESUMO

Motivation: Precision medicine requires the ability to predict the efficacies of different treatments for a given individual using high-dimensional genomic measurements. However, identifying predictive features remains a challenge when the sample size is small. Incorporating expert knowledge offers a promising approach to improve predictions, but collecting such knowledge is laborious if the number of candidate features is very large. Results: We introduce a probabilistic framework to incorporate expert feedback about the impact of genomic measurements on the outcome of interest and present a novel approach to collect the feedback efficiently, based on Bayesian experimental design. The new approach outperformed other recent alternatives in two medical applications: prediction of metabolic traits and prediction of sensitivity of cancer cells to different drugs, both using genomic features as predictors. Furthermore, the intelligent approach to collect feedback reduced the workload of the expert to approximately 11%, compared to a baseline approach. Availability and implementation: Source code implementing the introduced computational methods is freely available at https://github.com/AaltoPML/knowledge-elicitation-for-precision-medicine. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Medicina de Precisão/métodos , Software , Teorema de Bayes , Humanos , Análise de Sequência de DNA/métodos
6.
Sci Rep ; 8(1): 4034, 2018 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-29507319

RESUMO

In metazoans, epithelial architecture provides a context that dynamically modulates most if not all epithelial cell responses to intrinsic and extrinsic signals, including growth or survival signalling and transforming oncogene action. Three-dimensional (3D) epithelial culture systems provide tractable models to interrogate the function of human genetic determinants in establishment of context-dependency. We performed an arrayed genetic shRNA screen in mammary epithelial 3D cultures to identify new determinants of epithelial architecture, finding that the key phenotype impacting shRNAs altered not only the data population average but even more noticeably the population distribution. The broad distributions were attributable to sporadic gene silencing actions by shRNA in unselected populations. We employed Maximum Mean Discrepancy concept to capture similar population distribution patterns and demonstrate here the feasibility of the test in identifying an impact of shRNA in populations of 3D structures. Integration of the clustered morphometric data with protein-protein interactions data enabled hypothesis generation of novel biological pathways underlying similar 3D phenotype alterations. The results present a new strategy for 3D phenotype-driven pathway analysis, which is expected to accelerate discovery of context-dependent gene functions in epithelial biology and tumorigenesis.


Assuntos
Células Epiteliais/metabolismo , Transdução de Sinais , Linhagem Celular , Transformação Celular Neoplásica , Humanos , Fenótipo
7.
Cell Syst ; 5(5): 485-497.e3, 2017 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-28988802

RESUMO

We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data type; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlated with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison with this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study also demonstrates the value of releasing pre-publication data publicly to engage the community in an open research collaboration.


Assuntos
Expressão Gênica/genética , Genes Essenciais/genética , Algoritmos , Linhagem Celular Tumoral , Genômica/métodos , Humanos , RNA Interferente Pequeno/genética
8.
BMC Bioinformatics ; 18(Suppl 10): 393, 2017 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-28929971

RESUMO

BACKGROUND: Dispersed biomedical databases limit user exploration to generate structured knowledge. Linked Data unifies data structures and makes the dispersed data easy to search across resources, but it lacks supporting human cognition to achieve insights. In addition, potential errors in the data are difficult to detect in their free formats. Devising a visualization that synthesizes multiple sources in such a way that links between data sources are transparent, and uncertainties, such as data conflicts, are salient is challenging. RESULTS: To investigate the requirements and challenges of uncertainty-aware visualizations of linked data, we developed MediSyn, a system that synthesizes medical datasets to support drug treatment selection. It uses a matrix-based layout to visually link drugs, targets (e.g., mutations), and tumor types. Data uncertainties are salient in MediSyn; for example, (i) missing data are exposed in the matrix view of drug-target relations; (ii) inconsistencies between datasets are shown via overlaid layers; and (iii) data credibility is conveyed through links to data provenance. CONCLUSIONS: Through the synthesis of two manually curated datasets, cancer treatment biomarkers and drug-target bioactivities, a use case shows how MediSyn effectively supports the discovery of drug-repurposing opportunities. A study with six domain experts indicated that MediSyn benefited the drug selection and data inconsistency discovery. Though linked publication sources supported user exploration for further information, the causes of inconsistencies were not easy to find. Additionally, MediSyn could embrace more patient data to increase its informativeness. We derive design implications from the findings.


Assuntos
Bases de Dados Factuais , Tratamento Farmacológico , Software , Incerteza , Adulto , Feminino , Humanos , Inquéritos e Questionários
9.
Bioinformatics ; 32(17): i455-i463, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27587662

RESUMO

MOTIVATION: A key goal of computational personalized medicine is to systematically utilize genomic and other molecular features of samples to predict drug responses for a previously unseen sample. Such predictions are valuable for developing hypotheses for selecting therapies tailored for individual patients. This is especially valuable in oncology, where molecular and genetic heterogeneity of the cells has a major impact on the response. However, the prediction task is extremely challenging, raising the need for methods that can effectively model and predict drug responses. RESULTS: In this study, we propose a novel formulation of multi-task matrix factorization that allows selective data integration for predicting drug responses. To solve the modeling task, we extend the state-of-the-art kernelized Bayesian matrix factorization (KBMF) method with component-wise multiple kernel learning. In addition, our approach exploits the known pathway information in a novel and biologically meaningful fashion to learn the drug response associations. Our method quantitatively outperforms the state of the art on predicting drug responses in two publicly available cancer datasets as well as on a synthetic dataset. In addition, we validated our model predictions with lab experiments using an in-house cancer cell line panel. We finally show the practical applicability of the proposed method by utilizing prior knowledge to infer pathway-drug response associations, opening up the opportunity for elucidating drug action mechanisms. We demonstrate that pathway-response associations can be learned by the proposed model for the well-known EGFR and MEK inhibitors. AVAILABILITY AND IMPLEMENTATION: The source code implementing the method is available at http://research.cs.aalto.fi/pml/software/cwkbmf/ CONTACTS: muhammad.ammad-ud-din@aalto.fi or samuel.kaski@aalto.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Neoplasias , Algoritmos , Teorema de Bayes , Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Humanos , Redes e Vias Metabólicas , Software
10.
Nat Commun ; 7: 12460, 2016 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-27549343

RESUMO

Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h(2)=0.18, P value=0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.


Assuntos
Anticorpos Monoclonais Humanizados/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Predisposição Genética para Doença/genética , Polimorfismo de Nucleotídeo Único , Fator de Necrose Tumoral alfa/antagonistas & inibidores , Adulto , Idoso , Anticorpos Monoclonais/uso terapêutico , Antirreumáticos/uso terapêutico , Artrite Reumatoide/genética , Artrite Reumatoide/patologia , Certolizumab Pegol/uso terapêutico , Estudos de Coortes , Crowdsourcing , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Resultado do Tratamento , Fator de Necrose Tumoral alfa/imunologia
11.
Bioinformatics ; 32(16): 2457-63, 2016 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153643

RESUMO

MOTIVATION: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. RESULTS: Our simulation studies show that the proposed method reliably infers biclusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity. AVAILABILITY AND IMPLEMENTATION: http://research.cs.aalto.fi/pml/software/GFAsparse/ CONTACTS: : kerstin.bunte@googlemail.com or samuel.kaski@aalto.fi.


Assuntos
Algoritmos , Análise por Conglomerados , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Teorema de Bayes , Análise Fatorial , Armazenamento e Recuperação da Informação , Análise de Sequência com Séries de Oligonucleotídeos
12.
Altern Lab Anim ; 43(5): 325-32, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26551289

RESUMO

This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety.


Assuntos
Alternativas aos Testes com Animais , Biologia Computacional , Humanos , Medição de Risco , Toxicogenética
13.
Bioinformatics ; 30(17): i497-504, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25161239

RESUMO

MOTIVATION: Analysis of relationships of drug structure to biological response is key to understanding off-target and unexpected drug effects, and for developing hypotheses on how to tailor drug therapies. New methods are required for integrated analyses of a large number of chemical features of drugs against the corresponding genome-wide responses of multiple cell models. RESULTS: In this article, we present the first comprehensive multi-set analysis on how the chemical structure of drugs impacts on genome-wide gene expression across several cancer cell lines [Connectivity Map (CMap) database]. The task is formulated as searching for drug response components across multiple cancers to reveal shared effects of drugs and the chemical features that may be responsible. The components can be computed with an extension of a recent approach called Group Factor Analysis. We identify 11 components that link the structural descriptors of drugs with specific gene expression responses observed in the three cell lines and identify structural groups that may be responsible for the responses. Our method quantitatively outperforms the limited earlier methods on CMap and identifies both the previously reported associations and several interesting novel findings, by taking into account multiple cell lines and advanced 3D structural descriptors. The novel observations include: previously unknown similarities in the effects induced by 15-delta prostaglandin J2 and HSP90 inhibitors, which are linked to the 3D descriptors of the drugs; and the induction by simvastatin of leukemia-specific response, resembling the effects of corticosteroids. AVAILABILITY AND IMPLEMENTATION: Source Code implementing the method is available at: http://research.ics.aalto.fi/mi/software/GFAsparse. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos/química , Antineoplásicos/farmacologia , Teorema de Bayes , Linhagem Celular Tumoral , Expressão Gênica/efeitos dos fármacos , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Relação Estrutura-Atividade
14.
J Chem Inf Model ; 54(8): 2347-59, 2014 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-25046554

RESUMO

With data from recent large-scale drug sensitivity measurement campaigns, it is now possible to build and test models predicting responses for more than one hundred anticancer drugs against several hundreds of human cancer cell lines. Traditional quantitative structure-activity relationship (QSAR) approaches focus on small molecules in searching for their structural properties predictive of the biological activity in a single cell line or a single tissue type. We extend this line of research in two directions: (1) an integrative QSAR approach predicting the responses to new drugs for a panel of multiple known cancer cell lines simultaneously and (2) a personalized QSAR approach predicting the responses to new drugs for new cancer cell lines. To solve the modeling task, we apply a novel kernelized Bayesian matrix factorization method. For maximum applicability and predictive performance, the method optionally utilizes genomic features of cell lines and target information on drugs in addition to chemical drug descriptors. In a case study with 116 anticancer drugs and 650 cell lines, we demonstrate the usefulness of the method in several relevant prediction scenarios, differing in the amount of available information, and analyze the importance of various types of drug features for the response prediction. Furthermore, after predicting the missing values of the data set, a complete global map of drug response is explored to assess treatment potential and treatment range of therapeutically interesting anticancer drugs.


Assuntos
Antineoplásicos/farmacologia , Regulação Neoplásica da Expressão Gênica , Proteínas de Neoplasias/genética , Relação Quantitativa Estrutura-Atividade , Bibliotecas de Moléculas Pequenas/farmacologia , Antineoplásicos/química , Teorema de Bayes , Biomarcadores Farmacológicos , Linhagem Celular Tumoral , Análise Fatorial , Humanos , Proteínas de Neoplasias/antagonistas & inibidores , Proteínas de Neoplasias/metabolismo , Bibliotecas de Moléculas Pequenas/química
15.
Nat Biotechnol ; 32(12): 1202-12, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24880487

RESUMO

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.


Assuntos
Antineoplásicos/uso terapêutico , Resistencia a Medicamentos Antineoplásicos/genética , Perfilação da Expressão Gênica , Neoplasias/tratamento farmacológico , Algoritmos , Antineoplásicos/efeitos adversos , Epigenômica/métodos , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Genômica/métodos , Humanos , Neoplasias/genética , Proteômica/métodos
16.
Bioinformatics ; 28(18): 2349-56, 2012 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-22743225

RESUMO

MOTIVATION: Large public repositories of gene expression measurements offer the opportunity to position a new experiment into the context of earlier studies. While previous methods rely on experimental annotation or global similarity of expression profiles across genes or gene sets, we compare experiments by measuring similarity based on an unsupervised, data-driven regulatory model around pre-specified genes of interest. Our experiment retrieval approach is novel in two conceptual respects: (i) targetable focus and interpretability: the analysis is targeted at regulatory relationships of genes that are relevant to the analyst or come from prior knowledge; (ii) regulatory model-based similarity measure: related experiments are retrieved based on the strength of inferred regulatory links between genes. RESULTS: We learn a model for the regulation of specific genes from a data repository and exploit it to construct a similarity metric for an information retrieval task. We use the Fisher kernel, a rigorous similarity measure that typically has been applied to use generative models in discriminative classifiers. Results on human and plant microarray collections indicate that our method is able to substantially improve the retrieval of related experiments against standard methods. Furthermore, it allows the user to interpret biological conditions in terms of changes in link activity patterns. Our study of the osmotic stress network for Arabidopsis thaliana shows that the method successfully identifies relevant relationships around given key genes. AVAILABILITY: The code (R) is available at http://research.ics.tkk.fi/mi/software.shtml. CONTACT: elisabeth.georgii@aalto.fi; jarkko.salojarvi@helsinki.fi; samuel.kaski@hiit.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Arabidopsis/genética , Arabidopsis/metabolismo , Mineração de Dados , Regulação da Expressão Gênica , Humanos , Leucemia/genética , Leucemia/metabolismo , Modelos Lineares , Modelos Genéticos , Pressão Osmótica
17.
BMC Bioinformatics ; 13: 112, 2012 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-22646858

RESUMO

BACKGROUND: Detailed and systematic understanding of the biological effects of millions of available compounds on living cells is a significant challenge. As most compounds impact multiple targets and pathways, traditional methods for analyzing structure-function relationships are not comprehensive enough. Therefore more advanced integrative models are needed for predicting biological effects elicited by specific chemical features. As a step towards creating such computational links we developed a data-driven chemical systems biology approach to comprehensively study the relationship of 76 structural 3D-descriptors (VolSurf, chemical space) of 1159 drugs with the microarray gene expression responses (biological space) they elicited in three cancer cell lines. The analysis covering 11350 genes was based on data from the Connectivity Map. We decomposed the biological response profiles into components, each linked to a characteristic chemical descriptor profile. RESULTS: Integrated analysis of both the chemical and biological space was more informative than either dataset alone in predicting drug similarity as measured by shared protein targets. We identified ten major components that link distinct VolSurf chemical features across multiple compounds to specific cellular responses. For example, component 2 (hydrophobic properties) strongly linked to DNA damage response, while component 3 (hydrogen bonding) was associated with metabolic stress. Individual structural and biological features were often linked to one cell line only, such as leukemia cells (HL-60) specifically responding to cardiac glycosides. CONCLUSIONS: In summary, our approach identified several novel links between specific chemical structure properties and distinct biological responses in cells incubated with these drugs. Importantly, the analysis focused on chemical-biological properties that emerge across multiple drugs. The decoding of such systematic relationships is necessary to build better models of drug effects, including unanticipated types of molecular properties having strong biological effects.


Assuntos
Antineoplásicos/química , Antineoplásicos/farmacologia , Biomarcadores Farmacológicos , Perfilação da Expressão Gênica/estatística & dados numéricos , Neoplasias/genética , Genoma Humano/efeitos dos fármacos , Genoma Humano/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Relação Estrutura-Atividade , Biologia de Sistemas/métodos , Transcriptoma
18.
Bioinformatics ; 28(2): 246-53, 2012 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-22106335

RESUMO

MOTIVATION: Genome-wide measurement of transcript levels is an ubiquitous tool in biomedical research. As experimental data continues to be deposited in public databases, it is becoming important to develop search engines that enable the retrieval of relevant studies given a query study. While retrieval systems based on meta-data already exist, data-driven approaches that retrieve studies based on similarities in the expression data itself have a greater potential of uncovering novel biological insights. RESULTS: We propose an information retrieval method based on differential expression. Our method deals with arbitrary experimental designs and performs competitively with alternative approaches, while making the search results interpretable in terms of differential expression patterns. We show that our model yields meaningful connections between biological conditions from different studies. Finally, we validate a previously unknown connection between malignant pleural mesothelioma and SIM2s suggested by our method, via real-time polymerase chain reaction in an independent set of mesothelioma samples. AVAILABILITY: Supplementary data and source code are available from http://www.ebi.ac.uk/fg/research/rex.


Assuntos
Perfilação da Expressão Gênica , Armazenamento e Recuperação da Informação/métodos , Mesotelioma/genética , Neoplasias Pleurais/genética , Animais , Humanos , Camundongos , Linguagens de Programação , Ratos
19.
J Comput Biol ; 18(3): 251-61, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21385032

RESUMO

Clustering methods are a useful and common first step in gene expression studies, but the results may be hard to interpret. We bring in explicitly an indicator of which genes tie each cluster, changing the setup to biclustering. Furthermore, we make the indicators hierarchical, resulting in a hierarchy of progressively more specific biclusters. A non-parametric Bayesian formulation makes the model rigorous yet flexible and computations feasible. The model can additionally be used in information retrieval for relating relevant samples. We show that the model outperforms four other biclustering procedures on a large miRNA data set. We also demonstrate the model's added interpretability and information retrieval capability in a case study. Software is publicly available at http://research.ics.tkk.fi/mi/software/treebic/.


Assuntos
Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , Neoplasias/genética , Teorema de Bayes , Análise por Conglomerados , Expressão Gênica , Humanos , Modelos Genéticos , Software
20.
Bioinformatics ; 26(12): i391-8, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529933

RESUMO

MOTIVATION: Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates. RESULTS: We extend the applicability area of multivariate, multi-way ANOVA-type methods to multi-source cases by introducing a novel Bayesian model. The method is capable of finding covariate-related dependencies between the sources. It assumes the measurements consist of groups of similarly behaving variables, and estimates the multivariate covariate effects and their interaction effects for the discovered groups of variables. In particular, the method partitions the effects to those shared between the sources and to source-specific ones. The method is specifically designed for datasets with small sample sizes and high dimensionality. We apply the method to a lipidomics dataset from a lung cancer study with two-way experimental setup, where measurements from several tissues with mostly distinct lipids have been taken. The method is also directly applicable to gene expression and proteomics. AVAILABILITY: An R-implementation is available at http://www.cis.hut.fi/projects/mi/software/multiWayCCA/.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Variância , Coleta de Dados , Análise Multivariada
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA