Pesquisa | Portal de Pesquisa da BVS

1.

Enter the Matrix: Factorization Uncovers Knowledge from Omics.

Stein-O'Brien, Genevieve L; Arora, Raman; Culhane, Aedin C; Favorov, Alexander V; Garmire, Lana X; Greene, Casey S; Goff, Loyal A; Li, Yifeng; Ngom, Aloune; Ochs, Michael F; Xu, Yanxun; Fertig, Elana J.

Trends Genet ; 34(10): 790-805, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30143323

RESUMO

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.

Assuntos

Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Humanos , Biologia de Sistemas/estatística & dados numéricos

2.

Evaluating optimal therapy robustness by virtual expansion of a sample population, with a case study in cancer immunotherapy.

Barish, Syndi; Ochs, Michael F; Sontag, Eduardo D; Gevertz, Jana L.

Proc Natl Acad Sci U S A ; 114(31): E6277-E6286, 2017 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-28716945

RESUMO

Cancer is a highly heterogeneous disease, exhibiting spatial and temporal variations that pose challenges for designing robust therapies. Here, we propose the VEPART (Virtual Expansion of Populations for Analyzing Robustness of Therapies) technique as a platform that integrates experimental data, mathematical modeling, and statistical analyses for identifying robust optimal treatment protocols. VEPART begins with time course experimental data for a sample population, and a mathematical model fit to aggregate data from that sample population. Using nonparametric statistics, the sample population is amplified and used to create a large number of virtual populations. At the final step of VEPART, robustness is assessed by identifying and analyzing the optimal therapy (perhaps restricted to a set of clinically realizable protocols) across each virtual population. As proof of concept, we have applied the VEPART method to study the robustness of treatment response in a mouse model of melanoma subject to treatment with immunostimulatory oncolytic viruses and dendritic cell vaccines. Our analysis (i) showed that every scheduling variant of the experimentally used treatment protocol is fragile (nonrobust) and (ii) discovered an alternative region of dosing space (lower oncolytic virus dose, higher dendritic cell dose) for which a robust optimal protocol exists.

Assuntos

Vacinas Anticâncer/imunologia , Células Dendríticas/imunologia , Imunoterapia/métodos , Melanoma/terapia , Modelos Teóricos , Terapia Viral Oncolítica/métodos , Vírus Oncolíticos/fisiologia , Algoritmos , Animais , Diferenciação Celular/imunologia , Simulação por Computador , Modelos Animais de Doenças , Melanoma/imunologia , Camundongos , Linfócitos T Citotóxicos/imunologia

3.

Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer.

Afsari, Bahman; Guo, Theresa; Considine, Michael; Florea, Liliana; Kagohara, Luciane T; Stein-O'Brien, Genevieve L; Kelley, Dylan; Flam, Emily; Zambo, Kristina D; Ha, Patrick K; Geman, Donald; Ochs, Michael F; Califano, Joseph A; Gaykalova, Daria A; Favorov, Alexander V; Fertig, Elana J.

Bioinformatics ; 34(11): 1859-1867, 2018 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-29342249

RESUMO

Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg. Contact: bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Processamento Alternativo , Neoplasias/genética , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Software , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias de Cabeça e Pescoço/genética , Humanos , Modelos Genéticos

4.

PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF.

Stein-O'Brien, Genevieve L; Carey, Jacob L; Lee, Wai Shing; Considine, Michael; Favorov, Alexander V; Flam, Emily; Guo, Theresa; Li, Sijia; Marchionni, Luigi; Sherman, Thomas; Sivy, Shawn; Gaykalova, Daria A; McKay, Ronald D; Ochs, Michael F; Colantuoni, Carlo; Fertig, Elana J.

Bioinformatics ; 33(12): 1892-1894, 2017 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-28174896

RESUMO

SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: gsteinobrien@jhmi.edu or ccolantu@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Software , Teorema de Bayes , Biomarcadores , Humanos , Análise de Sequência de RNA/métodos

5.

Two clinical phenotypes in polycythemia vera.

Spivak, Jerry L; Considine, Michael; Williams, Donna M; Talbot, Conover C; Rogers, Ophelia; Moliterno, Alison R; Jie, Chunfa; Ochs, Michael F.

N Engl J Med ; 371(9): 808-17, 2014 Aug 28.

Artigo em Inglês | MEDLINE | ID: mdl-25162887

RESUMO

BACKGROUND: Polycythemia vera is the ultimate phenotypic consequence of the V617F mutation in Janus kinase 2 (encoded by JAK2), but the extent to which this mutation influences the behavior of the involved CD34+ hematopoietic stem cells is unknown. METHODS: We analyzed gene expression in CD34+ peripheral-blood cells from 19 patients with polycythemia vera, using oligonucleotide microarray technology after correcting for potential confounding by sex, since the phenotypic features of the disease differ between men and women. RESULTS: Men with polycythemia vera had twice as many up-regulated or down-regulated genes as women with polycythemia vera, in a comparison of gene expression in the patients and in healthy persons of the same sex, but there were 102 genes with differential regulation that was concordant in men and women. When these genes were used for class discovery by means of unsupervised hierarchical clustering, the 19 patients could be divided into two groups that did not differ significantly with respect to age, neutrophil JAK2 V617F allele burden, white-cell count, platelet count, or clonal dominance. However, they did differ significantly with respect to disease duration; hemoglobin level; frequency of thromboembolic events, palpable splenomegaly, and splenectomy; chemotherapy exposure; leukemic transformation; and survival. The unsupervised clustering was confirmed by a supervised approach with the use of a top-scoring-pair classifier that segregated the 19 patients into the same two phenotypic groups with 100% accuracy. CONCLUSIONS: Removing sex as a potential confounder, we identified an accurate molecular method for classifying patients with polycythemia vera according to disease behavior, independently of their JAK2 V617F allele burden, and identified previously unrecognized molecular pathways in polycythemia vera outside the canonical JAK2 pathway that may be amenable to targeted therapy. (Funded by the Department of Defense and the National Institutes of Health.).

Assuntos

Expressão Gênica , Janus Quinase 2/genética , Fenótipo , Policitemia Vera/genética , Idoso , Idoso de 80 Anos ou mais , Antígenos CD34 , Contagem de Células Sanguíneas , Fatores de Confusão Epidemiológicos , Feminino , Regulação da Expressão Gênica , Humanos , Janus Quinase 2/metabolismo , Masculino , Redes e Vias Metabólicas , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Policitemia Vera/classificação , Policitemia Vera/metabolismo , Fatores Sexuais

6.

NF-κB and stat3 transcription factor signatures differentiate HPV-positive and HPV-negative head and neck squamous cell carcinoma.

Gaykalova, Daria A; Manola, Judith B; Ozawa, Hiroyuki; Zizkova, Veronika; Morton, Kathryn; Bishop, Justin A; Sharma, Rajni; Zhang, Chi; Michailidi, Christina; Considine, Michael; Tan, Marietta; Fertig, Elana J; Hennessey, Patrick T; Ahn, Julie; Koch, Wayne M; Westra, William H; Khan, Zubair; Chung, Christine H; Ochs, Michael F; Califano, Joseph A.

Int J Cancer ; 137(8): 1879-89, 2015 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-25857630

RESUMO

Using high-throughput analyses and the TRANSFAC database, we characterized TF signatures of head and neck squamous cell carcinoma (HNSCC) subgroups by inferential analysis of target gene expression, correcting for the effects of DNA methylation and copy number. Using this discovery pipeline, we determined that human papillomavirus-related (HPV+) and HPV- HNSCC differed significantly based on the activity levels of key TFs including AP1, STATs, NF-κB and p53. Immunohistochemical analysis confirmed that HPV- HNSCC is characterized by co-activated STAT3 and NF-κB pathways and functional studies demonstrate that this phenotype can be effectively targeted with combined anti-NF-κB and anti-STAT therapies. These discoveries correlate strongly with previous findings connecting STATs, NF-κB and AP1 in HNSCC. We identified five top-scoring pair biomarkers from STATs, NF-κB and AP1 pathways that distinguish HPV+ from HPV- HNSCC based on TF activity and validated these biomarkers on TCGA and on independent validation cohorts. We conclude that a novel approach to TF pathway analysis can provide insight into therapeutic targeting of patient subgroup for heterogeneous disease such as HNSCC.

Assuntos

Carcinoma de Células Escamosas/genética , Neoplasias de Cabeça e Pescoço/genética , NF-kappa B/genética , Infecções por Papillomavirus/genética , Fator de Transcrição STAT3/genética , Carcinoma de Células Escamosas/metabolismo , Carcinoma de Células Escamosas/virologia , Linhagem Celular Tumoral , Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Neoplasias de Cabeça e Pescoço/metabolismo , Neoplasias de Cabeça e Pescoço/virologia , Humanos , NF-kappa B/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Infecções por Papillomavirus/metabolismo , Fator de Transcrição STAT3/metabolismo , Transdução de Sinais , Carcinoma de Células Escamosas de Cabeça e Pescoço

7.

Correcting transcription factor gene sets for copy number and promoter methylation variations.

Rathi, Komal S; Gaykalova, Daria A; Hennessey, Patrick; Califano, Joseph A; Ochs, Michael F.

Drug Dev Res ; 75(6): 343-7, 2014 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-25195578

RESUMO

Gene set analysis provides a method to generate statistical inferences across sets of linked genes, primarily using high-throughput expression data. Common gene sets include biological pathways, operons, and targets of transcriptional regulators. In higher eukaryotes, especially when dealing with diseases with strong genetic and epigenetic components such as cancer, copy number loss and gene silencing through promoter methylation can eliminate the possibility that a gene is transcribed. This, in turn, can adversely affect the estimation of transcription factor or pathway activity from a set of target genes, as some of the targets may not be responsive to transcriptional regulation. Here we introduce a simple filtering approach that removes genes from consideration if they show copy number loss or promoter methylation, and demonstrate the improvement in inference of transcription factor activity in a simulated dataset based on the background expression observed in normal head and neck tissue.

Assuntos

Biologia Computacional/métodos , Dosagem de Genes , Neoplasias/genética , Regiões Promotoras Genéticas , Fatores de Transcrição/genética , Metilação de DNA , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Humanos , Software

8.

Updating annotations with the distributed annotation system and the automated sequence annotation pipeline.

Speier, William; Ochs, Michael F.

Bioinformatics ; 28(21): 2858-9, 2012 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-22945787

RESUMO

SUMMARY: The integration between BioDAS ProServer and Automated Sequence Annotation Pipeline (ASAP) provides an interface for querying diverse annotation sources, chaining and linking results, and standardizing the output using the Distributed Annotation System (DAS) protocol. This interface allows pipeline plans in ASAP to be integrated into any system using HTTP and also allows the information returned by ASAP to be included in the DAS registry for use in any DAS-aware system. Three example implementations have been developed: the first accesses TRANSFAC information to automatically create gene sets for the Coordinated Gene Activity in Pattern Sets (CoGAPS) algorithm; the second integrates annotations from multiple array platforms and provides unified annotations in an R environment; and the third wraps the UniProt database for integration with the SPICE DAS client. AVAILABILITY: Source code for ASAP 2.7 and the DAS 1.6 interface is available under the GNU public license. Proserver 2.20 is free software available from SourceForge. Scripts for installation and configuration on Linux are provided at our website: http://www.rits.onc.jhmi.edu/dbb/custom/A6/

Assuntos

Algoritmos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Anotação de Sequência Molecular/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Bases de Conhecimento , Linguagens de Programação , Proteínas/química , Software , Interface Usuário-Computador

9.

Gene expression signatures modulated by epidermal growth factor receptor activation and their relationship to cetuximab resistance in head and neck squamous cell carcinoma.

Fertig, Elana J; Ren, Qing; Cheng, Haixia; Hatakeyama, Hiromitsu; Dicker, Adam P; Rodeck, Ulrich; Considine, Michael; Ochs, Michael F; Chung, Christine H.

BMC Genomics ; 13: 160, 2012 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-22549044

RESUMO

BACKGROUND: Aberrant activation of signaling pathways downstream of epidermal growth factor receptor (EGFR) has been hypothesized to be one of the mechanisms of cetuximab (a monoclonal antibody against EGFR) resistance in head and neck squamous cell carcinoma (HNSCC). To infer relevant and specific pathway activation downstream of EGFR from gene expression in HNSCC, we generated gene expression signatures using immortalized keratinocytes (HaCaT) subjected to ligand stimulation and transfected with EGFR, RELA/p65, or HRASVal12D. RESULTS: The gene expression patterns that distinguished the HaCaT variants and conditions were inferred using the Markov chain Monte Carlo (MCMC) matrix factorization algorithm Coordinated Gene Activity in Pattern Sets (CoGAPS). This approach inferred gene expression signatures with greater relevance to cell signaling pathway activation than the expression signatures inferred with standard linear models. Furthermore, the pathway signature generated using HaCaT-HRASVal12D further associated with the cetuximab treatment response in isogenic cetuximab-sensitive (UMSCC1) and -resistant (1CC8) cell lines. CONCLUSIONS: Our data suggest that the CoGAPS algorithm can generate gene expression signatures that are pertinent to downstream effects of receptor signaling pathway activation and potentially be useful in modeling resistance mechanisms to targeted therapies.

Assuntos

Anticorpos Monoclonais/farmacologia , Carcinoma de Células Escamosas/metabolismo , Receptores ErbB/metabolismo , Neoplasias de Cabeça e Pescoço/metabolismo , Algoritmos , Anticorpos Monoclonais Humanizados , Linhagem Celular Tumoral , Cetuximab , Resistencia a Medicamentos Antineoplásicos/genética , Receptores ErbB/genética , Humanos , Queratinócitos/citologia , Queratinócitos/efeitos dos fármacos , Queratinócitos/metabolismo , Ligação Proteica/efeitos dos fármacos , Transdução de Sinais/efeitos dos fármacos

10.

Knowledge-based data analysis comes of age.

Ochs, Michael F.

Brief Bioinform ; 11(1): 30-9, 2010 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-19854753

RESUMO

The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimates may not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models. We show that novel biological insights have been gained using these techniques.

Assuntos

Biologia de Sistemas , Teorema de Bayes , Estudo de Associação Genômica Ampla , Análise de Sequência com Séries de Oligonucleotídeos , Locos de Características Quantitativas

11.

CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data.

Fertig, Elana J; Ding, Jie; Favorov, Alexander V; Parmigiani, Giovanni; Ochs, Michael F.

Bioinformatics ; 26(21): 2792-3, 2010 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-20810601

RESUMO

SUMMARY: Coordinated Gene Activity in Pattern Sets (CoGAPS) provides an integrated package for isolating gene expression driven by a biological process, enhancing inference of biological processes from transcriptomic data. CoGAPS improves on other enrichment measurement methods by combining a Markov chain Monte Carlo (MCMC) matrix factorization algorithm (GAPS) with a threshold-independent statistic inferring activity on gene sets. The software is provided as open source C++ code built on top of JAGS software with an R interface. AVAILABILITY: The R package CoGAPS and the C++ package GAPS-JAGS are provided open source under the GNU Lesser Public License (GLPL) with a users manual containing installation and operating instructions. CoGAPS is available through Bioconductor and depends on the rjags package available through CRAN to interface CoGAPS with GAPS-JAGS. URL: http://www.cancerbiostats.onc.jhmi.edu/cogaps.cfm .

Assuntos

Expressão Gênica , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Cadeias de Markov

12.

Many ways to land upright: novel righting strategies allow spotted lanternfly nymphs to land on diverse substrates.

Kane, Suzanne Amador; Bien, Theodore; Contreras-Orendain, Luis; Ochs, Michael F; Tonia Hsieh, S.

J R Soc Interface ; 18(181): 20210367, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34376093

RESUMO

Unlike large animals, insects and other very small animals are so unsusceptible to impact-related injuries that they can use falling for dispersal and predator evasion. Reorienting to land upright can mitigate lost access to resources and predation risk. Such behaviours are critical for the spotted lanternfly (SLF) (Lycorma delicatula), an invasive, destructive insect pest spreading rapidly in the USA. High-speed video of SLF nymphs released under different conditions showed that these insects self-right using both active midair righting motions previously reported for other insects and novel post-impact mechanisms that take advantage of their ability to experience near-total energy loss on impact. Unlike during terrestrial self-righting, in which an animal initially at rest on its back uses appendage motions to flip over, SLF nymphs impacted the surface at varying angles and then self-righted during the rebound using coordinated body rotations, foot-substrate adhesion and active leg motions. These previously unreported strategies were found to promote disproportionately upright, secure landings on both hard, flat surfaces and tilted, compliant host plant leaves. Our results highlight the importance of examining biomechanical phenomena in ecologically relevant contexts, and show that, for small animals, the post-impact bounce period can be critical for achieving an upright landing.

Assuntos

Hemípteros , Animais , Extremidades , Insetos , Movimento

13.

Correction: Novel Insight into Mutational Landscape of Head and Neck Squamous Cell Carcinoma.

Gaykalova, Daria A; Mambo, Elizabeth; Choudhary, Ashish; Houghton, Jeffery; Buddavarapu, Kalyan; Sanford, Tiffany; Darden, Will; Adai, Alex; Hadd, Andrew; Latham, Gary; Danilova, Ludmila V; Bishop, Justin; Li, Ryan J; Westra, William H; Hennessey, Patrick; Koch, Wayne M; Ochs, Michael F; Califano, Joseph A; Sun, Wenyue.

PLoS One ; 15(5): e0233409, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32401780

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0093102.].

14.

Information systems for cancer research.

Ochs, Michael F; Casagrande, John T.

Cancer Invest ; 26(10): 1060-7, 2008 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-19093263

RESUMO

The last decade has seen a massive growth in data for cancer research, with high-throughput technologies joining clinical trials as major drivers of informatics needs. These data provide opportunities for developing new cancer treatments, but also major challenges for informatics, and we summarize the systems needed and potential issues arising in addressing these challenges. Integrating these data into the research enterprise will require investments in (1) data capture and management, (2) data analysis, (3) data integration standards, (4) visualization tools, and (5) methods for integration with other enterprise systems.

Assuntos

Sistemas de Informação/estatística & dados numéricos , Neoplasias/terapia , Pesquisa/tendências , Ensaios Clínicos como Assunto , Biologia Computacional , Humanos , Sistemas de Informação/organização & administração , Sistemas de Informação/tendências , Idioma , Informática Médica , Projetos de Pesquisa , Ciência/métodos , Ciência/tendências , Biologia de Sistemas

15.

Estimating gene function with least squares nonnegative matrix factorization.

Wang, Guoli; Ochs, Michael F.

Methods Mol Biol ; 408: 35-47, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-18314576

RESUMO

Nonnegative matrix factorization is a machine learning algorithm that has extracted information from data in a number of fields, including imaging and spectral analysis, text mining, and microarray data analysis. One limitation with the method for linking genes through microarray data in order to estimate gene function is the high variance observed in transcription levels between different genes. Least squares nonnegative matrix factorization uses estimates of the uncertainties on the mRNA levels for each gene in each condition, to guide the algorithm to a local minimum in normalized chi2, rather than a Euclidean distance or divergence between the reconstructed data and the data itself. Herein, application of this method to microarray data is demonstrated in order to predict gene function.

Assuntos

Algoritmos , Técnicas Genéticas/estatística & dados numéricos , Inteligência Artificial , Análise por Conglomerados , Simulação por Computador , Humanos , Análise dos Mínimos Quadrados , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Interface Usuário-Computador

16.

Incorporation of gene ontology annotations to enhance microarray data analysis.

Ochs, Michael F; Peterson, Aidan J; Kossenkov, Andrew; Bidaut, Ghislain.

Methods Mol Biol ; 377: 243-54, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17634621

RESUMO

Typical microarray or GeneChip experiments now provide genome-wide measurements on gene expression across many conditions. Analysis often focuses on only a few of the genes, looking for those that are "differentially expressed" between conditions or groups of conditions. However, the large number of measurements both present statistical problems to such single gene approaches and offers a tremendous amount of information for methods focused on biological processes rather than individual genes. Here we provide a method to utilize biological annotations in the form of gene ontologies to interpret the results of individual or multiple pattern recognition analyses of a microarray experiment.

Assuntos

Interpretação Estatística de Dados , Genes , Análise em Microsséries/métodos , Biologia Molecular/métodos , Animais , Análise por Conglomerados , Expressão Gênica , Genoma , Humanos , Reconhecimento Automatizado de Padrão

17.

Determining transcription factor activity from microarray data using Bayesian Markov chain Monte Carlo sampling.

Kossenkov, Andrew V; Peterson, Aidan J; Ochs, Michael F.

Stud Health Technol Inform ; 129(Pt 2): 1250-4, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17911915

RESUMO

Many biological processes rely on remodeling of the transcriptional response of cells through activation of transcription factors. Although determination of the activity level of transcription factors from microarray data can provide insight into developmental and disease processes, it requires careful analysis because of the multiple regulation of genes. We present a novel approach that handles both the assignment of genes to multiple patterns, as required by multiple regulation, and the linking of genes in prior probability distributions according to their known transcriptional regulators. We demonstrate the power of this approach in simulations and by application to yeast cell cycle and deletion mutant data. The results of simulations in the presence of increasing noise showed improved recovery of patterns in terms of chi2 fit. Analysis of the yeast data led to improved inference of biologically meaningful groups in comparison to other techniques, as demonstrated with ROC analysis. The new algorithm provides an approach for estimating the levels of transcription factor activity from microarray data, and therefore provides insights into biological response.

Assuntos

Algoritmos , Regulação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Transcrição/metabolismo , Teorema de Bayes , Biologia Computacional , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo , Curva ROC , Transcrição Gênica , Leveduras/genética

18.

Integrative computational analysis of transcriptional and epigenetic alterations implicates DTX1 as a putative tumor suppressor gene in HNSCC.

Gaykalova, Daria A; Zizkova, Veronika; Guo, Theresa; Tiscareno, Ilse; Wei, Yingying; Vatapalli, Rajita; Hennessey, Patrick T; Ahn, Julie; Danilova, Ludmila; Khan, Zubair; Bishop, Justin A; Gutkind, J Silvio; Koch, Wayne M; Westra, William H; Fertig, Elana J; Ochs, Michael F; Califano, Joseph A.

Oncotarget ; 8(9): 15349-15363, 2017 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-28146432

RESUMO

Over a half million new cases of Head and Neck Squamous Cell Carcinoma (HNSCC) are diagnosed annually worldwide, however, 5 year overall survival is only 50% for HNSCC patients. Recently, high throughput technologies have accelerated the genome-wide characterization of HNSCC. However, comprehensive pipelines with statistical algorithms that account for HNSCC biology and perform independent confirmatory and functional validation of candidates are needed to identify the most biologically relevant genes. We applied outlier statistics to high throughput gene expression data, and identified 76 top-scoring candidates with significant differential expression in tumors compared to normal tissues. We identified 15 epigenetically regulated candidates by focusing on a subset of the genes with a negative correlation between gene expression and promoter methylation. Differential expression and methylation of 3 selected candidates (BANK1, BIN2, and DTX1) were confirmed in an independent HNSCC cohorts from Johns Hopkins and TCGA (The Cancer Genome Atlas). We further performed functional evaluation of NOTCH regulator, DTX1, which was downregulated by promoter hypermethylation in tumors, and demonstrated that decreased expression of DTX1 in HNSCC tumors maybe associated with NOTCH pathway activation and increased migration potential.

Assuntos

Carcinoma de Células Escamosas/genética , Epigenômica , Regulação Neoplásica da Expressão Gênica , Genes Supressores de Tumor , Neoplasias de Cabeça e Pescoço/genética , Ubiquitina-Proteína Ligases/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Carcinoma de Células Escamosas/patologia , Linhagem Celular Tumoral , Movimento Celular/genética , Análise por Conglomerados , Estudos de Coortes , Biologia Computacional/métodos , Metilação de DNA , Feminino , Perfilação da Expressão Gênica/métodos , Neoplasias de Cabeça e Pescoço/patologia , Humanos , Masculino , Pessoa de Meia-Idade , Interferência de RNA , Receptores Notch/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transdução de Sinais/genética

19.

LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates.

Wang, Guoli; Kossenkov, Andrew V; Ochs, Michael F.

BMC Bioinformatics ; 7: 175, 2006 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-16569230

RESUMO

BACKGROUND: Non-negative matrix factorisation (NMF), a machine learning algorithm, has been applied to the analysis of microarray data. A key feature of NMF is the ability to identify patterns that together explain the data as a linear combination of expression signatures. Microarray data generally includes individual estimates of uncertainty for each gene in each condition, however NMF does not exploit this information. Previous work has shown that such uncertainties can be extremely valuable for pattern recognition. RESULTS: We have created a new algorithm, least squares non-negative matrix factorization, LS-NMF, which integrates uncertainty measurements of gene expression data into NMF updating rules. While the LS-NMF algorithm maintains the advantages of original NMF algorithm, such as easy implementation and a guaranteed locally optimal solution, the performance in terms of linking functionally related genes has been improved. LS-NMF exceeds NMF significantly in terms of identifying functionally related genes as determined from annotations in the MIPS database. CONCLUSION: Uncertainty measurements on gene expression data provide valuable information for data analysis, and use of this information in the LS-NMF algorithm significantly improves the power of the NMF technique.

Assuntos

Algoritmos , Bases de Dados Genéticas , Análise de Sequência com Séries de Oligonucleotídeos , Incerteza , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Reconhecimento Automatizado de Padrão/métodos , RNA Mensageiro/genética

20.

Determination of strongly overlapping signaling activity from microarray data.

Bidaut, Ghislain; Suhre, Karsten; Claverie, Jean-Michel; Ochs, Michael F.

BMC Bioinformatics ; 7: 99, 2006 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-16507110

RESUMO

BACKGROUND: As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target identification. Microarrays provide a global measure of cellular response, however linking these responses to signaling pathways requires an analytic approach tuned to the underlying biology. An ongoing issue in pattern recognition in microarrays has been how to determine the number of patterns (or clusters) to use for data interpretation, and this is a critical issue as measures of statistical significance in gene ontology or pathways rely on proper separation of genes into groups. RESULTS: Here we introduce a method relying on gene annotation coupled to decompositional analysis of global gene expression data that allows us to estimate specific activity on strongly coupled signaling pathways and, in some cases, activity of specific signaling proteins. We demonstrate the technique using the Rosetta yeast deletion mutant data set, decompositional analysis by Bayesian Decomposition, and annotation analysis using ClutrFree. We determined from measurements of gene persistence in patterns across multiple potential dimensionalities that 15 basis vectors provides the correct dimensionality for interpreting the data. Using gene ontology and data on gene regulation in the Saccharomyces Genome Database, we identified the transcriptional signatures of several cellular processes in yeast, including cell wall creation, ribosomal disruption, chemical blocking of protein synthesis, and, critically, individual signatures of the strongly coupled mating and filamentation pathways. CONCLUSION: This works demonstrates that microarray data can provide downstream indicators of pathway activity either through use of gene ontology or transcription factor databases. This can be used to investigate the specificity and success of targeted therapeutics as well as to elucidate signaling activity in normal and disease processes.

Assuntos

Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais/fisiologia , Fatores de Transcrição/metabolismo , Algoritmos , Simulação por Computador , Proteínas de Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA