Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
Cancer Genet ; 235-236: 1-12, 2019 06.
Article in English | MEDLINE | ID: mdl-31296308

ABSTRACT

Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic/genetics , Neoplasms/genetics , Neoplasms/mortality , Base Sequence , Biomarkers, Tumor/genetics , Data Interpretation, Statistical , Humans , Kaplan-Meier Estimate , Prognosis , Proportional Hazards Models , ROC Curve , Sequence Analysis, RNA/methods , Survival Analysis
2.
Oncogene ; 38(22): 4232-4249, 2019 05.
Article in English | MEDLINE | ID: mdl-30718920

ABSTRACT

Lysine methylation of histones and non-histone substrates by the SET domain containing protein lysine methyltransferase (KMT) G9a/EHMT2 governs transcription contributing to apoptosis, aberrant cell growth, and pluripotency. The positioning of chromosomes within the nuclear three-dimensional space involves interactions between nuclear lamina (NL) and the lamina-associated domains (LAD). Contact of individual LADs with the NL are dependent upon H3K9me2 introduced by G9a. The mechanisms governing the recruitment of G9a to distinct subcellular sites, into chromatin or to LAD, is not known. The cyclin D1 gene product encodes the regulatory subunit of the holoenzyme that phosphorylates pRB and NRF1 thereby governing cell-cycle progression and mitochondrial metabolism. Herein, we show that cyclin D1 enhanced H3K9 dimethylation though direct association with G9a. Endogenous cyclin D1 was required for the recruitment of G9a to target genes in chromatin, for G9a-induced H3K9me2 of histones, and for NL-LAD interaction. The finding that cyclin D1 is required for recruitment of G9a to target genes in chromatin and for H3K9 dimethylation, identifies a novel mechanism coordinating protein methylation.


Subject(s)
Cyclin D1/metabolism , DNA Methylation/physiology , Histocompatibility Antigens/metabolism , Histone-Lysine N-Methyltransferase/metabolism , Histones/metabolism , Cell Cycle/physiology , Cell Line , Cell Line, Tumor , Chromatin/metabolism , Chromosomes/physiology , HEK293 Cells , Humans , MCF-7 Cells , Protein Binding/physiology
3.
PLoS One ; 13(8): e0201751, 2018.
Article in English | MEDLINE | ID: mdl-30092011

ABSTRACT

Pancreatic ductal adenocarcinoma (PDAC) is the third leading cause of cancer death in the US. Despite multiple large-scale genetic sequencing studies, identification of predictors of patient survival remains challenging. We performed a comprehensive assessment and integrative analysis of large-scale gene expression datasets, across multiple platforms, to enable discovery of a prognostic gene signature for patient survival in pancreatic cancer. PDAC RNA-Sequencing data from The Cancer Genome Atlas was stratified into Survival+ (>2-year survival) and Survival-(<1-year survival) cohorts (n = 47). Comparisons of RNA expression profiles between survival groups and normal pancreatic tissue expression data from the Gene Expression Omnibus generated an initial PDAC specific prognostic differential expression gene list. The candidate prognostic gene list was then trained on the Australian pancreatic cancer dataset from the ICGC database (n = 103), using iterative sampling based algorithms, to derive a gene signature predictive of patient survival. The gene signature was validated in 2 independent patient cohorts and against existing PDAC subtype classifications. We identified 707 candidate prognostic genes exhibiting differential expression in tumor versus normal tissue. A substantial fraction of these genes was also found to be differentially methylated between survival groups. From the candidate gene list, a 5-gene signature (ADM, ASPM, DCBLD2, E2F7, and KRT6A) was identified. Our signature demonstrated significant power to predict patient survival in two distinct patient cohorts and was independent of AJCC TNM staging. Cross-validation of our gene signature reported a better ROC AUC (≥ 0.8) when compared to existing PDAC survival signatures. Furthermore, validation of our signature through immunohistochemical analysis of patient tumor tissue and existing gene expression subtyping data in PDAC, demonstrated a correlation to the presence of vascular invasion and the aggressive squamous tumor subtype. Assessment of these genes in patient biopsies could help further inform risk-stratification and treatment decisions in pancreatic cancer.


Subject(s)
Carcinoma, Pancreatic Ductal/metabolism , Carcinoma, Pancreatic Ductal/mortality , Pancreas/metabolism , Pancreatic Neoplasms/metabolism , Pancreatic Neoplasms/mortality , Aged , Algorithms , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/pathology , Cohort Studies , DNA Methylation , Female , Gene Expression Regulation, Neoplastic , Humans , Immunohistochemistry , Male , Microarray Analysis , Middle Aged , Models, Biological , Pancreas/pathology , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/pathology , Prognosis , Sequence Analysis, RNA , Survival Analysis
4.
Cancer Inform ; 14: 113-9, 2015.
Article in English | MEDLINE | ID: mdl-26494976

ABSTRACT

Ovarian cancer (OC) is a leading cause of cancer mortality, but aside from a few well-studied mutations, very little is known about its underlying causes. As such, we performed survival analysis on ovarian copy number amplifications and gene expression datasets presented by The Cancer Genome Atlas in order to identify potential drivers and markers of aggressive OC. Additionally, two independent datasets from the Gene Expression Omnibus web platform were used to validate the identified markers. Based on our analysis, we identified FXYD5, a glycoprotein known to reduce cell adhesion, as a potential driver of metastasis and a significant predictor of mortality in OC. As a marker of poor outcome, the protein has effective antibodies against it for use in tissue arrays. FXYD5 bridges together a wide variety of cancers, including ovarian, breast cancer stage II, thyroid, colorectal, pancreatic, and head and neck cancers for metastasis studies.

5.
BMC Genomics ; 16: 773, 2015 Oct 13.
Article in English | MEDLINE | ID: mdl-26459834

ABSTRACT

BACKGROUND: Bacterial infections comprise a global health challenge as the incidences of antibiotic resistance increase. Pathogenic potential of bacteria has been shown to be context dependent, varying in response to environment and even within the strains of the same genus. RESULTS: We used the KEGG repository and extensive literature searches to identify among the 2527 bacterial genomes in the literature those implicated as pathogenic to the host, including those which show pathogenicity in a context dependent manner. Using data on the gene contents of these genomes, we identified sets of genes highly abundant in pathogenic but relatively absent in commensal strains and vice versa. In addition, we carried out genome comparison within a genus for the seventeen largest genera in our genome collection. We projected the resultant lists of ortholog genes onto KEGG bacterial pathways to identify clusters and circuits, which can be linked to either pathogenicity or synergy. Gene circuits relatively abundant in nonpathogenic bacteria often mediated biosynthesis of antibiotics. Other synergy-linked circuits reduced drug-induced toxicity. Pathogen-abundant gene circuits included modules in one-carbon folate, two-component system, type-3 secretion system, and peptidoglycan biosynthesis. Antibiotics-resistant bacterial strains possessed genes modulating phagocytosis, vesicle trafficking, cytoskeletal reorganization, and regulation of the inflammatory response. Our study also identified bacterial genera containing a circuit, elements of which were previously linked to Alzheimer's disease. CONCLUSIONS: Present study produces for the first time, a signature, in the form of a robust list of gene circuitry whose presence or absence could potentially define the pathogenicity of a microbiome. Extensive literature search substantiated a bulk majority of the commensal and pathogenic circuitry in our predicted list. Scanning microbiome libraries for these circuitry motifs will provide further insights into the complex and context dependent pathogenicity of bacteria.


Subject(s)
Bacteria/genetics , Bacteria/pathogenicity , Gene Regulatory Networks , Genes, Bacterial , Genome, Bacterial , Genomics/methods , Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacterial Infections/microbiology , Computational Biology/methods , Drug Resistance, Bacterial , Host-Pathogen Interactions , Multigene Family
6.
Oncotarget ; 6(11): 8525-38, 2015 Apr 20.
Article in English | MEDLINE | ID: mdl-25940700

ABSTRACT

Cyclin D1 is an important molecular driver of human breast cancer but better understanding of its oncogenic mechanisms is needed, especially to enhance efforts in targeted therapeutics. Currently, pharmaceutical initiatives to inhibit cyclin D1 are focused on the catalytic component since the transforming capacity is thought to reside in the cyclin D1/CDK activity. We initiated the following study to directly test the oncogenic potential of catalytically inactive cyclin D1 in an in vivo mouse model that is relevant to breast cancer. Herein, transduction of cyclin D1(-/-) mouse embryonic fibroblasts (MEFs) with the kinase dead KE mutant of cyclin D1 led to aneuploidy, abnormalities in mitotic spindle formation, autosome amplification, and chromosomal instability (CIN) by gene expression profiling. Acute transgenic expression of either cyclin D1(WT) or cyclin D1(KE) in the mammary gland was sufficient to induce a high CIN score within 7 days. Sustained expression of cyclin D1(KE) induced mammary adenocarcinoma with similar kinetics to that of the wild-type cyclin D1. ChIP-Seq studies demonstrated recruitment of cyclin D1(WT) and cyclin D1(KE) to the genes governing CIN. We conclude that the CDK-activating function of cyclin D1 is not necessary to induce either chromosomal instability or mammary tumorigenesis.


Subject(s)
Adenocarcinoma/genetics , Cyclin D1/physiology , Mammary Neoplasms, Experimental/genetics , Amino Acid Substitution , Aneuploidy , Animals , Catalytic Domain/genetics , Cell Transformation, Neoplastic/genetics , Cells, Cultured , Centrosome/ultrastructure , Chromosomal Instability/genetics , Cyclin D1/deficiency , Cyclin D1/genetics , Female , Fibroblasts , Genes, bcl-1 , Humans , Mammary Tumor Virus, Mouse/physiology , Mice , Mice, Knockout , Mice, Transgenic , Mutation , Piperazines/pharmacology , Pyridines/pharmacology , Recombinant Fusion Proteins/metabolism , Spindle Apparatus/ultrastructure , Transduction, Genetic
7.
Nat Commun ; 4: 2812, 2013.
Article in English | MEDLINE | ID: mdl-24287487

ABSTRACT

Cyclin D1 encodes the regulatory subunit of a holoenzyme that phosphorylates the pRB protein and promotes G1/S cell-cycle progression and oncogenesis. Dicer is a central regulator of miRNA maturation, encoding an enzyme that cleaves double-stranded RNA or stem-loop-stem RNA into 20-25 nucleotide long small RNA, governing sequence-specific gene silencing and heterochromatin methylation. The mechanism by which the cell cycle directly controls the non-coding genome is poorly understood. Here we show that cyclin D1(-/-) cells are defective in pre-miRNA processing which is restored by cyclin D1a rescue. Cyclin D1 induces Dicer expression in vitro and in vivo. Dicer is transcriptionally targeted by cyclin D1, via a cdk-independent mechanism. Cyclin D1 and Dicer expression significantly correlates in luminal A and basal-like subtypes of human breast cancer. Cyclin D1 and Dicer maintain heterochromatic histone modification (Tri-m-H3K9). Cyclin D1-mediated cellular proliferation and migration is Dicer-dependent. We conclude that cyclin D1 induction of Dicer coordinates microRNA biogenesis.


Subject(s)
Breast Neoplasms/metabolism , Cyclin D1/physiology , Gene Expression Regulation, Neoplastic , Mammary Neoplasms, Experimental/metabolism , MicroRNAs/biosynthesis , Ribonuclease III/metabolism , Animals , Breast Neoplasms/enzymology , Breast Neoplasms/genetics , Cell Movement/genetics , Cell Proliferation , Female , HCT116 Cells , Histones/metabolism , Humans , MCF-7 Cells , Mammary Neoplasms, Experimental/enzymology , Mammary Neoplasms, Experimental/genetics , Mice , Mice, Inbred C57BL , Mice, Transgenic , MicroRNAs/genetics , Protein Processing, Post-Translational/genetics
8.
Cancer Res ; 73(11): 3262-74, 2013 Jun 01.
Article in English | MEDLINE | ID: mdl-23492369

ABSTRACT

Hyperactive EGF receptor (EGFR) and mutant p53 are common genetic abnormalities driving the progression of non-small cell lung cancer (NSCLC), the leading cause of cancer deaths in the world. The Drosophila gene Dachshund (Dac) was originally cloned as an inhibitor of hyperactive EGFR alleles. Given the importance of EGFR signaling in lung cancer etiology, we examined the role of DACH1 expression in lung cancer development. DACH1 protein and mRNA expression was reduced in human NSCLC. Reexpression of DACH1 reduced NSCLC colony formation and tumor growth in vivo via p53. Endogenous DACH1 colocalized with p53 in a nuclear, extranucleolar location, and shared occupancy of -15% of p53-bound genes in ChIP sequencing. The C-terminus of DACH1 was necessary for direct p53 binding, contributing to the inhibition of colony formation and cell-cycle arrest. Expression of the stem cell factor SOX2 was repressed by DACH1, and SOX2 expression was inversely correlated with DACH1 in NSCLC. We conclude that DACH1 binds p53 to inhibit NSCLC cellular growth.


Subject(s)
Adenocarcinoma/metabolism , Adenocarcinoma/pathology , Eye Proteins/metabolism , Lung Neoplasms/metabolism , Lung Neoplasms/pathology , Transcription Factors/metabolism , Tumor Suppressor Protein p53/metabolism , Adenocarcinoma/genetics , Adenocarcinoma of Lung , Animals , Cell Cycle Checkpoints/physiology , Cell Growth Processes/physiology , Cell Line, Tumor , Cyclin-Dependent Kinase Inhibitor p21/metabolism , Eye Proteins/genetics , Female , Genes, p53 , HCT116 Cells , HEK293 Cells , Heterografts , Humans , Immunohistochemistry , Lung Neoplasms/genetics , Mice , Mice, Nude , Rad51 Recombinase/antagonists & inhibitors , Rad51 Recombinase/metabolism , SOXB1 Transcription Factors/biosynthesis , SOXB1 Transcription Factors/genetics , Transcription Factors/genetics , Transcription, Genetic , Transfection , Tumor Suppressor Protein p53/genetics
9.
Inflamm Bowel Dis ; 18(12): 2315-33, 2012 Dec.
Article in English | MEDLINE | ID: mdl-22488912

ABSTRACT

BACKGROUND: Inflammatory bowel disease (IBD) is a complex disorder involving pathogen infection, host immune response, and altered enterocyte physiology. Incidences of IBD are increasing at an alarming rate in developed countries, warranting a detailed molecular portrait of IBD. METHODS: We used large-scale data, bioinformatics tools, and high-throughput computations to obtain gene and microRNA signatures for Crohn's disease (CD) and ulcerative colitis (UC). These signatures were then integrated with systemic literature review to draw a comprehensive portrait of IBD in relation to autoimmune diseases. RESULTS: The top upregulated genes in IBD are associated with diabetogenesis (REG1A, REG1B), bacterial signals (TLRs, NLRs), innate immunity (DEFA6, IDO1, EXOSC1), inflammation (CXCLs), and matrix degradation (MMPs). The downregulated genes coded tight junction proteins (CLDN8), solute transporters (SLCs), and adhesion proteins. Genes highly expressed in UC compared to CD included antiinflammatory ANXA1, transporter ABCA12, T-cell activator HSH2D, and immunoglobulin IGHV4-34. Compromised metabolisms for processing of drugs, nitrogen, androgen and estrogen, and lipids in IBD correlated with an increase in specific microRNA. Highly expressed IBD genes constituted targets of drugs used in gastrointestinal cancers, viral infections, and autoimmunity disorders such as rheumatoid arthritis and asthma. CONCLUSIONS: This study presents a clinically relevant gene-level portrait of IBD subtypes and their connectivity to autoimmune diseases. The study identified candidates for repositioning of existing drugs to manage IBD. Integration of mice and human data point to an altered B-cell response as a cause for upregulation of genes in IBD involved in other aspects of immune defense such as interferon-inducible responses.


Subject(s)
Autoimmune Diseases/genetics , Inflammatory Bowel Diseases/genetics , MicroRNAs/genetics , Transcriptome/genetics , Animals , Autoimmune Diseases/drug therapy , Chromosome Mapping , Colitis, Ulcerative/drug therapy , Colitis, Ulcerative/genetics , Colitis, Ulcerative/immunology , Computational Biology , Crohn Disease/drug therapy , Crohn Disease/genetics , Crohn Disease/immunology , Gene Expression Profiling , Genes/genetics , Humans , Inflammatory Bowel Diseases/drug therapy , Inflammatory Bowel Diseases/immunology , Mice , Oligonucleotide Array Sequence Analysis
10.
PLoS One ; 7(3): e33174, 2012.
Article in English | MEDLINE | ID: mdl-22432004

ABSTRACT

BACKGROUND: Pandemic and seasonal respiratory viruses are a major global health concern. Given the genetic diversity of respiratory viruses and the emergence of drug resistant strains, the targeted disruption of human host-virus interactions is a potential therapeutic strategy for treating multi-viral infections. The availability of large-scale genomic datasets focused on host-pathogen interactions can be used to discover novel drug targets as well as potential opportunities for drug repositioning. METHODS/RESULTS: In this study, we performed a large-scale analysis of microarray datasets involving host response to infections by influenza A virus, respiratory syncytial virus, rhinovirus, SARS-coronavirus, metapneumonia virus, coxsackievirus and cytomegalovirus. Common genes and pathways were found through a rigorous, iterative analysis pipeline where relevant host mRNA expression datasets were identified, analyzed for quality and gene differential expression, then mapped to pathways for enrichment analysis. Possible repurposed drugs targets were found through database and literature searches. A total of 67 common biological pathways were identified among the seven different respiratory viruses analyzed, representing fifteen laboratories, nine different cell types, and seven different array platforms. A large overlap in the general immune response was observed among the top twenty of these 67 pathways, adding validation to our analysis strategy. Of the top five pathways, we found 53 differentially expressed genes affected by at least five of the seven viruses. We suggest five new therapeutic indications for existing small molecules or biological agents targeting proteins encoded by the genes F3, IL1B, TNF, CASP1 and MMP9. Pathway enrichment analysis also identified a potential novel host response, the Parkin-Ubiquitin Proteasomal System (Parkin-UPS) pathway, which is known to be involved in the progression of neurodegenerative Parkinson's disease. CONCLUSIONS: Our study suggests that multiple and diverse respiratory viruses invoke several common host response pathways. Further analysis of these pathways suggests potential opportunities for therapeutic intervention.


Subject(s)
Antiviral Agents/pharmacology , Gene Expression Profiling , Host-Pathogen Interactions/drug effects , Host-Pathogen Interactions/genetics , Molecular Targeted Therapy , Respiratory Syncytial Viruses/drug effects , Signal Transduction/drug effects , Antiviral Agents/therapeutic use , Databases, Genetic , Gene Expression Regulation, Viral/drug effects , Humans , Oligonucleotide Array Sequence Analysis , Proteasome Endopeptidase Complex/metabolism , Quality Control , RNA, Messenger/genetics , RNA, Messenger/metabolism , Respiratory Syncytial Virus Infections/drug therapy , Respiratory Syncytial Virus Infections/genetics , Respiratory Syncytial Virus Infections/virology , Respiratory Syncytial Viruses/physiology , Signal Transduction/genetics , Ubiquitin/metabolism , Ubiquitin-Protein Ligases/metabolism
11.
J Clin Invest ; 122(3): 833-43, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22307325

ABSTRACT

Chromosomal instability (CIN) in tumors is characterized by chromosomal abnormalities and an altered gene expression signature; however, the mechanism of CIN is poorly understood. CCND1 (which encodes cyclin D1) is overexpressed in human malignancies and has been shown to play a direct role in transcriptional regulation. Here, we used genome-wide ChIP sequencing and found that the DNA-bound form of cyclin D1 occupied the regulatory region of genes governing chromosomal integrity and mitochondrial biogenesis. Adding cyclin D1 back to Ccnd1(-/-) mouse embryonic fibroblasts resulted in CIN gene regulatory region occupancy by the DNA-bound form of cyclin D1 and induction of CIN gene expression. Furthermore, increased chromosomal aberrations, aneuploidy, and centrosome abnormalities were observed in the cyclin D1-rescued cells by spectral karyotyping and immunofluorescence. To assess cyclin D1 effects in vivo, we generated transgenic mice with acute and continuous mammary gland-targeted cyclin D1 expression. These transgenic mice presented with increased tumor prevalence and signature CIN gene profiles. Additionally, interrogation of gene expression from 2,254 human breast tumors revealed that cyclin D1 expression correlated with CIN in luminal B breast cancer. These data suggest that cyclin D1 contributes to CIN and tumorigenesis by directly regulating a transcriptional program that governs chromosomal stability.


Subject(s)
Chromosomal Instability , Cyclin D1/genetics , Animals , Binding Sites , Breast Neoplasms/genetics , Cell Line, Tumor , Chromatin Immunoprecipitation , Chromosome Aberrations , Female , Fibroblasts/metabolism , Gene Expression Regulation, Neoplastic , Genome-Wide Association Study , Humans , Karyotyping , Mice , Mice, Transgenic , Transcription, Genetic
12.
PLoS One ; 6(8): e23293, 2011.
Article in English | MEDLINE | ID: mdl-21858059

ABSTRACT

HIV proteins target host hub proteins for transient binding interactions. The presence of viral proteins in the infected cell results in out-competition of host proteins in their interaction with hub proteins, drastically affecting cell physiology. Functional genomics and interactome datasets can be used to quantify the sequence hotspots on the HIV proteome mediating interactions with host hub proteins. In this study, we used the HIV and human interactome databases to identify HIV targeted host hub proteins and their host binding partners (H2). We developed a high throughput computational procedure utilizing motif discovery algorithms on sets of protein sequences, including sequences of HIV and H2 proteins. We identified as HIV sequence hotspots those linear motifs that are highly conserved on HIV sequences and at the same time have a statistically enriched presence on the sequences of H2 proteins. The HIV protein motifs discovered in this study are expressed by subsets of H2 host proteins potentially outcompeted by HIV proteins. A large subset of these motifs is involved in cleavage, nuclear localization, phosphorylation, and transcription factor binding events. Many such motifs are clustered on an HIV sequence in the form of hotspots. The sequential positions of these hotspots are consistent with the curated literature on phenotype altering residue mutations, as well as with existing binding site data. The hotspot map produced in this study is the first global portrayal of HIV motifs involved in altering the host protein network at highly connected hub nodes.


Subject(s)
Human Immunodeficiency Virus Proteins/metabolism , Protein Interaction Mapping/methods , Proteins/metabolism , Amino Acid Motifs/genetics , Amino Acid Sequence , Binding Sites/genetics , CREB-Binding Protein/metabolism , Calcium-Calmodulin-Dependent Protein Kinases/metabolism , Calmodulin/metabolism , Casein Kinase II/metabolism , Databases, Protein , Human Immunodeficiency Virus Proteins/chemistry , Human Immunodeficiency Virus Proteins/genetics , Humans , Hydrophobic and Hydrophilic Interactions , Mitogen-Activated Protein Kinase 1/metabolism , Models, Molecular , Protein Binding , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/genetics , env Gene Products, Human Immunodeficiency Virus/chemistry , env Gene Products, Human Immunodeficiency Virus/genetics , env Gene Products, Human Immunodeficiency Virus/metabolism , gag Gene Products, Human Immunodeficiency Virus/chemistry , gag Gene Products, Human Immunodeficiency Virus/genetics , gag Gene Products, Human Immunodeficiency Virus/metabolism , nef Gene Products, Human Immunodeficiency Virus/chemistry , nef Gene Products, Human Immunodeficiency Virus/genetics , nef Gene Products, Human Immunodeficiency Virus/metabolism , rev Gene Products, Human Immunodeficiency Virus/chemistry , rev Gene Products, Human Immunodeficiency Virus/genetics , rev Gene Products, Human Immunodeficiency Virus/metabolism , tat Gene Products, Human Immunodeficiency Virus/chemistry , tat Gene Products, Human Immunodeficiency Virus/genetics , tat Gene Products, Human Immunodeficiency Virus/metabolism
13.
PLoS One ; 6(6): e20735, 2011.
Article in English | MEDLINE | ID: mdl-21738584

ABSTRACT

Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.


Subject(s)
Computational Biology/methods , nef Gene Products, Human Immunodeficiency Virus/chemistry , nef Gene Products, Human Immunodeficiency Virus/metabolism , Amino Acid Motifs , Amino Acid Sequence , CD4 Antigens/chemistry , CD4 Antigens/metabolism , GTP-Binding Proteins/chemistry , GTP-Binding Proteins/metabolism , HLA-A Antigens/chemistry , HLA-A Antigens/metabolism , Humans , Mitogen-Activated Protein Kinase 1/chemistry , Mitogen-Activated Protein Kinase 1/metabolism , Molecular Sequence Data , Neoplasm Proteins/chemistry , Neoplasm Proteins/metabolism , Protein Binding , Proto-Oncogene Proteins c-fyn/chemistry , Proto-Oncogene Proteins c-fyn/metabolism , Proto-Oncogene Proteins c-hck/chemistry , Proto-Oncogene Proteins c-hck/metabolism , Proto-Oncogene Proteins c-vav/chemistry , Proto-Oncogene Proteins c-vav/metabolism , Receptors for Activated C Kinase , Receptors, Cell Surface/chemistry , Receptors, Cell Surface/metabolism
14.
Int J Cancer ; 128(12): 2881-91, 2011 Jun 15.
Article in English | MEDLINE | ID: mdl-21165954

ABSTRACT

The global gene expression analysis of cancer and healthy tissues typically results in large numbers of genes that are significantly altered in cancer. Such data, however, has been difficult to interpret due to the high level of variation of gene lists across laboratories and the small sample sizes used in individual studies. In this investigation, we compiled microarray data obtained from the same platform family from 84 laboratories, resulting in a database containing 1,043 healthy tissue samples and 4,900 cancer samples for 13 different tissue types. The primary cancers considered included adrenal gland, brain, breast, cervix, colon, kidney, liver, lung, ovary, pancreas, prostate and skin tissues. We normalized the data together and analyzed subsets for the discovery of genes involved in normal to cancer transformation. Our integrated significance analysis of microarrays approach produced top 400 gene lists for each of the 13 cancer types. These lists were highly statistically enriched with genes already associated with cancer in research publications excluding microarray studies (p < 1.31 E - 12). The genes MTIM and RRM2 appeared in nine and TOP2A in eight lists of significantly altered genes in cancer. In total, there were 132 genes present in at least four gene lists, 11 of which were not previously associated with cancer. The list contains 17 metal ions and 15 adenyl ribonucleotide binding proteins, six kinases and six transcription factors. Our results point to the value of integrating microarray data in the study of combination drug therapies targeting metastasis.


Subject(s)
Neoplasms/genetics , Oligonucleotide Array Sequence Analysis , Humans , Neoplasms/classification
15.
PLoS One ; 5(9): e12890, 2010 Sep 23.
Article in English | MEDLINE | ID: mdl-20886114

ABSTRACT

Single nucleotide polymorphisms (SNPs) constitute an important mode of genetic variations observed in the human genome. A small fraction of SNPs, about four thousand out of the ten million, has been associated with genetic disorders and complex diseases. The present study focuses on SNPs that fall on protein domains, 3D structures that facilitate connectivity of proteins in cell signaling and metabolic pathways. We scanned the human proteome using the PROSITE web tool and identified proteins with SNP containing domains. We showed that SNPs that fall on protein domains are highly statistically enriched among SNPs linked to hereditary disorders and complex diseases. Proteins whose domains are dramatically altered by the presence of an SNP are even more likely to be present among proteins linked to hereditary disorders. Proteins with domain-altering SNPs comprise highly connected nodes in cellular pathways such as the focal adhesion, the axon guidance pathway and the autoimmune disease pathways. Statistical enrichment of domain/motif signatures in interacting protein pairs indicates extensive loss of connectivity of cell signaling pathways due to domain-altering SNPs, potentially leading to hereditary disorders.


Subject(s)
Polymorphism, Single Nucleotide , Proteins/chemistry , Proteins/genetics , Signal Transduction , Databases, Protein , Genetic Diseases, Inborn/genetics , Genetic Diseases, Inborn/metabolism , Genome, Human , Humans , Protein Structure, Tertiary , Proteome/analysis , Proteome/genetics , Proteome/metabolism
16.
BMC Bioinformatics ; 11: 483, 2010 Sep 27.
Article in English | MEDLINE | ID: mdl-20875095

ABSTRACT

BACKGROUND: Much of the public access cancer microarray data is asymmetric, belonging to datasets containing no samples from normal tissue. Asymmetric data cannot be used in standard meta-analysis approaches (such as the inverse variance method) to obtain large sample sizes for statistical power enrichment. Noting that plenty of normal tissue microarray samples exist in studies not involving cancer, we investigated the viability and accuracy of an integrated microarray analysis approach based on significance analysis of microarrays (merged SAM) using a collection of data from separate diseased and normal samples. RESULTS: We focused on five solid cancer types (colon, kidney, liver, lung, and pancreas), where available microarray data allowed us to compare meta-analysis and integrated approaches. Our results from the merged SAM significantly overlapped gene lists from the validated inverse-variance method. Both meta-analysis and merged SAM approaches successfully captured the aberrances in the cell cycle that commonly occur in the different cancer types. However, the integrated SAM analysis replicated the known cancer literature (excluding microarray studies) with much more accuracy than the meta-analysis. CONCLUSION: The merged SAM test is a powerful, robust approach for combining data from similar platforms and for analyzing asymmetric datasets, including those with only normal or only cancer samples that cannot be utilized by meta-analysis methods. The integrated SAM approach can also be used in comparing global gene expression between various subtypes of cancer arising from the same tissue.


Subject(s)
Gene Expression Profiling/methods , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis/methods , Data Interpretation, Statistical , Databases, Genetic , Humans , Neoplasms/classification
17.
BMC Bioinformatics ; 11: 349, 2010 Jun 25.
Article in English | MEDLINE | ID: mdl-20579376

ABSTRACT

BACKGROUND: Phosphorylation events direct the flow of signals and metabolites along cellular protein networks. Current annotations of kinase-substrate binding events are far from complete. In this study, we scanned the entire human protein sequences using the PROSITE domain annotation tool to identify patterns of domain composition in kinases and their substrates. We identified statistically enriched pairs of strings of domains (signature pairs) in kinase-substrate couples presented in the 2006 version of the PTM database. RESULTS: The signature pairs enriched in kinase - substrate binding interactions turned out to be highly specific to kinase subtypes. The resulting list of signature pairs predicted kinase-substrate interactions in validation dataset not used in learning with high statistical accuracy. CONCLUSIONS: The method presented here produces predictions of protein phosphorylation events with high accuracy and mid-level coverage. Our method can be used in expanding the currently available drafts of cell signaling pathways and thus will be an important tool in the development of combination drug therapies targeting complex diseases.


Subject(s)
Phosphotransferases/metabolism , Proteome/analysis , Humans , Phosphorylation , Protein Binding , Protein Structure, Tertiary , Sequence Analysis, Protein , Signal Transduction , Substrate Specificity
18.
Proc Natl Acad Sci U S A ; 107(15): 6864-9, 2010 Apr 13.
Article in English | MEDLINE | ID: mdl-20351289

ABSTRACT

The Drosophila Dachshund (Dac) gene, cloned as a dominant inhibitor of the hyperactive growth factor mutant ellipse, encodes a key component of the retinal determination gene network that governs cell fate. Herein, cyclic amplification and selection of targets identified a DACH1 DNA-binding sequence that resembles the FOX (Forkhead box-containing protein) binding site. Genome-wide in silico promoter analysis of DACH1 binding sites identified gene clusters populating cellular pathways associated with the cell cycle and growth factor signaling. ChIP coupled with high-throughput sequencing mapped DACH1 binding sites to corresponding gene clusters predicted in silico and identified as weight matrix resembling the cyclic amplification and selection of targets-defined sequence. DACH1 antagonized FOXM1 target gene expression, promoter occupancy in the context of local chromatin, and contact-independent growth. Attenuation of FOX function by the cell fate determination pathway has broad implications given the diverse role of FOX proteins in cellular biology and tumorigenesis.


Subject(s)
Eye Proteins/metabolism , Forkhead Transcription Factors/metabolism , Retina/metabolism , Transcription Factors/metabolism , Binding Sites , Cell Lineage , Chromatin/chemistry , Computational Biology/methods , DNA/chemistry , Forkhead Box Protein M1 , Gene Expression Regulation , Genome , HeLa Cells , Humans , Promoter Regions, Genetic , Protein Binding , Signal Transduction
19.
PLoS One ; 5(1): e8942, 2010 Jan 28.
Article in English | MEDLINE | ID: mdl-20126615

ABSTRACT

Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs). MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.


Subject(s)
HIV/metabolism , Mitogen-Activated Protein Kinases/metabolism , Sequence Alignment , Viral Proteins/metabolism , Humans , Mitogen-Activated Protein Kinases/chemistry , Phosphorylation , Protein Binding
20.
BMC Med Genomics ; 2: 47, 2009 Jul 23.
Article in English | MEDLINE | ID: mdl-19627600

ABSTRACT

BACKGROUND: The HIV viral genome mutates at a high rate and poses a significant long term health risk even in the presence of combination antiretroviral therapy. Current methods for predicting a patient's response to therapy rely on site-directed mutagenesis experiments and in vitro resistance assays. In this bioinformatics study we treat response to antiretroviral therapy as a two-body problem: response to therapy is considered to be a function of both the host and pathogen proteomes. We set out to identify potential responders based on the presence or absence of host protein and DNA motifs on the HIV proteome. RESULTS: An alignment of thousands of HIV-1 sequences attested to extensive variation in nucleotide sequence but also showed conservation of eukaryotic short linear motifs on the protein coding regions. The reduction in viral load of patients in the Stanford HIV Drug Resistance Database exhibited a bimodal distribution after 24 weeks of antiretroviral therapy, with 2,000 copies/ml cutoff. Similarly, patients allocated into responder/non-responder categories based on consistent viral load reduction during a 24 week period showed clear separation. In both cases of phenotype identification, a set of features composed of short linear motifs in the reverse transcriptase region of HIV sequence accurately predicted a patient's response to therapy. Motifs that overlap resistance sites were highly predictive of responder identification in single drug regimens but these features lost importance in defining responders in multi-drug therapies. CONCLUSION: HIV sequence mutates in a way that preferentially preserves peptide sequence motifs that are also found in the human proteome. The presence and absence of such motifs at specific regions of the HIV sequence is highly predictive of response to therapy. Some of these predictive motifs overlap with known HIV-1 resistance sites. These motifs are well established in bioinformatics databases and hence do not require identification via in vitro mutation experiments.

SELECTION OF CITATIONS
SEARCH DETAIL
...