RESUMO
Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.
Assuntos
Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias/genética , Neoplasias/mortalidade , Sequência de Bases , Biomarcadores Tumorais/genética , Interpretação Estatística de Dados , Humanos , Estimativa de Kaplan-Meier , Prognóstico , Modelos de Riscos Proporcionais , Curva ROC , Análise de Sequência de RNA/métodos , Análise de SobrevidaRESUMO
Lysine methylation of histones and non-histone substrates by the SET domain containing protein lysine methyltransferase (KMT) G9a/EHMT2 governs transcription contributing to apoptosis, aberrant cell growth, and pluripotency. The positioning of chromosomes within the nuclear three-dimensional space involves interactions between nuclear lamina (NL) and the lamina-associated domains (LAD). Contact of individual LADs with the NL are dependent upon H3K9me2 introduced by G9a. The mechanisms governing the recruitment of G9a to distinct subcellular sites, into chromatin or to LAD, is not known. The cyclin D1 gene product encodes the regulatory subunit of the holoenzyme that phosphorylates pRB and NRF1 thereby governing cell-cycle progression and mitochondrial metabolism. Herein, we show that cyclin D1 enhanced H3K9 dimethylation though direct association with G9a. Endogenous cyclin D1 was required for the recruitment of G9a to target genes in chromatin, for G9a-induced H3K9me2 of histones, and for NL-LAD interaction. The finding that cyclin D1 is required for recruitment of G9a to target genes in chromatin and for H3K9 dimethylation, identifies a novel mechanism coordinating protein methylation.
Assuntos
Ciclina D1/metabolismo , Metilação de DNA/fisiologia , Antígenos de Histocompatibilidade/metabolismo , Histona-Lisina N-Metiltransferase/metabolismo , Histonas/metabolismo , Ciclo Celular/fisiologia , Linhagem Celular , Linhagem Celular Tumoral , Cromatina/metabolismo , Cromossomos/fisiologia , Células HEK293 , Humanos , Células MCF-7 , Ligação Proteica/fisiologiaRESUMO
Pancreatic ductal adenocarcinoma (PDAC) is the third leading cause of cancer death in the US. Despite multiple large-scale genetic sequencing studies, identification of predictors of patient survival remains challenging. We performed a comprehensive assessment and integrative analysis of large-scale gene expression datasets, across multiple platforms, to enable discovery of a prognostic gene signature for patient survival in pancreatic cancer. PDAC RNA-Sequencing data from The Cancer Genome Atlas was stratified into Survival+ (>2-year survival) and Survival-(<1-year survival) cohorts (n = 47). Comparisons of RNA expression profiles between survival groups and normal pancreatic tissue expression data from the Gene Expression Omnibus generated an initial PDAC specific prognostic differential expression gene list. The candidate prognostic gene list was then trained on the Australian pancreatic cancer dataset from the ICGC database (n = 103), using iterative sampling based algorithms, to derive a gene signature predictive of patient survival. The gene signature was validated in 2 independent patient cohorts and against existing PDAC subtype classifications. We identified 707 candidate prognostic genes exhibiting differential expression in tumor versus normal tissue. A substantial fraction of these genes was also found to be differentially methylated between survival groups. From the candidate gene list, a 5-gene signature (ADM, ASPM, DCBLD2, E2F7, and KRT6A) was identified. Our signature demonstrated significant power to predict patient survival in two distinct patient cohorts and was independent of AJCC TNM staging. Cross-validation of our gene signature reported a better ROC AUC (≥ 0.8) when compared to existing PDAC survival signatures. Furthermore, validation of our signature through immunohistochemical analysis of patient tumor tissue and existing gene expression subtyping data in PDAC, demonstrated a correlation to the presence of vascular invasion and the aggressive squamous tumor subtype. Assessment of these genes in patient biopsies could help further inform risk-stratification and treatment decisions in pancreatic cancer.
Assuntos
Carcinoma Ductal Pancreático/metabolismo , Carcinoma Ductal Pancreático/mortalidade , Pâncreas/metabolismo , Neoplasias Pancreáticas/metabolismo , Neoplasias Pancreáticas/mortalidade , Idoso , Algoritmos , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/patologia , Estudos de Coortes , Metilação de DNA , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Imuno-Histoquímica , Masculino , Análise em Microsséries , Pessoa de Meia-Idade , Modelos Biológicos , Pâncreas/patologia , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologia , Prognóstico , Análise de Sequência de RNA , Análise de SobrevidaRESUMO
Ovarian cancer (OC) is a leading cause of cancer mortality, but aside from a few well-studied mutations, very little is known about its underlying causes. As such, we performed survival analysis on ovarian copy number amplifications and gene expression datasets presented by The Cancer Genome Atlas in order to identify potential drivers and markers of aggressive OC. Additionally, two independent datasets from the Gene Expression Omnibus web platform were used to validate the identified markers. Based on our analysis, we identified FXYD5, a glycoprotein known to reduce cell adhesion, as a potential driver of metastasis and a significant predictor of mortality in OC. As a marker of poor outcome, the protein has effective antibodies against it for use in tissue arrays. FXYD5 bridges together a wide variety of cancers, including ovarian, breast cancer stage II, thyroid, colorectal, pancreatic, and head and neck cancers for metastasis studies.
RESUMO
BACKGROUND: Bacterial infections comprise a global health challenge as the incidences of antibiotic resistance increase. Pathogenic potential of bacteria has been shown to be context dependent, varying in response to environment and even within the strains of the same genus. RESULTS: We used the KEGG repository and extensive literature searches to identify among the 2527 bacterial genomes in the literature those implicated as pathogenic to the host, including those which show pathogenicity in a context dependent manner. Using data on the gene contents of these genomes, we identified sets of genes highly abundant in pathogenic but relatively absent in commensal strains and vice versa. In addition, we carried out genome comparison within a genus for the seventeen largest genera in our genome collection. We projected the resultant lists of ortholog genes onto KEGG bacterial pathways to identify clusters and circuits, which can be linked to either pathogenicity or synergy. Gene circuits relatively abundant in nonpathogenic bacteria often mediated biosynthesis of antibiotics. Other synergy-linked circuits reduced drug-induced toxicity. Pathogen-abundant gene circuits included modules in one-carbon folate, two-component system, type-3 secretion system, and peptidoglycan biosynthesis. Antibiotics-resistant bacterial strains possessed genes modulating phagocytosis, vesicle trafficking, cytoskeletal reorganization, and regulation of the inflammatory response. Our study also identified bacterial genera containing a circuit, elements of which were previously linked to Alzheimer's disease. CONCLUSIONS: Present study produces for the first time, a signature, in the form of a robust list of gene circuitry whose presence or absence could potentially define the pathogenicity of a microbiome. Extensive literature search substantiated a bulk majority of the commensal and pathogenic circuitry in our predicted list. Scanning microbiome libraries for these circuitry motifs will provide further insights into the complex and context dependent pathogenicity of bacteria.
Assuntos
Bactérias/genética , Bactérias/patogenicidade , Redes Reguladoras de Genes , Genes Bacterianos , Genoma Bacteriano , Genômica/métodos , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Infecções Bacterianas/microbiologia , Biologia Computacional/métodos , Farmacorresistência Bacteriana , Interações Hospedeiro-Patógeno , Família MultigênicaRESUMO
Cyclin D1 is an important molecular driver of human breast cancer but better understanding of its oncogenic mechanisms is needed, especially to enhance efforts in targeted therapeutics. Currently, pharmaceutical initiatives to inhibit cyclin D1 are focused on the catalytic component since the transforming capacity is thought to reside in the cyclin D1/CDK activity. We initiated the following study to directly test the oncogenic potential of catalytically inactive cyclin D1 in an in vivo mouse model that is relevant to breast cancer. Herein, transduction of cyclin D1(-/-) mouse embryonic fibroblasts (MEFs) with the kinase dead KE mutant of cyclin D1 led to aneuploidy, abnormalities in mitotic spindle formation, autosome amplification, and chromosomal instability (CIN) by gene expression profiling. Acute transgenic expression of either cyclin D1(WT) or cyclin D1(KE) in the mammary gland was sufficient to induce a high CIN score within 7 days. Sustained expression of cyclin D1(KE) induced mammary adenocarcinoma with similar kinetics to that of the wild-type cyclin D1. ChIP-Seq studies demonstrated recruitment of cyclin D1(WT) and cyclin D1(KE) to the genes governing CIN. We conclude that the CDK-activating function of cyclin D1 is not necessary to induce either chromosomal instability or mammary tumorigenesis.
Assuntos
Adenocarcinoma/genética , Ciclina D1/fisiologia , Neoplasias Mamárias Experimentais/genética , Substituição de Aminoácidos , Aneuploidia , Animais , Domínio Catalítico/genética , Transformação Celular Neoplásica/genética , Células Cultivadas , Centrossomo/ultraestrutura , Instabilidade Cromossômica/genética , Ciclina D1/deficiência , Ciclina D1/genética , Feminino , Fibroblastos , Genes bcl-1 , Humanos , Vírus do Tumor Mamário do Camundongo/fisiologia , Camundongos , Camundongos Knockout , Camundongos Transgênicos , Mutação , Piperazinas/farmacologia , Piridinas/farmacologia , Proteínas Recombinantes de Fusão/metabolismo , Fuso Acromático/ultraestrutura , Transdução GenéticaRESUMO
Cyclin D1 encodes the regulatory subunit of a holoenzyme that phosphorylates the pRB protein and promotes G1/S cell-cycle progression and oncogenesis. Dicer is a central regulator of miRNA maturation, encoding an enzyme that cleaves double-stranded RNA or stem-loop-stem RNA into 20-25 nucleotide long small RNA, governing sequence-specific gene silencing and heterochromatin methylation. The mechanism by which the cell cycle directly controls the non-coding genome is poorly understood. Here we show that cyclin D1(-/-) cells are defective in pre-miRNA processing which is restored by cyclin D1a rescue. Cyclin D1 induces Dicer expression in vitro and in vivo. Dicer is transcriptionally targeted by cyclin D1, via a cdk-independent mechanism. Cyclin D1 and Dicer expression significantly correlates in luminal A and basal-like subtypes of human breast cancer. Cyclin D1 and Dicer maintain heterochromatic histone modification (Tri-m-H3K9). Cyclin D1-mediated cellular proliferation and migration is Dicer-dependent. We conclude that cyclin D1 induction of Dicer coordinates microRNA biogenesis.
Assuntos
Neoplasias da Mama/metabolismo , Ciclina D1/fisiologia , Regulação Neoplásica da Expressão Gênica , Neoplasias Mamárias Experimentais/metabolismo , MicroRNAs/biossíntese , Ribonuclease III/metabolismo , Animais , Neoplasias da Mama/enzimologia , Neoplasias da Mama/genética , Movimento Celular/genética , Proliferação de Células , Feminino , Células HCT116 , Histonas/metabolismo , Humanos , Células MCF-7 , Neoplasias Mamárias Experimentais/enzimologia , Neoplasias Mamárias Experimentais/genética , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , MicroRNAs/genética , Processamento de Proteína Pós-Traducional/genéticaRESUMO
Hyperactive EGF receptor (EGFR) and mutant p53 are common genetic abnormalities driving the progression of non-small cell lung cancer (NSCLC), the leading cause of cancer deaths in the world. The Drosophila gene Dachshund (Dac) was originally cloned as an inhibitor of hyperactive EGFR alleles. Given the importance of EGFR signaling in lung cancer etiology, we examined the role of DACH1 expression in lung cancer development. DACH1 protein and mRNA expression was reduced in human NSCLC. Reexpression of DACH1 reduced NSCLC colony formation and tumor growth in vivo via p53. Endogenous DACH1 colocalized with p53 in a nuclear, extranucleolar location, and shared occupancy of -15% of p53-bound genes in ChIP sequencing. The C-terminus of DACH1 was necessary for direct p53 binding, contributing to the inhibition of colony formation and cell-cycle arrest. Expression of the stem cell factor SOX2 was repressed by DACH1, and SOX2 expression was inversely correlated with DACH1 in NSCLC. We conclude that DACH1 binds p53 to inhibit NSCLC cellular growth.
Assuntos
Adenocarcinoma/metabolismo , Adenocarcinoma/patologia , Proteínas do Olho/metabolismo , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Fatores de Transcrição/metabolismo , Proteína Supressora de Tumor p53/metabolismo , Adenocarcinoma/genética , Adenocarcinoma de Pulmão , Animais , Pontos de Checagem do Ciclo Celular/fisiologia , Processos de Crescimento Celular/fisiologia , Linhagem Celular Tumoral , Inibidor de Quinase Dependente de Ciclina p21/metabolismo , Proteínas do Olho/genética , Feminino , Genes p53 , Células HCT116 , Células HEK293 , Xenoenxertos , Humanos , Imuno-Histoquímica , Neoplasias Pulmonares/genética , Camundongos , Camundongos Nus , Rad51 Recombinase/antagonistas & inibidores , Rad51 Recombinase/metabolismo , Fatores de Transcrição SOXB1/biossíntese , Fatores de Transcrição SOXB1/genética , Fatores de Transcrição/genética , Transcrição Gênica , Transfecção , Proteína Supressora de Tumor p53/genéticaRESUMO
BACKGROUND: Inflammatory bowel disease (IBD) is a complex disorder involving pathogen infection, host immune response, and altered enterocyte physiology. Incidences of IBD are increasing at an alarming rate in developed countries, warranting a detailed molecular portrait of IBD. METHODS: We used large-scale data, bioinformatics tools, and high-throughput computations to obtain gene and microRNA signatures for Crohn's disease (CD) and ulcerative colitis (UC). These signatures were then integrated with systemic literature review to draw a comprehensive portrait of IBD in relation to autoimmune diseases. RESULTS: The top upregulated genes in IBD are associated with diabetogenesis (REG1A, REG1B), bacterial signals (TLRs, NLRs), innate immunity (DEFA6, IDO1, EXOSC1), inflammation (CXCLs), and matrix degradation (MMPs). The downregulated genes coded tight junction proteins (CLDN8), solute transporters (SLCs), and adhesion proteins. Genes highly expressed in UC compared to CD included antiinflammatory ANXA1, transporter ABCA12, T-cell activator HSH2D, and immunoglobulin IGHV4-34. Compromised metabolisms for processing of drugs, nitrogen, androgen and estrogen, and lipids in IBD correlated with an increase in specific microRNA. Highly expressed IBD genes constituted targets of drugs used in gastrointestinal cancers, viral infections, and autoimmunity disorders such as rheumatoid arthritis and asthma. CONCLUSIONS: This study presents a clinically relevant gene-level portrait of IBD subtypes and their connectivity to autoimmune diseases. The study identified candidates for repositioning of existing drugs to manage IBD. Integration of mice and human data point to an altered B-cell response as a cause for upregulation of genes in IBD involved in other aspects of immune defense such as interferon-inducible responses.
Assuntos
Doenças Autoimunes/genética , Doenças Inflamatórias Intestinais/genética , MicroRNAs/genética , Transcriptoma/genética , Animais , Doenças Autoimunes/tratamento farmacológico , Mapeamento Cromossômico , Colite Ulcerativa/tratamento farmacológico , Colite Ulcerativa/genética , Colite Ulcerativa/imunologia , Biologia Computacional , Doença de Crohn/tratamento farmacológico , Doença de Crohn/genética , Doença de Crohn/imunologia , Perfilação da Expressão Gênica , Genes/genética , Humanos , Doenças Inflamatórias Intestinais/tratamento farmacológico , Doenças Inflamatórias Intestinais/imunologia , Camundongos , Análise de Sequência com Séries de OligonucleotídeosRESUMO
BACKGROUND: Pandemic and seasonal respiratory viruses are a major global health concern. Given the genetic diversity of respiratory viruses and the emergence of drug resistant strains, the targeted disruption of human host-virus interactions is a potential therapeutic strategy for treating multi-viral infections. The availability of large-scale genomic datasets focused on host-pathogen interactions can be used to discover novel drug targets as well as potential opportunities for drug repositioning. METHODS/RESULTS: In this study, we performed a large-scale analysis of microarray datasets involving host response to infections by influenza A virus, respiratory syncytial virus, rhinovirus, SARS-coronavirus, metapneumonia virus, coxsackievirus and cytomegalovirus. Common genes and pathways were found through a rigorous, iterative analysis pipeline where relevant host mRNA expression datasets were identified, analyzed for quality and gene differential expression, then mapped to pathways for enrichment analysis. Possible repurposed drugs targets were found through database and literature searches. A total of 67 common biological pathways were identified among the seven different respiratory viruses analyzed, representing fifteen laboratories, nine different cell types, and seven different array platforms. A large overlap in the general immune response was observed among the top twenty of these 67 pathways, adding validation to our analysis strategy. Of the top five pathways, we found 53 differentially expressed genes affected by at least five of the seven viruses. We suggest five new therapeutic indications for existing small molecules or biological agents targeting proteins encoded by the genes F3, IL1B, TNF, CASP1 and MMP9. Pathway enrichment analysis also identified a potential novel host response, the Parkin-Ubiquitin Proteasomal System (Parkin-UPS) pathway, which is known to be involved in the progression of neurodegenerative Parkinson's disease. CONCLUSIONS: Our study suggests that multiple and diverse respiratory viruses invoke several common host response pathways. Further analysis of these pathways suggests potential opportunities for therapeutic intervention.
Assuntos
Antivirais/farmacologia , Perfilação da Expressão Gênica , Interações Hospedeiro-Patógeno/efeitos dos fármacos , Interações Hospedeiro-Patógeno/genética , Terapia de Alvo Molecular , Vírus Sinciciais Respiratórios/efeitos dos fármacos , Transdução de Sinais/efeitos dos fármacos , Antivirais/uso terapêutico , Bases de Dados Genéticas , Regulação Viral da Expressão Gênica/efeitos dos fármacos , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Complexo de Endopeptidases do Proteassoma/metabolismo , Controle de Qualidade , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Infecções por Vírus Respiratório Sincicial/tratamento farmacológico , Infecções por Vírus Respiratório Sincicial/genética , Infecções por Vírus Respiratório Sincicial/virologia , Vírus Sinciciais Respiratórios/fisiologia , Transdução de Sinais/genética , Ubiquitina/metabolismo , Ubiquitina-Proteína Ligases/metabolismoRESUMO
Chromosomal instability (CIN) in tumors is characterized by chromosomal abnormalities and an altered gene expression signature; however, the mechanism of CIN is poorly understood. CCND1 (which encodes cyclin D1) is overexpressed in human malignancies and has been shown to play a direct role in transcriptional regulation. Here, we used genome-wide ChIP sequencing and found that the DNA-bound form of cyclin D1 occupied the regulatory region of genes governing chromosomal integrity and mitochondrial biogenesis. Adding cyclin D1 back to Ccnd1(-/-) mouse embryonic fibroblasts resulted in CIN gene regulatory region occupancy by the DNA-bound form of cyclin D1 and induction of CIN gene expression. Furthermore, increased chromosomal aberrations, aneuploidy, and centrosome abnormalities were observed in the cyclin D1-rescued cells by spectral karyotyping and immunofluorescence. To assess cyclin D1 effects in vivo, we generated transgenic mice with acute and continuous mammary gland-targeted cyclin D1 expression. These transgenic mice presented with increased tumor prevalence and signature CIN gene profiles. Additionally, interrogation of gene expression from 2,254 human breast tumors revealed that cyclin D1 expression correlated with CIN in luminal B breast cancer. These data suggest that cyclin D1 contributes to CIN and tumorigenesis by directly regulating a transcriptional program that governs chromosomal stability.
Assuntos
Instabilidade Cromossômica , Ciclina D1/genética , Animais , Sítios de Ligação , Neoplasias da Mama/genética , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Aberrações Cromossômicas , Feminino , Fibroblastos/metabolismo , Regulação Neoplásica da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Cariotipagem , Camundongos , Camundongos Transgênicos , Transcrição GênicaRESUMO
HIV proteins target host hub proteins for transient binding interactions. The presence of viral proteins in the infected cell results in out-competition of host proteins in their interaction with hub proteins, drastically affecting cell physiology. Functional genomics and interactome datasets can be used to quantify the sequence hotspots on the HIV proteome mediating interactions with host hub proteins. In this study, we used the HIV and human interactome databases to identify HIV targeted host hub proteins and their host binding partners (H2). We developed a high throughput computational procedure utilizing motif discovery algorithms on sets of protein sequences, including sequences of HIV and H2 proteins. We identified as HIV sequence hotspots those linear motifs that are highly conserved on HIV sequences and at the same time have a statistically enriched presence on the sequences of H2 proteins. The HIV protein motifs discovered in this study are expressed by subsets of H2 host proteins potentially outcompeted by HIV proteins. A large subset of these motifs is involved in cleavage, nuclear localization, phosphorylation, and transcription factor binding events. Many such motifs are clustered on an HIV sequence in the form of hotspots. The sequential positions of these hotspots are consistent with the curated literature on phenotype altering residue mutations, as well as with existing binding site data. The hotspot map produced in this study is the first global portrayal of HIV motifs involved in altering the host protein network at highly connected hub nodes.
Assuntos
Proteínas do Vírus da Imunodeficiência Humana/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Motivos de Aminoácidos/genética , Sequência de Aminoácidos , Sítios de Ligação/genética , Proteína de Ligação a CREB/metabolismo , Proteínas Quinases Dependentes de Cálcio-Calmodulina/metabolismo , Calmodulina/metabolismo , Caseína Quinase II/metabolismo , Bases de Dados de Proteínas , Proteínas do Vírus da Imunodeficiência Humana/química , Proteínas do Vírus da Imunodeficiência Humana/genética , Humanos , Interações Hidrofóbicas e Hidrofílicas , Proteína Quinase 1 Ativada por Mitógeno/metabolismo , Modelos Moleculares , Ligação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/genética , Produtos do Gene env do Vírus da Imunodeficiência Humana/química , Produtos do Gene env do Vírus da Imunodeficiência Humana/genética , Produtos do Gene env do Vírus da Imunodeficiência Humana/metabolismo , Produtos do Gene gag do Vírus da Imunodeficiência Humana/química , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genética , Produtos do Gene gag do Vírus da Imunodeficiência Humana/metabolismo , Produtos do Gene nef do Vírus da Imunodeficiência Humana/química , Produtos do Gene nef do Vírus da Imunodeficiência Humana/genética , Produtos do Gene nef do Vírus da Imunodeficiência Humana/metabolismo , Produtos do Gene rev do Vírus da Imunodeficiência Humana/química , Produtos do Gene rev do Vírus da Imunodeficiência Humana/genética , Produtos do Gene rev do Vírus da Imunodeficiência Humana/metabolismo , Produtos do Gene tat do Vírus da Imunodeficiência Humana/química , Produtos do Gene tat do Vírus da Imunodeficiência Humana/genética , Produtos do Gene tat do Vírus da Imunodeficiência Humana/metabolismoRESUMO
Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.
Assuntos
Biologia Computacional/métodos , Produtos do Gene nef do Vírus da Imunodeficiência Humana/química , Produtos do Gene nef do Vírus da Imunodeficiência Humana/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Antígenos CD4/química , Antígenos CD4/metabolismo , Proteínas de Ligação ao GTP/química , Proteínas de Ligação ao GTP/metabolismo , Antígenos HLA-A/química , Antígenos HLA-A/metabolismo , Humanos , Proteína Quinase 1 Ativada por Mitógeno/química , Proteína Quinase 1 Ativada por Mitógeno/metabolismo , Dados de Sequência Molecular , Proteínas de Neoplasias/química , Proteínas de Neoplasias/metabolismo , Ligação Proteica , Proteínas Proto-Oncogênicas c-fyn/química , Proteínas Proto-Oncogênicas c-fyn/metabolismo , Proteínas Proto-Oncogênicas c-hck/química , Proteínas Proto-Oncogênicas c-hck/metabolismo , Proteínas Proto-Oncogênicas c-vav/química , Proteínas Proto-Oncogênicas c-vav/metabolismo , Receptores de Quinase C Ativada , Receptores de Superfície Celular/química , Receptores de Superfície Celular/metabolismoRESUMO
The global gene expression analysis of cancer and healthy tissues typically results in large numbers of genes that are significantly altered in cancer. Such data, however, has been difficult to interpret due to the high level of variation of gene lists across laboratories and the small sample sizes used in individual studies. In this investigation, we compiled microarray data obtained from the same platform family from 84 laboratories, resulting in a database containing 1,043 healthy tissue samples and 4,900 cancer samples for 13 different tissue types. The primary cancers considered included adrenal gland, brain, breast, cervix, colon, kidney, liver, lung, ovary, pancreas, prostate and skin tissues. We normalized the data together and analyzed subsets for the discovery of genes involved in normal to cancer transformation. Our integrated significance analysis of microarrays approach produced top 400 gene lists for each of the 13 cancer types. These lists were highly statistically enriched with genes already associated with cancer in research publications excluding microarray studies (p < 1.31 E - 12). The genes MTIM and RRM2 appeared in nine and TOP2A in eight lists of significantly altered genes in cancer. In total, there were 132 genes present in at least four gene lists, 11 of which were not previously associated with cancer. The list contains 17 metal ions and 15 adenyl ribonucleotide binding proteins, six kinases and six transcription factors. Our results point to the value of integrating microarray data in the study of combination drug therapies targeting metastasis.
Assuntos
Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos , Humanos , Neoplasias/classificaçãoRESUMO
Single nucleotide polymorphisms (SNPs) constitute an important mode of genetic variations observed in the human genome. A small fraction of SNPs, about four thousand out of the ten million, has been associated with genetic disorders and complex diseases. The present study focuses on SNPs that fall on protein domains, 3D structures that facilitate connectivity of proteins in cell signaling and metabolic pathways. We scanned the human proteome using the PROSITE web tool and identified proteins with SNP containing domains. We showed that SNPs that fall on protein domains are highly statistically enriched among SNPs linked to hereditary disorders and complex diseases. Proteins whose domains are dramatically altered by the presence of an SNP are even more likely to be present among proteins linked to hereditary disorders. Proteins with domain-altering SNPs comprise highly connected nodes in cellular pathways such as the focal adhesion, the axon guidance pathway and the autoimmune disease pathways. Statistical enrichment of domain/motif signatures in interacting protein pairs indicates extensive loss of connectivity of cell signaling pathways due to domain-altering SNPs, potentially leading to hereditary disorders.
Assuntos
Polimorfismo de Nucleotídeo Único , Proteínas/química , Proteínas/genética , Transdução de Sinais , Bases de Dados de Proteínas , Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/metabolismo , Genoma Humano , Humanos , Estrutura Terciária de Proteína , Proteoma/análise , Proteoma/genética , Proteoma/metabolismoRESUMO
BACKGROUND: Much of the public access cancer microarray data is asymmetric, belonging to datasets containing no samples from normal tissue. Asymmetric data cannot be used in standard meta-analysis approaches (such as the inverse variance method) to obtain large sample sizes for statistical power enrichment. Noting that plenty of normal tissue microarray samples exist in studies not involving cancer, we investigated the viability and accuracy of an integrated microarray analysis approach based on significance analysis of microarrays (merged SAM) using a collection of data from separate diseased and normal samples. RESULTS: We focused on five solid cancer types (colon, kidney, liver, lung, and pancreas), where available microarray data allowed us to compare meta-analysis and integrated approaches. Our results from the merged SAM significantly overlapped gene lists from the validated inverse-variance method. Both meta-analysis and merged SAM approaches successfully captured the aberrances in the cell cycle that commonly occur in the different cancer types. However, the integrated SAM analysis replicated the known cancer literature (excluding microarray studies) with much more accuracy than the meta-analysis. CONCLUSION: The merged SAM test is a powerful, robust approach for combining data from similar platforms and for analyzing asymmetric datasets, including those with only normal or only cancer samples that cannot be utilized by meta-analysis methods. The integrated SAM approach can also be used in comparing global gene expression between various subtypes of cancer arising from the same tissue.
Assuntos
Perfilação da Expressão Gênica/métodos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Humanos , Neoplasias/classificaçãoRESUMO
BACKGROUND: Phosphorylation events direct the flow of signals and metabolites along cellular protein networks. Current annotations of kinase-substrate binding events are far from complete. In this study, we scanned the entire human protein sequences using the PROSITE domain annotation tool to identify patterns of domain composition in kinases and their substrates. We identified statistically enriched pairs of strings of domains (signature pairs) in kinase-substrate couples presented in the 2006 version of the PTM database. RESULTS: The signature pairs enriched in kinase - substrate binding interactions turned out to be highly specific to kinase subtypes. The resulting list of signature pairs predicted kinase-substrate interactions in validation dataset not used in learning with high statistical accuracy. CONCLUSIONS: The method presented here produces predictions of protein phosphorylation events with high accuracy and mid-level coverage. Our method can be used in expanding the currently available drafts of cell signaling pathways and thus will be an important tool in the development of combination drug therapies targeting complex diseases.
Assuntos
Fosfotransferases/metabolismo , Proteoma/análise , Humanos , Fosforilação , Ligação Proteica , Estrutura Terciária de Proteína , Análise de Sequência de Proteína , Transdução de Sinais , Especificidade por SubstratoRESUMO
The Drosophila Dachshund (Dac) gene, cloned as a dominant inhibitor of the hyperactive growth factor mutant ellipse, encodes a key component of the retinal determination gene network that governs cell fate. Herein, cyclic amplification and selection of targets identified a DACH1 DNA-binding sequence that resembles the FOX (Forkhead box-containing protein) binding site. Genome-wide in silico promoter analysis of DACH1 binding sites identified gene clusters populating cellular pathways associated with the cell cycle and growth factor signaling. ChIP coupled with high-throughput sequencing mapped DACH1 binding sites to corresponding gene clusters predicted in silico and identified as weight matrix resembling the cyclic amplification and selection of targets-defined sequence. DACH1 antagonized FOXM1 target gene expression, promoter occupancy in the context of local chromatin, and contact-independent growth. Attenuation of FOX function by the cell fate determination pathway has broad implications given the diverse role of FOX proteins in cellular biology and tumorigenesis.
Assuntos
Proteínas do Olho/metabolismo , Fatores de Transcrição Forkhead/metabolismo , Retina/metabolismo , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem da Célula , Cromatina/química , Biologia Computacional/métodos , DNA/química , Proteína Forkhead Box M1 , Regulação da Expressão Gênica , Genoma , Células HeLa , Humanos , Regiões Promotoras Genéticas , Ligação Proteica , Transdução de SinaisRESUMO
Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs). MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.
Assuntos
HIV/metabolismo , Proteínas Quinases Ativadas por Mitógeno/metabolismo , Alinhamento de Sequência , Proteínas Virais/metabolismo , Humanos , Proteínas Quinases Ativadas por Mitógeno/química , Fosforilação , Ligação ProteicaRESUMO
BACKGROUND: The HIV viral genome mutates at a high rate and poses a significant long term health risk even in the presence of combination antiretroviral therapy. Current methods for predicting a patient's response to therapy rely on site-directed mutagenesis experiments and in vitro resistance assays. In this bioinformatics study we treat response to antiretroviral therapy as a two-body problem: response to therapy is considered to be a function of both the host and pathogen proteomes. We set out to identify potential responders based on the presence or absence of host protein and DNA motifs on the HIV proteome. RESULTS: An alignment of thousands of HIV-1 sequences attested to extensive variation in nucleotide sequence but also showed conservation of eukaryotic short linear motifs on the protein coding regions. The reduction in viral load of patients in the Stanford HIV Drug Resistance Database exhibited a bimodal distribution after 24 weeks of antiretroviral therapy, with 2,000 copies/ml cutoff. Similarly, patients allocated into responder/non-responder categories based on consistent viral load reduction during a 24 week period showed clear separation. In both cases of phenotype identification, a set of features composed of short linear motifs in the reverse transcriptase region of HIV sequence accurately predicted a patient's response to therapy. Motifs that overlap resistance sites were highly predictive of responder identification in single drug regimens but these features lost importance in defining responders in multi-drug therapies. CONCLUSION: HIV sequence mutates in a way that preferentially preserves peptide sequence motifs that are also found in the human proteome. The presence and absence of such motifs at specific regions of the HIV sequence is highly predictive of response to therapy. Some of these predictive motifs overlap with known HIV-1 resistance sites. These motifs are well established in bioinformatics databases and hence do not require identification via in vitro mutation experiments.