RESUMO
Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interactions, pathways, and functional ontologies. This knowledgebase has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here, we present MetaCore™ and Key Pathway Advisor (KPA), an integrated platform for functional data analysis. On the content side, MetaCore and KPA encompass a comprehensive database of molecular interactions of different types, pathways, network models, and ten functional ontologies covering human, mouse, and rat genes. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for the identification of over- and under-connected proteins in the dataset, and a biological network analysis module made up of network generation algorithms and filters. The suite also features Advanced Search, an application for combinatorial search of the database content, as well as a Java-based tool called Pathway Map Creator for drawing and editing custom pathway maps. Applications of MetaCore and KPA include molecular mode of action of disease research, identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds and clinical applications (analysis of large cohorts of patients, and translational and personalized medicine).
Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas , Algoritmos , Animais , Humanos , Bases de Conhecimento , Camundongos , RatosRESUMO
Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.
Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Elementos de RespostaRESUMO
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
Assuntos
Algoritmos , Biomarcadores/metabolismo , Simulação por Computador , Área Sob a Curva , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Paclitaxel/farmacologia , Paclitaxel/uso terapêutico , Curva ROC , Reprodutibilidade dos Testes , Transcriptoma/genéticaRESUMO
Gene coexpression network analysis is a powerful "data-driven" approach essential for understanding cancer biology and mechanisms of tumor development. Yet, despite the completion of thousands of studies on cancer gene expression, there have been few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. We generated such a resource by exploring gene coexpression networks in 82 microarray datasets from 9 major human cancer types. The analysis was conducted using an elaborate weighted gene coexpression network (WGCNA) methodology and identified over 3,000 robust gene coexpression modules. The modules covered a range of known tumor features, such as proliferation, extracellular matrix remodeling, hypoxia, inflammation, angiogenesis, tumor differentiation programs, specific signaling pathways, genomic alterations, and biomarkers of individual tumor subtypes. To prioritize genes with respect to those tumor features, we ranked genes within each module by connectivity, leading to identification of module-specific functionally prominent hub genes. To showcase the utility of this network information, we positioned known cancer drug targets within the coexpression networks and predicted that Anakinra, an anti-rheumatoid therapeutic agent, may be promising for development in colorectal cancer. We offer a comprehensive, normalized and well documented collection of >3000 gene coexpression modules in a variety of cancers as a rich data resource to facilitate further progress in cancer research.
Assuntos
Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Neoplasias/genética , Biomarcadores Tumorais/genética , Diferenciação Celular/genética , Proliferação de Células/genética , Mineração de Dados/métodos , Reposicionamento de Medicamentos/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Humanos , Hipóxia/genética , Inflamação/genética , Transdução de Sinais/genéticaRESUMO
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3' UTRs, the area rich with regulatory regions.
Assuntos
DNA Intergênico/genética , Genoma de Planta/genética , Nucleotídeos/genética , Polimorfismo de Nucleotídeo Único/genética , Regiões 3' não Traduzidas/genética , Códon de Terminação/genética , Redes Reguladoras de Genes , Genômica/métodos , Oryza/genética , Transcrição Gênica/genéticaRESUMO
Using a three-dimensional coculture model, we identified significant subtype-specific changes in gene expression, metabolic, and therapeutic sensitivity profiles of breast cancer cells in contact with cancer-associated fibroblasts (CAF). CAF-induced gene expression signatures predicted clinical outcome and immune-related differences in the microenvironment. We found that fibroblasts strongly protect carcinoma cells from lapatinib, attributable to its reduced accumulation in carcinoma cells and an elevated apoptotic threshold. Fibroblasts from normal breast tissues and stromal cultures of brain metastases of breast cancer had similar effects as CAFs. Using synthetic lethality approaches, we identified molecular pathways whose inhibition sensitizes HER2+ breast cancer cells to lapatinib both in vitro and in vivo, including JAK2/STAT3 and hyaluronic acid. Neoadjuvant lapatinib therapy in HER2+ breast tumors lead to a significant increase of phospho-STAT3+ cancer cells and a decrease in the spatial proximity of proliferating (Ki67+) cells to CAFs impacting therapeutic responses. Our studies identify CAF-induced physiologically and clinically relevant changes in cancer cells and offer novel approaches for overcoming microenvironment-mediated therapeutic resistance. Cancer Res; 76(22); 6495-506. ©2016 AACR.
Assuntos
Neoplasias da Mama/metabolismo , Fibroblastos/metabolismo , Perfilação da Expressão Gênica/métodos , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Humanos , Resultado do TratamentoRESUMO
The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research.
Assuntos
DNA Antigo , Evolução Molecular , Genética Populacional/métodos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Animais , HumanosRESUMO
The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes, using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, identified mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals. We established that Nganasans, Kets, Selkups, and Yukaghirs form a cluster of populations most closely related to Paleo-Eskimos in Siberia (not considering indigenous populations of Chukotka and Kamchatka). Kets are closely related to modern Selkups and to some Bronze and Iron Age populations of the Altai region, with all these groups sharing a high degree of Mal'ta ancestry. Implications of these findings for the linguistic hypothesis uniting Ket and Na-Dene languages into a language macrofamily are discussed.
Assuntos
DNA Mitocondrial/genética , Etnicidade/genética , Genoma Humano , Inuíte/genética , Filogenia , Polimorfismo de Nucleotídeo Único , Cromossomos Humanos Y , Variação Genética , Haplótipos , Migração Humana , Humanos , Idioma , Filogeografia , SibériaRESUMO
BACKGROUND: The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002). It is now generally accepted that in bacterial and archaeal genomes the distributions of protein length are different between sequences with and without homologs. In this study, we examine this postulate by conducting a comprehensive analysis of all annotated prokaryotic genomes and by focusing on certain exceptions. RESULTS: We compared the distribution of lengths of "having homologs proteins" (HHPs) and "non-having homologs proteins" (orphans or ORFans) in all currently completely sequenced and COG-annotated prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80 % of atypical genomes. CONCLUSIONS: We confirmed on a ubiquitous set of genomes that the previous observation of HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon, such as an adaptation to Mycoplasmataceae's ecological niche, specifically its "quiet" co-existence with host organisms, resulting in long ABC transporters.
Assuntos
Proteínas de Bactérias/metabolismo , Mycoplasmataceae/metabolismo , Proteínas de Bactérias/genética , Genoma Bacteriano/genética , Mycoplasmataceae/genética , Fases de Leitura Aberta/genéticaRESUMO
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug's known mechanism of action. Also, the models predict each drug's potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets.
Assuntos
Antineoplásicos/farmacologia , Cloridrato de Erlotinib/farmacologia , Regulação Neoplásica da Expressão Gênica , Modelos Estatísticos , Neoplasias/tratamento farmacológico , Niacinamida/análogos & derivados , Compostos de Fenilureia/farmacologia , Biomarcadores Farmacológicos , Linhagem Celular Tumoral , Ensaios Clínicos Fase II como Assunto , Avaliação Pré-Clínica de Medicamentos , Resistencia a Medicamentos Antineoplásicos/genética , Redes Reguladoras de Genes , Humanos , Neoplasias/genética , Neoplasias/mortalidade , Neoplasias/patologia , Niacinamida/farmacologia , Transdução de Sinais , Sorafenibe , Análise de SobrevidaRESUMO
BACKGROUND: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. RESULTS: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. CONCLUSIONS: We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.
Assuntos
Perfilação da Expressão Gênica , Neuroblastoma/genética , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , Adolescente , Adulto , Criança , Pré-Escolar , Determinação de Ponto Final , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Modelos Genéticos , Neuroblastoma/classificação , Neuroblastoma/diagnóstico , Células Tumorais Cultivadas , Adulto JovemRESUMO
BACKGROUND: Despite a growing number of studies evaluating cancer of prostate (CaP) specific gene alterations, oncogenic activation of the ETS Related Gene (ERG) by gene fusions remains the most validated cancer gene alteration in CaP. Prevalent gene fusions have been described between the ERG gene and promoter upstream sequences of androgen-inducible genes, predominantly TMPRSS2 (transmembrane protease serine 2). Despite the extensive evaluations of ERG genomic rearrangements, fusion transcripts and the ERG oncoprotein, the prognostic value of ERG remains to be better understood. Using gene expression dataset from matched prostate tumor and normal epithelial cells from an 80 GeneChip experiment examining 40 tumors and their matching normal pairs in 40 patients with known ERG status, we conducted a cancer signaling-focused functional analysis of prostatic carcinoma representing moderate and aggressive cancers stratified by ERG expression. RESULTS: In the present study of matched pairs of laser capture microdissected normal epithelial cells and well-to-moderately differentiated tumor epithelial cells with known ERG gene expression status from 20 patients with localized prostate cancer, we have discovered novel ERG associated biochemical networks. CONCLUSIONS: Using causal network reconstruction methods, we have identified three major signaling pathways related to MAPK/PI3K cascade that may indeed contribute synergistically to the ERG dependent tumor development. Moreover, the key components of these pathways have potential as biomarkers and therapeutic target for ERG positive prostate tumors.
RESUMO
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R(2)î0.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.
Assuntos
Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética , Análise de Sequência de RNA , Animais , RatosRESUMO
Recurrent mutations in histone-modifying enzymes imply key roles in tumorigenesis, yet their functional relevance is largely unknown. Here, we show that JARID1B, encoding a histone H3 lysine 4 (H3K4) demethylase, is frequently amplified and overexpressed in luminal breast tumors and a somatic mutation in a basal-like breast cancer results in the gain of unique chromatin binding and luminal expression and splicing patterns. Downregulation of JARID1B in luminal cells induces basal genes expression and growth arrest, which is rescued by TGFß pathway inhibitors. Integrated JARID1B chromatin binding, H3K4 methylation, and expression profiles suggest a key function for JARID1B in luminal cell-specific expression programs. High luminal JARID1B activity is associated with poor outcome in patients with hormone receptor-positive breast tumors.
Assuntos
Neoplasias da Mama/genética , Histona Desmetilases com o Domínio Jumonji/genética , Proteínas Nucleares/genética , Oncogenes , Proteínas Repressoras/genética , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Fator de Ligação a CCCTC , Processos de Crescimento Celular/genética , Linhagem Celular Tumoral , Linhagem da Célula , Feminino , Amplificação de Genes , Regulação Neoplásica da Expressão Gênica , Histonas/genética , Histonas/metabolismo , Humanos , Histona Desmetilases com o Domínio Jumonji/metabolismo , Células MCF-7 , Mutação , Proteínas Nucleares/metabolismo , Regiões Promotoras Genéticas , Pirazóis/farmacologia , Pirróis/farmacologia , RNA Interferente Pequeno/administração & dosagem , RNA Interferente Pequeno/genética , Proteínas Repressoras/metabolismo , Transfecção , Fator de Crescimento Transformador beta/metabolismoRESUMO
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model.
Assuntos
Ratos Endogâmicos F344/metabolismo , Transcriptoma , Processamento Alternativo , Animais , Feminino , Perfilação da Expressão Gênica , Masculino , Isoformas de Proteínas/metabolismo , Ratos Endogâmicos F344/crescimento & desenvolvimento , Análise de Sequência de RNA , Caracteres SexuaisRESUMO
Early full-term pregnancy is one of the most effective natural protections against breast cancer. To investigate this effect, we have characterized the global gene expression and epigenetic profiles of multiple cell types from normal breast tissue of nulliparous and parous women and carriers of BRCA1 or BRCA2 mutations. We found significant differences in CD44(+) progenitor cells, where the levels of many stem cell-related genes and pathways, including the cell-cycle regulator p27, are lower in parous women without BRCA1/BRCA2 mutations. We also noted a significant reduction in the frequency of CD44(+)p27(+) cells in parous women and showed, using explant cultures, that parity-related signaling pathways play a role in regulating the number of p27(+) cells and their proliferation. Our results suggest that pathways controlling p27(+) mammary epithelial cells and the numbers of these cells relate to breast cancer risk and can be explored for cancer risk assessment and prevention.
Assuntos
Neoplasias da Mama/etiologia , Linhagem da Célula , Inibidor de Quinase Dependente de Ciclina p27/metabolismo , Perfilação da Expressão Gênica , Glândulas Mamárias Humanas/citologia , Paridade/genética , Células-Tronco/citologia , Proteína BRCA1/genética , Proteína BRCA2/genética , Biomarcadores/metabolismo , Western Blotting , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Diferenciação Celular , Proliferação de Células , Células Cultivadas , Inibidor de Quinase Dependente de Ciclina p27/genética , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Feminino , Fibroblastos/citologia , Fibroblastos/metabolismo , Citometria de Fluxo , Imunofluorescência , Humanos , Técnicas Imunoenzimáticas , Glândulas Mamárias Humanas/metabolismo , Mutação/genética , Análise de Sequência com Séries de Oligonucleotídeos , Gravidez , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transdução de Sinais , Células-Tronco/metabolismo , Células Estromais/citologia , Células Estromais/metabolismoRESUMO
The discovery of novel drug targets is a significant challenge in drug development. Although the human genome comprises approximately 30,000 genes, proteins encoded by fewer than 400 are used as drug targets in the treatment of diseases. Therefore, novel drug targets are extremely valuable as the source for first in class drugs. On the other hand, many of the currently known drug targets are functionally pleiotropic and involved in multiple pathologies. Several of them are exploited for treating multiple diseases, which highlights the need for methods to reliably reposition drug targets to new indications. Network-based methods have been successfully applied to prioritize novel disease-associated genes. In recent years, several such algorithms have been developed, some focusing on local network properties only, and others taking the complete network topology into account. Common to all approaches is the understanding that novel disease-associated candidates are in close overall proximity to known disease genes. However, the relevance of these methods to the prediction of novel drug targets has not yet been assessed. Here, we present a network-based approach for the prediction of drug targets for a given disease. The method allows both repositioning drug targets known for other diseases to the given disease and the prediction of unexploited drug targets which are not used for treatment of any disease. Our approach takes as input a disease gene expression signature and a high-quality interaction network and outputs a prioritized list of drug targets. We demonstrate the high performance of our method and highlight the usefulness of the predictions in three case studies. We present novel drug targets for scleroderma and different types of cancer with their underlying biological processes. Furthermore, we demonstrate the ability of our method to identify non-suspected repositioning candidates using diabetes type 1 as an example.
Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos , Algoritmos , Análise por Conglomerados , Simulação por Computador , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Terapia de Alvo Molecular , Curva ROC , Reprodutibilidade dos TestesRESUMO
As it is the case with any OMICs technology, the value of proteomics data is defined by the degree of its functional interpretation in the context of phenotype. Functional analysis of proteomics profiles is inherently complex, as each of hundreds of detected proteins can belong to dozens of pathways, be connected in different context-specific groups by protein interactions and regulated by a variety of one-step and remote regulators. Knowledge-based approach deals with this complexity by creating a structured database of protein interactions, pathways and protein-disease associations from experimental literature and a set of statistical tools to compare the proteomics profiles with this rich source of accumulated knowledge. Here we describe the main methods of ontology enrichment, interactome topology and network analysis applied on a comprehensive, manually curated and semantically consistent knowledge source MetaBase and demonstrate several case studies in different disease areas.
Assuntos
Bases de Dados de Proteínas/normas , Bases de Conhecimento , Proteômica/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Proteínas/genéticaRESUMO
BACKGROUND: There is resurgence within drug and biomarker development communities for the use of primary tumorgraft models as improved predictors of patient tumor response to novel therapeutic strategies. Despite perceived advantages over cell line derived xenograft models, there is limited data comparing the genotype and phenotype of tumorgrafts to the donor patient tumor, limiting the determination of molecular relevance of the tumorgraft model. This report directly compares the genomic characteristics of patient tumors and the derived tumorgraft models, including gene expression, and oncogenic mutation status. METHODS: Fresh tumor tissues from 182 cancer patients were implanted subcutaneously into immune-compromised mice for the development of primary patient tumorgraft models. Histological assessment was performed on both patient tumors and the resulting tumorgraft models. Somatic mutations in key oncogenes and gene expression levels of resulting tumorgrafts were compared to the matched patient tumors using the OncoCarta (Sequenom, San Diego, CA) and human gene microarray (Affymetrix, Santa Clara, CA) platforms respectively. The genomic stability of the established tumorgrafts was assessed across serial in vivo generations in a representative subset of models. The genomes of patient tumors that formed tumorgrafts were compared to those that did not to identify the possible molecular basis to successful engraftment or rejection. RESULTS: Fresh tumor tissues from 182 cancer patients were implanted into immune-compromised mice with forty-nine tumorgraft models that have been successfully established, exhibiting strong histological and genomic fidelity to the originating patient tumors. Comparison of the transcriptomes and oncogenic mutations between the tumorgrafts and the matched patient tumors were found to be stable across four tumorgraft generations. Not only did the various tumors retain the differentiation pattern, but supporting stromal elements were preserved. Those genes down-regulated specifically in tumorgrafts were enriched in biological pathways involved in host immune response, consistent with the immune deficiency status of the host. Patient tumors that successfully formed tumorgrafts were enriched for cell signaling, cell cycle, and cytoskeleton pathways and exhibited evidence of reduced immunogenicity. CONCLUSIONS: The preservation of the patient's tumor genomic profile and tumor microenvironment supports the view that primary patient tumorgrafts provide a relevant model to support the translation of new therapeutic strategies and personalized medicine approaches in oncology.
Assuntos
Genômica , Neoplasias/genética , Animais , Humanos , Camundongos , Camundongos Nus , Mutação , Neoplasias/patologiaRESUMO
The ability to accurately predict the toxicity of drug candidates from their chemical structure is critical for guiding experimental drug discovery toward safer medicines. Under the guidance of the MetaTox consortium (Thomson Reuters, CA, USA), which comprised toxicologists from the pharmaceutical industry and government agencies, we created a comprehensive ontology of toxic pathologies for 19 organs, classifying pathology terms by pathology type and functional organ substructure. By manual annotation of full-text research articles, the ontology was populated with chemical compounds causing specific histopathologies. Annotated compound-toxicity associations defined histologically from rat and mouse experiments were used to build quantitative structure-activity relationship models predicting subcategories of liver and kidney toxicity: liver necrosis, liver relative weight gain, liver lipid accumulation, nephron injury, kidney relative weight gain, and kidney necrosis. All models were validated using two independent test sets and demonstrated overall good performance: initial validation showed 0.80-0.96 sensitivity (correctly predicted toxic compounds) and 0.85-1.00 specificity (correctly predicted non-toxic compounds). Later validation against a test set of compounds newly added to the database in the 2 years following initial model generation showed 75-87% sensitivity and 60-78% specificity. General hepatotoxicity and nephrotoxicity models were less accurate, as expected for more complex endpoints.