Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
1.
Cancers (Basel) ; 14(9)2022 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-35565214

RESUMEN

Seventy percent of patients with colorectal cancer develop liver metastases (CRLM), which are a decisive factor in cancer progression. Therapy outcome is largely influenced by tumor heterogeneity, but the intra- and inter-patient heterogeneity of CRLM has been poorly studied. In particular, the contribution of the WNT and EGFR pathways, which are both frequently deregulated in colorectal cancer, has not yet been addressed in this context. To this end, we comprehensively characterized normal liver tissue and eight CRLM from two patients by standardized histopathological, molecular, and proteomic subtyping. Suitable fresh-frozen tissue samples were profiled by transcriptome sequencing (RNA-Seq) and proteomic profiling with reverse phase protein arrays (RPPA) combined with bioinformatic analyses to assess tumor heterogeneity and identify WNT- and EGFR-related master regulators and metastatic effectors. A standardized data analysis pipeline for integrating RNA-Seq with clinical, proteomic, and genetic data was established. Dimensionality reduction of the transcriptome data revealed a distinct signature for CRLM differing from normal liver tissue and indicated a high degree of tumor heterogeneity. WNT and EGFR signaling were highly active in CRLM and the genes of both pathways were heterogeneously expressed between the two patients as well as between the synchronous metastases of a single patient. An analysis of the master regulators and metastatic effectors implicated in the regulation of these genes revealed a set of four genes (SFN, IGF2BP1, STAT1, PIK3CG) that were differentially expressed in CRLM and were associated with clinical outcome in a large cohort of colorectal cancer patients as well as CRLM samples. In conclusion, high-throughput profiling enabled us to define a CRLM-specific signature and revealed the genes of the WNT and EGFR pathways associated with inter- and intra-patient heterogeneity, which were validated as prognostic biomarkers in CRC primary tumors as well as liver metastases.

2.
PLoS One ; 16(10): e0258623, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34653224

RESUMEN

Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.


Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Minería de Datos/métodos , Algoritmos , Neoplasias de la Mama/metabolismo , Bases de Datos Factuales , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Mapas de Interacción de Proteínas , Terminología como Asunto
3.
NPJ Syst Biol Appl ; 7(1): 38, 2021 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-34671039

RESUMEN

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

4.
Front Genet ; 12: 670240, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34211498

RESUMEN

Only 2% of glioblastoma multiforme (GBM) patients respond to standard therapy and survive beyond 36 months (long-term survivors, LTS), while the majority survive less than 12 months (short-term survivors, STS). To understand the mechanism leading to poor survival, we analyzed publicly available datasets of 113 STS and 58 LTS. This analysis revealed 198 differentially expressed genes (DEGs) that characterize aggressive tumor growth and may be responsible for the poor prognosis. These genes belong largely to the Gene Ontology (GO) categories "epithelial-to-mesenchymal transition" and "response to hypoxia." In this article, we applied an upstream analysis approach that involves state-of-the-art promoter analysis and network analysis of the dysregulated genes potentially responsible for short survival in GBM. Binding sites for transcription factors (TFs) associated with GBM pathology like NANOG, NF-κB, REST, FRA-1, PPARG, and seven others were found enriched in the promoters of the dysregulated genes. We reconstructed the gene regulatory network with several positive feedback loops controlled by five master regulators [insulin-like growth factor binding protein 2 (IGFBP2), vascular endothelial growth factor A (VEGFA), VEGF165, platelet-derived growth factor A (PDGFA), adipocyte enhancer-binding protein (AEBP1), and oncostatin M (OSMR)], which can be proposed as biomarkers and as therapeutic targets for enhancing GBM prognosis. A critical analysis of this gene regulatory network gives insights into the mechanism of gene regulation by IGFBP2 via several TFs including the key molecule of GBM tumor invasiveness and progression, FRA-1. All the observations were validated in independent cohorts, and their impact on overall survival has been investigated.

5.
PLoS One ; 15(4): e0231326, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32275727

RESUMEN

Cell differentiation is a complex process orchestrated by sets of regulators precisely appearing at certain time points, resulting in regulatory cascades that affect the expression of broader sets of genes, ending up in the formation of different tissues and organ parts. The identification of stage-specific master regulators and the mechanism by which they activate each other is a key to understanding and controlling differentiation, particularly in the fields of tissue regeneration and organoid engineering. Here we present a workflow that combines a comprehensive general regulatory network based on binding site predictions with user-provided temporal gene expression data, to generate a a temporally connected series of stage-specific regulatory networks, which we call a temporal regulatory cascade (TRC). A TRC identifies those regulators that are unique for each time point, resulting in a cascade that shows the emergence of these regulators and regulatory interactions across time. The model was implemented in the form of a user-friendly, visual web-tool, that requires no expert knowledge in programming or statistics, making it directly usable for life scientists. In addition to generating TRCs the tool links multiple interactive visual workflows, in which a user can track and investigate further different regulators, target genes, and interactions, directing the tool along the way into biologically sensible results based on the given dataset. We applied the TRC model on two different expression datasets, one based on experiments conducted on human induced pluripotent stem cells (hiPSCs) undergoing differentiation into mature cardiomyocytes and the other based on the differentiation of H1-derived human neuronal precursor cells. The model was successful in identifying previously known and new potential key regulators, in addition to the particular time points with which these regulators are associated, in cardiac and neural development.


Asunto(s)
Diferenciación Celular , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/metabolismo , Miocitos Cardíacos/citología , Miocitos Cardíacos/metabolismo , Células-Madre Neurales , Programas Informáticos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
6.
BMC Bioinformatics ; 20(Suppl 4): 119, 2019 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-30999858

RESUMEN

BACKGROUND: The search for molecular biomarkers of early-onset colorectal cancer (CRC) is an important but still quite challenging and unsolved task. Detection of CpG methylation in human DNA obtained from blood or stool has been proposed as a promising approach to a noninvasive early diagnosis of CRC. Thousands of abnormally methylated CpG positions in CRC genomes are often located in non-coding parts of genes. Novel bioinformatic methods are thus urgently needed for multi-omics data analysis to reveal causative biomarkers with a potential driver role in early stages of cancer. METHODS: We have developed a method for finding potential causal relationships between epigenetic changes (DNA methylations) in gene regulatory regions that affect transcription factor binding sites (TFBS) and gene expression changes. This method also considers the topology of the involved signal transduction pathways and searches for positive feedback loops that may cause the carcinogenic aberrations in gene expression. We call this method "Walking pathways", since it searches for potential rewiring mechanisms in cancer pathways due to dynamic changes in the DNA methylation status of important gene regulatory regions ("epigenomic walking"). RESULTS: In this paper, we analysed an extensive collection of full genome gene-expression data (RNA-seq) and DNA methylation data of genomic CpG islands (using Illumina methylation arrays) generated from a sample of tumor and normal gut epithelial tissues of 300 patients with colorectal cancer (at different stages of the disease) (data generated in the EU-supported SysCol project). Identification of potential epigenetic biomarkers of DNA methylation was performed using the fully automatic multi-omics analysis web service "My Genome Enhancer" (MGE) (my-genome-enhancer.com). MGE uses the database on gene regulation TRANSFAC®, the signal transduction pathways database TRANSPATH®, and software that employs AI (artificial intelligence) methods for the analysis of cancer-specific enhancers. CONCLUSIONS: The identified biomarkers underwent experimental testing on an independent set of blood samples from patients with colorectal cancer. As a result, using advanced methods of statistics and machine learning, a minimum set of 6 biomarkers was selected, which together achieve the best cancer detection potential. The markers include hypermethylated positions in regulatory regions of the following genes: CALCA, ENO1, MYC, PDX1, TCF7, ZNF43.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Colorrectales/genética , Metilación de ADN/genética , Retroalimentación Fisiológica , Transducción de Señal/genética , Sitios de Unión/genética , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/patología , Islas de CpG/genética , Epigénesis Genética , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Estadificación de Neoplasias , Factores de Transcripción/metabolismo
7.
Front Genet ; 9: 189, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29896218

RESUMEN

Today, it is well-known that in eukaryotic cells the complex interplay of transcription factors (TFs) bound to the DNA of promoters and enhancers is the basis for precise and specific control of transcription. Computational methods have been developed for the identification of potentially cooperating TFs through the co-occurrence of their binding sites (TFBSs). One challenge of these methods is the differentiation of TFBS pairs that are specific for a given sequence set from those that are ubiquitously appearing, rendering the results highly dependent on the choice of a proper background set. Here, we present an extension of our previous PC-TraFF approach that estimates the background co-occurrence of any TF pair by preserving the (oligo-) nucleotide composition and, thus, the core of TFBSs in the sequences of interest. Applying our approach to a simulated data set with implanted TFBS pairs, we could successfully identify them as sequence-set specific under a variety of conditions. When we analyzed the gene expression data sets of five breast cancer associated subtypes, the number of overlapping pairs could be dramatically reduced in comparison to our previous approach. As a result, we could identify potentially cooperating transcriptional regulators that are characteristic for each of the five breast cancer subtypes. This indicates that our approach is able to discriminate specific potential TF cooperations against ubiquitously occurring combinations. The results obtained with our method may help to understand the genetic programs governing specific biological processes such as the development of different tumor types.

8.
Nucleic Acids Res ; 46(D1): D168-D174, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29077896

RESUMEN

The cell-specific information of transcriptional regulation on microRNAs (miRNAs) is crucial to the precise understanding of gene regulations in various physiological and pathological processes existed in different tissues and cell types. The database, mirTrans, provides comprehensive information about cell-specific transcription of miRNAs including the transcriptional start sites (TSSs) of miRNAs, transcription factor (TF) to miRNA regulations and miRNA promoter sequences. mirTrans also maps the experimental H3K4me3 and DHS (DNase-I hypersensitive site) marks within miRNA promoters and expressed sequence tags (ESTs) within transcribed regions. The current version of database covers 35 259 TSSs and over 2.3 million TF-miRNA regulations for 1513 miRNAs in a total of 54 human cell lines. These cell lines span most of the biological systems, including circulatory system, digestive system and nervous system. Information for both the intragenic miRNAs and intergenic miRNAs is offered. Particularly, the quality of miRNA TSSs and TF-miRNA regulations is evaluated by literature curation. 23 447 TSS records and 2148 TF-miRNA regulations are supported by special experiments as a result of literature curation. EST coverage is also used to evaluate the accuracy of miRNA TSSs. Interface of mirTrans is friendly designed and convenient to make downloads (http://mcube.nju.edu.cn/jwang/lab/soft/mirtrans/ or http://120.27.239.192/mirtrans/).


Asunto(s)
Bases de Datos de Ácidos Nucleicos , MicroARNs/genética , MicroARNs/metabolismo , Línea Celular , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica , Código de Histonas , Humanos , Regiones Promotoras Genéticas , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Interfaz Usuario-Computador
9.
Nucleic Acids Res ; 46(D1): D343-D347, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29087517

RESUMEN

TFClass is a resource that classifies eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs), available online at http://tfclass.bioinf.med.uni-goettingen.de. The classification scheme of TFClass was originally derived for human TFs and is expanded here to the whole taxonomic class of mammalia. Combining information from different resources, checking manually the retrieved mammalian TFs sequences and applying extensive phylogenetic analyses, >39 000 TFs from up to 41 mammalian species were assigned to the Superclasses, Classes, Families and Subfamilies of TFClass. As a result, TFClass now provides the corresponding sequence collection in FASTA format, sequence logos and phylogenetic trees at different classification levels, predicted TF binding sites for human, mouse, dog and cow genomes as well as links to several external databases. In particular, all those TFs that are also documented in the TRANSFAC® database (FACTOR table) have been linked and can be freely accessed. TRANSFAC® FACTOR can also be queried through an own search interface.


Asunto(s)
Bases de Datos de Proteínas , Factores de Transcripción/clasificación , Animales , Sitios de Unión , Bovinos , Perros , Humanos , Mamíferos , Ratones , Filogenia , Dominios Proteicos , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador
10.
Circulation ; 135(19): 1832-1847, 2017 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-28167635

RESUMEN

BACKGROUND: Advancing structural and functional maturation of stem cell-derived cardiomyocytes remains a key challenge for applications in disease modeling, drug screening, and heart repair. Here, we sought to advance cardiomyocyte maturation in engineered human myocardium (EHM) toward an adult phenotype under defined conditions. METHODS: We systematically investigated cell composition, matrix, and media conditions to generate EHM from embryonic and induced pluripotent stem cell-derived cardiomyocytes and fibroblasts with organotypic functionality under serum-free conditions. We used morphological, functional, and transcriptome analyses to benchmark maturation of EHM. RESULTS: EHM demonstrated important structural and functional properties of postnatal myocardium, including: (1) rod-shaped cardiomyocytes with M bands assembled as a functional syncytium; (2) systolic twitch forces at a similar level as observed in bona fide postnatal myocardium; (3) a positive force-frequency response; (4) inotropic responses to ß-adrenergic stimulation mediated via canonical ß1- and ß2-adrenoceptor signaling pathways; and (5) evidence for advanced molecular maturation by transcriptome profiling. EHM responded to chronic catecholamine toxicity with contractile dysfunction, cardiomyocyte hypertrophy, cardiomyocyte death, and N-terminal pro B-type natriuretic peptide release; all are classical hallmarks of heart failure. In addition, we demonstrate the scalability of EHM according to anticipated clinical demands for cardiac repair. CONCLUSIONS: We provide proof-of-concept for a universally applicable technology for the engineering of macroscale human myocardium for disease modeling and heart repair from embryonic and induced pluripotent stem cell-derived cardiomyocytes under defined, serum-free conditions.


Asunto(s)
Células Madre Embrionarias/trasplante , Insuficiencia Cardíaca/terapia , Células Madre Pluripotentes Inducidas/trasplante , Miocitos Cardíacos/trasplante , Ingeniería de Tejidos/métodos , Remodelación Ventricular/fisiología , Animales , Diferenciación Celular/fisiología , Células Madre Embrionarias/fisiología , Insuficiencia Cardíaca/patología , Humanos , Células Madre Pluripotentes Inducidas/fisiología , Miocardio/citología , Miocardio/patología , Miocitos Cardíacos/fisiología , Impresión Tridimensional , Ratas , Ratas Desnudas
11.
PLoS One ; 11(8): e0160803, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27517874

RESUMEN

ChIP-seq experiments detect the chromatin occupancy of known transcription factors in a genome-wide fashion. The comparisons of several species-specific ChIP-seq libraries done for different transcription factors have revealed a complex combinatorial and context-specific co-localization behavior for the identified binding regions. In this study we have investigated human derived ChIP-seq data to identify common cis-regulatory principles for the human transcription factor c-Fos. We found that in four different cell lines, c-Fos targeted proximal and distal genomic intervals show prevalences for either AP-1 motifs or CCAAT boxes as known binding motifs for the transcription factor NF-Y, and thereby act in a mutually exclusive manner. For proximal regions of co-localized c-Fos and NF-YB binding, we gathered evidence that a characteristic configuration of repeating CCAAT motifs may be responsible for attracting c-Fos, probably provided by a nearby AP-1 bound enhancer. Our results suggest a novel regulatory function of NF-Y in gene-proximal regions. Specific CCAAT dimer repeats bound by the transcription factor NF-Y define this novel cis-regulatory module. Based on this behavior we propose a new enhancer promoter interaction model based on AP-1 motif defined enhancers which interact with CCAAT-box characterized promoter regions.


Asunto(s)
Factor de Unión a CCAAT/química , Factor de Unión a CCAAT/metabolismo , Regiones Promotoras Genéticas , Proteínas Proto-Oncogénicas c-fos/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Sitios de Unión , Línea Celular Tumoral , Dimerización , Humanos , Modelos Moleculares , Proteínas Proto-Oncogénicas c-fos/química , Factor de Transcripción AP-1/metabolismo , Factores de Transcripción p300-CBP/metabolismo
12.
13.
Bioinformatics ; 32(16): 2403-10, 2016 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153609

RESUMEN

MOTIVATION: Identification of microRNA (miRNA) transcriptional start sites (TSSs) is crucial to understand the transcriptional regulation of miRNA. As miRNA expression is highly cell specific, an automatic and systematic method that could identify miRNA TSSs accurately and cell specifically is in urgent requirement. RESULTS: A workflow to identify the TSSs of miRNAs was built by integrating the data of H3K4me3 and DNase I hypersensitive sites as well as combining the conservation level and sequence feature. By applying the workflow to the data for 54 cell lines from the ENCODE project, we successfully identified TSSs for 663 intragenic miRNAs and 620 intergenic miRNAs, which cover 84.2% (1283/1523) of all miRNAs recorded in miRBase 18. For these cell lines, we found 4042 alternative TSSs for intragenic miRNAs and 3186 alternative TSSs for intergenic miRNAs. Our method achieved a better performance than the previous non-cell-specific methods on miRNA TSSs. The cell-specific method developed by Georgakilas et al. gives 158 TSSs of higher accuracy in two cell lines, benefitting from the employment of deep-sequencing technique. In contrast, our method provided a much higher number of miRNA TSSs (7228) for a broader range of cell lines without the limitation of costly deep-sequencing data, thus being more applicable for various experimental cases. Analysis showed that upstream promoters at - 2 kb to - 200 bp of TSS are more conserved for independently transcribed miRNAs, while for miRNAs transcribed with host genes, their core promoters (-200 bp to 200 bp of TSS) are significantly conserved. AVAILABILITY AND IMPLEMENTATION: Predicted miRNA TSSs and promoters can be downloaded from supplementary files. CONTACT: jwang@nju.edu.cn or jlee@nju.edu.cn or edgar.wingender@bioinf.med.uni-goettingen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
MicroARNs , Sitio de Iniciación de la Transcripción , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Regiones Promotoras Genéticas
14.
Front Genet ; 7: 33, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27047536

RESUMEN

Transcription factors (TFs) regulate gene expression in living organisms. In higher organisms, TFs often interact in non-random combinations with each other to control gene transcription. Understanding the interactions is key to decipher mechanisms underlying tissue development. The aim of this study was to analyze co-occurring transcription factor binding sites (TFBSs) in a time series dataset from a new cell-culture model of human heart muscle development in order to identify common as well as specific co-occurring TFBS pairs in the promoter regions of regulated genes which can be essential to enhance cardiac tissue developmental processes. To this end, we separated available RNAseq dataset into five temporally defined groups: (i) mesoderm induction stage; (ii) early cardiac specification stage; (iii) late cardiac specification stage; (iv) early cardiac maturation stage; (v) late cardiac maturation stage, where each of these stages is characterized by unique differentially expressed genes (DEGs). To identify TFBS pairs for each stage, we applied the MatrixCatch algorithm, which is a successful method to deduce experimentally described TFBS pairs in the promoters of the DEGs. Although DEGs in each stage are distinct, our results show that the TFBS pair networks predicted by MatrixCatch for all stages are quite similar. Thus, we extend the results of MatrixCatch utilizing a Markov clustering algorithm (MCL) to perform network analysis. Using our extended approach, we are able to separate the TFBS pair networks in several clusters to highlight stage-specific co-occurences between TFBSs. Our approach has revealed clusters that are either common (NFAT or HMGIY clusters) or specific (SMAD or AP-1 clusters) for the individual stages. Several of these clusters are likely to play an important role during the cardiomyogenesis. Further, we have shown that the related TFs of TFBSs in the clusters indicate potential synergistic or antagonistic interactions to switch between different stages. Additionally, our results suggest that cardiomyogenesis follows the hourglass model which was already proven for Arabidopsis and some vertebrates. This investigation helps us to get a better understanding of how each stage of cardiomyogenesis is affected by different combination of TFs. Such knowledge may help to understand basic principles of stem cell differentiation into cardiomyocytes.

15.
Front Genet ; 7: 42, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27092172

RESUMEN

Transcription factors (TFs) are gene regulatory proteins that are essential for an effective regulation of the transcriptional machinery. Today, it is known that their expression plays an important role in several types of cancer. Computational identification of key players in specific cancer cell lines is still an open challenge in cancer research. In this study, we present a systematic approach which combines colorectal cancer (CRC) cell lines, namely 1638N-T1 and CMT-93, and well-established computational methods in order to compare these cell lines on the level of transcriptional regulation as well as on a pathway level, i.e., the cancer cell-intrinsic pathway repertoire. For this purpose, we firstly applied the Trinity platform to detect signature genes, and then applied analyses of the geneXplain platform to these for detection of upstream transcriptional regulators and their regulatory networks. We created a CRC-specific position weight matrix (PWM) library based on the TRANSFAC database (release 2014.1) to minimize the rate of false predictions in the promoter analyses. Using our proposed workflow, we specifically focused on revealing the similarities and differences in transcriptional regulation between the two CRC cell lines, and report a number of well-known, cancer-associated TFs with significantly enriched binding sites in the promoter regions of the signature genes. We show that, although the signature genes of both cell lines show no overlap, they may still be regulated by common TFs in CRC. Based on our findings, we suggest that canonical Wnt signaling is activated in 1638N-T1, but inhibited in CMT-93 through cross-talks of Wnt signaling with the VDR signaling pathway and/or LXR-related pathways. Furthermore, our findings provide indication of several master regulators being present such as MLK3 and Mapk1 (ERK2) which might be important in cell proliferation, migration, and invasion of 1638N-T1 and CMT-93, respectively. Taken together, we provide new insights into the invasive potential of these cell lines, which can be used for development of effective cancer therapy.

16.
EuPA Open Proteom ; 13: 1-13, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29900117

RESUMEN

We present an "upstream analysis" strategy for causal analysis of multiple "-omics" data. It analyzes promoters using the TRANSFAC database, combines it with an analysis of the upstream signal transduction pathways and identifies master regulators as potential drug targets for a pathological process. We applied this approach to a complex multi-omics data set that contains transcriptomics, proteomics and epigenomics data. We identified the following potential drug targets against induced resistance of cancer cells towards chemotherapy by methotrexate (MTX): TGFalpha, IGFBP7, alpha9-integrin, and the following chemical compounds: zardaverine and divalproex as well as human metabolites such as nicotinamide N-oxide.

17.
BMC Bioinformatics ; 16: 400, 2015 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-26627005

RESUMEN

BACKGROUND: Transcription factors (TFs) are important regulatory proteins that govern transcriptional regulation. Today, it is known that in higher organisms different TFs have to cooperate rather than acting individually in order to control complex genetic programs. The identification of these interactions is an important challenge for understanding the molecular mechanisms of regulating biological processes. In this study, we present a new method based on pointwise mutual information, PC-TraFF, which considers the genome as a document, the sequences as sentences, and TF binding sites (TFBSs) as words to identify interacting TFs in a set of sequences. RESULTS: To demonstrate the effectiveness of PC-TraFF, we performed a genome-wide analysis and a breast cancer-associated sequence set analysis for protein coding and miRNA genes. Our results show that in any of these sequence sets, PC-TraFF is able to identify important interacting TF pairs, for most of which we found support by previously published experimental results. Further, we made a pairwise comparison between PC-TraFF and three conventional methods. The outcome of this comparison study strongly suggests that all these methods focus on different important aspects of interaction between TFs and thus the pairwise overlap between any of them is only marginal. CONCLUSIONS: In this study, adopting the idea from the field of linguistics in the field of bioinformatics, we develop a new information theoretic method, PC-TraFF, for the identification of potentially collaborating transcription factors based on the idiosyncrasy of their binding site distributions on the genome. The results of our study show that PC-TraFF can succesfully identify known interacting TF pairs and thus its currently biologically uncorfirmed predictions could provide new hypotheses for further experimental validation. Additionally, the comparison of the results of PC-TraFF with the results of previous methods demonstrates that different methods with their specific scopes can perfectly supplement each other. Overall, our analyses indicate that PC-TraFF is a time-efficient method where its algorithm has a tractable computational time and memory consumption. The PC-TraFF server is freely accessible at http://pctraff.bioinf.med.uni-goettingen.de/.


Asunto(s)
Algoritmos , Neoplasias de la Mama/metabolismo , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Factores de Transcripción/metabolismo , Sitios de Unión , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Femenino , Humanos , MicroARNs/genética , Regiones Promotoras Genéticas/genética , Unión Proteica
18.
BMC Bioinformatics ; 16: 200, 2015 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-26108437

RESUMEN

BACKGROUND: Exploratory analysis of multi-dimensional high-throughput datasets, such as microarray gene expression time series, may be instrumental in understanding the genetic programs underlying numerous biological processes. In such datasets, variations in the gene expression profiles are usually observed across replicates and time points. Thus mining the temporal expression patterns in such multi-dimensional datasets may not only provide insights into the key biological processes governing organs to grow and develop but also facilitate the understanding of the underlying complex gene regulatory circuits. RESULTS: In this work we have developed an evolutionary multi-objective optimization for our previously introduced triclustering algorithm δ-TRIMAX. Its aim is to make optimal use of δ-TRIMAX in extracting groups of co-expressed genes from time series gene expression data, or from any 3D gene expression dataset, by adding the powerful capabilities of an evolutionary algorithm to retrieve overlapping triclusters. We have compared the performance of our newly developed algorithm, EMOA- δ-TRIMAX, with that of other existing triclustering approaches using four artificial dataset and three real-life datasets. Moreover, we have analyzed the results of our algorithm on one of these real-life datasets monitoring the differentiation of human induced pluripotent stem cells (hiPSC) into mature cardiomyocytes. For each group of co-expressed genes belonging to one tricluster, we identified key genes by computing their membership values within the tricluster. It turned out that to a very high percentage, these key genes were significantly enriched in Gene Ontology categories or KEGG pathways that fitted very well to the biological context of cardiomyocytes differentiation. CONCLUSIONS: EMOA- δ-TRIMAX has proven instrumental in identifying groups of genes in transcriptomic data sets that represent the functional categories constituting the biological process under study. The executable file can be found at http://www.bioinf.med.uni-goettingen.de/fileadmin/download/EMOA-delta-TRIMAX.tar.gz .


Asunto(s)
Algoritmos , Biomarcadores/análisis , Diferenciación Celular/genética , Perfilación de la Expresión Génica/métodos , Células Madre Pluripotentes Inducidas/metabolismo , Miocitos Cardíacos/metabolismo , Transcriptoma/genética , Fenómenos Biológicos , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Redes Reguladoras de Genes , Humanos , Células Madre Pluripotentes Inducidas/citología , Miocitos Cardíacos/citología , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Factores de Tiempo
19.
Microarrays (Basel) ; 4(2): 270-86, 2015 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-27600225

RESUMEN

A strategy is presented that allows a causal analysis of co-expressed genes, which may be subject to common regulatory influences. A state-of-the-art promoter analysis for potential transcription factor (TF) binding sites in combination with a knowledge-based analysis of the upstream pathway that control the activity of these TFs is shown to lead to hypothetical master regulators. This strategy was implemented as a workflow in a comprehensive bioinformatic software platform. We applied this workflow to gene sets that were identified by a novel triclustering algorithm in naphthalene-induced gene expression signatures of murine liver and lung tissue. As a result, tissue-specific master regulators were identified that are known to be linked with tumorigenic and apoptotic processes. To our knowledge, this is the first time that genes of expression triclusters were used to identify upstream regulators.

20.
Nucleic Acids Res ; 43(Database issue): D97-102, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25361979

RESUMEN

TFClass aims at classifying eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs). For this, a classification schema comprising four generic levels (superclass, class, family and subfamily) was defined that could accommodate all known DNA-binding human TFs. They were assigned to their (sub-)families as instances at two different levels, the corresponding TF genes and individual gene products (protein isoforms). In the present version, all mouse and rat orthologs have been linked to the human TFs, and the mouse orthologs have been arranged in an independent ontology. Many TFs were assigned with typical DNA-binding patterns and positional weight matrices derived from high-throughput in-vitro binding studies. Predicted TF binding sites from human gene upstream sequences are now also attached to each human TF whenever a PWM was available for this factor or one of his paralogs. TFClass is freely available at http://tfclass.bioinf.med.uni-goettingen.de/ through a web interface and for download in OBO format.


Asunto(s)
Bases de Datos de Proteínas , Factores de Transcripción/clasificación , Animales , Sitios de Unión , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/metabolismo , Humanos , Internet , Ratones , Estructura Terciaria de Proteína , Ratas , Factores de Transcripción/química , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...