Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 46(D1): D343-D347, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29087517

RESUMEN

TFClass is a resource that classifies eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs), available online at http://tfclass.bioinf.med.uni-goettingen.de. The classification scheme of TFClass was originally derived for human TFs and is expanded here to the whole taxonomic class of mammalia. Combining information from different resources, checking manually the retrieved mammalian TFs sequences and applying extensive phylogenetic analyses, >39 000 TFs from up to 41 mammalian species were assigned to the Superclasses, Classes, Families and Subfamilies of TFClass. As a result, TFClass now provides the corresponding sequence collection in FASTA format, sequence logos and phylogenetic trees at different classification levels, predicted TF binding sites for human, mouse, dog and cow genomes as well as links to several external databases. In particular, all those TFs that are also documented in the TRANSFAC® database (FACTOR table) have been linked and can be freely accessed. TRANSFAC® FACTOR can also be queried through an own search interface.


Asunto(s)
Bases de Datos de Proteínas , Factores de Transcripción/clasificación , Animales , Sitios de Unión , Bovinos , Perros , Humanos , Mamíferos , Ratones , Filogenia , Dominios Proteicos , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador
2.
Nucleic Acids Res ; 46(D1): D168-D174, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29077896

RESUMEN

The cell-specific information of transcriptional regulation on microRNAs (miRNAs) is crucial to the precise understanding of gene regulations in various physiological and pathological processes existed in different tissues and cell types. The database, mirTrans, provides comprehensive information about cell-specific transcription of miRNAs including the transcriptional start sites (TSSs) of miRNAs, transcription factor (TF) to miRNA regulations and miRNA promoter sequences. mirTrans also maps the experimental H3K4me3 and DHS (DNase-I hypersensitive site) marks within miRNA promoters and expressed sequence tags (ESTs) within transcribed regions. The current version of database covers 35 259 TSSs and over 2.3 million TF-miRNA regulations for 1513 miRNAs in a total of 54 human cell lines. These cell lines span most of the biological systems, including circulatory system, digestive system and nervous system. Information for both the intragenic miRNAs and intergenic miRNAs is offered. Particularly, the quality of miRNA TSSs and TF-miRNA regulations is evaluated by literature curation. 23 447 TSS records and 2148 TF-miRNA regulations are supported by special experiments as a result of literature curation. EST coverage is also used to evaluate the accuracy of miRNA TSSs. Interface of mirTrans is friendly designed and convenient to make downloads (http://mcube.nju.edu.cn/jwang/lab/soft/mirtrans/ or http://120.27.239.192/mirtrans/).


Asunto(s)
Bases de Datos de Ácidos Nucleicos , MicroARNs/genética , MicroARNs/metabolismo , Línea Celular , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica , Código de Histonas , Humanos , Regiones Promotoras Genéticas , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Interfaz Usuario-Computador
3.
BMC Bioinformatics ; 20(Suppl 4): 119, 2019 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-30999858

RESUMEN

BACKGROUND: The search for molecular biomarkers of early-onset colorectal cancer (CRC) is an important but still quite challenging and unsolved task. Detection of CpG methylation in human DNA obtained from blood or stool has been proposed as a promising approach to a noninvasive early diagnosis of CRC. Thousands of abnormally methylated CpG positions in CRC genomes are often located in non-coding parts of genes. Novel bioinformatic methods are thus urgently needed for multi-omics data analysis to reveal causative biomarkers with a potential driver role in early stages of cancer. METHODS: We have developed a method for finding potential causal relationships between epigenetic changes (DNA methylations) in gene regulatory regions that affect transcription factor binding sites (TFBS) and gene expression changes. This method also considers the topology of the involved signal transduction pathways and searches for positive feedback loops that may cause the carcinogenic aberrations in gene expression. We call this method "Walking pathways", since it searches for potential rewiring mechanisms in cancer pathways due to dynamic changes in the DNA methylation status of important gene regulatory regions ("epigenomic walking"). RESULTS: In this paper, we analysed an extensive collection of full genome gene-expression data (RNA-seq) and DNA methylation data of genomic CpG islands (using Illumina methylation arrays) generated from a sample of tumor and normal gut epithelial tissues of 300 patients with colorectal cancer (at different stages of the disease) (data generated in the EU-supported SysCol project). Identification of potential epigenetic biomarkers of DNA methylation was performed using the fully automatic multi-omics analysis web service "My Genome Enhancer" (MGE) (my-genome-enhancer.com). MGE uses the database on gene regulation TRANSFAC®, the signal transduction pathways database TRANSPATH®, and software that employs AI (artificial intelligence) methods for the analysis of cancer-specific enhancers. CONCLUSIONS: The identified biomarkers underwent experimental testing on an independent set of blood samples from patients with colorectal cancer. As a result, using advanced methods of statistics and machine learning, a minimum set of 6 biomarkers was selected, which together achieve the best cancer detection potential. The markers include hypermethylated positions in regulatory regions of the following genes: CALCA, ENO1, MYC, PDX1, TCF7, ZNF43.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Colorrectales/genética , Metilación de ADN/genética , Retroalimentación Fisiológica , Transducción de Señal/genética , Sitios de Unión/genética , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/patología , Islas de CpG/genética , Epigénesis Genética , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Estadificación de Neoplasias , Factores de Transcripción/metabolismo
4.
Circulation ; 135(19): 1832-1847, 2017 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-28167635

RESUMEN

BACKGROUND: Advancing structural and functional maturation of stem cell-derived cardiomyocytes remains a key challenge for applications in disease modeling, drug screening, and heart repair. Here, we sought to advance cardiomyocyte maturation in engineered human myocardium (EHM) toward an adult phenotype under defined conditions. METHODS: We systematically investigated cell composition, matrix, and media conditions to generate EHM from embryonic and induced pluripotent stem cell-derived cardiomyocytes and fibroblasts with organotypic functionality under serum-free conditions. We used morphological, functional, and transcriptome analyses to benchmark maturation of EHM. RESULTS: EHM demonstrated important structural and functional properties of postnatal myocardium, including: (1) rod-shaped cardiomyocytes with M bands assembled as a functional syncytium; (2) systolic twitch forces at a similar level as observed in bona fide postnatal myocardium; (3) a positive force-frequency response; (4) inotropic responses to ß-adrenergic stimulation mediated via canonical ß1- and ß2-adrenoceptor signaling pathways; and (5) evidence for advanced molecular maturation by transcriptome profiling. EHM responded to chronic catecholamine toxicity with contractile dysfunction, cardiomyocyte hypertrophy, cardiomyocyte death, and N-terminal pro B-type natriuretic peptide release; all are classical hallmarks of heart failure. In addition, we demonstrate the scalability of EHM according to anticipated clinical demands for cardiac repair. CONCLUSIONS: We provide proof-of-concept for a universally applicable technology for the engineering of macroscale human myocardium for disease modeling and heart repair from embryonic and induced pluripotent stem cell-derived cardiomyocytes under defined, serum-free conditions.


Asunto(s)
Células Madre Embrionarias/trasplante , Insuficiencia Cardíaca/terapia , Células Madre Pluripotentes Inducidas/trasplante , Miocitos Cardíacos/trasplante , Ingeniería de Tejidos/métodos , Remodelación Ventricular/fisiología , Animales , Diferenciación Celular/fisiología , Células Madre Embrionarias/fisiología , Insuficiencia Cardíaca/patología , Humanos , Células Madre Pluripotentes Inducidas/fisiología , Miocardio/citología , Miocardio/patología , Miocitos Cardíacos/fisiología , Impresión Tridimensional , Ratas , Ratas Desnudas
5.
Bioinformatics ; 32(16): 2403-10, 2016 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153609

RESUMEN

MOTIVATION: Identification of microRNA (miRNA) transcriptional start sites (TSSs) is crucial to understand the transcriptional regulation of miRNA. As miRNA expression is highly cell specific, an automatic and systematic method that could identify miRNA TSSs accurately and cell specifically is in urgent requirement. RESULTS: A workflow to identify the TSSs of miRNAs was built by integrating the data of H3K4me3 and DNase I hypersensitive sites as well as combining the conservation level and sequence feature. By applying the workflow to the data for 54 cell lines from the ENCODE project, we successfully identified TSSs for 663 intragenic miRNAs and 620 intergenic miRNAs, which cover 84.2% (1283/1523) of all miRNAs recorded in miRBase 18. For these cell lines, we found 4042 alternative TSSs for intragenic miRNAs and 3186 alternative TSSs for intergenic miRNAs. Our method achieved a better performance than the previous non-cell-specific methods on miRNA TSSs. The cell-specific method developed by Georgakilas et al. gives 158 TSSs of higher accuracy in two cell lines, benefitting from the employment of deep-sequencing technique. In contrast, our method provided a much higher number of miRNA TSSs (7228) for a broader range of cell lines without the limitation of costly deep-sequencing data, thus being more applicable for various experimental cases. Analysis showed that upstream promoters at - 2 kb to - 200 bp of TSS are more conserved for independently transcribed miRNAs, while for miRNAs transcribed with host genes, their core promoters (-200 bp to 200 bp of TSS) are significantly conserved. AVAILABILITY AND IMPLEMENTATION: Predicted miRNA TSSs and promoters can be downloaded from supplementary files. CONTACT: jwang@nju.edu.cn or jlee@nju.edu.cn or edgar.wingender@bioinf.med.uni-goettingen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
MicroARNs , Sitio de Iniciación de la Transcripción , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Regiones Promotoras Genéticas
6.
Nucleic Acids Res ; 43(Database issue): D97-102, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25361979

RESUMEN

TFClass aims at classifying eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs). For this, a classification schema comprising four generic levels (superclass, class, family and subfamily) was defined that could accommodate all known DNA-binding human TFs. They were assigned to their (sub-)families as instances at two different levels, the corresponding TF genes and individual gene products (protein isoforms). In the present version, all mouse and rat orthologs have been linked to the human TFs, and the mouse orthologs have been arranged in an independent ontology. Many TFs were assigned with typical DNA-binding patterns and positional weight matrices derived from high-throughput in-vitro binding studies. Predicted TF binding sites from human gene upstream sequences are now also attached to each human TF whenever a PWM was available for this factor or one of his paralogs. TFClass is freely available at http://tfclass.bioinf.med.uni-goettingen.de/ through a web interface and for download in OBO format.


Asunto(s)
Bases de Datos de Proteínas , Factores de Transcripción/clasificación , Animales , Sitios de Unión , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/metabolismo , Humanos , Internet , Ratones , Estructura Terciaria de Proteína , Ratas , Factores de Transcripción/química , Factores de Transcripción/metabolismo
7.
BMC Bioinformatics ; 16: 200, 2015 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-26108437

RESUMEN

BACKGROUND: Exploratory analysis of multi-dimensional high-throughput datasets, such as microarray gene expression time series, may be instrumental in understanding the genetic programs underlying numerous biological processes. In such datasets, variations in the gene expression profiles are usually observed across replicates and time points. Thus mining the temporal expression patterns in such multi-dimensional datasets may not only provide insights into the key biological processes governing organs to grow and develop but also facilitate the understanding of the underlying complex gene regulatory circuits. RESULTS: In this work we have developed an evolutionary multi-objective optimization for our previously introduced triclustering algorithm δ-TRIMAX. Its aim is to make optimal use of δ-TRIMAX in extracting groups of co-expressed genes from time series gene expression data, or from any 3D gene expression dataset, by adding the powerful capabilities of an evolutionary algorithm to retrieve overlapping triclusters. We have compared the performance of our newly developed algorithm, EMOA- δ-TRIMAX, with that of other existing triclustering approaches using four artificial dataset and three real-life datasets. Moreover, we have analyzed the results of our algorithm on one of these real-life datasets monitoring the differentiation of human induced pluripotent stem cells (hiPSC) into mature cardiomyocytes. For each group of co-expressed genes belonging to one tricluster, we identified key genes by computing their membership values within the tricluster. It turned out that to a very high percentage, these key genes were significantly enriched in Gene Ontology categories or KEGG pathways that fitted very well to the biological context of cardiomyocytes differentiation. CONCLUSIONS: EMOA- δ-TRIMAX has proven instrumental in identifying groups of genes in transcriptomic data sets that represent the functional categories constituting the biological process under study. The executable file can be found at http://www.bioinf.med.uni-goettingen.de/fileadmin/download/EMOA-delta-TRIMAX.tar.gz .


Asunto(s)
Algoritmos , Biomarcadores/análisis , Diferenciación Celular/genética , Perfilación de la Expresión Génica/métodos , Células Madre Pluripotentes Inducidas/metabolismo , Miocitos Cardíacos/metabolismo , Transcriptoma/genética , Fenómenos Biológicos , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Redes Reguladoras de Genes , Humanos , Células Madre Pluripotentes Inducidas/citología , Miocitos Cardíacos/citología , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Factores de Tiempo
8.
BMC Bioinformatics ; 16: 400, 2015 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-26627005

RESUMEN

BACKGROUND: Transcription factors (TFs) are important regulatory proteins that govern transcriptional regulation. Today, it is known that in higher organisms different TFs have to cooperate rather than acting individually in order to control complex genetic programs. The identification of these interactions is an important challenge for understanding the molecular mechanisms of regulating biological processes. In this study, we present a new method based on pointwise mutual information, PC-TraFF, which considers the genome as a document, the sequences as sentences, and TF binding sites (TFBSs) as words to identify interacting TFs in a set of sequences. RESULTS: To demonstrate the effectiveness of PC-TraFF, we performed a genome-wide analysis and a breast cancer-associated sequence set analysis for protein coding and miRNA genes. Our results show that in any of these sequence sets, PC-TraFF is able to identify important interacting TF pairs, for most of which we found support by previously published experimental results. Further, we made a pairwise comparison between PC-TraFF and three conventional methods. The outcome of this comparison study strongly suggests that all these methods focus on different important aspects of interaction between TFs and thus the pairwise overlap between any of them is only marginal. CONCLUSIONS: In this study, adopting the idea from the field of linguistics in the field of bioinformatics, we develop a new information theoretic method, PC-TraFF, for the identification of potentially collaborating transcription factors based on the idiosyncrasy of their binding site distributions on the genome. The results of our study show that PC-TraFF can succesfully identify known interacting TF pairs and thus its currently biologically uncorfirmed predictions could provide new hypotheses for further experimental validation. Additionally, the comparison of the results of PC-TraFF with the results of previous methods demonstrates that different methods with their specific scopes can perfectly supplement each other. Overall, our analyses indicate that PC-TraFF is a time-efficient method where its algorithm has a tractable computational time and memory consumption. The PC-TraFF server is freely accessible at http://pctraff.bioinf.med.uni-goettingen.de/.


Asunto(s)
Algoritmos , Neoplasias de la Mama/metabolismo , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Factores de Transcripción/metabolismo , Sitios de Unión , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Femenino , Humanos , MicroARNs/genética , Regiones Promotoras Genéticas/genética , Unión Proteica
9.
Nucleic Acids Res ; 41(Database issue): D165-70, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23180794

RESUMEN

TFClass (http://tfclass.bioinf.med.uni-goettingen.de/) provides a comprehensive classification of human transcription factors based on their DNA-binding domains. Transcription factors constitute a large functional family of proteins directly regulating the activity of genes. Most of them are sequence-specific DNA-binding proteins, thus reading out the information encoded in cis-regulatory DNA elements of promoters, enhancers and other regulatory regions of a genome. TFClass is a database that classifies human transcription factors by a six-level classification schema, four of which are abstractions according to different criteria, while the fifth level represents TF genes and the sixth individual gene products. Altogether, nine superclasses have been identified, comprising 40 classes and 111 families. Counted by genes, 1558 human TFs have been classified so far or >2900 different TFs when including their isoforms generated by alternative splicing or protein processing events. With this classification, we hope to provide a basis for deciphering protein-DNA recognition codes; moreover, it can be used for constructing expanded transcriptional networks by inferring additional TF-target gene relations.


Asunto(s)
Bases de Datos de Proteínas , Factores de Transcripción/clasificación , Proteínas de Unión al ADN/química , Humanos , Internet , Estructura Terciaria de Proteína , Alineación de Secuencia , Análisis de Secuencia de Proteína , Factores de Transcripción/química
10.
BMC Bioinformatics ; 15: 96, 2014 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-24694117

RESUMEN

BACKGROUND: The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites. RESULTS: The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs. CONCLUSIONS: QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF's algorithm, we leveraged Compute Unified Device Architecture (CUDA).The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.


Asunto(s)
Algoritmos , Mutación , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Aminoácidos/química , Secuencia Conservada , Receptores ErbB/química , Receptores ErbB/genética , Glucoquinasa/química , Glucoquinasa/genética , Humanos , Conformación Proteica , Teoría Cuántica , Alineación de Secuencia
11.
PLoS Comput Biol ; 9(3): e1002958, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23555204

RESUMEN

Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.


Asunto(s)
Biología Computacional/métodos , Motivos de Nucleótidos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Análisis por Conglomerados , ADN/genética , ADN/metabolismo , Bases de Datos Genéticas , Redes Reguladoras de Genes , Modelos Logísticos , Filogenia , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
12.
BMC Bioinformatics ; 14: 241, 2013 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-23924163

RESUMEN

BACKGROUND: Accurate recognition of regulatory elements in promoters is an essential prerequisite for understanding the mechanisms of gene regulation at the level of transcription. Composite regulatory elements represent a particular type of such transcriptional regulatory elements consisting of pairs of individual DNA motifs. In contrast to the present approach, most available recognition techniques are based purely on statistical evaluation of the occurrence of single motifs. Such methods are limited in application, since the accuracy of recognition is greatly dependent on the size and quality of the sequence dataset. Methods that exploit available knowledge and have broad applicability are evidently needed. RESULTS: We developed a novel method to identify composite regulatory elements in promoters using a library of known examples. In depth investigation of regularities encoded in known composite elements allowed us to introduce a new characteristic measure and to improve the specificity compared with other methods. Tests on an established benchmark and real genomic data show that our method outperforms other available methods based either on known examples or statistical evaluations. In addition to better recognition, a practical advantage of this method is first the ability to detect a high number of different types of composite elements, and second direct biological interpretation of the identified results. The program is available at http://gnaweb.helmholtz-hzi.de/cgi-bin/MCatch/MatrixCatch.pl and includes an option to extend the provided library by user supplied data. CONCLUSIONS: The novel algorithm for the identification of composite regulatory elements presented in this paper was proved to be superior to existing methods. Its application to tissue specific promoters identified several highly specific composite elements with relevance to their biological function. This approach together with other methods will further advance the understanding of transcriptional regulation of genes.


Asunto(s)
Biología Computacional , Regiones Promotoras Genéticas , Elementos Reguladores de la Transcripción , Secuencias Reguladoras de Ácidos Nucleicos , Algoritmos , Biología Computacional/instrumentación , Biología Computacional/métodos , Regulación de la Expresión Génica , Genómica/instrumentación , Genómica/métodos , Motivos de Nucleótidos
13.
Bioinformatics ; 28(18): i509-i514, 2012 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-22962474

RESUMEN

SUMMARY: The great variety of human cell types in morphology and function is due to the diverse gene expression profiles that are governed by the distinctive regulatory networks in different cell types. It is still a challenging task to explain how the regulatory networks achieve the diversity of different cell types. Here, we report on our studies of the design principles of the tissue regulatory system by constructing the regulatory networks of eight human tissues, which subsume the regulatory interactions between transcription factors (TFs), microRNAs (miRNAs) and non-TF target genes. The results show that there are in-/out-hubs of high in-/out-degrees in tissue networks. Some hubs (strong hubs) maintain the hub status in all the tissues where they are expressed, whereas others (weak hubs), in spite of their ubiquitous expression, are hubs only in some tissues. The network motifs are mostly feed-forward loops. Some of them having no miRNAs are the common motifs shared by all tissues, whereas the others containing miRNAs are the tissue-specific ones owned by one or several tissues, indicating that the transcriptional regulation is more conserved across tissues than the post-transcriptional regulation. In particular, a common bow-tie framework was found that underlies the motif instances and shows diverse patterns in different tissues. Such bow-tie framework reflects the utilization efficiency of the regulatory system as well as its high variability in different tissues, and could serve as the model to further understand the structural adaptation of the regulatory system to the specific requirements of different cell functions. CONTACT: edgar.wingender@bioinf.med.uni-goettingen.de; jwang@nju.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Interpretación Estadística de Datos , Regulación de la Expresión Génica , Humanos , MicroARNs/metabolismo , Factores de Transcripción/metabolismo
14.
Cancers (Basel) ; 14(9)2022 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-35565214

RESUMEN

Seventy percent of patients with colorectal cancer develop liver metastases (CRLM), which are a decisive factor in cancer progression. Therapy outcome is largely influenced by tumor heterogeneity, but the intra- and inter-patient heterogeneity of CRLM has been poorly studied. In particular, the contribution of the WNT and EGFR pathways, which are both frequently deregulated in colorectal cancer, has not yet been addressed in this context. To this end, we comprehensively characterized normal liver tissue and eight CRLM from two patients by standardized histopathological, molecular, and proteomic subtyping. Suitable fresh-frozen tissue samples were profiled by transcriptome sequencing (RNA-Seq) and proteomic profiling with reverse phase protein arrays (RPPA) combined with bioinformatic analyses to assess tumor heterogeneity and identify WNT- and EGFR-related master regulators and metastatic effectors. A standardized data analysis pipeline for integrating RNA-Seq with clinical, proteomic, and genetic data was established. Dimensionality reduction of the transcriptome data revealed a distinct signature for CRLM differing from normal liver tissue and indicated a high degree of tumor heterogeneity. WNT and EGFR signaling were highly active in CRLM and the genes of both pathways were heterogeneously expressed between the two patients as well as between the synchronous metastases of a single patient. An analysis of the master regulators and metastatic effectors implicated in the regulation of these genes revealed a set of four genes (SFN, IGF2BP1, STAT1, PIK3CG) that were differentially expressed in CRLM and were associated with clinical outcome in a large cohort of colorectal cancer patients as well as CRLM samples. In conclusion, high-throughput profiling enabled us to define a CRLM-specific signature and revealed the genes of the WNT and EGFR pathways associated with inter- and intra-patient heterogeneity, which were validated as prognostic biomarkers in CRC primary tumors as well as liver metastases.

15.
Front Genet ; 12: 670240, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34211498

RESUMEN

Only 2% of glioblastoma multiforme (GBM) patients respond to standard therapy and survive beyond 36 months (long-term survivors, LTS), while the majority survive less than 12 months (short-term survivors, STS). To understand the mechanism leading to poor survival, we analyzed publicly available datasets of 113 STS and 58 LTS. This analysis revealed 198 differentially expressed genes (DEGs) that characterize aggressive tumor growth and may be responsible for the poor prognosis. These genes belong largely to the Gene Ontology (GO) categories "epithelial-to-mesenchymal transition" and "response to hypoxia." In this article, we applied an upstream analysis approach that involves state-of-the-art promoter analysis and network analysis of the dysregulated genes potentially responsible for short survival in GBM. Binding sites for transcription factors (TFs) associated with GBM pathology like NANOG, NF-κB, REST, FRA-1, PPARG, and seven others were found enriched in the promoters of the dysregulated genes. We reconstructed the gene regulatory network with several positive feedback loops controlled by five master regulators [insulin-like growth factor binding protein 2 (IGFBP2), vascular endothelial growth factor A (VEGFA), VEGF165, platelet-derived growth factor A (PDGFA), adipocyte enhancer-binding protein (AEBP1), and oncostatin M (OSMR)], which can be proposed as biomarkers and as therapeutic targets for enhancing GBM prognosis. A critical analysis of this gene regulatory network gives insights into the mechanism of gene regulation by IGFBP2 via several TFs including the key molecule of GBM tumor invasiveness and progression, FRA-1. All the observations were validated in independent cohorts, and their impact on overall survival has been investigated.

16.
PLoS One ; 16(10): e0258623, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34653224

RESUMEN

Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.


Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Minería de Datos/métodos , Algoritmos , Neoplasias de la Mama/metabolismo , Bases de Datos Factuales , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Mapas de Interacción de Proteínas , Terminología como Asunto
17.
NPJ Syst Biol Appl ; 7(1): 38, 2021 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-34671039

RESUMEN

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

18.
Brief Bioinform ; 9(4): 326-32, 2008 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-18436575

RESUMEN

Since its beginning as a data collection more than 20 years ago, the TRANSFAC project underwent an evolution to become the basis for a complex platform for the description and analysis of gene regulatory events and networks. In the following, I describe what the original concepts were, what their present status is and how they may be expected to contribute to future system biology approaches.


Asunto(s)
Mapeo Cromosómico/métodos , Regulación de la Expresión Génica/fisiología , Modelos Biológicos , Proteoma/metabolismo , Transducción de Señal/fisiología , Biología de Sistemas/métodos , Factores de Transcripción/metabolismo , Biotecnología/métodos , Simulación por Computador , Integración de Sistemas
19.
Brief Bioinform ; 9(6): 518-31, 2008 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19073714

RESUMEN

Translating the exponentially growing amount of omics data into knowledge usable for a personalized medicine approach poses a formidable challenge. In this article-taking diabetes as a use case-we present strategies for developing data repositories into computer-accessible knowledge sources that can be used for a systemic view on the molecular causes of diseases, thus laying the foundation for systems pathology.


Asunto(s)
Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Bases del Conocimiento , Sistemas de Administración de Bases de Datos , Diabetes Mellitus/genética , Diabetes Mellitus/patología , Diabetes Mellitus/fisiopatología , Redes Reguladoras de Genes , Humanos , Sistemas de Información , Semántica , Transducción de Señal/fisiología , Interfaz Usuario-Computador
20.
Nucleic Acids Res ; 36(Database issue): D689-94, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18045786

RESUMEN

EndoNet is an information resource about intercellular regulatory communication. It provides information about hormones, hormone receptors, the sources (i.e. cells, tissues and organs) where the hormones are synthesized and secreted, and where the respective receptors are expressed. The database focuses on the regulatory relations between them. An elementary communication is displayed as a causal link from a cell that secretes a particular hormone to those cells which express the corresponding hormone receptor and respond to the hormone. Whenever expression, synthesis and/or secretion of another hormone are part of this response, it renders the corresponding cell an internal node of the resulting network. This intercellular communication network coordinates the function of different organs. Therefore, the database covers the hierarchy of cellular organization of tissues and organs as it has been modeled in the Cytomer ontology, which has now been directly embedded into EndoNet. The user can query the database; the results can be used to visualize the intercellular information flow. A newly implemented hormone classification enables to browse the database and may be used as alternative entry point. EndoNet is accessible at: http://endonet.bioinf.med.uni-goettingen.de/.


Asunto(s)
Comunicación Celular , Bases de Datos Factuales , Hormonas/metabolismo , Gráficos por Computador , Hormonas/clasificación , Internet , Receptores de Superficie Celular/metabolismo , Receptores Citoplasmáticos y Nucleares/metabolismo , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA