Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 46(D1): D343-D347, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29087517

RESUMO

TFClass is a resource that classifies eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs), available online at http://tfclass.bioinf.med.uni-goettingen.de. The classification scheme of TFClass was originally derived for human TFs and is expanded here to the whole taxonomic class of mammalia. Combining information from different resources, checking manually the retrieved mammalian TFs sequences and applying extensive phylogenetic analyses, >39 000 TFs from up to 41 mammalian species were assigned to the Superclasses, Classes, Families and Subfamilies of TFClass. As a result, TFClass now provides the corresponding sequence collection in FASTA format, sequence logos and phylogenetic trees at different classification levels, predicted TF binding sites for human, mouse, dog and cow genomes as well as links to several external databases. In particular, all those TFs that are also documented in the TRANSFAC® database (FACTOR table) have been linked and can be freely accessed. TRANSFAC® FACTOR can also be queried through an own search interface.


Assuntos
Bases de Dados de Proteínas , Fatores de Transcrição/classificação , Animais , Sítios de Ligação , Bovinos , Cães , Humanos , Mamíferos , Camundongos , Filogenia , Domínios Proteicos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Interface Usuário-Computador
2.
Nucleic Acids Res ; 46(D1): D168-D174, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29077896

RESUMO

The cell-specific information of transcriptional regulation on microRNAs (miRNAs) is crucial to the precise understanding of gene regulations in various physiological and pathological processes existed in different tissues and cell types. The database, mirTrans, provides comprehensive information about cell-specific transcription of miRNAs including the transcriptional start sites (TSSs) of miRNAs, transcription factor (TF) to miRNA regulations and miRNA promoter sequences. mirTrans also maps the experimental H3K4me3 and DHS (DNase-I hypersensitive site) marks within miRNA promoters and expressed sequence tags (ESTs) within transcribed regions. The current version of database covers 35 259 TSSs and over 2.3 million TF-miRNA regulations for 1513 miRNAs in a total of 54 human cell lines. These cell lines span most of the biological systems, including circulatory system, digestive system and nervous system. Information for both the intragenic miRNAs and intergenic miRNAs is offered. Particularly, the quality of miRNA TSSs and TF-miRNA regulations is evaluated by literature curation. 23 447 TSS records and 2148 TF-miRNA regulations are supported by special experiments as a result of literature curation. EST coverage is also used to evaluate the accuracy of miRNA TSSs. Interface of mirTrans is friendly designed and convenient to make downloads (http://mcube.nju.edu.cn/jwang/lab/soft/mirtrans/ or http://120.27.239.192/mirtrans/).


Assuntos
Bases de Dados de Ácidos Nucleicos , MicroRNAs/genética , MicroRNAs/metabolismo , Linhagem Celular , Etiquetas de Sequências Expressas , Regulação da Expressão Gênica , Código das Histonas , Humanos , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição , Interface Usuário-Computador
3.
BMC Bioinformatics ; 20(Suppl 4): 119, 2019 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-30999858

RESUMO

BACKGROUND: The search for molecular biomarkers of early-onset colorectal cancer (CRC) is an important but still quite challenging and unsolved task. Detection of CpG methylation in human DNA obtained from blood or stool has been proposed as a promising approach to a noninvasive early diagnosis of CRC. Thousands of abnormally methylated CpG positions in CRC genomes are often located in non-coding parts of genes. Novel bioinformatic methods are thus urgently needed for multi-omics data analysis to reveal causative biomarkers with a potential driver role in early stages of cancer. METHODS: We have developed a method for finding potential causal relationships between epigenetic changes (DNA methylations) in gene regulatory regions that affect transcription factor binding sites (TFBS) and gene expression changes. This method also considers the topology of the involved signal transduction pathways and searches for positive feedback loops that may cause the carcinogenic aberrations in gene expression. We call this method "Walking pathways", since it searches for potential rewiring mechanisms in cancer pathways due to dynamic changes in the DNA methylation status of important gene regulatory regions ("epigenomic walking"). RESULTS: In this paper, we analysed an extensive collection of full genome gene-expression data (RNA-seq) and DNA methylation data of genomic CpG islands (using Illumina methylation arrays) generated from a sample of tumor and normal gut epithelial tissues of 300 patients with colorectal cancer (at different stages of the disease) (data generated in the EU-supported SysCol project). Identification of potential epigenetic biomarkers of DNA methylation was performed using the fully automatic multi-omics analysis web service "My Genome Enhancer" (MGE) (my-genome-enhancer.com). MGE uses the database on gene regulation TRANSFAC®, the signal transduction pathways database TRANSPATH®, and software that employs AI (artificial intelligence) methods for the analysis of cancer-specific enhancers. CONCLUSIONS: The identified biomarkers underwent experimental testing on an independent set of blood samples from patients with colorectal cancer. As a result, using advanced methods of statistics and machine learning, a minimum set of 6 biomarkers was selected, which together achieve the best cancer detection potential. The markers include hypermethylated positions in regulatory regions of the following genes: CALCA, ENO1, MYC, PDX1, TCF7, ZNF43.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias Colorretais/genética , Metilação de DNA/genética , Retroalimentação Fisiológica , Transdução de Sinais/genética , Sítios de Ligação/genética , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/patologia , Ilhas de CpG/genética , Epigênese Genética , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Fatores de Transcrição/metabolismo
4.
Circulation ; 135(19): 1832-1847, 2017 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-28167635

RESUMO

BACKGROUND: Advancing structural and functional maturation of stem cell-derived cardiomyocytes remains a key challenge for applications in disease modeling, drug screening, and heart repair. Here, we sought to advance cardiomyocyte maturation in engineered human myocardium (EHM) toward an adult phenotype under defined conditions. METHODS: We systematically investigated cell composition, matrix, and media conditions to generate EHM from embryonic and induced pluripotent stem cell-derived cardiomyocytes and fibroblasts with organotypic functionality under serum-free conditions. We used morphological, functional, and transcriptome analyses to benchmark maturation of EHM. RESULTS: EHM demonstrated important structural and functional properties of postnatal myocardium, including: (1) rod-shaped cardiomyocytes with M bands assembled as a functional syncytium; (2) systolic twitch forces at a similar level as observed in bona fide postnatal myocardium; (3) a positive force-frequency response; (4) inotropic responses to ß-adrenergic stimulation mediated via canonical ß1- and ß2-adrenoceptor signaling pathways; and (5) evidence for advanced molecular maturation by transcriptome profiling. EHM responded to chronic catecholamine toxicity with contractile dysfunction, cardiomyocyte hypertrophy, cardiomyocyte death, and N-terminal pro B-type natriuretic peptide release; all are classical hallmarks of heart failure. In addition, we demonstrate the scalability of EHM according to anticipated clinical demands for cardiac repair. CONCLUSIONS: We provide proof-of-concept for a universally applicable technology for the engineering of macroscale human myocardium for disease modeling and heart repair from embryonic and induced pluripotent stem cell-derived cardiomyocytes under defined, serum-free conditions.


Assuntos
Células-Tronco Embrionárias/transplante , Insuficiência Cardíaca/terapia , Células-Tronco Pluripotentes Induzidas/transplante , Miócitos Cardíacos/transplante , Engenharia Tecidual/métodos , Remodelação Ventricular/fisiologia , Animais , Diferenciação Celular/fisiologia , Células-Tronco Embrionárias/fisiologia , Insuficiência Cardíaca/patologia , Humanos , Células-Tronco Pluripotentes Induzidas/fisiologia , Miocárdio/citologia , Miocárdio/patologia , Miócitos Cardíacos/fisiologia , Impressão Tridimensional , Ratos , Ratos Nus
5.
Bioinformatics ; 32(16): 2403-10, 2016 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153609

RESUMO

MOTIVATION: Identification of microRNA (miRNA) transcriptional start sites (TSSs) is crucial to understand the transcriptional regulation of miRNA. As miRNA expression is highly cell specific, an automatic and systematic method that could identify miRNA TSSs accurately and cell specifically is in urgent requirement. RESULTS: A workflow to identify the TSSs of miRNAs was built by integrating the data of H3K4me3 and DNase I hypersensitive sites as well as combining the conservation level and sequence feature. By applying the workflow to the data for 54 cell lines from the ENCODE project, we successfully identified TSSs for 663 intragenic miRNAs and 620 intergenic miRNAs, which cover 84.2% (1283/1523) of all miRNAs recorded in miRBase 18. For these cell lines, we found 4042 alternative TSSs for intragenic miRNAs and 3186 alternative TSSs for intergenic miRNAs. Our method achieved a better performance than the previous non-cell-specific methods on miRNA TSSs. The cell-specific method developed by Georgakilas et al. gives 158 TSSs of higher accuracy in two cell lines, benefitting from the employment of deep-sequencing technique. In contrast, our method provided a much higher number of miRNA TSSs (7228) for a broader range of cell lines without the limitation of costly deep-sequencing data, thus being more applicable for various experimental cases. Analysis showed that upstream promoters at - 2 kb to - 200 bp of TSS are more conserved for independently transcribed miRNAs, while for miRNAs transcribed with host genes, their core promoters (-200 bp to 200 bp of TSS) are significantly conserved. AVAILABILITY AND IMPLEMENTATION: Predicted miRNA TSSs and promoters can be downloaded from supplementary files. CONTACT: jwang@nju.edu.cn or jlee@nju.edu.cn or edgar.wingender@bioinf.med.uni-goettingen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
MicroRNAs , Sítio de Iniciação de Transcrição , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Regiões Promotoras Genéticas
6.
Nucleic Acids Res ; 43(Database issue): D97-102, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25361979

RESUMO

TFClass aims at classifying eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs). For this, a classification schema comprising four generic levels (superclass, class, family and subfamily) was defined that could accommodate all known DNA-binding human TFs. They were assigned to their (sub-)families as instances at two different levels, the corresponding TF genes and individual gene products (protein isoforms). In the present version, all mouse and rat orthologs have been linked to the human TFs, and the mouse orthologs have been arranged in an independent ontology. Many TFs were assigned with typical DNA-binding patterns and positional weight matrices derived from high-throughput in-vitro binding studies. Predicted TF binding sites from human gene upstream sequences are now also attached to each human TF whenever a PWM was available for this factor or one of his paralogs. TFClass is freely available at http://tfclass.bioinf.med.uni-goettingen.de/ through a web interface and for download in OBO format.


Assuntos
Bases de Dados de Proteínas , Fatores de Transcrição/classificação , Animais , Sítios de Ligação , DNA/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Humanos , Internet , Camundongos , Estrutura Terciária de Proteína , Ratos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo
7.
BMC Bioinformatics ; 16: 200, 2015 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-26108437

RESUMO

BACKGROUND: Exploratory analysis of multi-dimensional high-throughput datasets, such as microarray gene expression time series, may be instrumental in understanding the genetic programs underlying numerous biological processes. In such datasets, variations in the gene expression profiles are usually observed across replicates and time points. Thus mining the temporal expression patterns in such multi-dimensional datasets may not only provide insights into the key biological processes governing organs to grow and develop but also facilitate the understanding of the underlying complex gene regulatory circuits. RESULTS: In this work we have developed an evolutionary multi-objective optimization for our previously introduced triclustering algorithm δ-TRIMAX. Its aim is to make optimal use of δ-TRIMAX in extracting groups of co-expressed genes from time series gene expression data, or from any 3D gene expression dataset, by adding the powerful capabilities of an evolutionary algorithm to retrieve overlapping triclusters. We have compared the performance of our newly developed algorithm, EMOA- δ-TRIMAX, with that of other existing triclustering approaches using four artificial dataset and three real-life datasets. Moreover, we have analyzed the results of our algorithm on one of these real-life datasets monitoring the differentiation of human induced pluripotent stem cells (hiPSC) into mature cardiomyocytes. For each group of co-expressed genes belonging to one tricluster, we identified key genes by computing their membership values within the tricluster. It turned out that to a very high percentage, these key genes were significantly enriched in Gene Ontology categories or KEGG pathways that fitted very well to the biological context of cardiomyocytes differentiation. CONCLUSIONS: EMOA- δ-TRIMAX has proven instrumental in identifying groups of genes in transcriptomic data sets that represent the functional categories constituting the biological process under study. The executable file can be found at http://www.bioinf.med.uni-goettingen.de/fileadmin/download/EMOA-delta-TRIMAX.tar.gz .


Assuntos
Algoritmos , Biomarcadores/análise , Diferenciação Celular/genética , Perfilação da Expressão Gênica/métodos , Células-Tronco Pluripotentes Induzidas/metabolismo , Miócitos Cardíacos/metabolismo , Transcriptoma/genética , Fenômenos Biológicos , Análise por Conglomerados , Conjuntos de Dados como Assunto , Redes Reguladoras de Genes , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Miócitos Cardíacos/citologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fatores de Tempo
8.
BMC Bioinformatics ; 16: 400, 2015 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-26627005

RESUMO

BACKGROUND: Transcription factors (TFs) are important regulatory proteins that govern transcriptional regulation. Today, it is known that in higher organisms different TFs have to cooperate rather than acting individually in order to control complex genetic programs. The identification of these interactions is an important challenge for understanding the molecular mechanisms of regulating biological processes. In this study, we present a new method based on pointwise mutual information, PC-TraFF, which considers the genome as a document, the sequences as sentences, and TF binding sites (TFBSs) as words to identify interacting TFs in a set of sequences. RESULTS: To demonstrate the effectiveness of PC-TraFF, we performed a genome-wide analysis and a breast cancer-associated sequence set analysis for protein coding and miRNA genes. Our results show that in any of these sequence sets, PC-TraFF is able to identify important interacting TF pairs, for most of which we found support by previously published experimental results. Further, we made a pairwise comparison between PC-TraFF and three conventional methods. The outcome of this comparison study strongly suggests that all these methods focus on different important aspects of interaction between TFs and thus the pairwise overlap between any of them is only marginal. CONCLUSIONS: In this study, adopting the idea from the field of linguistics in the field of bioinformatics, we develop a new information theoretic method, PC-TraFF, for the identification of potentially collaborating transcription factors based on the idiosyncrasy of their binding site distributions on the genome. The results of our study show that PC-TraFF can succesfully identify known interacting TF pairs and thus its currently biologically uncorfirmed predictions could provide new hypotheses for further experimental validation. Additionally, the comparison of the results of PC-TraFF with the results of previous methods demonstrates that different methods with their specific scopes can perfectly supplement each other. Overall, our analyses indicate that PC-TraFF is a time-efficient method where its algorithm has a tractable computational time and memory consumption. The PC-TraFF server is freely accessible at http://pctraff.bioinf.med.uni-goettingen.de/.


Assuntos
Algoritmos , Neoplasias da Mama/metabolismo , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Fatores de Transcrição/metabolismo , Sítios de Ligação , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Feminino , Humanos , MicroRNAs/genética , Regiões Promotoras Genéticas/genética , Ligação Proteica
9.
Nucleic Acids Res ; 41(Database issue): D165-70, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23180794

RESUMO

TFClass (http://tfclass.bioinf.med.uni-goettingen.de/) provides a comprehensive classification of human transcription factors based on their DNA-binding domains. Transcription factors constitute a large functional family of proteins directly regulating the activity of genes. Most of them are sequence-specific DNA-binding proteins, thus reading out the information encoded in cis-regulatory DNA elements of promoters, enhancers and other regulatory regions of a genome. TFClass is a database that classifies human transcription factors by a six-level classification schema, four of which are abstractions according to different criteria, while the fifth level represents TF genes and the sixth individual gene products. Altogether, nine superclasses have been identified, comprising 40 classes and 111 families. Counted by genes, 1558 human TFs have been classified so far or >2900 different TFs when including their isoforms generated by alternative splicing or protein processing events. With this classification, we hope to provide a basis for deciphering protein-DNA recognition codes; moreover, it can be used for constructing expanded transcriptional networks by inferring additional TF-target gene relations.


Assuntos
Bases de Dados de Proteínas , Fatores de Transcrição/classificação , Proteínas de Ligação a DNA/química , Humanos , Internet , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína , Fatores de Transcrição/química
10.
BMC Bioinformatics ; 15: 96, 2014 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-24694117

RESUMO

BACKGROUND: The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites. RESULTS: The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs. CONCLUSIONS: QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF's algorithm, we leveraged Compute Unified Device Architecture (CUDA).The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.


Assuntos
Algoritmos , Mutação , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Aminoácidos/química , Sequência Conservada , Receptores ErbB/química , Receptores ErbB/genética , Glucoquinase/química , Glucoquinase/genética , Humanos , Conformação Proteica , Teoria Quântica , Alinhamento de Sequência
11.
PLoS Comput Biol ; 9(3): e1002958, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23555204

RESUMO

Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.


Assuntos
Biologia Computacional/métodos , Motivos de Nucleotídeos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Análise por Conglomerados , DNA/genética , DNA/metabolismo , Bases de Dados Genéticas , Redes Reguladoras de Genes , Modelos Logísticos , Filogenia , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
12.
BMC Bioinformatics ; 14: 241, 2013 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-23924163

RESUMO

BACKGROUND: Accurate recognition of regulatory elements in promoters is an essential prerequisite for understanding the mechanisms of gene regulation at the level of transcription. Composite regulatory elements represent a particular type of such transcriptional regulatory elements consisting of pairs of individual DNA motifs. In contrast to the present approach, most available recognition techniques are based purely on statistical evaluation of the occurrence of single motifs. Such methods are limited in application, since the accuracy of recognition is greatly dependent on the size and quality of the sequence dataset. Methods that exploit available knowledge and have broad applicability are evidently needed. RESULTS: We developed a novel method to identify composite regulatory elements in promoters using a library of known examples. In depth investigation of regularities encoded in known composite elements allowed us to introduce a new characteristic measure and to improve the specificity compared with other methods. Tests on an established benchmark and real genomic data show that our method outperforms other available methods based either on known examples or statistical evaluations. In addition to better recognition, a practical advantage of this method is first the ability to detect a high number of different types of composite elements, and second direct biological interpretation of the identified results. The program is available at http://gnaweb.helmholtz-hzi.de/cgi-bin/MCatch/MatrixCatch.pl and includes an option to extend the provided library by user supplied data. CONCLUSIONS: The novel algorithm for the identification of composite regulatory elements presented in this paper was proved to be superior to existing methods. Its application to tissue specific promoters identified several highly specific composite elements with relevance to their biological function. This approach together with other methods will further advance the understanding of transcriptional regulation of genes.


Assuntos
Biologia Computacional , Regiões Promotoras Genéticas , Elementos Reguladores de Transcrição , Sequências Reguladoras de Ácido Nucleico , Algoritmos , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Regulação da Expressão Gênica , Genômica/instrumentação , Genômica/métodos , Motivos de Nucleotídeos
13.
Bioinformatics ; 28(18): i509-i514, 2012 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-22962474

RESUMO

SUMMARY: The great variety of human cell types in morphology and function is due to the diverse gene expression profiles that are governed by the distinctive regulatory networks in different cell types. It is still a challenging task to explain how the regulatory networks achieve the diversity of different cell types. Here, we report on our studies of the design principles of the tissue regulatory system by constructing the regulatory networks of eight human tissues, which subsume the regulatory interactions between transcription factors (TFs), microRNAs (miRNAs) and non-TF target genes. The results show that there are in-/out-hubs of high in-/out-degrees in tissue networks. Some hubs (strong hubs) maintain the hub status in all the tissues where they are expressed, whereas others (weak hubs), in spite of their ubiquitous expression, are hubs only in some tissues. The network motifs are mostly feed-forward loops. Some of them having no miRNAs are the common motifs shared by all tissues, whereas the others containing miRNAs are the tissue-specific ones owned by one or several tissues, indicating that the transcriptional regulation is more conserved across tissues than the post-transcriptional regulation. In particular, a common bow-tie framework was found that underlies the motif instances and shows diverse patterns in different tissues. Such bow-tie framework reflects the utilization efficiency of the regulatory system as well as its high variability in different tissues, and could serve as the model to further understand the structural adaptation of the regulatory system to the specific requirements of different cell functions. CONTACT: edgar.wingender@bioinf.med.uni-goettingen.de; jwang@nju.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Interpretação Estatística de Dados , Regulação da Expressão Gênica , Humanos , MicroRNAs/metabolismo , Fatores de Transcrição/metabolismo
14.
Cancers (Basel) ; 14(9)2022 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-35565214

RESUMO

Seventy percent of patients with colorectal cancer develop liver metastases (CRLM), which are a decisive factor in cancer progression. Therapy outcome is largely influenced by tumor heterogeneity, but the intra- and inter-patient heterogeneity of CRLM has been poorly studied. In particular, the contribution of the WNT and EGFR pathways, which are both frequently deregulated in colorectal cancer, has not yet been addressed in this context. To this end, we comprehensively characterized normal liver tissue and eight CRLM from two patients by standardized histopathological, molecular, and proteomic subtyping. Suitable fresh-frozen tissue samples were profiled by transcriptome sequencing (RNA-Seq) and proteomic profiling with reverse phase protein arrays (RPPA) combined with bioinformatic analyses to assess tumor heterogeneity and identify WNT- and EGFR-related master regulators and metastatic effectors. A standardized data analysis pipeline for integrating RNA-Seq with clinical, proteomic, and genetic data was established. Dimensionality reduction of the transcriptome data revealed a distinct signature for CRLM differing from normal liver tissue and indicated a high degree of tumor heterogeneity. WNT and EGFR signaling were highly active in CRLM and the genes of both pathways were heterogeneously expressed between the two patients as well as between the synchronous metastases of a single patient. An analysis of the master regulators and metastatic effectors implicated in the regulation of these genes revealed a set of four genes (SFN, IGF2BP1, STAT1, PIK3CG) that were differentially expressed in CRLM and were associated with clinical outcome in a large cohort of colorectal cancer patients as well as CRLM samples. In conclusion, high-throughput profiling enabled us to define a CRLM-specific signature and revealed the genes of the WNT and EGFR pathways associated with inter- and intra-patient heterogeneity, which were validated as prognostic biomarkers in CRC primary tumors as well as liver metastases.

15.
PLoS One ; 16(10): e0258623, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34653224

RESUMO

Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.


Assuntos
Neoplasias da Mama/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Algoritmos , Neoplasias da Mama/metabolismo , Bases de Dados Factuais , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Mapas de Interação de Proteínas , Terminologia como Assunto
16.
Front Genet ; 12: 670240, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34211498

RESUMO

Only 2% of glioblastoma multiforme (GBM) patients respond to standard therapy and survive beyond 36 months (long-term survivors, LTS), while the majority survive less than 12 months (short-term survivors, STS). To understand the mechanism leading to poor survival, we analyzed publicly available datasets of 113 STS and 58 LTS. This analysis revealed 198 differentially expressed genes (DEGs) that characterize aggressive tumor growth and may be responsible for the poor prognosis. These genes belong largely to the Gene Ontology (GO) categories "epithelial-to-mesenchymal transition" and "response to hypoxia." In this article, we applied an upstream analysis approach that involves state-of-the-art promoter analysis and network analysis of the dysregulated genes potentially responsible for short survival in GBM. Binding sites for transcription factors (TFs) associated with GBM pathology like NANOG, NF-κB, REST, FRA-1, PPARG, and seven others were found enriched in the promoters of the dysregulated genes. We reconstructed the gene regulatory network with several positive feedback loops controlled by five master regulators [insulin-like growth factor binding protein 2 (IGFBP2), vascular endothelial growth factor A (VEGFA), VEGF165, platelet-derived growth factor A (PDGFA), adipocyte enhancer-binding protein (AEBP1), and oncostatin M (OSMR)], which can be proposed as biomarkers and as therapeutic targets for enhancing GBM prognosis. A critical analysis of this gene regulatory network gives insights into the mechanism of gene regulation by IGFBP2 via several TFs including the key molecule of GBM tumor invasiveness and progression, FRA-1. All the observations were validated in independent cohorts, and their impact on overall survival has been investigated.

17.
NPJ Syst Biol Appl ; 7(1): 38, 2021 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-34671039

RESUMO

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

18.
Brief Bioinform ; 9(4): 326-32, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18436575

RESUMO

Since its beginning as a data collection more than 20 years ago, the TRANSFAC project underwent an evolution to become the basis for a complex platform for the description and analysis of gene regulatory events and networks. In the following, I describe what the original concepts were, what their present status is and how they may be expected to contribute to future system biology approaches.


Assuntos
Mapeamento Cromossômico/métodos , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Biologia de Sistemas/métodos , Fatores de Transcrição/metabolismo , Biotecnologia/métodos , Simulação por Computador , Integração de Sistemas
19.
Brief Bioinform ; 9(6): 518-31, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19073714

RESUMO

Translating the exponentially growing amount of omics data into knowledge usable for a personalized medicine approach poses a formidable challenge. In this article-taking diabetes as a use case-we present strategies for developing data repositories into computer-accessible knowledge sources that can be used for a systemic view on the molecular causes of diseases, thus laying the foundation for systems pathology.


Assuntos
Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Bases de Conhecimento , Sistemas de Gerenciamento de Base de Dados , Diabetes Mellitus/genética , Diabetes Mellitus/patologia , Diabetes Mellitus/fisiopatologia , Redes Reguladoras de Genes , Humanos , Sistemas de Informação , Semântica , Transdução de Sinais/fisiologia , Interface Usuário-Computador
20.
Nucleic Acids Res ; 36(Database issue): D689-94, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18045786

RESUMO

EndoNet is an information resource about intercellular regulatory communication. It provides information about hormones, hormone receptors, the sources (i.e. cells, tissues and organs) where the hormones are synthesized and secreted, and where the respective receptors are expressed. The database focuses on the regulatory relations between them. An elementary communication is displayed as a causal link from a cell that secretes a particular hormone to those cells which express the corresponding hormone receptor and respond to the hormone. Whenever expression, synthesis and/or secretion of another hormone are part of this response, it renders the corresponding cell an internal node of the resulting network. This intercellular communication network coordinates the function of different organs. Therefore, the database covers the hierarchy of cellular organization of tissues and organs as it has been modeled in the Cytomer ontology, which has now been directly embedded into EndoNet. The user can query the database; the results can be used to visualize the intercellular information flow. A newly implemented hormone classification enables to browse the database and may be used as alternative entry point. EndoNet is accessible at: http://endonet.bioinf.med.uni-goettingen.de/.


Assuntos
Comunicação Celular , Bases de Dados Factuais , Hormônios/metabolismo , Gráficos por Computador , Hormônios/classificação , Internet , Receptores de Superfície Celular/metabolismo , Receptores Citoplasmáticos e Nucleares/metabolismo , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA