Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 211
Filter
Add more filters

Publication year range
1.
Cell ; 140(5): 744-52, 2010 Mar 05.
Article in English | MEDLINE | ID: mdl-20211142

ABSTRACT

Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Transcription Factors/metabolism , Animals , Cell Differentiation , Evolution, Molecular , Humans , Mice , Monocytes/cytology , Organ Specificity , Smad3 Protein/metabolism , Trans-Activators/metabolism
2.
Bioinformatics ; 36(4): 1121-1128, 2020 02 15.
Article in English | MEDLINE | ID: mdl-31584626

ABSTRACT

MOTIVATION: Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS: To enable a proteome-wide assessment of LD motifs, we developed an active learning based framework (LD motif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY AND IMPLEMENTATION: LDMF is freely available online at www.cbrc.kaust.edu.sa/ldmf; Source code is available at https://github.com/tanviralambd/LD/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteome , Amino Acid Motifs , Aspartic Acid , Humans , Leucine , Prevalence
3.
Nucleic Acids Res ; 47(D1): D128-D134, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30329098

ABSTRACT

Long non-coding RNAs (lncRNAs) have significant functions in a wide range of important biological processes. Although the number of known human lncRNAs has dramatically increased, they are poorly annotated, posing great challenges for better understanding their functional significance and elucidating their complex functioning molecular mechanisms. Here, we present LncBook (http://bigd.big.ac.cn/lncbook), a curated knowledgebase of human lncRNAs that features a comprehensive collection of human lncRNAs and systematic curation of lncRNAs by multi-omics data integration, functional annotation and disease association. In the present version, LncBook houses a large number of 270 044 lncRNAs and includes 1867 featured lncRNAs with 3762 lncRNA-function associations. It also integrates an abundance of multi-omics data from expression, methylation, genome variation and lncRNA-miRNA interaction. Also, LncBook incorporates 3772 experimentally validated lncRNA-disease associations and further identifies a total of 97 998 lncRNAs that are putatively disease-associated. Collectively, LncBook is dedicated to the integration and curation of human lncRNAs as well as their associated data and thus bears great promise to serve as a valuable knowledgebase for worldwide research communities.

4.
Bioinformatics ; 35(7): 1125-1132, 2019 04 01.
Article in English | MEDLINE | ID: mdl-30184052

ABSTRACT

MOTIVATION: Recognition of different genomic signals and regions (GSRs) in DNA is crucial for understanding genome organization, gene regulation, and gene function, which in turn generate better genome and gene annotations. Although many methods have been developed to recognize GSRs, their pure computational identification remains challenging. Moreover, various GSRs usually require a specialized set of features for developing robust recognition models. Recently, deep-learning (DL) methods have been shown to generate more accurate prediction models than 'shallow' methods without the need to develop specialized features for the problems in question. Here, we explore the potential use of DL for the recognition of GSRs. RESULTS: We developed DeepGSR, an optimized DL architecture for the prediction of different types of GSRs. The performance of the DeepGSR structure is evaluated on the recognition of polyadenylation signals (PAS) and translation initiation sites (TIS) of different organisms: human, mouse, bovine and fruit fly. The results show that DeepGSR outperformed the state-of-the-art methods, reducing the classification error rate of the PAS and TIS prediction in the human genome by up to 29% and 86%, respectively. Moreover, the cross-organisms and genome-wide analyses we performed, confirmed the robustness of DeepGSR and provided new insights into the conservation of examined GSRs across species. AVAILABILITY AND IMPLEMENTATION: DeepGSR is implemented in Python using Keras API; it is available as open-source software and can be obtained at https://doi.org/10.5281/zenodo.1117159. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Genome , Genomics , Software , Animals , Cattle , Drosophila/genetics , Genome/genetics , Genome-Wide Association Study , Genomics/methods , Humans , Mice , Sequence Analysis, DNA , Software/standards
5.
Bioinformatics ; 35(17): 2949-2956, 2019 09 01.
Article in English | MEDLINE | ID: mdl-30649200

ABSTRACT

MOTIVATION: The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. RESULTS: Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species. AVAILABILITY AND IMPLEMENTATION: LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Open Reading Frames , RNA, Long Noncoding , Animals , Mammals , Plants , Proteins
6.
Methods ; 166: 31-39, 2019 08 15.
Article in English | MEDLINE | ID: mdl-30991099

ABSTRACT

Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers. In this study, we developed a novel method for PAS recognition. The method is implemented in a hybrid PAS recognition model (HybPAS), which is based on deep neural networks (DNNs) and logistic regression models (LRMs). One of such models is developed for each of the 12 most frequent human PAS hexamers. DNN models appeared the best for eight PAS types (including the two most frequent PAS hexamers), while LRM appeared best for the remaining four PAS types. The new models use different combinations of signal processing-based, statistical, and sequence-based features as input. The results obtained on human genomic data show that HybPAS outperforms the well-tuned state-of-the-art Omni-PolyA models, reducing the classification error for different PAS hexamers by up to 57.35% for 10 out of 12 PAS types, with Omni-PolyA models being better for two PAS types. For the most frequent PAS types, 'AATAAA' and 'ATTAAA', HybPAS reduced the error rate by 35.14% and 34.48%, respectively. On average, HybPAS reduces the error by 30.29%. HybPAS is implemented partly in Python and in MATLAB available at https://github.com/EMANG-KAUST/PolyA_Prediction_LRM_DNN.


Subject(s)
Genome, Human/genetics , Genomics/methods , Neural Networks, Computer , Software , Humans , Poly A/genetics , Polyadenylation/genetics , Proteins/genetics
7.
Nucleic Acids Res ; 46(12): e72, 2018 07 06.
Article in English | MEDLINE | ID: mdl-29617876

ABSTRACT

Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.


Subject(s)
Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Binding Sites , Chromatin Immunoprecipitation , DNA/chemistry , DNA/metabolism , Humans , Machine Learning , Position-Specific Scoring Matrices
8.
Nucleic Acids Res ; 46(D1): D252-D259, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29140464

ABSTRACT

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.


Subject(s)
Databases, Genetic , Transcription Factors/metabolism , Animals , Binding Sites/genetics , Chromatin Immunoprecipitation , Humans , Mice , Models, Genetic , Nucleotide Motifs , Sequence Analysis, DNA
9.
J Clin Pharm Ther ; 45(2): 379-383, 2020 Apr.
Article in English | MEDLINE | ID: mdl-31736110

ABSTRACT

WHAT IS KNOWN AND OBJECTIVE: The HbA1C marker used in assessing diabetes control quality is not sufficient in diabetes patients with thalassaemia. CASE DESCRIPTION: A male diabetic patient with thalassaemia was hospitalized due to distal neuropathic pain, right toe trophic ulcer, unacceptable five-point glycaemic profile and recommended HbA1C value. After simultaneously initiated insulin therapy and management of ulcer by hyperbaric oxygen, the patient showed improved glycaemic control and ulcer healing, which led to the patient's discharge. WHAT IS NEW AND CONCLUSION: In thalassaemia and haemoglobinopathies, due to discrepancies in the five-point glycaemic profile and HbA1C values, it is necessary to measure HbA1C with a different method or to determine HbA1C and fructosamine simultaneously.


Subject(s)
Blood Glucose/metabolism , Diabetes Mellitus, Type 2/physiopathology , Glycated Hemoglobin/analysis , beta-Thalassemia/physiopathology , Aged , Biomarkers/analysis , Diabetes Mellitus, Type 2/drug therapy , Diabetic Foot/diagnosis , Diabetic Foot/therapy , Fructosamine/analysis , Humans , Hyperbaric Oxygenation , Hypoglycemic Agents/administration & dosage , Insulin/administration & dosage , Male
10.
BMC Genomics ; 20(1): 696, 2019 Sep 03.
Article in English | MEDLINE | ID: mdl-31481022

ABSTRACT

BACKGROUND: Biosynthetic gene clusters produce a wide range of metabolites with activities that are of interest to the pharmaceutical industry. Specific interest is shown towards those metabolites that exhibit antimicrobial activities against multidrug-resistant bacteria that have become a global health threat. Genera of the phylum Firmicutes are frequently identified as sources of such metabolites, but the biosynthetic potential of its Virgibacillus genus is not known. Here, we used comparative genomic analysis to determine whether Virgibacillus strains isolated from the Red Sea mangrove mud in Rabigh Harbor Lagoon, Saudi Arabia, may be an attractive source of such novel antimicrobial agents. RESULTS: A comparative genomics analysis based on Virgibacillus dokdonensis Bac330, Virgibacillus sp. Bac332 and Virgibacillus halodenitrificans Bac324 (isolated from the Red Sea) and six other previously reported Virgibacillus strains was performed. Orthology analysis was used to determine the core genomes as well as the accessory genome of the nine Virgibacillus strains. The analysis shows that the Red Sea strain Virgibacillus sp. Bac332 has the highest number of unique genes and genomic islands compared to other genomes included in this study. Focusing on biosynthetic gene clusters, we show how marine isolates, including those from the Red Sea, are more enriched with nonribosomal peptides compared to the other Virgibacillus species. We also found that most nonribosomal peptide synthases identified in the Virgibacillus strains are part of genomic regions that are potentially horizontally transferred. CONCLUSIONS: The Red Sea Virgibacillus strains have a large number of biosynthetic genes in clusters that are not assigned to known products, indicating significant potential for the discovery of novel bioactive compounds. Also, having more modular synthetase units suggests that these strains are good candidates for experimental characterization of previously identified bioactive compounds as well. Future efforts will be directed towards establishing the properties of the potentially novel compounds encoded by the Red Sea specific trans-AT PKS/NRPS cluster and the type III PKS/NRPS cluster.


Subject(s)
Data Mining , Genomics , Multigene Family/genetics , Virgibacillus/genetics , Virgibacillus/metabolism , Genome, Bacterial/genetics , Genomic Islands/genetics , Ribosomes/metabolism
11.
BMC Genomics ; 20(1): 102, 2019 Feb 01.
Article in English | MEDLINE | ID: mdl-30709331

ABSTRACT

BACKGROUND: DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis. RESULTS: We demonstrate that on the genome level a single CpG methylation can serve as a more accurate predictor of gene expression than an average promoter / gene body methylation. We define CpG traffic lights (CpG TL) as CpG dinucleotides with a significant correlation between methylation and expression of a gene nearby. CpG TL are enriched in all regulatory regions. Among all promoters, CpG TL are especially enriched in poised ones, suggesting involvement of DNA methylation in their regulation. Yet, binding of only a handful of transcription factors, such as NRF1, ETS, STAT and IRF-family members, could be regulated by direct methylation of transcription factor binding sites (TFBS) or its close proximity. For the majority of TF, an alternative scenario is more likely: methylation and inactivation of the whole regulatory element indirectly represses functional TF binding with a CpG TL being a reliable marker of such inactivation. CONCLUSIONS: CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression. CpG TL methylation can be used as reliable markers of enhancer activity and gene expression in applications, e.g. in clinic where measuring DNA methylation is easier compared to directly measuring gene expression due to more stable nature of DNA.


Subject(s)
CpG Islands , DNA Methylation , Gene Expression Regulation , Genome, Human , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Humans , Promoter Regions, Genetic , Transcription Factors/genetics , Transcription, Genetic
12.
Article in English | MEDLINE | ID: mdl-30670423

ABSTRACT

Pseudomonas aeruginosa is a prevalent and pernicious pathogen equipped with extraordinary capabilities both to infect the host and to develop antimicrobial resistance (AMR). Monitoring the emergence of AMR high-risk clones and understanding the interplay of their pathogenicity and antibiotic resistance is of paramount importance to avoid resistance dissemination and to control P. aeruginosa infections. In this study, we report the identification of a multidrug-resistant (MDR) P. aeruginosa strain PA154197 isolated from a blood stream infection in Hong Kong. PA154197 belongs to a distinctive MLST550 clonal complex shared by two other international P. aeruginosa isolates VW0289 and AUS544. Comparative genome and transcriptome analysis of PA154197 with the reference strain PAO1 led to the identification of a variety of genetic variations in antibiotic resistance genes and the hyperexpression of three multidrug efflux pumps MexAB-OprM, MexEF-OprN, and MexGHI-OpmD in PA154197. Unexpectedly, the strain does not display a metabolic cost and a compromised virulence compared to PAO1. Characterizing its various physiological and virulence traits demonstrated that PA154197 produces a substantially higher level of the P. aeruginosa major virulence factor pyocyanin (PYO) than PAO1, but it produces a decreased level of pyoverdine and displays decreased biofilm formation compared with PAO1. Further analysis revealed that the secondary quorum-sensing (QS) system Pqs that primarily controls the PYO production is hyperactive in PA154197 independent of the master QS systems Las and Rhl. Together, these investigations disclose a unique, uncoupled QS mediated pathoadaptation mechanism in clinical P. aeruginosa which may account for the high pathogenic potentials and antibiotic resistance in the MDR isolate PA154197.


Subject(s)
Drug Resistance, Multiple, Bacterial/genetics , Pseudomonas aeruginosa/drug effects , Pseudomonas aeruginosa/pathogenicity , Quorum Sensing , Animals , Caenorhabditis elegans/microbiology , Drug Resistance, Multiple, Bacterial/drug effects , Gene Expression Regulation, Bacterial , Genome, Bacterial , Genomic Islands , Humans , Microbial Sensitivity Tests , Mutation , Phylogeny , Pseudomonas Infections/microbiology , Pseudomonas aeruginosa/genetics , Quorum Sensing/drug effects , Quorum Sensing/genetics , Virulence/genetics , Virulence Factors/genetics
13.
Bioinformatics ; 34(7): 1164-1173, 2018 04 01.
Article in English | MEDLINE | ID: mdl-29186331

ABSTRACT

Motivation: Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate. Results: We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 34% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs. Availability and implementation: The data and code are provided at https://bitbucket.org/RSO24/ddr/. Contact: vladimir.bajic@kaust.edu.sa. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Drug Interactions , Machine Learning , Area Under Curve , Humans
14.
PLoS Comput Biol ; 14(3): e1005934, 2018 03.
Article in English | MEDLINE | ID: mdl-29494619

ABSTRACT

Genetic variants underlying complex traits, including disease susceptibility, are enriched within the transcriptional regulatory elements, promoters and enhancers. There is emerging evidence that regulatory elements associated with particular traits or diseases share similar patterns of transcriptional activity. Accordingly, shared transcriptional activity (coexpression) may help prioritise loci associated with a given trait, and help to identify underlying biological processes. Using cap analysis of gene expression (CAGE) profiles of promoter- and enhancer-derived RNAs across 1824 human samples, we have analysed coexpression of RNAs originating from trait-associated regulatory regions using a novel quantitative method (network density analysis; NDA). For most traits studied, phenotype-associated variants in regulatory regions were linked to tightly-coexpressed networks that are likely to share important functional characteristics. Coexpression provides a new signal, independent of phenotype association, to enable fine mapping of causative variants. The NDA coexpression approach identifies new genetic variants associated with specific traits, including an association between the regulation of the OCT1 cation transporter and genetic variants underlying circulating cholesterol levels. NDA strongly implicates particular cell types and tissues in disease pathogenesis. For example, distinct groupings of disease-associated regulatory regions implicate two distinct biological processes in the pathogenesis of ulcerative colitis; a further two separate processes are implicated in Crohn's disease. Thus, our functional analysis of genetic predisposition to disease defines new distinct disease endotypes. We predict that patients with a preponderance of susceptibility variants in each group are likely to respond differently to pharmacological therapy. Together, these findings enable a deeper biological understanding of the causal basis of complex traits.


Subject(s)
Genetic Predisposition to Disease/genetics , Genomics/methods , Promoter Regions, Genetic/genetics , Crohn Disease/genetics , Databases, Genetic , Gene Expression Profiling , Humans , Transcriptome/genetics
15.
Nucleic Acids Res ; 45(4): e25, 2017 02 28.
Article in English | MEDLINE | ID: mdl-27789687

ABSTRACT

Promoters and enhancers regulate the initiation of gene expression and maintenance of expression levels in spatial and temporal manner. Recent findings stemming from the Cap Analysis of Gene Expression (CAGE) demonstrate that promoters and enhancers, based on their expression profiles after stimulus, belong to different transcription response subclasses. One of the most promising biological features that might explain the difference in transcriptional response between subclasses is the local chromatin environment. We introduce a novel computational framework, PEDAL, for distinguishing effectively transcriptional profiles of promoters and enhancers using solely histone modification marks, chromatin accessibility and binding sites of transcription factors and co-activators. A case study on data from MCF-7 cell-line reveals that PEDAL can identify successfully the transcription response subclasses of promoters and enhancers from two different stimulations. Moreover, we report subsets of input markers that discriminate with minimized classification error MCF-7 promoter and enhancer transcription response subclasses. Our work provides a general computational approach for identifying effectively cell-specific and stimulation-specific promoter and enhancer transcriptional profiles, and thus, contributes to improve our understanding of transcriptional activation in human.


Subject(s)
Computational Biology/methods , Enhancer Elements, Genetic , Promoter Regions, Genetic , Transcription, Genetic , Algorithms , Chromatin/genetics , Epidermal Growth Factor/pharmacology , Gene Expression Profiling , Gene Expression Regulation/drug effects , Humans , MCF-7 Cells , Protein Binding , Transcription Factors , Transcriptional Activation , Workflow
16.
Nucleic Acids Res ; 45(D1): D145-D150, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27789689

ABSTRACT

Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect information on TcoFs. In this article, we describe updates and improvements implemented in TcoF-DB v2. TcoF-DB v2 provides several new features that enables exploration of the roles of TcoFs. The content of the database has significantly expanded, and is enriched with information from Gene Ontology, biological pathways, diseases and molecular signatures. TcoF-DB v2 now includes many more TFs; has substantially increased the number of human TcoFs to 958, and now includes information on mouse (418 new TcoFs). TcoF-DB v2 enables the exploration of information on TcoFs and allows investigations into their influence on transcriptional regulation in humans and mice. TcoF-DB v2 can be accessed at http://tcofdb.org/.


Subject(s)
Carrier Proteins , Databases, Genetic , Gene Expression Regulation , Transcription Factors , Animals , Carrier Proteins/metabolism , Humans , Mice , Protein Binding , Transcription Factors/metabolism
17.
Nucleic Acids Res ; 45(8): e58, 2017 05 05.
Article in English | MEDLINE | ID: mdl-28053124

ABSTRACT

Comparing histone modification profiles between cancer and normal states, or across different tumor samples, can provide insights into understanding cancer initiation, progression and response to therapy. ChIP-seq histone modification data of cancer samples are distorted by copy number variation innate to any cancer cell. We present HMCan-diff, the first method designed to analyze ChIP-seq data to detect changes in histone modifications between two cancer samples of different genetic backgrounds, or between a cancer sample and a normal control. HMCan-diff explicitly corrects for copy number bias, and for other biases in the ChIP-seq data, which significantly improves prediction accuracy compared to methods that do not consider such corrections. On in silico simulated ChIP-seq data generated using genomes with differences in copy number profiles, HMCan-diff shows a much better performance compared to other methods that have no correction for copy number bias. Additionally, we benchmarked HMCan-diff on four experimental datasets, characterizing two histone marks in two different scenarios. We correlated changes in histone modifications between a cancer and a normal control sample with changes in gene expression. On all experimental datasets, HMCan-diff demonstrated better performance compared to the other methods.


Subject(s)
Gene Expression Regulation, Neoplastic , Histone Code , Histones/genetics , Neoplasms/genetics , Software , Algorithms , Chromatin Immunoprecipitation , Datasets as Topic , Disease Progression , Gene Dosage , Histones/metabolism , Humans , Markov Chains , Neoplasms/metabolism , Neoplasms/pathology
18.
Nucleic Acids Res ; 45(5): 2838-2848, 2017 03 17.
Article in English | MEDLINE | ID: mdl-27924038

ABSTRACT

Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna.


Subject(s)
Databases, Nucleic Acid , Knowledge Bases , MicroRNAs/physiology , RNA, Long Noncoding/physiology , Humans , MicroRNAs/metabolism , RNA, Long Noncoding/metabolism , Transcription Factors/metabolism
19.
BMC Complement Altern Med ; 19(1): 142, 2019 Jun 20.
Article in English | MEDLINE | ID: mdl-31221160

ABSTRACT

BACKGROUND: Microbial species in the brine pools of the Red Sea and the brine pool-seawater interfaces are exposed to high temperature, high salinity, low oxygen levels and high concentrations of heavy metals. As adaptations to these harsh conditions require a large suite of secondary metabolites, these microbes have a huge potential as a source of novel anticancer molecules. METHODS: A total of 60 ethyl-acetate extracts of newly isolated strains from extreme environments of the Red-Sea were isolated and tested against several human cancer cell lines for potential cytotoxic and apoptotic activities. RESULTS: Isolates from the Erba brine-pool accounted for 50% of active bacterial extracts capable of inducing 30% or greater inhibition of cell growth. Among the 60 extracts screened, seven showed selectivity towards triple negative BT20 cells compared to normal fibroblasts. CONCLUSION: In this study, we identified several extracts able to induce caspase-dependent apoptosis in various cancer cell lines. Further investigations and isolation of the active compounds of these Red Sea brine pool microbes may offer a chemotherapeutic potential for cancers with limited treatment options.


Subject(s)
Antineoplastic Agents/pharmacology , Bacteria/chemistry , Microbiota , Salts/chemistry , Seawater/microbiology , Antineoplastic Agents/isolation & purification , Apoptosis/drug effects , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Cell Line, Tumor , Humans , Indian Ocean
20.
BMC Genomics ; 19(1): 382, 2018 May 22.
Article in English | MEDLINE | ID: mdl-29788916

ABSTRACT

BACKGROUND: The increasing spectrum of multidrug-resistant bacteria is a major global public health concern, necessitating discovery of novel antimicrobial agents. Here, members of the genus Bacillus are investigated as a potentially attractive source of novel antibiotics due to their broad spectrum of antimicrobial activities. We specifically focus on a computational analysis of the distinctive biosynthetic potential of Bacillus paralicheniformis strains isolated from the Red Sea, an ecosystem exposed to adverse, highly saline and hot conditions. RESULTS: We report the complete circular and annotated genomes of two Red Sea strains, B. paralicheniformis Bac48 isolated from mangrove mud and B. paralicheniformis Bac84 isolated from microbial mat collected from Rabigh Harbor Lagoon in Saudi Arabia. Comparing the genomes of B. paralicheniformis Bac48 and B. paralicheniformis Bac84 with nine publicly available complete genomes of B. licheniformis and three genomes of B. paralicheniformis, revealed that all of the B. paralicheniformis strains in this study are more enriched in nonribosomal peptides (NRPs). We further report the first computationally identified trans-acyltransferase (trans-AT) nonribosomal peptide synthetase/polyketide synthase (PKS/ NRPS) cluster in strains of this species. CONCLUSIONS: B. paralicheniformis species have more genes associated with biosynthesis of antimicrobial bioactive compounds than other previously characterized species of B. licheniformis, which suggests that these species are better potential sources for novel antibiotics. Moreover, the genome of the Red Sea strain B. paralicheniformis Bac48 is more enriched in modular PKS genes compared to B. licheniformis strains and other B. paralicheniformis strains. This may be linked to adaptations that strains surviving in the Red Sea underwent to survive in the relatively hot and saline ecosystems.


Subject(s)
Bacillus/genetics , Bacillus/metabolism , Biological Products/metabolism , Computer Simulation , Genomics , Multigene Family/genetics , Bacillus/enzymology , Bacteriocins/metabolism , Genome, Bacterial/genetics , Peptide Synthases/genetics , Polyketide Synthases/genetics , Ribosomes/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL