Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38189539

RESUMO

Sequence motif discovery algorithms enhance the identification of novel deoxyribonucleic acid sequences with pivotal biological significance, especially transcription factor (TF)-binding motifs. The advent of assay for transposase-accessible chromatin using sequencing (ATAC-seq) has broadened the toolkit for motif characterization. Nonetheless, prevailing computational approaches have focused on delineating TF-binding footprints, with motif discovery receiving less attention. Herein, we present Cis rEgulatory Motif Influence using de Bruijn Graph (CEMIG), an algorithm leveraging de Bruijn and Hamming distance graph paradigms to predict and map motif sites. Assessment on 129 ATAC-seq datasets from the Cistrome Data Browser demonstrates CEMIG's exceptional performance, surpassing three established methodologies on four evaluative metrics. CEMIG accurately identifies both cell-type-specific and common TF motifs within GM12878 and K562 cell lines, demonstrating its comparative genomic capabilities in the identification of evolutionary conservation and cell-type specificity. In-depth transcriptional and functional genomic studies have validated the functional relevance of CEMIG-identified motifs across various cell types. CEMIG is available at https://github.com/OSU-BMBL/CEMIG, developed in C++ to ensure cross-platform compatibility with Linux, macOS and Windows operating systems.


Assuntos
Algoritmos , Sequenciamento de Cromatina por Imunoprecipitação , Benchmarking , Evolução Biológica , Linhagem Celular
2.
Trends Genet ; 36(12): 951-966, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32868128

RESUMO

Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.


Assuntos
Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Análise de Célula Única/métodos , Animais , Humanos
3.
Brief Bioinform ; 22(2): 1639-1655, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32047891

RESUMO

Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Metabolômica/métodos , Metagenômica/métodos , Microbiota , Proteômica/métodos , Transcriptoma , Humanos
4.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32793986

RESUMO

Bacterial genomes are now recognized as interacting intimately with cellular processes. Uncovering organizational mechanisms of bacterial genomes has been a primary focus of researchers to reveal the potential cellular activities. The advances in both experimental techniques and computational models provide a tremendous opportunity for understanding these mechanisms, and various studies have been proposed to explore the organization rules of bacterial genomes associated with functions recently. This review focuses mainly on the principles that shape the organization of bacterial genomes, both locally and globally. We first illustrate local structures as operons/transcription units for facilitating co-transcription and horizontal transfer of genes. We then clarify the constraints that globally shape bacterial genomes, such as metabolism, transcription and replication. Finally, we highlight challenges and opportunities to advance bacterial genomic studies and provide application perspectives of genome organization, including pathway hole assignment and genome assembly and understanding disease mechanisms.


Assuntos
Genoma Bacteriano , Cromossomos Bacterianos , Biologia Computacional/métodos , Replicação do DNA , Redes e Vias Metabólicas , Óperon , Transcrição Gênica
5.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33957668

RESUMO

Alternative transcription units (ATUs) are dynamically encoded under different conditions and display overlapping patterns (sharing one or more genes) under a specific condition in bacterial genomes. Genome-scale identification of ATUs is essential for studying the emergence of human diseases caused by bacterial organisms. However, it is unrealistic to identify all ATUs using experimental techniques because of the complexity and dynamic nature of ATUs. Here, we present the first-of-its-kind computational framework, named SeqATU, for genome-scale ATU prediction based on next-generation RNA-Seq data. The framework utilizes a convex quadratic programming model to seek an optimum expression combination of all of the to-be-identified ATUs. The predicted ATUs in Escherichia coli reached a precision of 0.77/0.74 and a recall of 0.75/0.76 in the two RNA-Sequencing datasets compared with the benchmarked ATUs from third-generation RNA-Seq data. In addition, the proportion of 5'- or 3'-end genes of the predicted ATUs, having documented transcription factor binding sites and transcription termination sites, was three times greater than that of no 5'- or 3'-end genes. We further evaluated the predicted ATUs by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes functional enrichment analyses. The results suggested that gene pairs frequently encoded in the same ATUs are more functionally related than those that can belong to two distinct ATUs. Overall, these results demonstrated the high reliability of predicted ATUs. We expect that the new insights derived by SeqATU will not only improve the understanding of the transcription mechanism of bacteria but also guide the reconstruction of a genome-scale transcriptional regulatory network.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Isoformas de RNA , Transcrição Gênica , Algoritmos , Bactérias/genética , Bases de Dados Genéticas , Escherichia coli/genética , Genoma Bacteriano , Genômica/métodos , Humanos , RNA Mensageiro/genética , RNA-Seq , Análise de Célula Única/métodos , Regiões Terminadoras Genéticas , Sítio de Iniciação de Transcrição
6.
Nucleic Acids Res ; 48(W1): W275-W286, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32421805

RESUMO

A group of genes controlled as a unit, usually by the same repressor or activator gene, is known as a regulon. The ability to identify active regulons within a specific cell type, i.e., cell-type-specific regulons (CTSR), provides an extraordinary opportunity to pinpoint crucial regulators and target genes responsible for complex diseases. However, the identification of CTSRs from single-cell RNA-Seq (scRNA-Seq) data is computationally challenging. We introduce IRIS3, the first-of-its-kind web server for CTSR inference from scRNA-Seq data for human and mouse. IRIS3 is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified CTSRs. CTSR data can be used to reliably characterize and distinguish the corresponding cell type from others and can be combined with other computational or experimental analyses for biomedical studies. CTSRs can, therefore, aid in the discovery of major regulatory mechanisms and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. The broader impact of IRIS3 includes, but is not limited to, investigation of complex diseases hierarchies and heterogeneity, causal gene regulatory network construction, and drug development. IRIS3 is freely accessible from https://bmbl.bmi.osumc.edu/iris3/ with no login requirement.


Assuntos
RNA-Seq , Regulon , Análise de Célula Única , Software , Animais , Encéfalo/metabolismo , Análise por Conglomerados , Camundongos
7.
Sensors (Basel) ; 22(22)2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36433543

RESUMO

Simultaneous localization and mapping (SLAM) is the major solution for constructing or updating a map of an unknown environment while simultaneously keeping track of a mobile robot's location. Correlative Scan Matching (CSM) is a scan matching algorithm for obtaining the posterior distribution probability for the robot's pose in SLAM. This paper combines the non-linear optimization algorithm and CSM algorithm into an NLO-CSM (Non-linear Optimization CSM) algorithm for reducing the computation resources and the amount of computation while ensuring high calculation accuracy, and it presents an efficient hardware accelerator design of the NLO-CSM algorithm for the scan matching in 2D LiDAR SLAM. The proposed NLO-CSM hardware accelerator utilizes pipeline processing and module reusing techniques to achieve low hardware overhead, fast matching, and high energy efficiency. FPGA implementation results show that, at 100 MHz clock, the power consumption of the proposed hardware accelerator is as low as 0.79 W, while it performs a scan match at 8.98 ms and 7.15 mJ per frame. The proposed design outperforms the ARM-A9 dual-core CPU implementation with a 92.74% increase and 90.71% saving in computing speed and energy consumption, respectively. It has also achieved 80.3% LUTs, 84.13% FFs, and 20.83% DSPs saving, as well as an 8.17× increase in frame rate and 96.22% improvement in energy efficiency over a state-of-the-art hardware accelerator design in the literature. ASIC implementation in 65 nm can further reduce the computing time and energy consumption per scan to 5.94 ms and 0.06 mJ, respectively, which shows that the proposed NLO-CSM hardware accelerator design is suitable for resource-limited and energy-constrained mobile and micro robot applications.

8.
Sensors (Basel) ; 22(23)2022 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-36501862

RESUMO

Achieving low-cost and high-performance network security communication is necessary for Internet of Things (IoT) devices, including intelligent sensors and mobile robots. Designing hardware accelerators to accelerate multiple computationally intensive cryptographic primitives in various network security protocols is challenging. Different from existing unified reconfigurable cryptographic accelerators with relatively low efficiency and high latency, this paper presents design and analysis of a reconfigurable cryptographic accelerator consisting of a reconfigurable cipher unit and a reconfigurable hash unit to support widely used cryptographic algorithms for IoT Devices, which require block ciphers and hash functions simultaneously. Based on a detailed and comprehensive algorithmic analysis of both the block ciphers and hash functions in terms of basic algorithm structures and common cryptographic operators, the proposed reconfigurable cryptographic accelerator is designed by reusing key register files and operators to build unified data paths. Both the reconfigurable cipher unit and the reconfigurable hash unit contain a unified data path to implement Data Encryption Standard (DES)/Advanced Encryption Standard (AES)/ShangMi 4 (SM4) and Secure Hash Algorithm-1 (SHA-1)/SHA-256/SM3 algorithms, respectively. A reconfigurable S-Box for AES and SM4 is designed based on the composite field Galois field (GF) GF(((22)2)2), which significantly reduces hardware overhead and power consumption compared with the conventional implementation by look-up tables. The experimental results based on 65-nm application-specific integrated circuit (ASIC) implementation show that the achieved energy efficiency and area efficiency of the proposed design is 441 Gbps/W and 37.55 Gbps/mm2, respectively, which is suitable for IoT devices with limited battery and form factor. The result of delay analysis also shows that the number of delay cycles of our design can be reduced by 83% compared with the state-of-the-art design, which shows that the proposed design is more suitable for applications including 5G/Wi-Fi/ZigBee/Ethernet network standards to accelerate block ciphers and hash functions simultaneously.

9.
Entropy (Basel) ; 24(11)2022 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-36359655

RESUMO

Entropy is a measure of uncertainty or randomness. It is the foundation for almost all cryptographic systems. True random number generators (TRNGs) and physical unclonable functions (PUFs) are the silicon primitives to respectively harvest dynamic and static entropy to generate random bit streams. In this survey paper, we present a systematic and comprehensive review of different state-of-the-art methods to harvest entropy from silicon-based devices, including the implementations, applications, and the security of the designs. Furthermore, we conclude the trends of the entropy source design to point out the current spots of entropy harvesting.

10.
Brief Bioinform ; 20(6): 2044-2054, 2019 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30099484

RESUMO

Differential gene expression (DGE) analysis is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes across two or more conditions and is widely used in many applications of RNA-seq data analysis. Interpretation of the DGE results can be nonintuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we reviewed DGE results analysis from a functional point of view for various visualizations. We also provide an R/Bioconductor package, Visualization of Differential Gene Expression Results using R, which generates information-rich visualizations for the interpretation of DGE results from three widely used tools, Cuffdiff, DESeq2 and edgeR. The implemented functions are also tested on five real-world data sets, consisting of one human, one Malus domestica and three Vitis riparia data sets.


Assuntos
Expressão Gênica , Análise de Sequência de RNA , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
11.
Bioinformatics ; 36(4): 1143-1149, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31503285

RESUMO

MOTIVATION: The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. RESULTS: We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. AVAILABILITY AND IMPLEMENTATION: The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , RNA , Algoritmos , Humanos , Análise de Sequência de RNA , Software
12.
Nucleic Acids Res ; 47(15): 7809-7824, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31372637

RESUMO

The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein-DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein-protein-DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF-DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework.


Assuntos
Biologia Computacional/estatística & dados numéricos , DNA/química , Aprendizado Profundo , Fatores de Transcrição/genética , Sítios de Ligação , Biologia Computacional/métodos , DNA/genética , DNA/metabolismo , Regulação da Expressão Gênica , Humanos , Células K562 , Motivos de Nucleotídeos , Ligação Proteica , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo
13.
Brief Bioinform ; 19(5): 1069-1081, 2018 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-28334268

RESUMO

Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Software , Sequência de Bases , Sítios de Ligação/genética , Imunoprecipitação da Cromatina/estatística & dados numéricos , Biologia Computacional/métodos , DNA/genética , DNA/metabolismo , Humanos , Análise de Sequência de DNA/estatística & dados numéricos , Fatores de Transcrição/metabolismo
14.
Bioinformatics ; 35(21): 4474-4477, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31116375

RESUMO

MOTIVATION: Metagenomic and metatranscriptomic analyses can provide an abundance of information related to microbial communities. However, straightforward analysis of this data does not provide optimal results, with a required integration of data types being needed to thoroughly investigate these microbiomes and their environmental interactions. RESULTS: Here, we present MetaQUBIC, an integrated biclustering-based computational pipeline for gene module detection that integrates both metagenomic and metatranscriptomic data. Additionally, we used this pipeline to investigate 735 paired DNA and RNA human gut microbiome samples, resulting in a comprehensive hybrid gene expression matrix of 2.3 million cross-species genes in the 735 human fecal samples and 155 functional enriched gene modules. We believe both the MetaQUBIC pipeline and the generated comprehensive human gut hybrid expression matrix will facilitate further investigations into multiple levels of microbiome studies. AVAILABILITY AND IMPLEMENTATION: The package is freely available at https://github.com/OSU-BMBL/metaqubic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microbioma Gastrointestinal , Metagenoma , Fezes , Humanos , Metagenômica , Transcriptoma
15.
Bioinformatics ; 35(14): 2395-2402, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30520961

RESUMO

MOTIVATION: The prediction of protein-protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. RESULTS: A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2-15.7% and 6.1-18.9% higher than the other existing tools, respectively. AVAILABILITY AND IMPLEMENTATION: The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software
16.
Molecules ; 23(10)2018 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-30322177

RESUMO

Overlapping structures of protein⁻protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein⁻protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well.


Assuntos
Biologia Computacional/métodos , Escherichia coli/metabolismo , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/metabolismo , Algoritmos , Animais , Proteínas de Escherichia coli/metabolismo , Camundongos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo
17.
PLoS Comput Biol ; 12(2): e1004772, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26894997

RESUMO

High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of transcriptomes. However, it is an important and highly challenging task to assemble vast amounts of short RNA-seq reads into transcriptomes with alternative splicing isoforms. In this study, we present a novel de novo assembler, BinPacker, by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their corresponding isoforms by solving a series of bin-packing problems. This approach, which subtly integrates coverage information into the procedure, has two exclusive features: 1) only splicing junctions are involved in the assembling procedure; 2) massive pell-mell reads are assembled seemingly by moving a comb along junction edges on a splicing graph. Being tested on both real and simulated RNA-seq datasets, it outperforms almost all the existing de novo assemblers on all the tested datasets, and even outperforms those ab initio assemblers on the real dog dataset. In addition, it runs substantially faster and requires less memory space than most of the assemblers. BinPacker is published under GNU GENERAL PUBLIC LICENSE and the source is available from: http://sourceforge.net/projects/transcriptomeassembly/files/BinPacker_1.0.tar.gz/download. Quick installation version is available from: http://sourceforge.net/projects/transcriptomeassembly/files/BinPacker_binary.tar.gz/download.


Assuntos
Perfilação da Expressão Gênica/métodos , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Algoritmos , Animais , Biologia Computacional , Cães , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Camundongos , RNA Mensageiro/química
18.
BMC Genomics ; 17: 578, 2016 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-27507169

RESUMO

BACKGROUND: Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. RESULTS: Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. CONCLUSION: The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.


Assuntos
Genoma , Genômica , Motivos de Nucleotídeos , Filogenia , Células Procarióticas/classificação , Células Procarióticas/metabolismo , Sequências Reguladoras de Ácido Nucleico , Algoritmos , Sítios de Ligação , Escherichia coli/genética , Genoma Bacteriano , Genômica/métodos , Modelos Estatísticos , Regiões Promotoras Genéticas , Ligação Proteica
19.
Nucleic Acids Res ; 42(Web Server issue): W12-9, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24753419

RESUMO

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular.


Assuntos
DNA/química , Regiões Promotoras Genéticas , Software , Sítios de Ligação , Internet , Motivos de Nucleotídeos , Análise de Sequência de DNA , Integração de Sistemas , Fatores de Transcrição/metabolismo
20.
BMC Bioinformatics ; 16: 356, 2015 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-26538447

RESUMO

BACKGROUND: Bacterial operons are considerably more complex than what were thought. At least their components are dynamically rather than statically defined as previously assumed. Here we present a computational study of the landscape of the transcriptional units (TUs) of E. coli K12, revealed by the available genomic and transcriptomic data, providing new understanding about the complexity of TUs as a whole encoded in the genome of E. coli K12. RESULTS AND CONCLUSION: Our main findings include that (i) different TUs may overlap with each other by sharing common genes, giving rise to clusters of overlapped TUs (TUCs) along the genomic sequence; (ii) the intergenic regions in front of the first gene of each TU tend to have more conserved sequence motifs than those of the other genes inside the TU, suggesting that TUs each have their own promoters; (iii) the terminators associated with the 3' ends of TUCs tend to be Rho-independent terminators, substantially more often than terminators of TUs that end inside a TUC; and (iv) the functional relatedness of adjacent gene pairs in individual TUs is higher than those in TUCs, suggesting that individual TUs are more basic functional units than TUCs.


Assuntos
Escherichia coli K12/genética , Óperon/genética , Transcrição Gênica , Sítios de Ligação , Sequência Conservada/genética , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos , Motivos de Nucleotídeos/genética , Regiões Promotoras Genéticas , Regulon/genética , Regiões Terminadoras Genéticas , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa