Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39082647

RESUMEN

Deciphering the intricate relationships between transcription factors (TFs), enhancers, and genes through the inference of enhancer-driven gene regulatory networks (eGRNs) is crucial in understanding gene regulatory programs in a complex biological system. This study introduces STREAM, a novel method that leverages a Steiner forest problem model, a hybrid biclustering pipeline, and submodular optimization to infer eGRNs from jointly profiled single-cell transcriptome and chromatin accessibility data. Compared to existing methods, STREAM demonstrates enhanced performance in terms of TF recovery, TF-enhancer linkage prediction, and enhancer-gene relation discovery. Application of STREAM to an Alzheimer's disease dataset and a diffuse small lymphocytic lymphoma dataset reveals its ability to identify TF-enhancer-gene relations associated with pseudotime, as well as key TF-enhancer-gene relations and TF cooperation underlying tumor cells.


Asunto(s)
Elementos de Facilitación Genéticos , Redes Reguladoras de Genes , RNA-Seq , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Secuenciación de Inmunoprecipitación de Cromatina , Algoritmos , Biología Computacional/métodos , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Análisis de Expresión Génica de una Sola Célula
2.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38189539

RESUMEN

Sequence motif discovery algorithms enhance the identification of novel deoxyribonucleic acid sequences with pivotal biological significance, especially transcription factor (TF)-binding motifs. The advent of assay for transposase-accessible chromatin using sequencing (ATAC-seq) has broadened the toolkit for motif characterization. Nonetheless, prevailing computational approaches have focused on delineating TF-binding footprints, with motif discovery receiving less attention. Herein, we present Cis rEgulatory Motif Influence using de Bruijn Graph (CEMIG), an algorithm leveraging de Bruijn and Hamming distance graph paradigms to predict and map motif sites. Assessment on 129 ATAC-seq datasets from the Cistrome Data Browser demonstrates CEMIG's exceptional performance, surpassing three established methodologies on four evaluative metrics. CEMIG accurately identifies both cell-type-specific and common TF motifs within GM12878 and K562 cell lines, demonstrating its comparative genomic capabilities in the identification of evolutionary conservation and cell-type specificity. In-depth transcriptional and functional genomic studies have validated the functional relevance of CEMIG-identified motifs across various cell types. CEMIG is available at https://github.com/OSU-BMBL/CEMIG, developed in C++ to ensure cross-platform compatibility with Linux, macOS and Windows operating systems.


Asunto(s)
Algoritmos , Secuenciación de Inmunoprecipitación de Cromatina , Benchmarking , Evolución Biológica , Línea Celular
3.
Theor Appl Genet ; 137(9): 211, 2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39210238

RESUMEN

Soybean, a source of plant-derived lipids, contains an array of fatty acids essential for health. A comprehensive understanding of the fatty acid profiles in soybean is crucial for enhancing soybean cultivars and augmenting their qualitative attributes. Here, 180 F10 generation recombinant inbred lines (RILs), derived from the cross-breeding of the cultivated soybean variety 'Jidou 12' and the wild soybean 'Y9,' were used as primary experimental subjects. Using inclusive composite interval mapping (ICIM), this study undertook a quantitative trait locus (QTL) analysis on five distinct fatty acid components in the RIL population from 2019 to 2021. Concurrently, a genome-wide association study (GWAS) was conducted on 290 samples from a genetically diverse natural population to scrutinize the five fatty acid components during the same timeframe, thereby aiming to identify loci closely associated with fatty acid profiles. In addition, haplotype analysis and the Kyoto Encyclopedia of Genes and Genomes pathway analysis were performed to predict candidate genes. The QTL analysis elucidated 23 stable QTLs intricately associated with the five fatty acid components, exhibiting phenotypic contribution rates ranging from 2.78% to 25.37%. In addition, GWAS of the natural population unveiled 102 significant loci associated with these fatty acid components. The haplotype analysis of the colocalized loci revealed that Glyma.06G221400 on chromosome 6 exhibited a significant correlation with stearic acid content, with Hap1 showing a markedly elevated stearic acid level compared with Hap2 and Hap3. Similarly, Glyma.12G075100 on chromosome 12 was significantly associated with the contents of oleic, linoleic, and linolenic acids, suggesting its involvement in fatty acid biosynthesis. In the natural population, candidate genes associated with the contents of palmitic and linolenic acids were predominantly from the fatty acid metabolic pathway, indicating their potential role as pivotal genes in the critical steps of fatty acid metabolism. Furthermore, genomic selection (GS) for fatty acid components was conducted using ridge regression best linear unbiased prediction based on both random single nucleotide polymorphisms (SNPs) and SNPs significantly associated with fatty acid components identified by GWAS. GS accuracy was contingent upon the SNP set used. Notably, GS efficiency was enhanced when using SNPs derived from QTL mapping analysis and GWAS compared with random SNPs, and reached a plateau when the number of SNP markers exceeded 3,000. This study thus indicates that Glyma.06G221400 and Glyma.12G075100 are genes integral to the synthesis and regulatory mechanisms of fatty acids. It provides insights into the complex biosynthesis and regulation of fatty acids, with significant implications for the directed improvement of soybean oil quality and the selection of superior soybean varieties. The SNP markers delineated in this study can be instrumental in establishing an efficacious pipeline for marker-assisted selection and GS aimed at improving soybean fatty acid components.


Asunto(s)
Mapeo Cromosómico , Ácidos Grasos , Glycine max , Sitios de Carácter Cuantitativo , Glycine max/genética , Glycine max/metabolismo , Ácidos Grasos/metabolismo , Mapeo Cromosómico/métodos , Fenotipo , Polimorfismo de Nucleótido Simple , Haplotipos , Fitomejoramiento , Genes de Plantas , Estudios de Asociación Genética , Estudio de Asociación del Genoma Completo
4.
Trends Genet ; 36(12): 951-966, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32868128

RESUMEN

Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.


Asunto(s)
Biología Computacional/métodos , Enfermedad/genética , Redes Reguladoras de Genes , Análisis de la Célula Individual/métodos , Animales , Humanos
5.
Brief Bioinform ; 22(2): 1639-1655, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-32047891

RESUMEN

Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.


Asunto(s)
Ensayos Analíticos de Alto Rendimiento/métodos , Metabolómica/métodos , Metagenómica/métodos , Microbiota , Proteómica/métodos , Transcriptoma , Humanos
6.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32793986

RESUMEN

Bacterial genomes are now recognized as interacting intimately with cellular processes. Uncovering organizational mechanisms of bacterial genomes has been a primary focus of researchers to reveal the potential cellular activities. The advances in both experimental techniques and computational models provide a tremendous opportunity for understanding these mechanisms, and various studies have been proposed to explore the organization rules of bacterial genomes associated with functions recently. This review focuses mainly on the principles that shape the organization of bacterial genomes, both locally and globally. We first illustrate local structures as operons/transcription units for facilitating co-transcription and horizontal transfer of genes. We then clarify the constraints that globally shape bacterial genomes, such as metabolism, transcription and replication. Finally, we highlight challenges and opportunities to advance bacterial genomic studies and provide application perspectives of genome organization, including pathway hole assignment and genome assembly and understanding disease mechanisms.


Asunto(s)
Genoma Bacteriano , Cromosomas Bacterianos , Biología Computacional/métodos , Replicación del ADN , Redes y Vías Metabólicas , Operón , Transcripción Genética
7.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33957668

RESUMEN

Alternative transcription units (ATUs) are dynamically encoded under different conditions and display overlapping patterns (sharing one or more genes) under a specific condition in bacterial genomes. Genome-scale identification of ATUs is essential for studying the emergence of human diseases caused by bacterial organisms. However, it is unrealistic to identify all ATUs using experimental techniques because of the complexity and dynamic nature of ATUs. Here, we present the first-of-its-kind computational framework, named SeqATU, for genome-scale ATU prediction based on next-generation RNA-Seq data. The framework utilizes a convex quadratic programming model to seek an optimum expression combination of all of the to-be-identified ATUs. The predicted ATUs in Escherichia coli reached a precision of 0.77/0.74 and a recall of 0.75/0.76 in the two RNA-Sequencing datasets compared with the benchmarked ATUs from third-generation RNA-Seq data. In addition, the proportion of 5'- or 3'-end genes of the predicted ATUs, having documented transcription factor binding sites and transcription termination sites, was three times greater than that of no 5'- or 3'-end genes. We further evaluated the predicted ATUs by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes functional enrichment analyses. The results suggested that gene pairs frequently encoded in the same ATUs are more functionally related than those that can belong to two distinct ATUs. Overall, these results demonstrated the high reliability of predicted ATUs. We expect that the new insights derived by SeqATU will not only improve the understanding of the transcription mechanism of bacteria but also guide the reconstruction of a genome-scale transcriptional regulatory network.


Asunto(s)
Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/métodos , Isoformas de ARN , Transcripción Genética , Algoritmos , Bacterias/genética , Bases de Datos Genéticas , Escherichia coli/genética , Genoma Bacteriano , Genómica/métodos , Humanos , ARN Mensajero/genética , RNA-Seq , Análisis de la Célula Individual/métodos , Regiones Terminadoras Genéticas , Sitio de Iniciación de la Transcripción
8.
Nucleic Acids Res ; 48(W1): W275-W286, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32421805

RESUMEN

A group of genes controlled as a unit, usually by the same repressor or activator gene, is known as a regulon. The ability to identify active regulons within a specific cell type, i.e., cell-type-specific regulons (CTSR), provides an extraordinary opportunity to pinpoint crucial regulators and target genes responsible for complex diseases. However, the identification of CTSRs from single-cell RNA-Seq (scRNA-Seq) data is computationally challenging. We introduce IRIS3, the first-of-its-kind web server for CTSR inference from scRNA-Seq data for human and mouse. IRIS3 is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified CTSRs. CTSR data can be used to reliably characterize and distinguish the corresponding cell type from others and can be combined with other computational or experimental analyses for biomedical studies. CTSRs can, therefore, aid in the discovery of major regulatory mechanisms and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. The broader impact of IRIS3 includes, but is not limited to, investigation of complex diseases hierarchies and heterogeneity, causal gene regulatory network construction, and drug development. IRIS3 is freely accessible from https://bmbl.bmi.osumc.edu/iris3/ with no login requirement.


Asunto(s)
RNA-Seq , Regulón , Análisis de la Célula Individual , Programas Informáticos , Animales , Encéfalo/metabolismo , Análisis por Conglomerados , Ratones
9.
Sensors (Basel) ; 22(22)2022 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-36433543

RESUMEN

Simultaneous localization and mapping (SLAM) is the major solution for constructing or updating a map of an unknown environment while simultaneously keeping track of a mobile robot's location. Correlative Scan Matching (CSM) is a scan matching algorithm for obtaining the posterior distribution probability for the robot's pose in SLAM. This paper combines the non-linear optimization algorithm and CSM algorithm into an NLO-CSM (Non-linear Optimization CSM) algorithm for reducing the computation resources and the amount of computation while ensuring high calculation accuracy, and it presents an efficient hardware accelerator design of the NLO-CSM algorithm for the scan matching in 2D LiDAR SLAM. The proposed NLO-CSM hardware accelerator utilizes pipeline processing and module reusing techniques to achieve low hardware overhead, fast matching, and high energy efficiency. FPGA implementation results show that, at 100 MHz clock, the power consumption of the proposed hardware accelerator is as low as 0.79 W, while it performs a scan match at 8.98 ms and 7.15 mJ per frame. The proposed design outperforms the ARM-A9 dual-core CPU implementation with a 92.74% increase and 90.71% saving in computing speed and energy consumption, respectively. It has also achieved 80.3% LUTs, 84.13% FFs, and 20.83% DSPs saving, as well as an 8.17× increase in frame rate and 96.22% improvement in energy efficiency over a state-of-the-art hardware accelerator design in the literature. ASIC implementation in 65 nm can further reduce the computing time and energy consumption per scan to 5.94 ms and 0.06 mJ, respectively, which shows that the proposed NLO-CSM hardware accelerator design is suitable for resource-limited and energy-constrained mobile and micro robot applications.

10.
Sensors (Basel) ; 22(23)2022 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-36501862

RESUMEN

Achieving low-cost and high-performance network security communication is necessary for Internet of Things (IoT) devices, including intelligent sensors and mobile robots. Designing hardware accelerators to accelerate multiple computationally intensive cryptographic primitives in various network security protocols is challenging. Different from existing unified reconfigurable cryptographic accelerators with relatively low efficiency and high latency, this paper presents design and analysis of a reconfigurable cryptographic accelerator consisting of a reconfigurable cipher unit and a reconfigurable hash unit to support widely used cryptographic algorithms for IoT Devices, which require block ciphers and hash functions simultaneously. Based on a detailed and comprehensive algorithmic analysis of both the block ciphers and hash functions in terms of basic algorithm structures and common cryptographic operators, the proposed reconfigurable cryptographic accelerator is designed by reusing key register files and operators to build unified data paths. Both the reconfigurable cipher unit and the reconfigurable hash unit contain a unified data path to implement Data Encryption Standard (DES)/Advanced Encryption Standard (AES)/ShangMi 4 (SM4) and Secure Hash Algorithm-1 (SHA-1)/SHA-256/SM3 algorithms, respectively. A reconfigurable S-Box for AES and SM4 is designed based on the composite field Galois field (GF) GF(((22)2)2), which significantly reduces hardware overhead and power consumption compared with the conventional implementation by look-up tables. The experimental results based on 65-nm application-specific integrated circuit (ASIC) implementation show that the achieved energy efficiency and area efficiency of the proposed design is 441 Gbps/W and 37.55 Gbps/mm2, respectively, which is suitable for IoT devices with limited battery and form factor. The result of delay analysis also shows that the number of delay cycles of our design can be reduced by 83% compared with the state-of-the-art design, which shows that the proposed design is more suitable for applications including 5G/Wi-Fi/ZigBee/Ethernet network standards to accelerate block ciphers and hash functions simultaneously.

11.
Entropy (Basel) ; 24(11)2022 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-36359655

RESUMEN

Entropy is a measure of uncertainty or randomness. It is the foundation for almost all cryptographic systems. True random number generators (TRNGs) and physical unclonable functions (PUFs) are the silicon primitives to respectively harvest dynamic and static entropy to generate random bit streams. In this survey paper, we present a systematic and comprehensive review of different state-of-the-art methods to harvest entropy from silicon-based devices, including the implementations, applications, and the security of the designs. Furthermore, we conclude the trends of the entropy source design to point out the current spots of entropy harvesting.

12.
Brief Bioinform ; 20(6): 2044-2054, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30099484

RESUMEN

Differential gene expression (DGE) analysis is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes across two or more conditions and is widely used in many applications of RNA-seq data analysis. Interpretation of the DGE results can be nonintuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we reviewed DGE results analysis from a functional point of view for various visualizations. We also provide an R/Bioconductor package, Visualization of Differential Gene Expression Results using R, which generates information-rich visualizations for the interpretation of DGE results from three widely used tools, Cuffdiff, DESeq2 and edgeR. The implemented functions are also tested on five real-world data sets, consisting of one human, one Malus domestica and three Vitis riparia data sets.


Asunto(s)
Expresión Génica , Análisis de Secuencia de ARN , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
13.
Bioinformatics ; 36(4): 1143-1149, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31503285

RESUMEN

MOTIVATION: The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. RESULTS: We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. AVAILABILITY AND IMPLEMENTATION: The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , ARN , Algoritmos , Humanos , Análisis de Secuencia de ARN , Programas Informáticos
14.
Nucleic Acids Res ; 47(15): 7809-7824, 2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31372637

RESUMEN

The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein-DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein-protein-DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF-DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework.


Asunto(s)
Biología Computacional/estadística & datos numéricos , ADN/química , Aprendizaje Profundo , Factores de Transcripción/genética , Sitios de Unión , Biología Computacional/métodos , ADN/genética , ADN/metabolismo , Regulación de la Expresión Génica , Humanos , Células K562 , Motivos de Nucleótidos , Unión Proteica , Factores de Transcripción/clasificación , Factores de Transcripción/metabolismo
15.
Brief Bioinform ; 19(5): 1069-1081, 2018 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-28334268

RESUMEN

Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Programas Informáticos , Secuencia de Bases , Sitios de Unión/genética , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional/métodos , ADN/genética , ADN/metabolismo , Humanos , Análisis de Secuencia de ADN/estadística & datos numéricos , Factores de Transcripción/metabolismo
16.
Bioinformatics ; 35(21): 4474-4477, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31116375

RESUMEN

MOTIVATION: Metagenomic and metatranscriptomic analyses can provide an abundance of information related to microbial communities. However, straightforward analysis of this data does not provide optimal results, with a required integration of data types being needed to thoroughly investigate these microbiomes and their environmental interactions. RESULTS: Here, we present MetaQUBIC, an integrated biclustering-based computational pipeline for gene module detection that integrates both metagenomic and metatranscriptomic data. Additionally, we used this pipeline to investigate 735 paired DNA and RNA human gut microbiome samples, resulting in a comprehensive hybrid gene expression matrix of 2.3 million cross-species genes in the 735 human fecal samples and 155 functional enriched gene modules. We believe both the MetaQUBIC pipeline and the generated comprehensive human gut hybrid expression matrix will facilitate further investigations into multiple levels of microbiome studies. AVAILABILITY AND IMPLEMENTATION: The package is freely available at https://github.com/OSU-BMBL/metaqubic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Microbioma Gastrointestinal , Metagenoma , Heces , Humanos , Metagenómica , Transcriptoma
17.
Bioinformatics ; 35(14): 2395-2402, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-30520961

RESUMEN

MOTIVATION: The prediction of protein-protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. RESULTS: A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2-15.7% and 6.1-18.9% higher than the other existing tools, respectively. AVAILABILITY AND IMPLEMENTATION: The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos
18.
Molecules ; 23(10)2018 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-30322177

RESUMEN

Overlapping structures of protein⁻protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein⁻protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well.


Asunto(s)
Biología Computacional/métodos , Escherichia coli/metabolismo , Mapeo de Interacción de Proteínas/métodos , Saccharomyces cerevisiae/metabolismo , Algoritmos , Animales , Proteínas de Escherichia coli/metabolismo , Ratones , Mapas de Interacción de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo
19.
PLoS Comput Biol ; 12(2): e1004772, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26894997

RESUMEN

High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of transcriptomes. However, it is an important and highly challenging task to assemble vast amounts of short RNA-seq reads into transcriptomes with alternative splicing isoforms. In this study, we present a novel de novo assembler, BinPacker, by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their corresponding isoforms by solving a series of bin-packing problems. This approach, which subtly integrates coverage information into the procedure, has two exclusive features: 1) only splicing junctions are involved in the assembling procedure; 2) massive pell-mell reads are assembled seemingly by moving a comb along junction edges on a splicing graph. Being tested on both real and simulated RNA-seq datasets, it outperforms almost all the existing de novo assemblers on all the tested datasets, and even outperforms those ab initio assemblers on the real dog dataset. In addition, it runs substantially faster and requires less memory space than most of the assemblers. BinPacker is published under GNU GENERAL PUBLIC LICENSE and the source is available from: http://sourceforge.net/projects/transcriptomeassembly/files/BinPacker_1.0.tar.gz/download. Quick installation version is available from: http://sourceforge.net/projects/transcriptomeassembly/files/BinPacker_binary.tar.gz/download.


Asunto(s)
Perfilación de la Expresión Génica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Algoritmos , Animales , Biología Computacional , Perros , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Ratones , ARN Mensajero/química
20.
BMC Genomics ; 17: 578, 2016 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-27507169

RESUMEN

BACKGROUND: Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. RESULTS: Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. CONCLUSION: The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.


Asunto(s)
Genoma , Genómica , Motivos de Nucleótidos , Filogenia , Células Procariotas/clasificación , Células Procariotas/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Algoritmos , Sitios de Unión , Escherichia coli/genética , Genoma Bacteriano , Genómica/métodos , Modelos Estadísticos , Regiones Promotoras Genéticas , Unión Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA