Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Nat Immunol ; 23(11): 1588-1599, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36266363

RESUMO

Dysfunctional CD8+ T cells, which have defective production of antitumor effectors, represent a major mediator of immunosuppression in the tumor microenvironment. Here, we show that SUSD2 is a negative regulator of CD8+ T cell antitumor function. Susd2-/- effector CD8+ T cells showed enhanced production of antitumor molecules, which consequently blunted tumor growth in multiple syngeneic mouse tumor models. Through a quantitative mass spectrometry assay, we found that SUSD2 interacted with interleukin (IL)-2 receptor α through sushi domain-dependent protein interactions and that this interaction suppressed the binding of IL-2, an essential cytokine for the effector functions of CD8+ T cells, to IL-2 receptor α. SUSD2 was not expressed on regulatory CD4+ T cells and did not affect the inhibitory function of these cells. Adoptive transfer of Susd2-/- chimeric antigen receptor T cells induced a robust antitumor response in mice, highlighting the potential of SUSD2 as an immunotherapy target for cancer.


Assuntos
Linfócitos T CD8-Positivos , Neoplasias , Animais , Camundongos , Linhagem Celular Tumoral , Imunoterapia/métodos , Camundongos Endogâmicos C57BL , Neoplasias/metabolismo , Receptores de Interleucina-2/metabolismo , Transdução de Sinais , Microambiente Tumoral
2.
Cancer Immunol Immunother ; 73(3): 52, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38349405

RESUMO

INTRODUCTION: As one of the major components of the tumor microenvironment, tumor-associated macrophages (TAMs) possess profound inhibitory activity against T cells and facilitate tumor escape from immune checkpoint blockade therapy. Converting this pro-tumorigenic toward the anti-tumorigenic phenotype thus is an important strategy for enhancing adaptive immunity against cancer. However, a plethora of mechanisms have been described for pro-tumorigenic differentiation in cancer, metabolic switches to program the anti-tumorigenic property of TAMs are elusive. MATERIALS AND METHODS: From an unbiased analysis of single-cell transcriptome data from multiple tumor models, we discovered that anti-tumorigenic TAMs uniquely express elevated levels of a specific fatty acid receptor, G-protein-coupled receptor 84 (GPR84). Genetic ablation of GPR84 in mice leads to impaired pro-inflammatory polarization of macrophages, while enhancing their anti-inflammatory phenotype. By contrast, GPR84 activation by its agonist, 6-n-octylaminouracil (6-OAU), potentiates pro-inflammatory phenotype via the enhanced STAT1 pathway. Moreover, 6-OAU treatment significantly retards tumor growth and increases the anti-tumor efficacy of anti-PD-1 therapy. CONCLUSION: Overall, we report a previously unappreciated fatty acid receptor, GPR84, that serves as an important metabolic sensing switch for orchestrating anti-tumorigenic macrophage polarization. Pharmacological agonists of GPR84 hold promise to reshape and reverse the immunosuppressive TME, and thereby restore responsiveness of cancer to overcome resistance to immune checkpoint blockade.


Assuntos
Inibidores de Checkpoint Imunológico , Imunoterapia , Animais , Camundongos , Carcinogênese , Ácidos Graxos , Macrófagos , Microambiente Tumoral , Macrófagos Associados a Tumor
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34607350

RESUMO

Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method's outputs.


Assuntos
Aprendizado Profundo , Algoritmos , Sequência de Bases , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
4.
Trends Genet ; 36(12): 951-966, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32868128

RESUMO

Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.


Assuntos
Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Análise de Célula Única/métodos , Animais , Humanos
5.
Brief Bioinform ; 22(2): 1639-1655, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32047891

RESUMO

Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Metabolômica/métodos , Metagenômica/métodos , Microbiota , Proteômica/métodos , Transcriptoma , Humanos
6.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33300547

RESUMO

The rapid development of single-cell RNA sequencing (scRNA-Seq) technology provides strong technical support for accurate and efficient analyzing single-cell gene expression data. However, the analysis of scRNA-Seq is accompanied by many obstacles, including dropout events and the curse of dimensionality. Here, we propose the scGMAI, which is a new single-cell Gaussian mixture clustering method based on autoencoder networks and the fast independent component analysis (FastICA). Specifically, scGMAI utilizes autoencoder networks to reconstruct gene expression values from scRNA-Seq data and FastICA is used to reduce the dimensions of reconstructed data. The integration of these computational techniques in scGMAI leads to outperforming results compared to existing tools, including Seurat, in clustering cells from 17 public scRNA-Seq datasets. In summary, scGMAI is an effective tool for accurately clustering and identifying cell types from scRNA-Seq data and shows the great potential of its applicative power in scRNA-Seq data analysis. The source code is available at https://github.com/QUST-AIBBDRC/scGMAI/.


Assuntos
Algoritmos , RNA-Seq , Análise de Célula Única , Software
7.
Bioinformatics ; 38(23): 5322-5325, 2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36250784

RESUMO

MOTIVATION: Gene expression imputation has been an essential step of the single-cell RNA-Seq data analysis workflow. Among several deep-learning methods, the debut of scGNN gained substantial recognition in 2021 for its superior performance and the ability to produce a cell-cell graph. However, the implementation of scGNN was relatively time-consuming and its performance could still be optimized. RESULTS: The implementation of scGNN 2.0 is significantly faster than scGNN thanks to a simplified close-loop architecture. For all eight datasets, cell clustering performance was increased by 85.02% on average in terms of adjusted rand index, and the imputation Median L1 Error was reduced by 67.94% on average. With the built-in visualizations, users can quickly assess the imputation and cell clustering results, compare against benchmarks and interpret the cell-cell interaction. The expanded input and output formats also pave the way for custom workflows that integrate scGNN 2.0 with other scRNA-Seq toolkits on both Python and R platforms. AVAILABILITY AND IMPLEMENTATION: scGNN 2.0 is implemented in Python (as of version 3.8) with the source code available at https://github.com/OSU-BMBL/scGNN2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise de Sequência de RNA/métodos , RNA-Seq , Perfilação da Expressão Gênica/métodos , Software , Análise por Conglomerados , Redes Neurais de Computação
8.
Bioinformatics ; 38(19): 4636-4638, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35997564

RESUMO

MOTIVATION: Transcription factor binding sites (TFBSs) prediction is a crucial step in revealing functions of transcription factors from high-throughput sequencing data. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) provides insight on TFBSs and nucleosome positioning by probing open chromatic, which can simultaneously reveal multiple TFBSs compare to traditional technologies. The existing tools based on convolutional neural network (CNN) only find the fixed length of TFBSs from ATAC-seq data. Graph neural network (GNN) can be considered as the extension of CNN, which has great potential in finding multiple TFBSs with different lengths from ATAC-seq data. RESULTS: We develop a motif predictor called MMGraph based on three-layer GNN and coexisting probability of k-mers for finding multiple motifs from ATAC-seq data. The results of the experiment which has been conducted on 88 ATAC-seq datasets indicate that MMGraph has achieved the best performance on area of eight metrics radar score of 2.31 and could find 207 higher-quality multiple motifs than other existing tools. AVAILABILITY AND IMPLEMENTATION: MMGraph is wrapped in Python package, which is available at https://github.com/zhangsq06/MMGraph.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Cromatina , Redes Neurais de Computação , Probabilidade
9.
J Med Virol ; 95(8): e29060, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37638381

RESUMO

Human Papillomaviruses (HPVs) are associated with around 5%-10% of human cancer, notably nearly 99% of cervical cancer. The mechanisms HPV interacts with stratified epithelium (differentiated layers) during the viral life cycle, and oncogenesis remain unclear. In this study, we used single-cell transcriptome analysis to study viral gene and host cell differentiation-associated heterogeneity of HPV-positive cervical cancer tissue. We examined the HPV16 genes-E1, E6, and E7, and found they expressed differently across nine epithelial clusters. We found that three epithelial clusters had the highest proportion of HPV-positive cells (33.6%, 37.5%, and 32.4%, respectively), while two exhibited the lowest proportions (7.21% and 5.63%, respectively). Notably, the cluster with the most HPV-positive cells deviated significantly from normal epithelial layer markers, exhibiting functional heterogeneity and altered epithelial structuring, indicating that significant molecular heterogeneity existed in cancer tissues and that these cells exhibited unique/different gene signatures compared with normal epithelial cells. These HPV-positive cells, compared to HPV-negative, showed different gene expressions related to the extracellular matrix, cell adhesion, proliferation, and apoptosis. Further, the viral oncogenes E6 and E7 appeared to modify epithelial function via distinct pathways, thus contributing to cervical cancer progression. We investigated the HPV and host transcripts from a novel viewpoint focusing on layer heterogeneity. Our results indicated varied HPV expression across epithelial clusters and epithelial heterogeneity associated with viral oncogenes, contributing biological insights to this critical field of study.


Assuntos
Infecções por Papillomavirus , Neoplasias do Colo do Útero , Humanos , Feminino , Neoplasias do Colo do Útero/genética , Infecções por Papillomavirus/genética , Transcriptoma , Oncogenes , Papillomavirus Humano , Diferenciação Celular
10.
Brief Bioinform ; 21(4): 1196-1208, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31271412

RESUMO

Appropriate ways to measure the similarity between single-cell RNA-sequencing (scRNA-seq) data are ubiquitous in bioinformatics, but using single clustering or classification methods to process scRNA-seq data is generally difficult. This has led to the emergence of integrated methods and tools that aim to automatically process specific problems associated with scRNA-seq data. These approaches have attracted a lot of interest in bioinformatics and related fields. In this paper, we systematically review the integrated methods and tools, highlighting the pros and cons of each approach. We not only pay particular attention to clustering and classification methods but also discuss methods that have emerged recently as powerful alternatives, including nonlinear and linear methods and descending dimension methods. Finally, we focus on clustering and classification methods for scRNA-seq data, in particular, integrated methods, and provide a comprehensive description of scRNA-seq data and download URLs.


Assuntos
Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Conjuntos de Dados como Assunto , Transcriptoma
11.
Nucleic Acids Res ; 48(W1): W275-W286, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32421805

RESUMO

A group of genes controlled as a unit, usually by the same repressor or activator gene, is known as a regulon. The ability to identify active regulons within a specific cell type, i.e., cell-type-specific regulons (CTSR), provides an extraordinary opportunity to pinpoint crucial regulators and target genes responsible for complex diseases. However, the identification of CTSRs from single-cell RNA-Seq (scRNA-Seq) data is computationally challenging. We introduce IRIS3, the first-of-its-kind web server for CTSR inference from scRNA-Seq data for human and mouse. IRIS3 is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified CTSRs. CTSR data can be used to reliably characterize and distinguish the corresponding cell type from others and can be combined with other computational or experimental analyses for biomedical studies. CTSRs can, therefore, aid in the discovery of major regulatory mechanisms and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. The broader impact of IRIS3 includes, but is not limited to, investigation of complex diseases hierarchies and heterogeneity, causal gene regulatory network construction, and drug development. IRIS3 is freely accessible from https://bmbl.bmi.osumc.edu/iris3/ with no login requirement.


Assuntos
RNA-Seq , Regulon , Análise de Célula Única , Software , Animais , Encéfalo/metabolismo , Análise por Conglomerados , Camundongos
12.
Brief Bioinform ; 20(4): 1449-1464, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29490019

RESUMO

Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Mineração de Dados/métodos , Algoritmos , Big Data , Bases de Dados Genéticas/estatística & dados numéricos , Doença/classificação , Doença/genética , Expressão Gênica/efeitos dos fármacos , Perfilação da Expressão Gênica/estatística & dados numéricos , Redes Reguladoras de Genes , Humanos , Anotação de Sequência Molecular/estatística & dados numéricos
13.
Bioinformatics ; 36(4): 1074-1081, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31603468

RESUMO

MOTIVATION: Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. RESULTS: We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8-12.5% and 3.8-9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases. AVAILABILITY AND IMPLEMENTATION: The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Software , Sequência de Aminoácidos , Humanos , Matrizes de Pontuação de Posição Específica , Proteínas
14.
Bioinformatics ; 36(4): 1143-1149, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31503285

RESUMO

MOTIVATION: The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. RESULTS: We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. AVAILABILITY AND IMPLEMENTATION: The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , RNA , Algoritmos , Humanos , Análise de Sequência de RNA , Software
15.
Nucleic Acids Res ; 47(15): 7809-7824, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31372637

RESUMO

The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein-DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein-protein-DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF-DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework.


Assuntos
Biologia Computacional/estatística & dados numéricos , DNA/química , Aprendizado Profundo , Fatores de Transcrição/genética , Sítios de Ligação , Biologia Computacional/métodos , DNA/genética , DNA/metabolismo , Regulação da Expressão Gênica , Humanos , Células K562 , Motivos de Nucleotídeos , Ligação Proteica , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo
16.
Bioinformatics ; 35(21): 4474-4477, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31116375

RESUMO

MOTIVATION: Metagenomic and metatranscriptomic analyses can provide an abundance of information related to microbial communities. However, straightforward analysis of this data does not provide optimal results, with a required integration of data types being needed to thoroughly investigate these microbiomes and their environmental interactions. RESULTS: Here, we present MetaQUBIC, an integrated biclustering-based computational pipeline for gene module detection that integrates both metagenomic and metatranscriptomic data. Additionally, we used this pipeline to investigate 735 paired DNA and RNA human gut microbiome samples, resulting in a comprehensive hybrid gene expression matrix of 2.3 million cross-species genes in the 735 human fecal samples and 155 functional enriched gene modules. We believe both the MetaQUBIC pipeline and the generated comprehensive human gut hybrid expression matrix will facilitate further investigations into multiple levels of microbiome studies. AVAILABILITY AND IMPLEMENTATION: The package is freely available at https://github.com/OSU-BMBL/metaqubic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microbioma Gastrointestinal , Metagenoma , Fezes , Humanos , Metagenômica , Transcriptoma
17.
Bioinformatics ; 35(14): 2395-2402, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30520961

RESUMO

MOTIVATION: The prediction of protein-protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. RESULTS: A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2-15.7% and 6.1-18.9% higher than the other existing tools, respectively. AVAILABILITY AND IMPLEMENTATION: The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software
18.
Molecules ; 23(10)2018 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-30322177

RESUMO

Overlapping structures of protein⁻protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein⁻protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well.


Assuntos
Biologia Computacional/métodos , Escherichia coli/metabolismo , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/metabolismo , Algoritmos , Animais , Proteínas de Escherichia coli/metabolismo , Camundongos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo
20.
Patterns (N Y) ; 5(3): 100927, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38487805

RESUMO

In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa