Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nat Immunol ; 18(11): 1249-1260, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-28892471

RESUMEN

Interleukin 2 (IL-2) promotes Foxp3+ regulatory T (Treg) cell responses, but inhibits T follicular helper (TFH) cell development. However, it is not clear how IL-2 affects T follicular regulatory (TFR) cells, a cell type with properties of both Treg and TFH cells. Using an influenza infection model, we found that high IL-2 concentrations at the peak of the infection prevented TFR cell development by a Blimp-1-dependent mechanism. However, once the immune response resolved, some Treg cells downregulated CD25, upregulated Bcl-6 and differentiated into TFR cells, which then migrated into the B cell follicles to prevent the expansion of self-reactive B cell clones. Thus, unlike its effects on conventional Treg cells, IL-2 inhibits TFR cell responses.


Asunto(s)
Interleucina-2/farmacología , Infecciones por Orthomyxoviridae/inmunología , Orthomyxoviridae/inmunología , Linfocitos T Colaboradores-Inductores/efectos de los fármacos , Linfocitos T Reguladores/efectos de los fármacos , Animales , Movimiento Celular/genética , Movimiento Celular/inmunología , Factores de Transcripción Forkhead/genética , Factores de Transcripción Forkhead/inmunología , Factores de Transcripción Forkhead/metabolismo , Perfilación de la Expresión Génica/métodos , Interacciones Huésped-Patógeno/efectos de los fármacos , Interacciones Huésped-Patógeno/inmunología , Interleucina-2/administración & dosificación , Interleucina-2/metabolismo , Subunidad alfa del Receptor de Interleucina-2/genética , Subunidad alfa del Receptor de Interleucina-2/metabolismo , Ratones Endogámicos C57BL , Ratones Noqueados , Ratones Transgénicos , Orthomyxoviridae/fisiología , Infecciones por Orthomyxoviridae/metabolismo , Infecciones por Orthomyxoviridae/virología , Factor 1 de Unión al Dominio 1 de Regulación Positiva , Proteínas Proto-Oncogénicas c-bcl-6/genética , Proteínas Proto-Oncogénicas c-bcl-6/metabolismo , Linfocitos T Colaboradores-Inductores/inmunología , Linfocitos T Colaboradores-Inductores/metabolismo , Linfocitos T Reguladores/inmunología , Linfocitos T Reguladores/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/inmunología , Factores de Transcripción/metabolismo
2.
Immunity ; 50(1): 225-240.e4, 2019 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-30635238

RESUMEN

Infants have a higher risk of developing allergic asthma than adults. However, the underlying mechanism remains unknown. We show here that sensitization of mice with house-dust mites (HDMs) in the presence of low-dose lipopolysaccharide (LPS) prevented T helper 2 (Th2) cell allergic responses in adult, but not infant, mice. Mechanistically, adult CD11b+ migratory dendritic cells (mDCs) upregulated the transcription factor T-bet in response to tumor necrosis factor-α (TNF-α), which was rapidly induced after HDM + LPS sensitization. Consequently, adult CD11b+ mDCs produced interleukin-12 (IL-12), which prevented Th2 cell development by promoting T-bet upregulation in responding T cells. Conversely, infants failed to induce TNF-α after HDM + LPS sensitization. Therefore, CD11b+ mDCs failed to upregulate T-bet and did not secrete IL-12 and Th2 cell responses normally developed in infant mice. Thus, the availability of TNF-α dictates the ability of CD11b+ mDCs to suppress allergic Th2-cell responses upon dose-dependent endotoxin sensitization and is a key mediator governing susceptibility to allergic airway inflammation in infant mice.


Asunto(s)
Células Dendríticas/fisiología , Hipersensibilidad/inmunología , Inflamación/inmunología , Células Th2/inmunología , Factor de Necrosis Tumoral alfa/metabolismo , Adulto , Animales , Animales Recién Nacidos , Antígenos Dermatofagoides , Diferenciación Celular , Humanos , Inmunización , Lactante , Lipopolisacáridos/inmunología , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Pyroglyphidae/inmunología , Proteínas de Dominio T Box/metabolismo
3.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36617463

RESUMEN

DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.


Asunto(s)
Inteligencia Artificial , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Transcriptoma , Análisis de Secuencia de ARN/métodos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN/genética
4.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32599617

RESUMEN

Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user's viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.


Asunto(s)
Bacterias , Proteínas Bacterianas/genética , Bases de Datos de Proteínas , Aprendizaje Profundo , Genoma Bacteriano , Factores de Virulencia/genética , Bacterias/genética , Bacterias/patogenicidad
5.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33212503

RESUMEN

Beta-lactamases (BLs) are enzymes localized in the periplasmic space of bacterial pathogens, where they confer resistance to beta-lactam antibiotics. Experimental identification of BLs is costly yet crucial to understand beta-lactam resistance mechanisms. To address this issue, we present DeepBL, a deep learning-based approach by incorporating sequence-derived features to enable high-throughput prediction of BLs. Specifically, DeepBL is implemented based on the Small VGGNet architecture and the TensorFlow deep learning library. Furthermore, the performance of DeepBL models is investigated in relation to the sequence redundancy level and negative sample selection in the benchmark dataset. The models are trained on datasets of varying sequence redundancy thresholds, and the model performance is evaluated by extensive benchmarking tests. Using the optimized DeepBL model, we perform proteome-wide screening for all reviewed bacterium protein sequences available from the UniProt database. These results are freely accessible at the DeepBL webserver at http://deepbl.erc.monash.edu.au/.


Asunto(s)
Biología Computacional , Bases de Datos de Proteínas , Aprendizaje Profundo , Proteoma , Programas Informáticos , beta-Lactamasas/genética
6.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33774670

RESUMEN

Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Proteínas Citotóxicas Formadoras de Poros/farmacología , Bacterias/clasificación , Bacterias/efectos de los fármacos , Biopelículas/efectos de los fármacos , Biopelículas/crecimiento & desarrollo , Bases de Datos Factuales , Hongos/clasificación , Hongos/efectos de los fármacos , Proteínas Citotóxicas Formadoras de Poros/clasificación , Proteínas Citotóxicas Formadoras de Poros/metabolismo , Máquina de Vectores de Soporte , Virus/efectos de los fármacos
7.
Nucleic Acids Res ; 49(D1): D651-D659, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33084862

RESUMEN

Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or directly into neighboring cells. These substrates are proteins that function to promote bacterial survival: by facilitating nutrient collection, disabling competitor species or, for pathogens, to disable host defenses. Following a rapid development of computational techniques, a growing number of substrates have been discovered and subsequently validated by wet lab experiments. To date, several online databases have been developed to catalogue these substrates but they have limited user options for in-depth analysis, and typically focus on a single type of secreted substrate. We therefore developed a universal platform, BastionHub, that incorporates extensive functional modules to facilitate substrate analysis and integrates the five major Gram-negative secreted substrate types (i.e. from types I-IV and VI secretion systems). To our knowledge, BastionHub is not only the most comprehensive online database available, it is also the first to incorporate substrates secreted by type I or type II secretion systems. By providing the most up-to-date details of secreted substrates and state-of-the-art prediction and visualized relationship analysis tools, BastionHub will be an important platform that can assist biologists in uncovering novel substrates and formulating new hypotheses. BastionHub is freely available at http://bastionhub.erc.monash.edu/.


Asunto(s)
Bases de Datos como Asunto , Bacterias Gramnegativas/metabolismo , Curaduría de Datos , Anotación de Secuencia Molecular , Especificidad por Sustrato
8.
Brief Bioinform ; 21(4): 1119-1135, 2020 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31204427

RESUMEN

Human leukocyte antigen class I (HLA-I) molecules are encoded by major histocompatibility complex (MHC) class I loci in humans. The binding and interaction between HLA-I molecules and intracellular peptides derived from a variety of proteolytic mechanisms play a crucial role in subsequent T-cell recognition of target cells and the specificity of the immune response. In this context, tools that predict the likelihood for a peptide to bind to specific HLA class I allotypes are important for selecting the most promising antigenic targets for immunotherapy. In this article, we comprehensively review a variety of currently available tools for predicting the binding of peptides to a selection of HLA-I allomorphs. Specifically, we compare their calculation methods for the prediction score, employed algorithms, evaluation strategies and software functionalities. In addition, we have evaluated the prediction performance of the reviewed tools based on an independent validation data set, containing 21 101 experimentally verified ligands across 19 HLA-I allotypes. The benchmarking results show that MixMHCpred 2.0.1 achieves the best performance for predicting peptides binding to most of the HLA-I allomorphs studied, while NetMHCpan 4.0 and NetMHCcons 1.1 outperform the other machine learning-based and consensus-based tools, respectively. Importantly, it should be noted that a peptide predicted with a higher binding score for a specific HLA allotype does not necessarily imply it will be immunogenic. That said, peptide-binding predictors are still very useful in that they can help to significantly reduce the large number of epitope candidates that need to be experimentally verified. Several other factors, including susceptibility to proteasome cleavage, peptide transport into the endoplasmic reticulum and T-cell receptor repertoire, also contribute to the immunogenicity of peptide antigens, and some of them can be considered by some predictors. Therefore, integrating features derived from these additional factors together with HLA-binding properties by using machine-learning algorithms may increase the prediction accuracy of immunogenic peptides. As such, we anticipate that this review and benchmarking survey will assist researchers in selecting appropriate prediction tools that best suit their purposes and provide useful guidelines for the development of improved antigen predictors in the future.


Asunto(s)
Biología Computacional/métodos , Antígenos de Histocompatibilidad Clase I/metabolismo , Algoritmos , Conjuntos de Datos como Asunto , Antígenos de Histocompatibilidad Clase I/química , Humanos , Aprendizaje Automático , Reproducibilidad de los Resultados
9.
Brief Bioinform ; 21(3): 1069-1079, 2020 05 21.
Artículo en Inglés | MEDLINE | ID: mdl-31161204

RESUMEN

Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.


Asunto(s)
Bases de Datos de Proteínas , Mutación , Procesamiento Proteico-Postraduccional , Proteínas/química , Conformación Proteica
10.
Brief Bioinform ; 21(3): 1047-1057, 2020 05 21.
Artículo en Inglés | MEDLINE | ID: mdl-31067315

RESUMEN

With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.


Asunto(s)
ADN/química , Aprendizaje Automático , Proteínas/química , ARN/química , Análisis de Secuencia/métodos , Algoritmos , Internet
11.
Brief Bioinform ; 20(6): 2185-2199, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30351377

RESUMEN

As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.


Asunto(s)
Biología Computacional , Lisina/metabolismo , Aprendizaje Automático , Malonatos/metabolismo , Animales , Humanos
12.
Brief Bioinform ; 20(6): 2150-2166, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30184176

RESUMEN

The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.


Asunto(s)
Benchmarking , Biología Computacional , Péptido Hidrolasas/metabolismo , Investigación , Algoritmos , Aprendizaje Automático , Especificidad por Sustrato
13.
Bioinformatics ; 36(3): 704-712, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31393553

RESUMEN

MOTIVATION: Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, 'non-classical' secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of 'non-classical' secreted proteins from sequence data. RESULTS: In this work, we first constructed a high-quality dataset of experimentally verified 'non-classical' secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew's correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users' demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors. AVAILABILITY AND IMPLEMENTATION: http://pengaroo.erc.monash.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Aprendizaje Automático , Biología Computacional , Péptidos , Proteínas
14.
Bioinformatics ; 35(17): 2957-2965, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30649179

RESUMEN

MOTIVATION: Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS: In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION: The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Regiones Promotoras Genéticas , Programas Informáticos , Teorema de Bayes , Sitio de Iniciación de la Transcripción
15.
Bioinformatics ; 35(12): 2017-2028, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-30388198

RESUMEN

MOTIVATION: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. RESULTS: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. AVAILABILITY AND IMPLEMENTATION: http://bastion3.erc.monash.edu/. CONTACT: selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Algoritmos , Secuencia de Aminoácidos , Proteínas Bacterianas , Biología Computacional , Bacterias Gramnegativas , Programas Informáticos
16.
Bioinformatics ; 34(24): 4223-4231, 2018 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-29947803

RESUMEN

Motivation: Kinase-regulated phosphorylation is a ubiquitous type of post-translational modification (PTM) in both eukaryotic and prokaryotic cells. Phosphorylation plays fundamental roles in many signalling pathways and biological processes, such as protein degradation and protein-protein interactions. Experimental studies have revealed that signalling defects caused by aberrant phosphorylation are highly associated with a variety of human diseases, especially cancers. In light of this, a number of computational methods aiming to accurately predict protein kinase family-specific or kinase-specific phosphorylation sites have been established, thereby facilitating phosphoproteomic data analysis. Results: In this work, we present Quokka, a novel bioinformatics tool that allows users to rapidly and accurately identify human kinase family-regulated phosphorylation sites. Quokka was developed by using a variety of sequence scoring functions combined with an optimized logistic regression algorithm. We evaluated Quokka based on well-prepared up-to-date benchmark and independent test datasets, curated from the Phospho.ELM and UniProt databases, respectively. The independent test demonstrates that Quokka improves the prediction performance compared with state-of-the-art computational tools for phosphorylation prediction. In summary, our tool provides users with high-quality predicted human phosphorylation sites for hypothesis generation and biological validation. Availability and implementation: The Quokka webserver and datasets are freely available at http://quokka.erc.monash.edu/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Proteoma , Proteómica , Animales , Humanos , Fosforilación , Procesamiento Proteico-Postraduccional , Proteoma/metabolismo , Proteómica/métodos
17.
Bioinformatics ; 34(15): 2546-2555, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-29547915

RESUMEN

Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs. Availability and implementation: http://bastion6.erc.monash.edu/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas Bacterianas/metabolismo , Bacterias Gramnegativas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Sistemas de Secreción Tipo VI/metabolismo , Secuencia de Aminoácidos , Proteínas Bacterianas/química , Biología Computacional/métodos , Internet , Aprendizaje Automático , Análisis de Secuencia de ADN/métodos , Sistemas de Secreción Tipo VI/química
18.
Bioinformatics ; 34(14): 2499-2502, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29528364

RESUMEN

Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. Availability and implementation: http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Anotación de Secuencia Molecular , Péptidos/metabolismo , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Aprendizaje Automático , Péptidos/química , Péptidos/fisiología , Conformación Proteica , Proteínas/química , Proteínas/fisiología
19.
Bioinformatics ; 34(4): 684-687, 2018 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-29069280

RESUMEN

Summary: Proteases are enzymes that specifically cleave the peptide backbone of their target proteins. As an important type of irreversible post-translational modification, protein cleavage underlies many key physiological processes. When dysregulated, proteases' actions are associated with numerous diseases. Many proteases are highly specific, cleaving only those target substrates that present certain particular amino acid sequence patterns. Therefore, tools that successfully identify potential target substrates for proteases may also identify previously unknown, physiologically relevant cleavage sites, thus providing insights into biological processes and guiding hypothesis-driven experiments aimed at verifying protease-substrate interaction. In this work, we present PROSPERous, a tool for rapid in silico prediction of protease-specific cleavage sites in substrate sequences. Our tool is based on logistic regression models and uses different scoring functions and their pairwise combinations to subsequently predict potential cleavage sites. PROSPERous represents a state-of-the-art tool that enables fast, accurate and high-throughput prediction of substrate cleavage sites for 90 proteases. Availability and implementation: http://prosperous.erc.monash.edu/. Contact: jiangning.song@monash.edu or geoff.webb@monash.edu or r.pike@latrobe.edu.au. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptido Hidrolasas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Biología Computacional/métodos , Simulación por Computador , Exactitud de los Datos , Proteolisis , Especificidad por Sustrato
20.
Bioinformatics ; 33(17): 2756-2758, 2017 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-28903538

RESUMEN

SUMMARY: Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of descriptors. Here, we present POSSUM ( Po sition- S pecific S coring matrix-based feat u re generator for m achine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM-based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to facilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research. AVAILABILITY AND IMPLEMENTATION: http://possum.erc.monash.edu/ . CONTACT: trevor.lithgow@monash.edu or jiangning.song@monash.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Posición Específica de Matrices de Puntuación , Análisis de Secuencia de Proteína/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA