Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 36(10): 3260-3262, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32096820

RESUMO

MOTIVATION: Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. RESULTS: Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. AVAILABILITY AND IMPLEMENTATION: Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Sequências de Repetição em Tandem , Sequência de Aminoácidos , Aprendizado de Máquina , Proteínas/genética , Software
2.
Genome Biol ; 20(1): 244, 2019 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-31744546

RESUMO

BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


Assuntos
Anotação de Sequência Molecular/tendências , Animais , Biofilmes , Candida albicans/genética , Drosophila melanogaster/genética , Genoma Bacteriano , Genoma Fúngico , Humanos , Locomoção , Memória de Longo Prazo , Anotação de Sequência Molecular/métodos , Pseudomonas aeruginosa/genética
3.
Amino Acids ; 51(8): 1187-1200, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31278492

RESUMO

Over the last decade, various machine learning (ML) and statistical approaches for protein-protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php .


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Software , Humanos , Máquina de Vetores de Suporte
4.
Curr Med Chem ; 26(21): 3890-3910, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29446725

RESUMO

BACKGROUND: The significant number of protein-protein interactions (PPIs) discovered by harnessing concomitant advances in the fields of sequencing, crystallography, spectrometry and two-hybrid screening suggests astonishing prospects for remodelling drug discovery. The PPI space which includes up to 650 000 entities is a remarkable reservoir of potential therapeutic targets for every human disease. In order to allow modern drug discovery programs to leverage this, we should be able to discern complete PPI maps associated with a specific disorder and corresponding normal physiology. OBJECTIVE: Here, we will review community available computational programs for predicting PPIs and web-based resources for storing experimentally annotated interactions. METHODS: We compared the capacities of prediction tools: iLoops, Struck2Net, HOMCOS, COTH, PrePPI, InterPreTS and PRISM to predict recently discovered protein interactions. RESULTS: We described sequence-based and structure-based PPI prediction tools and addressed their peculiarities. Additionally, since the usefulness of prediction algorithms critically depends on the quality and quantity of the experimental data they are built on; we extensively discussed community resources for protein interactions. We focused on the active and recently updated primary and secondary PPI databases, repositories specialized to the subject or species, as well as databases that include both experimental and predicted PPIs. CONCLUSION: PPI complexes are the basis of important physiological processes and therefore, possible targets for cell-penetrating ligands. Reliable computational PPI predictions can speed up new target discoveries through prioritization of therapeutically relevant protein-protein complexes for experimental studies.


Assuntos
Biologia Computacional , Mapas de Interação de Proteínas , Proteínas/química , Bases de Dados de Proteínas , Humanos , Ligação Proteica
5.
Sci Rep ; 8(1): 10563, 2018 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-30002402

RESUMO

Intrinsically disordered proteins (IDPs) are characterized by the lack of a fixed tertiary structure and are involved in the regulation of key biological processes via binding to multiple protein partners. IDPs are malleable, adapting to structurally different partners, and this flexibility stems from features encoded in the primary structure. The assumption that universal sequence information will facilitate coverage of the sparse zones of the human interactome motivated us to explore the possibility of predicting protein-protein interactions (PPIs) that involve IDPs based on sequence characteristics. We developed a method that relies on features of the interacting and non-interacting protein pairs and utilizes machine learning to classify and predict IDP PPIs. Consideration of both sequence determinants specific for conformational organizations and the multiplicity of IDP interactions in the training phase ensured a reliable approach that is superior to current state-of-the-art methods. By applying a strict evaluation procedure, we confirm that our method predicts interactions of the IDP of interest even on the proteome-scale. This service is provided as a web tool to expedite the discovery of new interactions and IDP functions with enhanced efficiency.


Assuntos
Proteínas Intrinsicamente Desordenadas/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Sequência de Aminoácidos/fisiologia , Biologia Computacional , Conjuntos de Dados como Assunto , Humanos , Células MCF-7 , Aprendizado de Máquina , Modelos Moleculares , Anotação de Sequência Molecular , Ligação Proteica/fisiologia , Mapas de Interação de Proteínas/fisiologia
6.
Pathog Dis ; 76(4)2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29684116

RESUMO

Pseudomonas aeruginosa has been amongst the top 10 'superbugs' worldwide and is causing infections with poor outcomes in both humans and animals. From 202 P. aeruginosa isolates (n = 121 animal and n = 81 human), 40 were selected on the basis of biofilm-forming ability and were comparatively characterized in terms of virulence determinants to the type strain P. aeruginosa PAO1. Biofilm formation, pyocyanin and hemolysin production, and bacterial motility patterns were compared with the ability to kill human cell line A549 in vitro. On average, there was no significant difference between levels of animal and human cytotoxicity, while human isolates produced higher amounts of pyocyanin, hemolysins and showed increased swimming ability. Non-parametric statistical analysis identified the highest positive correlation between hemolysis and the swarming ability. For the first time an ensemble machine learning approach used on the in vitro virulence data determined the highest relative predictive importance of the submerged biofilm formation for the cytotoxicity, as an indicator of the infection ability. The findings from the in vitro study were validated in vivo using zebrafish (Danio rerio) embryos. This study highlighted no major differences between P. aeruginosa species isolated from animal and human infections and the importance of pyocyanin production in cytotoxicity and infection ability.


Assuntos
Biofilmes/efeitos dos fármacos , Proteínas Hemolisinas/toxicidade , Pseudomonas aeruginosa/patogenicidade , Piocianina/toxicidade , Fatores de Virulência/toxicidade , Células A549 , Animais , Biofilmes/crescimento & desenvolvimento , Sobrevivência Celular/efeitos dos fármacos , Embrião não Mamífero , Expressão Gênica , Proteínas Hemolisinas/biossíntese , Proteínas Hemolisinas/genética , Hemólise/efeitos dos fármacos , Especificidade de Hospedeiro , Humanos , Aprendizado de Máquina , Infecções por Pseudomonas/microbiologia , Infecções por Pseudomonas/patologia , Pseudomonas aeruginosa/crescimento & desenvolvimento , Pseudomonas aeruginosa/metabolismo , Piocianina/biossíntese , Piocianina/genética , Virulência , Fatores de Virulência/biossíntese , Fatores de Virulência/genética , Peixe-Zebra
7.
Bioinformatics ; 33(2): 289-291, 2017 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-27605104

RESUMO

The TRI_tool, a sequence-based web tool for prediction of protein interactions in the human transcriptional regulation, is intended for biomedical investigators who work on understanding the regulation of gene expression. It has an improved predictive performance due to the training on updated, human specific, experimentally validated datasets. The TRI_tool is designed to test up to 100 potential interactions with no time delay and to report both probabilities and binarized predictions. AVAILABILITY AND IMPLEMENTATION: http://www.vin.bg.ac.rs/180/tools/tfpred.php CONTACT: vladaper@vinca.rs; nevenav@vinca.rsSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Ligação Proteica , Software , Transcrição Gênica , Humanos , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA