Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 36(10): 3260-3262, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32096820

RESUMO

MOTIVATION: Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. RESULTS: Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. AVAILABILITY AND IMPLEMENTATION: Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Sequências de Repetição em Tandem , Sequência de Aminoácidos , Aprendizado de Máquina , Proteínas/genética , Software
2.
Bioinformatics ; 32(13): 1952-8, 2016 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153701

RESUMO

MOTIVATION: Tandem Repeats (TRs) are abundant in proteins, having a variety of fundamental functions. In many cases, evolution has blurred their repetitive patterns. This leads to the problem of distinguishing between sequences that contain highly imperfect TRs, and the sequences without TRs. The 3D structure of proteins can be used as a benchmarking criterion for TR detection in sequences, because the vast majority of proteins having TRs in sequences are built of repetitive 3D structural blocks. According to our benchmark, none of the existing scoring methods are able to clearly distinguish, based on the sequence analysis, between structures with and without 3D TRs. RESULTS: We developed a scoring tool called Tally, which is based on a machine learning approach. Tally is able to achieve a better separation between sequences with structural TRs and sequences of aperiodic structures, than existing scoring procedures. It performs at a level of 81% sensitivity, while achieving a high specificity of 74% and an Area Under the Receiver Operating Characteristic Curve of 86%. Tally can be used to select a set of structurally and functionally meaningful TRs from all TRs detected in proteomes. The generated dataset is available for benchmarking purposes. AVAILABILITY AND IMPLEMENTATION: Source code is available upon request. Tool and dataset can be accessed through our website: http://bioinfo.montp.cnrs.fr/?r=Tally CONTACT: andrey.kajava@crbm.cnrs.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Sequências de Repetição em Tandem , Algoritmos , Sequência de Aminoácidos , Aprendizado de Máquina , Proteoma
3.
Biochem Soc Trans ; 43(5): 807-11, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26517886

RESUMO

Tandem repeats (TRs) are frequently not perfect, containing a number of mutations accumulated during evolution. One of the main problems is to distinguish between the sequences that contain highly imperfect TRs and the aperiodic sequences. The majority of proteins with TRs in sequences have repetitive arrangements in their 3D structures. Therefore, the 3D structures of proteins can be used as a benchmarking criterion for TR detection in sequences. Different TR detection tools use their own scoring procedures to determine the boundary between repetitive and non-repetitive protein sequences. Here we described these scoring functions and benchmark them by using known structural TRs. Our survey shows that none of the existing scoring procedures are able to achieve an appropriate separation between genuine structural TRs and non-TR regions. This suggests that if we want to obtain a collection of structurally and functionally meaningful TRs from a large scale analysis of proteomes, the TR scoring metrics need to be improved.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteômica/métodos , Sequências Repetitivas de Aminoácidos , Sequências de Repetição em Tandem , Algoritmos , Animais , Biomarcadores , Humanos , Dobramento de Proteína , Estabilidade Proteica
4.
J Struct Biol ; 186(3): 386-91, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24681324

RESUMO

The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Sequências Repetitivas de Aminoácidos , Análise de Sequência de Proteína/métodos , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA