Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Proteins ; 87(3): 198-211, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30536635

RESUMO

RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.


Assuntos
Biologia Computacional , Proteínas/química , Proteínas de Ligação a RNA/genética , RNA/genética , Sequência de Aminoácidos/genética , Sequência de Bases/genética , Sítios de Ligação/genética , Modelos Moleculares , Ligação Proteica/genética , Conformação Proteica , Proteínas/genética , RNA/química , Motivos de Ligação ao RNA/genética , Proteínas de Ligação a RNA/química , Análise de Sequência de Proteína , Software
2.
Am J Prev Med ; 45(2): 228-36, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23867031

RESUMO

Creative use of new mobile and wearable health information and sensing technologies (mHealth) has the potential to reduce the cost of health care and improve well-being in numerous ways. These applications are being developed in a variety of domains, but rigorous research is needed to examine the potential, as well as the challenges, of utilizing mobile technologies to improve health outcomes. Currently, evidence is sparse for the efficacy of mHealth. Although these technologies may be appealing and seemingly innocuous, research is needed to assess when, where, and for whom mHealth devices, apps, and systems are efficacious. In order to outline an approach to evidence generation in the field of mHealth that would ensure research is conducted on a rigorous empirical and theoretic foundation, on August 16, 2011, researchers gathered for the mHealth Evidence Workshop at NIH. The current paper presents the results of the workshop. Although the discussions at the meeting were cross-cutting, the areas covered can be categorized broadly into three areas: (1) evaluating assessments; (2) evaluating interventions; and (3) reshaping evidence generation using mHealth. This paper brings these concepts together to describe current evaluation standards, discuss future possibilities, and set a grand goal for the emerging field of mHealth research.


Assuntos
Tecnologia Biomédica , Avaliação de Resultados em Cuidados de Saúde , Telemedicina , Tecnologia Biomédica/métodos , Tecnologia Biomédica/normas , Tecnologia Biomédica/tendências , Segurança Computacional , Difusão de Inovações , Previsões , Humanos , Avaliação de Resultados em Cuidados de Saúde/métodos , Avaliação de Resultados em Cuidados de Saúde/organização & administração , Avaliação de Resultados em Cuidados de Saúde/tendências , Melhoria de Qualidade/organização & administração , Qualidade da Assistência à Saúde/normas , Reprodutibilidade dos Testes , Telemedicina/métodos , Telemedicina/normas , Telemedicina/estatística & dados numéricos
3.
BMC Bioinformatics ; 13: 89, 2012 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-22574904

RESUMO

BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.


Assuntos
Inteligência Artificial , Proteínas de Ligação a RNA/química , RNA/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Humanos , Matrizes de Pontuação de Posição Específica , Conformação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de Proteína , Máquina de Vetores de Suporte
4.
BMC Bioinformatics ; 11 Suppl 8: S6, 2010 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-21034431

RESUMO

BACKGROUND: Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing semi-supervised methods for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data. RESULTS: In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data). CONCLUSIONS: The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs.


Assuntos
Biologia Computacional/métodos , Cadeias de Markov , Proteínas/classificação , Proteínas/metabolismo , Frações Subcelulares/metabolismo , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Bases de Dados de Proteínas , Modelos Biológicos , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Proteínas/química , Reprodutibilidade dos Testes , Frações Subcelulares/química
5.
Proc IEEE Int Conf Data Min ; : 68-77, 2010 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-22820557

RESUMO

High accuracy sequence classification often requires the use of higher order Markov models (MMs). However, the number of MM parameters increases exponentially with the range of direct dependencies between sequence elements, thereby increasing the risk of overfitting when the data set is limited in size. We present abstraction augmented Markov models (AAMMs) that effectively reduce the number of numeric parameters of k(th) order MMs by successively grouping strings of length k (i.e., k-grams) into abstraction hierarchies. We evaluate AAMMs on three protein subcellular localization prediction tasks. The results of our experiments show that abstraction makes it possible to construct predictive models that use significantly smaller number of features (by one to three orders of magnitude) as compared to MMs. AAMMs are competitive with and, in some cases, significantly outperform MMs. Moreover, the results show that AAMMs often perform significantly better than variable order Markov models, such as decomposed context tree weighting, prediction by partial match, and probabilistic suffix trees.

6.
J Artif Intell Res ; 35(1): 449-484, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-22822297

RESUMO

We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearl's well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearl's theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.

7.
Int J Hybrid Intell Syst ; 1(1-2): 80-89, 2004 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-20351798

RESUMO

This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity; The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA