Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Mol Cell Proteomics ; 11(7): M111.016808, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22415040

RESUMEN

Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteínas de la Membrana/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Proteus mirabilis , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
2.
BMC Bioinformatics ; 11: 109, 2010 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-20187963

RESUMEN

BACKGROUND: Recent advances in proteomics technologies such as SELDI-TOF mass spectrometry has shown promise in the detection of early stage cancers. However, dimensionality reduction and classification are considerable challenges in statistical machine learning. We therefore propose a novel approach for dimensionality reduction and tested it using published high-resolution SELDI-TOF data for ovarian cancer. RESULTS: We propose a method based on statistical moments to reduce feature dimensions. After refining and t-testing, SELDI-TOF data are divided into several intervals. Four statistical moments (mean, variance, skewness and kurtosis) are calculated for each interval and are used as representative variables. The high dimensionality of the data can thus be rapidly reduced. To improve efficiency and classification performance, the data are further used in kernel PLS models. The method achieved average sensitivity of 0.9950, specificity of 0.9916, accuracy of 0.9935 and a correlation coefficient of 0.9869 for 100 five-fold cross validations. Furthermore, only one control was misclassified in leave-one-out cross validation. CONCLUSION: The proposed method is suitable for analyzing high-throughput proteomics data.


Asunto(s)
Neoplasias Ováricas/clasificación , Proteómica/métodos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Biomarcadores de Tumor/análisis , Femenino , Perfilación de la Expresión Génica , Humanos
3.
Biochimie ; 95(2): 354-8, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23116714

RESUMEN

Protein-DNA interactions are involved in many biological processes essential for gene expression and regulation. To understand the molecular mechanisms of protein-DNA recognition, it is crucial to analyze and identify DNA-binding residues of protein-DNA complexes. Here, we proposed a novel descriptor shape string and another two related features shape string PSSM and shape string pair composition to characterize DNA-binding residues. We employed the new features and the position-specific scoring matrix (PSSM) for modeling and prediction. The results of a benchmark dataset showed that our approach significantly improved the accuracy of the predictor. The overall accuracy of our approach reached 85.86% with 85.02% sensitivity and 86.02% specificity. The results also demonstrated that shape string is a powerful descriptor for the prediction of DNA-binding residues. The additional two related features enhanced the predictive value.


Asunto(s)
Algoritmos , ADN/química , Posición Específica de Matrices de Puntuación , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Sitios de Unión , Bases de Datos de Proteínas , Modelos Moleculares , Datos de Secuencia Molecular , Unión Proteica , Dominios y Motivos de Interacción de Proteínas , Sensibilidad y Especificidad
4.
PLoS One ; 8(4): e60559, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23593247

RESUMEN

MOTIVATION: The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved. RESULTS: In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy. AVAILABILITY: The DomHR is available at http://cal.tongji.edu.cn/domain/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Estructura Terciaria de Proteína/genética , Proteínas/química , Proteómica/métodos , Programas Informáticos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Datos de Secuencia Molecular
5.
Biochimie ; 94(3): 847-53, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22182488

RESUMEN

Mycobacterium, the most common disease-causing genus, infects billions of people and is notoriously difficult to treat. Understanding the subcellular localization of mycobacterial proteins can provide essential clues for protein function and drug discovery. In this article, we present a novel approach that focuses on local sequence information to identify localization motifs that are generated by a merging algorithm and are selected based on a binomially distributed model. These localization motifs are employed as features for identifying the subcellular localization of mycobacterial proteins. Our approach provides more accurate results than previous methods and was tested on an independent dataset recently obtained from an experimental study to provide a first and reasonably accurate prediction of subcellular localization. Our approach can also be used for large-scale prediction of new protein entries in the UniportKB database and of protein sequences obtained experimentally. In addition, our approach identified many local motifs involved with the subcellular localization that also interact with the environment. Thus, our method may have widespread applications both in the study of the functions of mycobacterial proteins and in the search for a potential vaccine target for designing drugs.


Asunto(s)
Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Mycobacterium/metabolismo , Algoritmos
6.
J Bioinform Comput Biol ; 8 Suppl 1: 147-60, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21155025

RESUMEN

Cancer diagnosis depending on microarray technology has drawn more and more attention in the past few years. Accurate and fast diagnosis results make gene expression profiling produced from microarray widely used by a large range of researchers. Much research work highlights the importance of gene selection and gains good results. However, the minimum sets of genes derived from different methods are seldom overlapping and often inconsistent even for the same set of data, partially because of the complexity of cancer disease. In this paper, cancer classification was attempted in an alternative way of the whole gene expression profile for all samples instead of partial gene sets. Here, the three common sets of data were tested by NIPALS-KPLS method for acute leukemia, prostate cancer and lung cancer respectively. Compared to other conventional methods, the results showed wide improvement in classification accuracy. This paper indicates that sample profile of gene expression may be explored as a better indicator for cancer classification, which deserves further investigation.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Neoplasias/clasificación , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Algoritmos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Análisis Discriminante , Femenino , Humanos , Análisis de los Mínimos Cuadrados , Leucemia/clasificación , Leucemia/diagnóstico , Leucemia/genética , Neoplasias Pulmonares/clasificación , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Masculino , Neoplasias/diagnóstico , Neoplasias de la Próstata/clasificación , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/genética
7.
Talanta ; 66(1): 65-73, 2005 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-18969963

RESUMEN

Non-negative matrix factorization (NMF), with the constraints of non-negativity, has been recently proposed for multi-variate data analysis. Because it allows only additive, not subtractive, combinations of the original data, NMF is capable of producing region or parts-based representation of objects. It has been used for image analysis and text processing. Unlike PCA, the resolutions of NMF are non-negative and can be easily interpreted and understood directly. Due to multiple solutions, the original algorithm of NMF [D.D. Lee, H.S. Seung, Nature 401 (1999) 788] is not suitable for resolving chemical mixed signals. In reality, NMF has never been applied to resolving chemical mixed signals. It must be modified according to the characteristics of the chemical signals, such as smoothness of spectra, unimodality of chromatograms, sparseness of mass spectra, etc. We have used the modified NMF algorithm to narrow the feasible solution region for resolving chemical signals, and found that it could produce reasonable and acceptable results for certain experimental errors, especially for overlapping chromatograms and sparse mass spectra. Simulated two-dimensional (2-D) data and real GUJINGGONG alcohol liquor GC-MS data have been resolved soundly by NMF technique. Butyl caproate and its isomeric compound (butyric acid, hexyl ester) have been identified from the overlapping spectra. The result of NMF is preferable to that of Heuristic evolving latent projections (HELP). It shows that NMF is a promising chemometric resolution method for complex samples.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA