Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Bioinformatics ; 34(22): 3781-3787, 2018 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-29868708

RESUMEN

Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that function in RNA silencing and post-transcriptional regulation of gene expression by targeting messenger RNAs (mRNAs). Because the underlying mechanisms associated with miRNA binding to mRNA are not fully understood, a major challenge of miRNA studies involves the identification of miRNA-target sites on mRNA. In silico prediction of miRNA-target sites can expedite costly and time-consuming experimental work by providing the most promising miRNA-target-site candidates. Results: In this study, we reported the design and implementation of DeepMirTar, a deep-learning-based approach for accurately predicting human miRNA targets at the site level. The predicted miRNA-target sites are those having canonical or non-canonical seed, and features, including high-level expert-designed, low-level expert-designed and raw-data-level, were used to represent the miRNA-target site. Comparison with other state-of-the-art machine-learning methods and existing miRNA-target-prediction tools indicated that DeepMirTar improved overall predictive performance. Availability and implementation: DeepMirTar is freely available at https://github.com/Bjoux2/DeepMirTar_SdA. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Algoritmos , Biología Computacional , Regulación de la Expresión Génica , Humanos , MicroARNs , Interferencia de ARN , ARN Mensajero
2.
Nucleic Acids Res ; 41(Web Server issue): W441-7, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23729470

RESUMEN

Knowledge of subcellular localizations (SCLs) of plant proteins relates to their functions and aids in understanding the regulation of biological processes at the cellular level. We present PlantLoc, a highly accurate and fast webserver for predicting the multi-label SCLs of plant proteins. The PlantLoc server has two innovative characters: building localization motif libraries by a recursive method without alignment and Gene Ontology information; and establishing simple architecture for rapidly and accurately identifying plant protein SCLs without a machine learning algorithm. PlantLoc provides predicted SCLs results, confidence estimates and which is the substantiality motif and where it is located on the sequence. PlantLoc achieved the highest accuracy (overall accuracy of 80.8%) of identification of plant protein SCLs as benchmarked by using a new test dataset compared other plant SCL prediction webservers. The ability of PlantLoc to predict multiple sites was also significantly higher than for any other webserver. The predicted substantiality motifs of queries also have great potential for analysis of relationships with protein functional regions. The PlantLoc server is available at http://cal.tongji.edu.cn/PlantLoc/.


Asunto(s)
Proteínas de Plantas/química , Señales de Clasificación de Proteína , Programas Informáticos , Secuencias de Aminoácidos , Internet , Proteínas de Plantas/análisis , Análisis de Secuencia de Proteína
3.
Mol Cell Proteomics ; 11(7): M111.016808, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22415040

RESUMEN

Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteínas de la Membrana/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Proteus mirabilis , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
4.
Nucleic Acids Res ; 40(Web Server issue): W298-302, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22553364

RESUMEN

Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.


Asunto(s)
Conformación Proteica , Programas Informáticos , Internet , Alineación de Secuencia , Análisis de Secuencia de Proteína
5.
Bioinformatics ; 28(1): 32-9, 2012 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22065541

RESUMEN

MOTIVATION: The precise prediction of protein secondary structure is of key importance for the prediction of 3D structure and biological function. Although the development of many excellent methods over the last few decades has allowed the achievement of prediction accuracies of up to 80%, progress seems to have reached a bottleneck, and further improvements in accuracy have proven difficult. RESULTS: We propose for the first time a structural position-specific scoring matrix (SPSSM), and establish an unprecedented database of 9 million sequences and their SPSSMs. This database, when combined with a purpose-designed BLAST tool, provides a novel prediction tool: SPSSMPred. When the SPSSMPred was validated on a large dataset (10,814 entries), the Q3 accuracy of the protein secondary structure prediction was 93.4%. Our approach was tested on the two latest EVA sets; accuracies of 82.7 and 82.0% were achieved, far higher than can be achieved using other predictors. For further evaluation, we tested our approach on newly determined sequences (141 entries), and obtained an accuracy of 89.6%. For a set of low-homology proteins (40 entries), the SPSSMPred still achieved a Q3 value of 84.6%. AVAILABILITY: The SPSSMPred server is available at http://cal.tongji.edu.cn/SPSSMPred/ CONTACT: lith@tongji.edu.cn


Asunto(s)
Posición Específica de Matrices de Puntuación , Estructura Secundaria de Proteína , Proteínas/química , Animales , Humanos , Homología de Secuencia de Aminoácido
6.
J Theor Biol ; 308: 135-40, 2012 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-22683368

RESUMEN

The subcellular localization of proteins is closely related to their functions. In this work, we propose a novel approach based on localization motifs to improve the accuracy of predicting subcellular localization of Gram-positive bacterial proteins. Our approach performed well on a five-fold cross validation with an overall success rate of 89.5%. Besides, the overall success rate of an independent testing dataset was 97.7%. Moreover, our approach was tested using a new experimentally-determined set of Gram-positive bacteria proteins and achieved an overall success rate of 96.3%.


Asunto(s)
Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Bacterias Grampositivas/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Modelos Biológicos , Datos de Secuencia Molecular , Transporte de Proteínas , Reproducibilidad de los Resultados , Fracciones Subcelulares/metabolismo
7.
BioData Min ; 10: 1, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28127402

RESUMEN

BACKGROUND: Protein relative solvent accessibility provides insight into understanding protein structure and function. Prediction of protein relative solvent accessibility is often the first stage of predicting other protein properties. Recent predictors of relative solvent accessibility discriminate against exposed regions as compared with buried regions, resulting in higher prediction accuracy associated with buried regions relative to exposed regions. METHODS: Here, we propose a more accurate and balanced predictor of protein relative solvent accessibility. First, we collected known proteins in three subsets according to sequence length and constructed a balanced dataset after reducing redundancy within each subset. Next, we measured the performance associated with different variables and variable combinations to determine the best variable combination. Finally, a predictor called BMRSA was constructed for modelling and prediction, which used the balanced set as the training set, the position- specific scoring matrix, predicted secondary structure, buried-exposed profile, and length of a query sequence as variables, and the conditional random field as the machine-learning method. RESULTS: BMRSA performance on test sets confirmed that our approach improved prediction accuracy relative to state-of-the-art approaches and was balanced in its comparison of buried and exposed regions. Our method is valuable when higher levels of accuracy in predicting exposed-residue states are required. The BMRSA is available at: http://cheminfo.tongji.edu.cn:8080/BMRSA/.

8.
PLoS One ; 10(6): e0128334, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26090958

RESUMEN

UNLABELLED: The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database. AVAILABILITY: The DisoMCS is available at http://cal.tongji.edu.cn/disorder/.


Asunto(s)
Biología Computacional/métodos , Proteínas Intrínsecamente Desordenadas/química , Programas Informáticos , Bases de Datos de Proteínas , Curva ROC , Reproducibilidad de los Resultados , Navegador Web
9.
Talanta ; 115: 548-55, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-24054631

RESUMEN

Rice has played an important role in staple food supply of over approximately one-half of the world population. In this study, Raman spectroscopy and several multivariate data analysis methods were applied for discrimination of rice samples from different districts of China. A total of 42 samples were examined. It is shown that the representative Raman spectra in each group are different according to geographical origin after baseline correction to enhance spectral features. Moreover, adulteration of rice is a serious problem for consumers. In addition to the obvious effect on producer profits, adulteration can also cause severe health and safety problems. Paraffin was added to give the rice a desirable translucent appearance and increase its marketability. Detection of paraffin in the adulterated rice samples was preliminarily investigated as well. The results showed that Raman spectroscopy data with chemometric techniques can be applied to rapid detecting rice adulteration with paraffin.


Asunto(s)
Contaminación de Alimentos/análisis , Oryza/química , Parafina/análisis , Contaminación de Alimentos/prevención & control , Humanos , Análisis Multivariante , Análisis de Componente Principal , Espectrometría Raman , Máquina de Vectores de Soporte
10.
Biochimie ; 95(12): 2460-4, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24056076

RESUMEN

Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.


Asunto(s)
Estructura Secundaria de Proteína , Proteínas/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Posición Específica de Matrices de Puntuación , Alineación de Secuencia
11.
PLoS One ; 8(12): e83532, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24376713

RESUMEN

Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Secuencia de Aminoácidos , Internet , Espectroscopía de Resonancia Magnética , Estructura Secundaria de Proteína
12.
PLoS One ; 8(4): e60559, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23593247

RESUMEN

MOTIVATION: The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved. RESULTS: In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy. AVAILABILITY: The DomHR is available at http://cal.tongji.edu.cn/domain/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Estructura Terciaria de Proteína/genética , Proteínas/química , Proteómica/métodos , Programas Informáticos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Datos de Secuencia Molecular
13.
PLoS One ; 7(11): e48389, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23144872

RESUMEN

MOTIVATION: Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, ß-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. RESULTS: In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.


Asunto(s)
Modelos Moleculares , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Internet , Estructura Secundaria de Proteína , Reproducibilidad de los Resultados
14.
Biochimie ; 94(3): 847-53, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22182488

RESUMEN

Mycobacterium, the most common disease-causing genus, infects billions of people and is notoriously difficult to treat. Understanding the subcellular localization of mycobacterial proteins can provide essential clues for protein function and drug discovery. In this article, we present a novel approach that focuses on local sequence information to identify localization motifs that are generated by a merging algorithm and are selected based on a binomially distributed model. These localization motifs are employed as features for identifying the subcellular localization of mycobacterial proteins. Our approach provides more accurate results than previous methods and was tested on an independent dataset recently obtained from an experimental study to provide a first and reasonably accurate prediction of subcellular localization. Our approach can also be used for large-scale prediction of new protein entries in the UniportKB database and of protein sequences obtained experimentally. In addition, our approach identified many local motifs involved with the subcellular localization that also interact with the environment. Thus, our method may have widespread applications both in the study of the functions of mycobacterial proteins and in the search for a potential vaccine target for designing drugs.


Asunto(s)
Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Mycobacterium/metabolismo , Algoritmos
15.
Talanta ; 83(2): 541-8, 2010 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-21111171

RESUMEN

Non-negative matrix approximation (NNMA) has been used in diverse scientific fields, but it still has some major limitations. In the present study a novel trilinear decomposition method, termed three-way NNMA (TWNNMA), was developed. The method decomposes three-way arrays directly without unfolding and overcomes the restriction of locking zero elements in the deduced multiplicative update rules by adding a positive symmetric matrix. Direct trilinear decomposition was used as the TWNNMA initialization method and experimental results confirm that this greatly accelerated the convergence. An obvious advantage of TWNNMA is the uniqueness of the non-negative solution, which facilitates a better understanding of the underlying physical realities of complex data. TWNNMA was applied in complex systems such as chemical kinetics, second-order calibration and analysis of GC-MS data. The results demonstrate that TWNNMA, differing from previous trilinear decomposition methods, is comparable to existing second-order calibration methods and represents a promising resolution method for complex systems.


Asunto(s)
Cromatografía de Gases y Espectrometría de Masas/métodos , Algoritmos , Calibración , Cromatografía/métodos , Cinética , Modelos Químicos , Modelos Estadísticos , Programas Informáticos , Espectrometría de Fluorescencia/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA