Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Mais filtros

Base de dados
Intervalo de ano de publicação
Nucleic Acids Res ; 46(D1): D393-D398, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29036676


CRISPR-Cas is a tool that is widely used for gene editing. However, unexpected off-target effects may occur as a result of long-term nuclease activity. Anti-CRISPR proteins, which are powerful molecules that inhibit the CRISPR-Cas system, may have the potential to promote better utilization of the CRISPR-Cas system in gene editing, especially for gene therapy. Additionally, more in-depth research on these proteins would help researchers to better understand the co-evolution of bacteria and phages. Therefore, it is necessary to collect and integrate data on various types of anti-CRISPRs. Herein, data on these proteins were manually gathered through data screening of the literatures. Then, the first online resource, anti-CRISPRdb, was constructed for effectively organizing these proteins. It contains the available protein sequences, DNA sequences, coding regions, source organisms, taxonomy, virulence, protein interactors and their corresponding three-dimensional structures. Users can access our database at without registration. We believe that the anti-CRISPRdb can be used as a resource to facilitate research on anti-CRISPR proteins and in related fields.

Bacteriófagos/fisiologia , Sistemas CRISPR-Cas , Bases de Dados de Proteínas , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo
Biomed Res Int ; 2016: 7639397, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27660763


Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus, which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

Mol Biosyst ; 12(9): 2893-900, 2016 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-27410247


Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by ∼6.6%, ∼3.2%, and ∼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from . In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.

Biologia Computacional/métodos , DNA/química , DNA/genética , Nucleotídeos , Algoritmos , Genoma Fúngico , Curva ROC , Recombinação Genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Máquina de Vetores de Suporte , Navegador