Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 69
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2377: 423-430, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34709630

RESUMO

Computational tool composites alternative way to identify essential genes and it is low-cost and time-efficient. Based on experimental essentiality sets deposited in the databases DEG and OGEE as reference, we developed an automatically computational tool named Geptop to select essential genes from the set of protein-coding genes in a prokaryotic genome, which utilizes the strategy of reciprocally best hit for homology search and evolutionary distance for weight assigning. The latest version of Geptop is 2.0 ( http://guolab.whu.edu.cn/geptop ), which can predict gene essentiality with the mean AUC 0f 0.84 in prokaryotes and is more stable. The chapter is to briefly introduce the tool and tell how to use it.


Assuntos
Genes Essenciais , Células Procarióticas , Biologia Computacional , Genes Essenciais/genética , Genoma Bacteriano
2.
Interdiscip Sci ; 2021 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-34817803

RESUMO

In 2002, our research group observed a gene clustering pattern based on the base frequency of A versus T at the second codon position in the genome of Vibrio cholera and found that the functional category distribution of genes in the two clusters was different. With the availability of a large number of sequenced genomes, we performed a systematic investigation of A2-T2 distribution and found that 2694 out of 2764 prokaryotic genomes have an optimal clustering number of two, indicating a consistent pattern. Analysis of the functional categories of the coding genes in each cluster in 1483 prokaryotic genomes indicated, that 99.33% of the genomes exhibited a significant difference (p < 0.01) in function distribution between the two clusters. Specifically, functional category P was overrepresented in the small cluster of 98.65% of genomes, whereas categories J, K, and L were overrepresented in the larger cluster of over 98.52% of genomes. Lineage analysis uncovered that these preferences appear consistently across all phyla. Overall, our work revealed an almost universal clustering pattern based on the relative frequency of A2 versus T2 and its role in functional category preference. These findings will promote the understanding of the rationality of theoretical prediction of functional classes of genes from their nucleotide sequences and how protein function is determined by DNA sequence.

3.
Comput Struct Biotechnol J ; 19: 4042-4048, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34527183

RESUMO

Studies on codon property would deepen our understanding of the origin of primitive life and enlighten biotechnical application. Here, we proposed a quantitative measurement of codon-amino acid association and found that seven out of 13 physicochemical properties have stronger associations with the nucleotide identity at the second codon position, indicating that protein structure and function may associate more closely with it than the other two sites. When extending the effect of codon-amino acid association to protein level, it was found that the correlation between the second codon position (measured by the relative frequencies of nucleobase T and A at this codon site) and hydrophobicity (by the form of GRAVY value) became stronger with 96% genomes having R > 0.90 and p < 1e-60. Furthermore, we revealed that informational genes encoding proteins have lower GRAVY values than operational proteins (p < 3e-37) in both prokaryotic and eukaryotic genomes. The above results reveal a complete link from codon identity (A2 versus T2) to amino acid property (hydrophilic versus hydrophobic) and then to protein functions (informational versus operational). Hence, our work may help to understand how the nucleotide sequence determines protein function.

4.
Front Microbiol ; 12: 593979, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33552037

RESUMO

Synthetic biology seeks to create new biological parts, devices, and systems, and to reconfigure existing natural biological systems for custom-designed purposes. The standardized BioBrick parts are the foundation of synthetic biology. The incomplete and flawed metadata of BioBrick parts, however, are a major obstacle for designing genetic circuit easily, quickly, and accurately. Here, a database termed BioMaster http://www.biomaster-uestc.cn was developed to extensively complement information about BioBrick parts, which includes 47,934 items of BioBrick parts from the international Genetically Engineered Machine (iGEM) Registry with more comprehensive information integrated from 10 databases, providing corresponding information about functions, activities, interactions, and related literature. Moreover, BioMaster is also a user-friendly platform for retrieval and analyses of relevant information on BioBrick parts.

5.
Database (Oxford) ; 20202020 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-33306800

RESUMO

Essential genes are key elements for organisms to maintain their living. Building databases that store essential genes in the form of homologous clusters, rather than storing them as a singleton, can provide more enlightening information such as the general essentiality of homologous genes in multiple organisms. In 2013, the first database to store prokaryotic essential genes in clusters, CEG (Clusters of Essential Genes), was constructed. Afterward, the amount of available data for essential genes increased by a factor >3 since the last revision. Herein, we updated CEG to version 2, including more prokaryotic essential genes (from 16 gene datasets to 29 gene datasets) and newly added eukaryotic essential genes (nine species), specifically the human essential genes of 12 cancer cell lines. For prokaryotes, information associated with drug targets, such as protein structure, ligand-protein interaction, virulence factor and matched drugs, is also provided. Finally, we provided the service of essential gene prediction for both prokaryotes and eukaryotes. We hope our updated database will benefit more researchers in drug targets and evolutionary genomics. Database URL: http://cefg.uestc.cn/ceg.


Assuntos
Eucariotos , Genes Essenciais , Bases de Dados Factuais , Genes Essenciais/genética , Genômica , Humanos , Proteínas
6.
FEBS Lett ; 593(18): 2646-2654, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31260103

RESUMO

In prokaryotes, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated protein (Cas) systems constitute adaptive immune systems against mobile genetic elements (MGEs). Here, we introduce the Markov cluster algorithm (MCL) to Makarova et al.'s method in order to select a more reasonable profile. Additionally, our new Maximum Continuous Cas Subcluster (MCCS) method helps identification of tightly clustered loci. The comparison with two other commonly used programs shows that the method could identify Cas proteins with higher accuracy and lower Additional Prediction Rate (APR). Moreover, we developed a web-based server, CasLocusAnno (http://cefg.uestc.cn/CasLocusAnno), capable of annotating Cas proteins, cas loci and their (sub)types less than ~ 28 s following the whole proteome sequence submission. Its standalone version can be downloaded at https://github.com/RiversDong/CasLocusAnno.


Assuntos
Proteínas Associadas a CRISPR/genética , Biologia Computacional/métodos , Loci Gênicos/genética , Internet , Anotação de Sequência Molecular/métodos
7.
Front Microbiol ; 10: 1236, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31214154

RESUMO

Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.

8.
Front Microbiol ; 10: 184, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30814982

RESUMO

The in-depth study of viral genomes is of great help in many aspects, especially in the treatment of human diseases caused by viral infections. With the rapid accumulation of viral sequencing data, improved, or alternative gene-finding systems have become necessary to process and mine these data. In this article, we present Vgas, a system combining an ab initio method and a similarity-based method to automatically find viral genes and perform gene function annotation. Vgas was compared with existing programs, such as Prodigal, GeneMarkS, and Glimmer. Through testing 5,705 virus genomes downloaded from RefSeq, Vgas demonstrated its superiority with the highest average precision and recall (both indexes were 1% higher or more than the other programs); particularly for small virus genomes (≤ 10 kb), it showed significantly improved performance (precision was 6% higher, and recall was 2% higher). Moreover, Vgas presents an annotation module to provide functional information for predicted genes based on BLASTp alignment. This characteristic may be specifically useful in some cases. When combining Vgas with GeneMarkS and Prodigal, better prediction results could be obtained than with each of the three individual programs, suggesting that collaborative prediction using several different software programs is an alternative for gene prediction. Vgas is freely available at http://cefg.uestc.cn/vgas/ or http://121.48.162.133/vgas/. We hope that Vgas could be an alternative virus gene finder to annotate new genomes or reannotate existing genome.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1274-1279, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-28212095

RESUMO

Essential genes are those genes of an organism that are considered to be crucial for its survival. Identification of essential genes is therefore of great significance to advance our understanding of the principles of cellular life. We have developed a novel computational method, which can effectively predict bacterial essential genes by extracting and integrating homologous features, protein domain feature, gene intrinsic features, and network topological features. By performing the principal component regression (PCR) analysis for Escherichia coli MG1655, we established a classification model with the average area under curve (AUC) value of 0.992 in ten times 5-fold cross-validation tests. Furthermore, when employing this new model to a distantly related organism-Streptococcus pneumoniae TIGR4, we still got a reliable AUC value of 0.788. These results indicate that our feature-integrated approach could have practical applications in accurately investigating essential genes from broad bacterial species, and also provide helpful guidelines for the minimal cell.


Assuntos
Biologia Computacional/métodos , Escherichia coli/genética , Genes Bacterianos , Genes Essenciais , Streptococcus pneumoniae/genética , Algoritmos , Área Sob a Curva , Bases de Dados Genéticas , Reações Falso-Positivas , Genômica/métodos , Filogenia , Domínios Proteicos , Mapeamento de Interação de Proteínas , RNA Ribossômico 16S/genética , Curva ROC , Análise de Regressão , Sensibilidade e Especificidade
10.
Front Microbiol ; 9: 2948, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30581420

RESUMO

Understanding how proteins evolve is important, and the order of amino acids being recruited into the genetic codons was found to be an important factor shaping the amino acid composition of proteins. The latest work about the last universal common ancestor (LUCA) makes it possible to determine the potential factors shaping amino acid compositions during evolution. Those LUCA genes/proteins from Methanococcus maripaludis S2, which is one of the possible LUCA, were investigated. The evolutionary rates of these genes positively correlate with GC contents with P-value significantly lower than 0.05 for 94% homologous genes. Linear regression results showed that compositions of amino acids coded by GC-rich codons positively contribute to the evolutionary rates, while these amino acids tend to be gained in GC-rich organisms according to our results. The first principal component correlates with the GC content very well. The ratios of amino acids of the LUCA proteins coded by GC rich codons positively correlate with the GC content of different bacteria genomes, while the ratios of amino acids coded by AT rich codons negatively correlate with the increase of GC content of genomes. Next, we found that the recruitment order does correlate with the amino acid compositions, but gain and loss in codons showed newly recruited amino acids are not significantly increased along with the evolution. Thus, we conclude that GC content is a primary factor shaping amino acid compositions. GC content shapes amino acid composition to trade off the cost of amino acids with bases, which could be caused by the energy efficiency.

11.
Brief Bioinform ; 2018 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-30496347

RESUMO

Essential genes have attracted increasing attention in recent years due to the important functions of these genes in organisms. Among the methods used to identify the essential genes, accurate and efficient computational methods can make up for the deficiencies of expensive and time-consuming experimental technologies. In this review, we have collected researches on essential gene predictions in prokaryotes and eukaryotes and summarized the five predominant types of features used in these studies. The five types of features include evolutionary conservation, domain information, network topology, sequence component and expression level. We have described how to implement the useful forms of these features and evaluated their performance based on the data of Escherichia coli MG1655, Bacillus subtilis 168 and human. The prerequisite and applicable range of these features is described. In addition, we have investigated the techniques used to weight features in various models. To facilitate researchers in the field, two available online tools, which are accessible for free and can be directly used to predict gene essentiality in prokaryotes and humans, were referred. This article provides a simple guide for the identification of essential genes in prokaryotes and eukaryotes.

12.
Environ Microbiol ; 20(10): 3836-3850, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30187624

RESUMO

To better understand the mechanisms of bacterial adaptation in oxygen environments, we explored the aerobic living-associated genes in bacteria by comparing Clusters of Orthologous Groups of proteins' (COGs) frequencies and gene expression analyses and 38 COGs were detected at significantly higher frequencies (p-value less than 1e-6) in aerobes than in anaerobes. Differential expression analyses between two conditions further narrowed the prediction to 27 aerobe-specific COGs. Then, we annotated the enzymes associated with these COGs. Literature review revealed that 14 COGs contained enzymes catalysing oxygen-involved reactions or products involved in aerobic pathways, suggesting their important roles for survival in aerobic environments. Additionally, protein-protein interaction analyses and step length comparisons of metabolic networks suggested that the other 13 COGs may function relevantly with the 14 enzymes-corresponding COGs, indicating that these genes may be highly associated with oxygen utilization. Phylogenetic and evolutionary analyses showed that the 27 COGs did not have similar trees, and all suffered purifying selection pressures. The divergent times of species containing or lacking aerobic COGs validated that the appearing time of oxygen-utilizing gene was approximately 2.80 Gyr ago. In addition to help better understand oxygen adaption, our method may be extended to identify genes relevant to other living environments.


Assuntos
Bactérias/enzimologia , Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Oxigênio/metabolismo , Aerobiose , Bactérias/classificação , Bactérias/genética , Proteínas de Bactérias/genética , Evolução Molecular , Redes e Vias Metabólicas , Filogenia
13.
Genome Biol Evol ; 10(8): 2072-2085, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30060177

RESUMO

Pandemic cholera is a major concern for public health because of its high mortality and morbidity. Mutation accumulation (MA) experiments were performed on a representative strain of the current cholera pandemic. Although the base-pair substitution mutation rates in Vibrio cholerae (1.24 × 10-10 per site per generation for wild-type lines and 3.29 × 10-8 for mismatch repair deficient lines) are lower than that previously reported in other bacteria using MA analysis, we discovered specific high rates (8.31 × 10-8 site/generation for wild-type lines and 1.82 × 10-6 for mismatch repair deficient lines) of base duplication or deletion driven by large-scale copy number variations (CNVs). These duplication-deletions are located in two pathogenic islands, IMEX and the large integron island. Each element of these islands has discrepant rate in rapid integration and excision, which provides clues to the pandemicity evolution of V. cholerae. These results also suggest that large-scale structural variants such as CNVs can accumulate rapidly during short-term evolution. Mismatch repair deficient lines exhibit a significantly increased mutation rate in the larger chromosome (Chr1) at specific regions, and this pattern is not observed in wild-type lines. We propose that the high frequency of GATC sites in Chr1 improves the efficiency of MMR, resulting in similar rates of mutation in the wild-type condition. In addition, different mutation rates and spectra were observed in the MA lines under distinct growth conditions, including minimal media, rich media and antibiotic treatments.


Assuntos
Pareamento de Bases/genética , Cólera/epidemiologia , Cólera/microbiologia , Deleção de Genes , Duplicação Gênica , Pandemias , Vibrio cholerae/genética , Cromossomos Bacterianos/genética , Meios de Cultura , Período de Replicação do DNA/efeitos dos fármacos , Ilhas Genômicas , Humanos , Taxa de Mutação , Reprodutibilidade dos Testes , Rifampina/farmacologia , Vibrio cholerae/efeitos dos fármacos
14.
Int J Biol Sci ; 14(8): 883-891, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29989083

RESUMO

Meiotic recombination caused by meiotic double-strand DNA breaks. In some regions the frequency of DNA recombination is relatively higher, while in other regions the frequency is lower: the former is usually called "recombination hotspot", while the latter the "recombination coldspot". Information of the hot and cold spots may provide important clues for understanding the mechanism of genome revolution. Therefore, it is important to accurately predict these spots. In this study, we rebuilt the benchmark dataset by unifying its samples with a same length (131 bp). Based on such a foundation and using SVM (Support Vector Machine) classifier, a new predictor called "iRSpot-Pse6NC" was developed by incorporating the key hexamer features into the general PseKNC (Pseudo K-tuple Nucleotide Composition) via the binomial distribution approach. It has been observed via rigorous cross-validations that the proposed predictor is superior to its counterparts in overall accuracy, stability, sensitivity and specificity. For the convenience of most experimental scientists, the web-server for iRSpot-Pse6NC has been established at http://lin-group.cn/server/iRSpot-Pse6NC, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.


Assuntos
Biologia Computacional/métodos , Recombinação Genética/genética , Saccharomyces cerevisiae/genética , Algoritmos , Análise de Sequência de DNA , Software
15.
Sci Rep ; 8(1): 7382, 2018 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-29743515

RESUMO

Inconsistent results on the association between evolutionary rates and amino acid composition of proteins have been reported in eukaryotes. However, there are few studies of how amino acid composition can influence evolutionary rates in bacteria. Thus, we constructed linear regression models between composition frequencies of amino acids and evolutionary rates for bacteria. Compositions of all amino acids can on average explain 21.5% of the variation in evolutionary rates among 273 investigated bacterial organisms. In five model organisms, amino acid composition contributes more to variation in evolutionary rates than protein abundance, and frequency of optimal codons. The contribution of individual amino acid composition to evolutionary rate varies among organisms. The closer the GC-content of genome to its maximum or minimum, the better the correlation between the amino acid content and the evolutionary rate of proteins would appear in that genome. The types of amino acids that significantly contribute to evolutionary rates can be grouped into GC-rich and AT-rich amino acids. Besides, the amino acid with high composition also contributes more to evolutionary rates than amino acid with low composition in proteome. In summary, amino acid composition significantly contributes to the rate of evolution in bacterial organisms and this in turn is impacted by GC-content.


Assuntos
Sequência de Aminoácidos , Bactérias/genética , Evolução Molecular , Genoma Bacteriano , Proteoma/genética , Bactérias/metabolismo , Composição de Bases , Proteoma/química , Proteoma/metabolismo
16.
Nucleic Acids Res ; 46(D1): D393-D398, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29036676

RESUMO

CRISPR-Cas is a tool that is widely used for gene editing. However, unexpected off-target effects may occur as a result of long-term nuclease activity. Anti-CRISPR proteins, which are powerful molecules that inhibit the CRISPR-Cas system, may have the potential to promote better utilization of the CRISPR-Cas system in gene editing, especially for gene therapy. Additionally, more in-depth research on these proteins would help researchers to better understand the co-evolution of bacteria and phages. Therefore, it is necessary to collect and integrate data on various types of anti-CRISPRs. Herein, data on these proteins were manually gathered through data screening of the literatures. Then, the first online resource, anti-CRISPRdb, was constructed for effectively organizing these proteins. It contains the available protein sequences, DNA sequences, coding regions, source organisms, taxonomy, virulence, protein interactors and their corresponding three-dimensional structures. Users can access our database at http://cefg.uestc.edu.cn/anti-CRISPRdb/ without registration. We believe that the anti-CRISPRdb can be used as a resource to facilitate research on anti-CRISPR proteins and in related fields.


Assuntos
Bacteriófagos/fisiologia , Sistemas CRISPR-Cas , Bases de Dados de Proteínas , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo
17.
DNA Res ; 24(6): 623-633, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-28992099

RESUMO

Although more and more entangled participants of translation process were realized, how they cooperate and co-determine the final translation efficiency still lacks details. Here, we reasoned that the basic translation components, tRNAs and amino acids should be consistent to maximize the efficiency and minimize the cost. We firstly revealed that 310 out of 410 investigated genomes of three domains had significant co-adaptions between the tRNA gene copy numbers and amino acid compositions, indicating that maximum efficiency constitutes ubiquitous selection pressure on protein translation. Furthermore, fast-growing and larger bacteria are found to have significantly better co-adaption and confirmed the effect of this pressure. Within organism, highly expressed proteins and those connected to acute responses have higher co-adaption intensity. Thus, the better co-adaption probably speeds up the growing of cells through accelerating the translation of special proteins. Experimentally, manipulating the tRNA gene copy number to optimize co-adaption between enhanced green fluorescent protein (EGFP) and tRNA gene set of Escherichia coli indeed lifted the translation rate (speed). Finally, as a newly confirmed translation rate regulating mechanism, the co-adaption reflecting translation rate not only deepens our understanding on translation process but also provides an easy and practicable method to improve protein translation rates and productivity.


Assuntos
Aminoácidos/genética , Escherichia coli/genética , Dosagem de Genes , RNA de Transferência/genética , Saccharomyces cerevisiae/genética , Adaptação Fisiológica , Escherichia coli/metabolismo , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Humanos , Biossíntese de Proteínas , RNA de Transferência/metabolismo , Saccharomyces cerevisiae/metabolismo , Seleção Genética
18.
BMC Syst Biol ; 11(1): 50, 2017 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-28420402

RESUMO

BACKGROUND: Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. DESCRIPTION: Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. CONCLUSION: SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Análise do Fluxo Metabólico
19.
BMC Microbiol ; 17(1): 73, 2017 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-28347342

RESUMO

BACKGROUND: Genomic islands (GIs) are genomic regions that reveal evidence of horizontal DNA transfer. They can code for many functions and may augment a bacterium's adaptation to its host or environment. GIs have been identified in strain J2315 of Burkholderia cenocepacia, whereas in strain AU 1054 there has been no published works on such regions according to our text mining and keyword search in Medline. RESULTS: In this study, we identified 21 GIs in AU 1054 by combining two computational tools. Feature analyses suggested that the predictions are highly reliable and hence illustrated the advantage of joint predictions by two independent methods. Based on putative virulence factors, four GIs were further identified as pathogenicity islands (PAIs). Through experiments of gene deletion mutants in live bacteria, two putative PAIs were confirmed, and the virulence factors involved were identified as lipA and copR. The importance of the genes lipA (from PAI 1) and copR (from PAI 2) for bacterial invasion and replication indicates that they are required for the invasive properties of B. cenocepacia and may function as virulence determinants for bacterial pathogenesis and host infection. CONCLUSIONS: This approach of in silico prediction of GIs and subsequent identification of potential virulence factors in the putative island regions with final validation using wet experiments could be used as an effective strategy to rapidly discover novel virulence factors in other bacterial species and strains.


Assuntos
Burkholderia cenocepacia/genética , Ilhas Genômicas/genética , Genômica , Fatores de Virulência/genética , Fatores de Virulência/isolamento & purificação , Células A549 , Aderência Bacteriana , Proteínas de Bactérias/genética , Composição de Bases , Infecções por Burkholderia/microbiologia , Burkholderia cenocepacia/crescimento & desenvolvimento , Burkholderia cenocepacia/patogenicidade , Técnicas de Cultura de Células , Contagem de Colônia Microbiana , Biologia Computacional/métodos , DNA Bacteriano , Deleção de Genes , Transferência Genética Horizontal , Genes Bacterianos/genética , Genoma Bacteriano/genética , Humanos
20.
Bioinformatics ; 33(12): 1758-1764, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28158612

RESUMO

Motivation: Previously constructed classifiers in predicting eukaryotic essential genes integrated a variety of features including experimental ones. If we can obtain satisfactory prediction using only nucleotide (sequence) information, it would be more promising. Three groups recently identified essential genes in human cancer cell lines using wet experiments and it provided wonderful opportunity to accomplish our idea. Here we improved the Z curve method into the λ-interval form to denote nucleotide composition and association information and used it to construct the SVM classifying model. Results: Our model accurately predicted human gene essentiality with an AUC higher than 0.88 both for 5-fold cross-validation and jackknife tests. These results demonstrated that the essentiality of human genes could be reliably reflected by only sequence information. We re-predicted the negative dataset by our Pheg server and 118 genes were additionally predicted as essential. Among them, 20 were found to be homologues in mouse essential genes, indicating that some of the 118 genes were indeed essential, however previous experiments overlooked them. As the first available server, Pheg could predict essentiality for anonymous gene sequences of human. It is also hoped the λ-interval Z curve method could be effectively extended to classification issues of other DNA elements. Availability and Implementation: http://cefg.uestc.edu.cn/Pheg. Contact: fbguo@uestc.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Composição de Bases , Genes Essenciais , Análise de Sequência de DNA/métodos , Software , Animais , Eucariotos/genética , Humanos , Camundongos , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...