Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Nucleic Acids Res ; 44(W1): W550-6, 2016 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-27150808

RESUMO

In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown 'chemical space' to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for 'chemical space', which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/.


Assuntos
Simulação por Computador , Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Internet , Ligantes , Proteínas/química , Software , Sítios de Ligação , Imageamento Tridimensional , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Interface Usuário-Computador
2.
Environ Microbiol ; 19(3): 1266-1280, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28028888

RESUMO

Laribacter hongkongensis is a fish-borne pathogen associated with invasive infections and gastroenteritis. Its adaptive mechanisms to oxygen-limiting conditions in various environmental niches remain unclear. In this study, we compared the transcriptional profiles of L. hongkongensis under aerobic and anaerobic conditions using RNA-sequencing. Expression of genes involved in arginine metabolism significantly increased under anoxic conditions. Arginine was exploited as the sole energy source in L. hongkongensis for anaerobic respiration via the arginine catabolism pathway: specifically via the arginine deiminase (ADI) pathway. A transcriptional regulator FNR was identified to coordinate anaerobic metabolism by tightly regulating the expression of arginine metabolism genes. FNR executed its regulatory function by binding to FNR boxes in arc operons promoters. Survival of isogenic fnr mutant in macrophages decreased significantly when compared with wild-type; and expression level of fnr increased 8 h post-infection. Remarkably, FNR directly interacted with ArgR, another regulator that influences the biological fitness and intracellular survival of L. hongkongensis by regulating arginine metabolism genes. Our results demonstrated that FNR and ArgR work in coordination to respond to oxygen changes in both extracellular and intracellular environments, by finely regulating the ADI pathway and arginine anabolism pathways, thereby optimizing bacterial fitness in various environmental niches.


Assuntos
Arginina/metabolismo , Proteínas de Bactérias/metabolismo , Betaproteobacteria/fisiologia , Regulação Bacteriana da Expressão Gênica , Proteínas Ferro-Enxofre/metabolismo , Aclimatação , Adaptação Fisiológica , Anaerobiose , Proteínas de Bactérias/genética , Betaproteobacteria/genética , Hidrolases/metabolismo , Proteínas Ferro-Enxofre/genética , Redes e Vias Metabólicas , Óperon , Regiões Promotoras Genéticas
3.
Mol Biol Evol ; 31(5): 1302-8, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24531082

RESUMO

Mutation is the ultimate source of genetic variation and evolution. Mutation accumulation (MA) experiments are an alternative approach to study de novo mutation events directly. We have constructed a resource of Spontaneous Mutation Accumulation Lines (SMAL; http://cefg.uestc.edu.cn/smal), which contains all the current publicly available MA lines identified by high-throughput sequencing. We have relocated and mapped the mutations based on the most recent genome annotations. A total of 5,608 single base mutations and 540 other mutations were obtained and are recorded in the current version of the SMAL database. The integrated data in SMAL provide detailed information that can be used in new theoretical analyses. We believe that the SMAL resource will help researchers better understand the processes of genetic variation and the incidence of disease.


Assuntos
Bases de Dados Genéticas , Mutação , Animais , Drosophila melanogaster/genética , Escherichia coli/genética , Evolução Molecular , Feminino , Deriva Genética , Aptidão Genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Modelos Genéticos , Salmonella typhimurium/genética
4.
Int J Mol Sci ; 16(9): 23111-26, 2015 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-26404268

RESUMO

Composition bias from Chargaff's second parity rule (PR2) has long been found in sequenced genomes, and is believed to relate strongly with the replication process in microbial genomes. However, some disagreement on the underlying reason for strand composition bias remains. We performed an integrative analysis of various genomic features that might influence composition bias using a large-scale dataset of 1111 genomes. Our results indicate (1) the bias was stronger in obligate intracellular bacteria than in other free-living species (p-value=0.0305); (2) Fusobacteria and Firmicutes had the highest average bias among the 24 microbial phyla analyzed; (3) the strength of selected codon usage bias and generation times were not observably related to strand composition bias (p-value=0.3247); (4) significant negative relationships were found between GC content, genome size, rearrangement frequency, Clusters of Orthologous Groups (COG) functional subcategories A, C, I, Q, and composition bias (p-values<1.0×10(-8)); (5) gene density and COG functional subcategories D, F, J, L, and V were positively related with composition bias (p-value<2.2×10(-16)); and (6) gene density made the most important contribution to composition bias, indicating transcriptional bias was associated strongly with strand composition bias. Therefore, strand composition bias was found to be influenced by multiple factors with varying weights.


Assuntos
Bactérias/genética , Genoma Bacteriano , Composição de Bases , Dosagem de Genes , Genes Bacterianos , Análise de Componente Principal , Recombinação Genética
5.
BMC Genomics ; 14: 769, 2013 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-24209780

RESUMO

BACKGROUND: Essential genes are indispensable for the survival of living entities. They are the cornerstones of synthetic biology, and are potential candidate targets for antimicrobial and vaccine design. DESCRIPTION: Here we describe the Cluster of Essential Genes (CEG) database, which contains clusters of orthologous essential genes. Based on the size of a cluster, users can easily decide whether an essential gene is conserved in multiple bacterial species or is species-specific. It contains the similarity value of every essential gene cluster against human proteins or genes. The CEG_Match tool is based on the CEG database, and was developed for prediction of essential genes according to function. The database is available at http://cefg.uestc.edu.cn/ceg. CONCLUSIONS: Properties contained in the CEG database, such as cluster size, and the similarity of essential gene clusters against human proteins or genes, are very important for evolutionary research and drug design. An advantage of CEG is that it clusters essential genes based on function, and therefore decreases false positive results when predicting essential genes in comparison with using the similarity alignment method.


Assuntos
Bases de Dados Genéticas , Genes Essenciais , Internet , Algoritmos , Humanos , Análise em Microsséries , Software , Especificidade da Espécie
6.
Plant Genome ; 16(2): e20317, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36896476

RESUMO

Fully understanding traditional Chinese medicines (TCMs) is still challenging because of the extreme complexity of their chemical components and mechanisms of action. The TCM Plant Genome Project aimed to obtain genetic information, determine gene functions, discover regulatory networks of herbal species, and elucidate the molecular mechanisms involved in the disease prevention and treatment, thereby accelerating the modernization of TCMs. A comprehensive database that contains TCM-related information will provide a vital resource. Here, we present an integrative genome database of TCM plants (IGTCM) that contains 14,711,220 records of 83 annotated TCM-related herb genomes, including 3,610,350 genes, 3,534,314 proteins and corresponding coding sequences, and 4,032,242 RNAs, as well as 1033 non-redundant component records for 68 herbs, downloaded and integrated from the GenBank and RefSeq databases. For minimal interconnectivity, each gene, protein, and component was annotated using the eggNOG-mapper tool and Kyoto Encyclopedia of Genes and Genomes database to acquire pathway information and enzyme classifications. These features can be linked across several species and different components. The IGTCM database also provides visualization and sequence similarity search tools for data analyses. These annotated herb genome sequences in IGTCM database are a necessary resource for systematically exploring genes related to the biosynthesis of compounds that have significant medicinal activities and excellent agronomic traits that can be used to improve TCM-related varieties through molecular breeding. It also provides valuable data and tools for future research on drug discovery and the protection and rational use of TCM plant resources. The IGTCM database is freely available at http://yeyn.group:96/.


Assuntos
Medicamentos de Ervas Chinesas , Medicina Tradicional Chinesa , Medicamentos de Ervas Chinesas/química , Medicamentos de Ervas Chinesas/farmacologia , Medicamentos de Ervas Chinesas/uso terapêutico
7.
Yi Chuan ; 34(4): 420-30, 2012 Apr.
Artigo em Chinês | MEDLINE | ID: mdl-22522159

RESUMO

Essential genes are indispensable for the survival of an organism in optimal conditions. Recently, study on essential gene is becoming a hot topic of microbiology, genomics, and bioinformatics. This paper described the experiments that determined essential genes in some microbes and the theoretical researches on essential genes were reviewed. The major content contained comparison of essential genes and non-essential genes based on information on evolutionary conservation and sequence composition, and in silico prediction of essential genes, and analysis of the chromosomal distributions of essential genes. Finally, related progresses were concluded and the open problems were pointed out.


Assuntos
Bactérias/genética , Genes Essenciais/fisiologia , Evolução Molecular
8.
Front Endocrinol (Lausanne) ; 13: 830760, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35360080

RESUMO

Purpose: Anaplastic thyroid carcinoma (ATC) and primary squamous cell carcinoma of the thyroid (PSCCTh) have similar histological findings and are currently treated using the same approaches; however, the characteristics and prognosis of these cancers are poorly researched. The objective of this study was to determine the differences in characteristics between ATC and PSCCTh and establish prognostic models. Patients and Methods: All variables of patients with ATC and PSCCTh, diagnosed from 2004-2015, were retrieved from the Surveillance, Epidemiology, and End Results Program (SEER) database. Percentage differences for categorical data were compared using the Chi-square test. Kaplan-Meier curves, log-rank test, and Cox-regression for survival analysis, and C-index value was used to evaluate the performance of the prognostic models. Results: After application of the inclusion and exclusion criteria, a total of 1164 ATC and 124 PSCCTh patients, diagnosed from 2004 to 2015, were included in the study. There were no differences in sex, ethnicity, age, marital status, or percentage of proximal metastases between the two cancers; however, radiotherapy, chemotherapy, incidence of surgical treatment, and presence of multiple primary tumors were higher in patients with ATC than those with PSCCTh. Further cancer-specific survival (CSS) of patients with PSCCTh was better than that of patients with ATC. Prognostic factors were not identical for the two cancers. Multivariate Cox model analysis indicated that age, sex, radiotherapy, chemotherapy, surgery, multiple primary tumors, marital status, and distant metastasis status are independent prognostic factors for CSS in patients with ATC, while for patients with PSCCTh, the corresponding factors are age, radiotherapy, multiple primary tumors, and surgery. The C-index values of the two models were both > 0.8, indicating that the models exhibited good discriminative ability. Conclusion: Prognostic factors influencing CSS were not identical in patients with ATC and PSCCTh. These findings indicate that different clinical treatment and management plans are required for patients with these two types of thyroid cancer.


Assuntos
Carcinoma de Células Escamosas , Carcinoma Anaplásico da Tireoide , Neoplasias da Glândula Tireoide , Carcinoma de Células Escamosas/epidemiologia , Carcinoma de Células Escamosas/terapia , Células Epiteliais/patologia , Humanos , Prognóstico , Carcinoma Anaplásico da Tireoide/epidemiologia , Carcinoma Anaplásico da Tireoide/terapia , Neoplasias da Glândula Tireoide/diagnóstico , Neoplasias da Glândula Tireoide/epidemiologia , Neoplasias da Glândula Tireoide/terapia
9.
Front Endocrinol (Lausanne) ; 13: 882279, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36176465

RESUMO

Background: This study aimed to establish and validate an accurate prognostic model, based on demographic and clinical parameters, for predicting the cancer-specific survival (CSS) of patients with poorly differentiated thyroid carcinoma (PDTC). Materials and methods: Patients diagnosed with PDTC between 2004 to 2015 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Randomly split the data into training and validation sets. Kaplan-Meier analysis with the log-rank test was performed to compare the survival distribution among cases. Univariate and multivariate Cox proportional hazards regression analyses were used to identify independent prognostic factors, which were subsequently utilized to construct a nomogram for predicting the 5- and 10-year cancer-specific survival of patients with PDTC. The discriminative ability and calibration of the nomogram model were assessed using the concordance index and calibration plots, respectively. In addition, we performed a decision curve analysis to assess the clinical value of the nomogram. Simultaneously, we compared the predictive performance of the nomogram model against that of the American Joint Committee on Cancer (AJCC) T-, N-, M-stage. Results: A total of 970 eligible patients were randomly assigned to either a training cohort (n = 679) or a validation cohort (n = 291). The Kaplan-Meier analysis revealed that there were no significant differences in cumulative survival based on the race, radiation, and marital status of patients. The stepwise Cox regression model showed that the model was optimal when the following five variables were included: age, tumor size, T-, N-, and M-stage. A nomogram was developed as a graphical representation of the model and exhibited good calibration and discriminative ability in the study. Compared to the T-, N-, and M-stage, the C-index of nomogram (training group: 0.807, validation group: 0.802), the areas under the receiver operating characteristic curve of the training set (5-year AUC: 0.843, 10-year AUC:0.834) and the validation set (5-year AUC:0.878, 10-year AUC:0.811), and the calibration plots of this model all exhibited better performance. At last, compared with T-, N-, and M-stage, the decision curve analysis indicated that the nomogram had excellent clinical net benefit. Conclusions: The nomogram developed by us can accurately predict the CSS of PDTC patients. It can help clinicians determine appropriate treatment strategies for poorly differentiated thyroid carcinoma patients.


Assuntos
Adenocarcinoma , Neoplasias da Glândula Tireoide , Adenocarcinoma/patologia , Humanos , Estadiamento de Neoplasias , Nomogramas , Programa de SEER , Neoplasias da Glândula Tireoide/epidemiologia , Neoplasias da Glândula Tireoide/terapia
10.
Indian J Biochem Biophys ; 48(6): 416-21, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22329244

RESUMO

Although non-coding RNA (ncRNA) genes do not encode proteins, they play vital roles in cells by producing functionally important RNAs. In this paper, we present a novel method for predicting ncRNA genes based on compositional features extracted directly from gene sequences. Our method consists of two Support Vector Machine (SVM) models--Codon model which uses codon usage features derived from ncRNA genes and protein-coding genes and Kmer model which utilizes features of nucleotide and dinucleotide frequency extracted respectively from ncRNA genes and randomly chosen genome sequences. The 10-fold cross-validation accuracy for the two models is found to be 92% and 91%, respectively. Thus, we could make an automatic prediction of ncRNA genes in one genome without manual filtration of protein-coding genes. After applying our method in Sulfolobus solfataricus genome, 25 prediction results have been generated according to 25 cut-off pairs. We have also applied the approach in E. coli and found our results comparable to those of previous studies. In general, our method enables automatic identification of ncRNA genes in newly sequenced prokaryotic genomes.


Assuntos
Células Procarióticas , RNA não Traduzido/genética , Modelos Genéticos
11.
Comput Struct Biotechnol J ; 19: 4042-4048, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34527183

RESUMO

Studies on codon property would deepen our understanding of the origin of primitive life and enlighten biotechnical application. Here, we proposed a quantitative measurement of codon-amino acid association and found that seven out of 13 physicochemical properties have stronger associations with the nucleotide identity at the second codon position, indicating that protein structure and function may associate more closely with it than the other two sites. When extending the effect of codon-amino acid association to protein level, it was found that the correlation between the second codon position (measured by the relative frequencies of nucleobase T and A at this codon site) and hydrophobicity (by the form of GRAVY value) became stronger with 96% genomes having R > 0.90 and p < 1e-60. Furthermore, we revealed that informational genes encoding proteins have lower GRAVY values than operational proteins (p < 3e-37) in both prokaryotic and eukaryotic genomes. The above results reveal a complete link from codon identity (A2 versus T2) to amino acid property (hydrophilic versus hydrophobic) and then to protein functions (informational versus operational). Hence, our work may help to understand how the nucleotide sequence determines protein function.

12.
Database (Oxford) ; 20202020 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-33306800

RESUMO

Essential genes are key elements for organisms to maintain their living. Building databases that store essential genes in the form of homologous clusters, rather than storing them as a singleton, can provide more enlightening information such as the general essentiality of homologous genes in multiple organisms. In 2013, the first database to store prokaryotic essential genes in clusters, CEG (Clusters of Essential Genes), was constructed. Afterward, the amount of available data for essential genes increased by a factor >3 since the last revision. Herein, we updated CEG to version 2, including more prokaryotic essential genes (from 16 gene datasets to 29 gene datasets) and newly added eukaryotic essential genes (nine species), specifically the human essential genes of 12 cancer cell lines. For prokaryotes, information associated with drug targets, such as protein structure, ligand-protein interaction, virulence factor and matched drugs, is also provided. Finally, we provided the service of essential gene prediction for both prokaryotes and eukaryotes. We hope our updated database will benefit more researchers in drug targets and evolutionary genomics. Database URL: http://cefg.uestc.cn/ceg.


Assuntos
Eucariotos , Genes Essenciais , Bases de Dados Factuais , Genes Essenciais/genética , Genômica , Humanos , Proteínas
13.
Int J Biol Sci ; 15(7): 1396-1403, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31337970

RESUMO

Dendritic cells (DCs) are the most potent specialized antigen-presenting cells as now known, which play a crucial role in initiating and amplifying both the innate and adaptive immune responses. Immunologically, the motilities and T cell activation capabilities of DCs are closely related to the resulting immune responses. However, due to the complexity of the immune system, the dynamic changes in the number of cells during the peripheral tissue (e.g. skin and mucosa) immune response induced by DCs are still poorly understood. Therefore, this study simulated dynamic number changes of DCs and T cells in this process by constructing several ordinary differential equations and setting the initial conditions of the functions and parameters. The results showed that these equations could simulate dynamic numerical changes of DCs and T cells in peripheral tissue and lymph node, which was in accordance with the physiological conditions such as the duration of immune response, the proliferation rates and the motilities of DCs and T cells. This model provided a theoretical reference for studying the immunologic functions of DCs and practical guidance for the clinical DCs-based therapy against immune-related diseases.


Assuntos
Células Dendríticas/citologia , Imunidade Celular , Modelos Teóricos , Linfócitos T/citologia , Antígenos/imunologia , Movimento Celular , Proliferação de Células , Humanos , Imunoterapia , Inflamação , Linfonodos/patologia , Ativação Linfocitária
14.
Genome Biol Evol ; 10(8): 2072-2085, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30060177

RESUMO

Pandemic cholera is a major concern for public health because of its high mortality and morbidity. Mutation accumulation (MA) experiments were performed on a representative strain of the current cholera pandemic. Although the base-pair substitution mutation rates in Vibrio cholerae (1.24 × 10-10 per site per generation for wild-type lines and 3.29 × 10-8 for mismatch repair deficient lines) are lower than that previously reported in other bacteria using MA analysis, we discovered specific high rates (8.31 × 10-8 site/generation for wild-type lines and 1.82 × 10-6 for mismatch repair deficient lines) of base duplication or deletion driven by large-scale copy number variations (CNVs). These duplication-deletions are located in two pathogenic islands, IMEX and the large integron island. Each element of these islands has discrepant rate in rapid integration and excision, which provides clues to the pandemicity evolution of V. cholerae. These results also suggest that large-scale structural variants such as CNVs can accumulate rapidly during short-term evolution. Mismatch repair deficient lines exhibit a significantly increased mutation rate in the larger chromosome (Chr1) at specific regions, and this pattern is not observed in wild-type lines. We propose that the high frequency of GATC sites in Chr1 improves the efficiency of MMR, resulting in similar rates of mutation in the wild-type condition. In addition, different mutation rates and spectra were observed in the MA lines under distinct growth conditions, including minimal media, rich media and antibiotic treatments.


Assuntos
Pareamento de Bases/genética , Cólera/epidemiologia , Cólera/microbiologia , Deleção de Genes , Duplicação Gênica , Pandemias , Vibrio cholerae/genética , Cromossomos Bacterianos/genética , Meios de Cultura , Período de Replicação do DNA/efeitos dos fármacos , Ilhas Genômicas , Humanos , Taxa de Mutação , Reprodutibilidade dos Testes , Rifampina/farmacologia , Vibrio cholerae/efeitos dos fármacos
16.
BMC Syst Biol ; 11(1): 50, 2017 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-28420402

RESUMO

BACKGROUND: Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. DESCRIPTION: Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. CONCLUSION: SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Análise do Fluxo Metabólico
17.
Sci Rep ; 6: 35082, 2016 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-27713529

RESUMO

A minimal gene set (MGS) is critical for the assembly of a minimal artificial cell. We have developed a proposal of simplifying bacterial gene set to approximate a bacterial MGS by the following procedure. First, we base our simplified bacterial gene set (SBGS) on experimentally determined essential genes to ensure that the genes included in the SBGS are critical. Second, we introduced a half-retaining strategy to extract persistent essential genes to ensure stability. Third, we constructed a viable metabolic network to supplement SBGS. The proposed SBGS includes 327 genes and required 431 reactions. This report describes an SBGS that preserves both self-replication and self-maintenance systems. In the minimized metabolic network, we identified five novel hub metabolites and confirmed 20 known hubs. Highly essential genes were found to distribute the connecting metabolites into more reactions. Based on our SBGS, we expanded the pool of targets for designing broad-spectrum antibacterial drugs to reduce pathogen resistance. We also suggested a rough semi-de novo strategy to synthesize an artificial cell, with potential applications in industry.


Assuntos
Células Artificiais/metabolismo , Genes Bacterianos/genética , Genes Essenciais/genética , Redes e Vias Metabólicas/genética , Proteínas de Bactérias/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Genômica/métodos , Haemophilus influenzae/genética , Mycoplasma genitalium/genética
18.
Mol Biosyst ; 12(9): 2893-900, 2016 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-27410247

RESUMO

Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by ∼6.6%, ∼3.2%, and ∼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from . In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.


Assuntos
Biologia Computacional/métodos , DNA/química , DNA/genética , Nucleotídeos , Algoritmos , Genoma Fúngico , Curva ROC , Recombinação Genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Máquina de Vetores de Suporte , Navegador
19.
Methods Mol Biol ; 1279: 205-17, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25636621

RESUMO

Essential genes are those genes indispensable for the survival of any living cell. Bacterial essential genes constitute the cornerstones of synthetic biology and are often attractive targets in the development of antibiotics and vaccines. Because identification of essential genes with wet-lab ways often means expensive economic costs and tremendous labor, scientists changed to seek for alternative way of computational prediction. Aiming to help to solve this issue, our research group (CEFG: group of Computational, Comparative, Evolutionary and Functional Genomics, http://cefg.uestc.edu.cn) has constructed three online services to predict essential genes in bacterial genomes. These freely available tools are applicable for single gene sequences without annotated functions, single genes with definite names, and complete genomes of bacterial strains. To ensure reliable predictions, the investigated species should belong to the same family (for EGP) or phylum (for CEG_Match and Geptop) with one of the reference species, respectively. As the pilot software for the issue, predicting accuracies of them have been assessed and compared with existing algorithms, and note that all of other published algorithms have not any formed online services. We hope these services at CEFG will help scientists and researchers in the field of essential genes.


Assuntos
Biologia Computacional/métodos , Genes Bacterianos , Genes Essenciais , Área Sob a Curva , Sequência de Bases , Bases de Dados Genéticas , Escherichia coli K12/genética , Evolução Molecular , Genômica , Família Multigênica
20.
Artigo em Inglês | MEDLINE | ID: mdl-24923821

RESUMO

Knowledge of an organism's fitness for survival is important for a complete understanding of microbial genetics and effective drug design. Current essential gene databases provide only binary essentiality data from genome-wide experiments. We therefore developed a new database that Integrates quantitative Fitness Information for Microbial genes (IFIM). The IFIM database currently contains data from 16 experiments and 2186 theoretical predictions. The highly significant correlation between the experiment-derived fitness data and our computational simulations demonstrated that the computer-generated predictions were often as reliable as the experimental data. The data in IFIM can be accessed easily, and the interface allows users to browse through the gene fitness information that it contains. IFIM is the first resource that allows easy access to fitness data of microbial genes. We believe this database will contribute to a better understanding of microbial genetics and will be useful in designing drugs to resist microbial pathogens, especially when experimental data are unavailable. Database URL: http://cefg.uestc.edu.cn/ifim/ or http://cefg.cn/ifim/


Assuntos
Bases de Dados Genéticas , Genes Microbianos , Aptidão Genética , Biologia Computacional , Coleta de Dados , Processamento Eletrônico de Dados , Dosagem de Genes , Genes Bacterianos , Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa