Search | VHL Search Portal

1.

Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors.

Jin, Yan-Ting; Tan, Yang; Gan, Zhong-Hua; Hao, Yu-Duo; Wang, Tian-Yu; Lin, Hao; Tang, Bo.

Methods ; 229: 125-132, 2024 Jul 02.

Article in English | MEDLINE | ID: mdl-38964595

ABSTRACT

DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.

2.

Mslar: Microbial synthetic lethal and rescue database.

Zhu, Sen-Bin; Jiang, Qian-Hu; Chen, Zhi-Guo; Zhou, Xiang; Jin, Yan-Ting; Deng, Zixin; Guo, Feng-Biao.

PLoS Comput Biol ; 19(6): e1011218, 2023 06.

Article in English | MEDLINE | ID: mdl-37289843

ABSTRACT

Synthetic lethality (SL) occurs when mutations in two genes together lead to cell or organism death, while a single mutation in either gene does not have a significant impact. This concept can also be extended to three or more genes for SL. Computational and experimental methods have been developed to predict and verify SL gene pairs, especially for yeast and Escherichia coli. However, there is currently a lack of a specialized platform to collect microbial SL gene pairs. Therefore, we designed a synthetic interaction database for microbial genetics that collects 13,313 SL and 2,994 Synthetic Rescue (SR) gene pairs that are reported in the literature, as well as 86,981 putative SL pairs got through homologous transfer method in 281 bacterial genomes. Our database website provides multiple functions such as search, browse, visualization, and Blast. Based on the SL interaction data in the S. cerevisiae, we review the issue of duplications' essentiality and observed that the duplicated genes and singletons have a similar ratio of being essential when we consider both individual and SL. The Microbial Synthetic Lethal and Rescue Database (Mslar) is expected to be a useful reference resource for researchers interested in the SL and SR genes of microorganisms. Mslar is open freely to everyone and available on the web at http://guolab.whu.edu.cn/Mslar/.

Subject(s)

Neoplasms , Saccharomyces cerevisiae , Humans , Saccharomyces cerevisiae/genetics , Synthetic Lethal Mutations , Mutation , Genome, Bacterial/genetics , Databases, Genetic , Neoplasms/genetics

3.

Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment.

Dong, Chuan; Jin, Yan-Ting; Hua, Hong-Li; Wen, Qing-Feng; Luo, Sen; Zheng, Wen-Xin; Guo, Feng-Biao.

Brief Bioinform ; 21(1): 171-181, 2020 Jan 17.

Article in English | MEDLINE | ID: mdl-30496347

ABSTRACT

Essential genes have attracted increasing attention in recent years due to the important functions of these genes in organisms. Among the methods used to identify the essential genes, accurate and efficient computational methods can make up for the deficiencies of expensive and time-consuming experimental technologies. In this review, we have collected researches on essential gene predictions in prokaryotes and eukaryotes and summarized the five predominant types of features used in these studies. The five types of features include evolutionary conservation, domain information, network topology, sequence component and expression level. We have described how to implement the useful forms of these features and evaluated their performance based on the data of Escherichia coli MG1655, Bacillus subtilis 168 and human. The prerequisite and applicable range of these features is described. In addition, we have investigated the techniques used to weight features in various models. To facilitate researchers in the field, two available online tools, which are accessible for free and can be directly used to predict gene essentiality in prokaryotes and humans, were referred. This article provides a simple guide for the identification of essential genes in prokaryotes and eukaryotes.

4.

Accurate prediction of human essential genes using only nucleotide composition and association information.

Guo, Feng-Biao; Dong, Chuan; Hua, Hong-Li; Liu, Shuo; Luo, Hao; Zhang, Hong-Wan; Jin, Yan-Ting; Zhang, Kai-Yue.

Bioinformatics ; 33(12): 1758-1764, 2017 Jun 15.

Article in English | MEDLINE | ID: mdl-28158612

ABSTRACT

MOTIVATION: Previously constructed classifiers in predicting eukaryotic essential genes integrated a variety of features including experimental ones. If we can obtain satisfactory prediction using only nucleotide (sequence) information, it would be more promising. Three groups recently identified essential genes in human cancer cell lines using wet experiments and it provided wonderful opportunity to accomplish our idea. Here we improved the Z curve method into the λ-interval form to denote nucleotide composition and association information and used it to construct the SVM classifying model. RESULTS: Our model accurately predicted human gene essentiality with an AUC higher than 0.88 both for 5-fold cross-validation and jackknife tests. These results demonstrated that the essentiality of human genes could be reliably reflected by only sequence information. We re-predicted the negative dataset by our Pheg server and 118 genes were additionally predicted as essential. Among them, 20 were found to be homologues in mouse essential genes, indicating that some of the 118 genes were indeed essential, however previous experiments overlooked them. As the first available server, Pheg could predict essentiality for anonymous gene sequences of human. It is also hoped the λ-interval Z curve method could be effectively extended to classification issues of other DNA elements. AVAILABILITY AND IMPLEMENTATION: http://cefg.uestc.edu.cn/Pheg. CONTACT: fbguo@uestc.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Base Composition , Genes, Essential , Sequence Analysis, DNA/methods , Software , Animals , Eukaryota/genetics , Humans , Mice , Models, Genetic

5.

Efficient Reduction of Cr(VI) with Carbon Quantum Dots.

Yang, Wei-Min; Liu, Fu; Jin, Yan-Ting; Dong, Zong-Mu; Zhao, Guang-Chao.

ACS Omega ; 7(27): 23555-23565, 2022 Jul 12.

Article in English | MEDLINE | ID: mdl-35847330

ABSTRACT

Hexavalent chromium (Cr(VI)) pollution is a global problem, and the reduction of highly toxic Cr(VI) to less toxic Cr(III) is considered to be an effective method to address Cr(VI) pollution. In this study, low-toxicity carbon quantum dots (CQDs) were used to reduce Cr(VI) in wastewater. The results show that CQDs can directly reduce Cr(VI) at pH 2 and can achieve a reduction efficiency of 94% within 120 min. It is observed that under pH higher than 2, CQDs can activate peroxymonosulfate (PMS) to produce reactive oxygen species (ROS) for the reduction of Cr(VI) and the reduction efficiency can reach 99% within 120 min even under neutral conditions. The investigation of the mechanism shows that the hydroxyl groups on the surface of CQDs can be directly oxidized by Cr(VI) because of the higher redox potential of Cr(VI) at pH 2. As the pH increases, the carbonyl groups on the surface of CQDs can activate PMS to generate ROS, O2 â¢-, and 1O2, which result in Cr(VI) being reduced. To facilitate the practical application of CQDs, the treatment of Cr(VI) in real water samples by CQDs was simulated and the method reduced Cr(VI) from an initial concentration of 5 mg/L to only 8 µg/L in 150 min, which is below the California water quality standard of 10 µg/L. The study provides a new method for the removal of Cr(VI) from wastewater and a theoretical basis for practical application.

6.

T-G-A Deficiency Pattern in Protein-Coding Genes and Its Potential Reason.

Jin, Yan-Ting; Pu, Dong-Kai; Guo, Hai-Xia; Deng, Zixin; Chen, Ling-Ling; Guo, Feng-Biao.

Front Microbiol ; 13: 847325, 2022.

Article in English | MEDLINE | ID: mdl-35602045

ABSTRACT

If a stop codon appears within one gene, then its translation will be terminated earlier than expected. False folding of premature protein will be adverse to the host; hence, all functional genes would tend to avoid the intragenic stop codons. Therefore, we hypothesize that there will be less frequency of nucleotides corresponding to stop codons at each codon position of genes. Here, we validate this inference by investigating the nucleotide frequency at a large scale and results from 19,911 prokaryote genomes revealed that nucleotides coinciding with stop codons indeed have the lowest frequency in most genomes. Interestingly, genes with three types of stop codons all tend to follow a T-G-A deficiency pattern, suggesting that the property of avoiding intragenic termination pressure is the same and the major stop codon TGA plays a dominant role in this effect. Finally, a positive correlation between the TGA deficiency extent and the base length was observed in start-experimentally verified genes of Escherichia coli (E. coli). This strengthens the proof of our hypothesis. The T-G-A deficiency pattern observed would help to understand the evolution of codon usage tactics in extant organisms.

7.

Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference.

Jin, Yan-Ting; Ma, Cong; Wang, Xin; Wang, Shu-Xuan; Zhang, Kai-Yue; Zheng, Wen-Xin; Deng, Zixin; Wang, Ju; Guo, Feng-Biao.

Interdiscip Sci ; 14(2): 349-357, 2022 Jun.

Article in English | MEDLINE | ID: mdl-34817803

ABSTRACT

In 2002, our research group observed a gene clustering pattern based on the base frequency of A versus T at the second codon position in the genome of Vibrio cholera and found that the functional category distribution of genes in the two clusters was different. With the availability of a large number of sequenced genomes, we performed a systematic investigation of A2-T2 distribution and found that 2694 out of 2764 prokaryotic genomes have an optimal clustering number of two, indicating a consistent pattern. Analysis of the functional categories of the coding genes in each cluster in 1483 prokaryotic genomes indicated, that 99.33% of the genomes exhibited a significant difference (p < 0.01) in function distribution between the two clusters. Specifically, functional category P was overrepresented in the small cluster of 98.65% of genomes, whereas categories J, K, and L were overrepresented in the larger cluster of over 98.52% of genomes. Lineage analysis uncovered that these preferences appear consistently across all phyla. Overall, our work revealed an almost universal clustering pattern based on the relative frequency of A2 versus T2 and its role in functional category preference. These findings will promote the understanding of the rationality of theoretical prediction of functional classes of genes from their nucleotide sequences and how protein function is determined by DNA sequence.

Subject(s)

Proteins , Base Sequence , Cluster Analysis , Codon/genetics , Proteins/genetics

8.

Quantitative elucidation of associations between nucleotide identity and physicochemical properties of amino acids and the functional insight.

Jin, Yan-Ting; Jin, Tian-Yue; Zhang, Zhi-Li; Ye, Yuan-Nong; Deng, Zixin; Wang, Ju; Guo, Feng-Biao.

Comput Struct Biotechnol J ; 19: 4042-4048, 2021.

Article in English | MEDLINE | ID: mdl-34527183

ABSTRACT

Studies on codon property would deepen our understanding of the origin of primitive life and enlighten biotechnical application. Here, we proposed a quantitative measurement of codon-amino acid association and found that seven out of 13 physicochemical properties have stronger associations with the nucleotide identity at the second codon position, indicating that protein structure and function may associate more closely with it than the other two sites. When extending the effect of codon-amino acid association to protein level, it was found that the correlation between the second codon position (measured by the relative frequencies of nucleobase T and A at this codon site) and hydrophobicity (by the form of GRAVY value) became stronger with 96% genomes having R > 0.90 and p < 1e-60. Furthermore, we revealed that informational genes encoding proteins have lower GRAVY values than operational proteins (p < 3e-37) in both prokaryotic and eukaryotic genomes. The above results reveal a complete link from codon identity (A2 versus T2) to amino acid property (hydrophilic versus hydrophobic) and then to protein functions (informational versus operational). Hence, our work may help to understand how the nucleotide sequence determines protein function.

9.

Mutation Landscape of Base Substitutions, Duplications, and Deletions in the Representative Current Cholera Pandemic Strain.

Wei, Wen; Xiong, Lifeng; Ye, Yuan-Nong; Du, Meng-Ze; Gao, Yi-Zhou; Zhang, Kai-Yue; Jin, Yan-Ting; Yang, Zujun; Wong, Po-Chun; Lau, Susanna K P; Kan, Biao; Zhu, Jun; Woo, Patrick C Y; Guo, Feng-Biao.

Genome Biol Evol ; 10(8): 2072-2085, 2018 08 01.

Article in English | MEDLINE | ID: mdl-30060177

ABSTRACT

Pandemic cholera is a major concern for public health because of its high mortality and morbidity. Mutation accumulation (MA) experiments were performed on a representative strain of the current cholera pandemic. Although the base-pair substitution mutation rates in Vibrio cholerae (1.24 × 10-10 per site per generation for wild-type lines and 3.29 × 10-8 for mismatch repair deficient lines) are lower than that previously reported in other bacteria using MA analysis, we discovered specific high rates (8.31 × 10-8 site/generation for wild-type lines and 1.82 × 10-6 for mismatch repair deficient lines) of base duplication or deletion driven by large-scale copy number variations (CNVs). These duplication-deletions are located in two pathogenic islands, IMEX and the large integron island. Each element of these islands has discrepant rate in rapid integration and excision, which provides clues to the pandemicity evolution of V. cholerae. These results also suggest that large-scale structural variants such as CNVs can accumulate rapidly during short-term evolution. Mismatch repair deficient lines exhibit a significantly increased mutation rate in the larger chromosome (Chr1) at specific regions, and this pattern is not observed in wild-type lines. We propose that the high frequency of GATC sites in Chr1 improves the efficiency of MMR, resulting in similar rates of mutation in the wild-type condition. In addition, different mutation rates and spectra were observed in the MA lines under distinct growth conditions, including minimal media, rich media and antibiotic treatments.

Subject(s)

Base Pairing/genetics , Cholera/epidemiology , Cholera/microbiology , Gene Deletion , Gene Duplication , Pandemics , Vibrio cholerae/genetics , Chromosomes, Bacterial/genetics , Culture Media , DNA Replication Timing/drug effects , Genomic Islands , Humans , Mutation Rate , Reproducibility of Results , Rifampin/pharmacology , Vibrio cholerae/drug effects

10.

An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao.

Biomed Res Int ; 2016: 7639397, 2016.

Article in English | MEDLINE | ID: mdl-27660763

ABSTRACT

Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus, which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

11.

Genomic Complexity Places Less Restrictions on the Evolution of Young Coexpression Networks than Protein-Protein Interactions.

Wei, Wen; Jin, Yan-Ting; Du, Meng-Ze; Wang, Ju; Rao, Nini; Guo, Feng-Biao.

Genome Biol Evol ; 8(8): 2624-31, 2016 09 03.

Article in English | MEDLINE | ID: mdl-27521813

ABSTRACT

The differences in evolutionary patterns of young protein-protein interactions (PPIs) among distinct species have long been a puzzle. However, based on our genome-wide analysis of available integrated experimental data, we confirm that young genes preferentially integrate into ancestral PPI networks, and that this manner is consistent in all of six model organisms with widely different levels of phenotypic complexity. We demonstrate that the level of restrictions placed on the evolution of biological networks declines with a decrease of phenotypic complexity. Compared with young PPI networks, new co-expression links have less evolutionary restrictions, so a young gene with a high possibility to be coexpressed other young genes relatively frequently emerges in the four simpler genomes among the six studied. However, it is not favorable for such young-young coexpression in terms of a young gene evolving into a coexpression hub, so the coexpression pattern could gradually decline. To explain this apparent contradiction, we suggest that young genes that are initially peripheral to networks are temporarily coexpressed with other young genes, driving functional evolution because of low selective pressure. However, as the expression levels of genes increase and they gradually develop a greater effect on fitness, young genes start to be coexpressed more with members of ancestral networks and less with other young genes. Our findings provide new insights into the evolution of biological networks.

Subject(s)

Evolution, Molecular , Gene Regulatory Networks , Protein Interaction Maps , Animals , Archaea/genetics , Bacteria/genetics , Fungi/genetics , Genetic Fitness , Genome , Humans , Phenotype

12.

[Exploration of basic restorative dental materials teaching in the field of dental technology].

Jin, Yan-ting.

Shanghai Kou Qiang Yi Xue ; 21(6): 718-20, 2012 Dec.

Article in Zh | MEDLINE | ID: mdl-23364564

ABSTRACT

This study was to compare the difference of the existing course materials of basic restorative dental with the past materials, found out the weakness of teaching mode before the reform, and explored the reform in education through teaching content, method and evaluation, in order to improve the teaching quality.

Subject(s)

Dental Restoration, Permanent , Education, Dental , Technology, Dental , Dental Materials , Humans

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL