Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33279983

RESUMO

The protein Yin Yang 1 (YY1) could form dimers that facilitate the interaction between active enhancers and promoter-proximal elements. YY1-mediated enhancer-promoter interaction is the general feature of mammalian gene control. Recently, some computational methods have been developed to characterize the interactions between DNA elements by elucidating important features of chromatin folding; however, no computational methods have been developed for identifying the YY1-mediated chromatin loops. In this study, we developed a deep learning algorithm named DeepYY1 based on word2vec to determine whether a pair of YY1 motifs would form a loop. The proposed models showed a high prediction performance (AUCs$\ge$0.93) on both training datasets and testing datasets in different cell types, demonstrating that DeepYY1 has an excellent performance in the identification of the YY1-mediated chromatin loops. Our study also suggested that sequences play an important role in the formation of YY1-mediated chromatin loops. Furthermore, we briefly discussed the distribution of the replication origin site in the loops. Finally, a user-friendly web server was established, and it can be freely accessed at http://lin-group.cn/server/DeepYY1.


Assuntos
Cromatina/metabolismo , Bases de Dados Factuais , Aprendizado Profundo , Modelos Biológicos , Fator de Transcrição YY1/metabolismo , Células HCT116 , Humanos , Células K562
2.
Brief Bioinform ; 21(3): 982-995, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-31157855

RESUMO

5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.


Assuntos
5-Metilcitosina/metabolismo , Biologia Computacional/métodos , Algoritmos , Animais , Arabidopsis/metabolismo , Conjuntos de Dados como Assunto , Humanos , Camundongos , Saccharomyces cerevisiae/metabolismo
3.
Curr Genomics ; 21(1): 11-25, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32655294

RESUMO

MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.

4.
Sci Rep ; 14(1): 5274, 2024 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-38438393

RESUMO

Hepatocellular carcinoma (HCC) remains a formidable malignancy that significantly impacts human health, and the early diagnosis of HCC holds paramount importance. Therefore, it is imperative to develop an efficacious signature for the early diagnosis of HCC. In this study, we aimed to develop early HCC predictors (eHCC-pred) using machine learning-based methods and compare their performance with existing methods. The enhancements and advancements of eHCC-pred encompassed the following: (i) utilization of a substantial number of samples, including an increased representation of cirrhosis tissues without HCC (CwoHCC) samples for model training and augmented numbers of HCC and CwoHCC samples for model validation; (ii) incorporation of two feature selection methods, namely minimum redundancy maximum relevance and maximum relevance maximum distance, along with the inclusion of eight machine learning-based methods; (iii) improvement in the accuracy of early HCC identification, elevating it from 78.15 to 97% using identical independent datasets; and (iv) establishment of a user-friendly web server. The eHCC-pred is freely accessible at http://www.dulab.com.cn/eHCC-pred/ . Our approach, eHCC-pred, is anticipated to be robustly employed at the individual level for facilitating early HCC diagnosis in clinical practice, surpassing currently available state-of-the-art techniques.


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/diagnóstico , Neoplasias Hepáticas/diagnóstico , Diagnóstico Precoce , Cirrose Hepática , Aprendizado de Máquina , Prednisona
5.
Med Chem ; 16(5): 594-604, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31584374

RESUMO

Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.


Assuntos
Aprendizado de Máquina , Receptores Citoplasmáticos e Nucleares/genética , Animais , Biologia Computacional , Humanos , Receptores Citoplasmáticos e Nucleares/metabolismo
6.
Front Cell Dev Biol ; 8: 582864, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33178697

RESUMO

Pancreatic ductal adenocarcinoma (PDAC) is an aggressive and lethal cancer deeply affecting human health. Diagnosing early-stage PDAC is the key point to PDAC patients' survival. However, the biomarkers for diagnosing early PDAC are inexact in most cases. Therefore, it is highly desirable to identify an effective PDAC diagnostic biomarker. In the current work, we designed a novel computational approach based on within-sample relative expression orderings (REOs). A feature selection technique called minimum redundancy maximum relevance was used to pick out optimal REOs. We then compared the performances of different classification algorithms for discriminating PDAC and its adjacent normal tissues from non-PDAC tissues. The support vector machine algorithm is the best one for identifying early PDAC diagnostic biomarker. At first, a signature composed of nine gene pairs was acquired from microarray gene expression data sets. These gene pairs could produce satisfactory classification accuracy up to 97.53% in fivefold cross-validation. Subsequently, two types of data from diverse platforms, namely, microarray and RNA-Seq, were used to validate this signature. For microarray data, all (100.00%) of 115 PDAC tissues and all (100.00%) of 31 PDAC adjacent normal tissues were correctly recognized as PDAC. In addition, 88.24% of 17 non-PDAC (normal or pancreatitis) tissues were correctly classified. For the RNA-Seq data, all (100.00%) of 177 PDAC tissues and all (100.00%) of 4 PDAC adjacent normal tissues were correctly recognized as PDAC. Validation results demonstrated that the signature had a good cross-platform effect for early detection of PDAC. This work developed a new robust signature that might be a promising biomarker for early PDAC diagnosis.

7.
Artigo em Inglês | MEDLINE | ID: mdl-32292778

RESUMO

Hepatocellular carcinoma (HCC) is a serious cancer which ranked the fourth in cancer-related death worldwide. Hence, more accurate diagnostic models are urgently needed to aid the early HCC diagnosis under clinical scenarios and thus improve HCC treatment and survival. Several conventional methods have been used for discriminating HCC from cirrhosis tissues in patients without HCC (CwoHCC). However, the recognition successful rates are still far from satisfactory. In this study, we applied a computational approach that based on machine learning method to a set of microarray data generated from 1091 HCC samples and 242 CwoHCC samples. The within-sample relative expression orderings (REOs) method was used to extract numerical descriptors from gene expression profiles datasets. After removing the unrelated features by using maximum redundancy minimum relevance (mRMR) with incremental feature selection, we achieved "11-gene-pair" which could produce outstanding results. We further investigated the discriminate capability of the "11-gene-pair" for HCC recognition on several independent datasets. The wonderful results were obtained, demonstrating that the selected gene pairs can be signature for HCC. The proposed computational model can discriminate HCC and adjacent non-cancerous tissues from CwoHCC even for minimum biopsy specimens and inaccurately sampled specimens, which can be practical and effective for aiding the early HCC diagnosis at individual level.

8.
Med Chem ; 16(5): 605-619, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31584379

RESUMO

Mycobacterium tuberculosis (MTB) can cause the terrible tuberculosis (TB), which is reported as one of the most dreadful epidemics. Although many biochemical molecular drugs have been developed to cope with this disease, the drug resistance-especially the multidrug-resistant (MDR) and extensively drug-resistance (XDR)-poses a huge threat to the treatment. However, traditional biochemical experimental method to tackle TB is time-consuming and costly. Benefited by the appearance of the enormous genomic and proteomic sequence data, TB can be treated via sequence-based biological computational approach-bioinformatics. Studies on predicting subcellular localization of mycobacterial protein (MBP) with high precision and efficiency may help figure out the biological function of these proteins and then provide useful insights for protein function annotation as well as drug design. In this review, we reported the progress that has been made in computational prediction of subcellular localization of MBP including the following aspects: 1) Construction of benchmark datasets. 2) Methods of feature extraction. 3) Techniques of feature selection. 4) Application of several published prediction algorithms. 5) The published results. 6) The further study on prediction of subcellular localization of MBP.


Assuntos
Proteínas de Bactérias/genética , Aprendizado de Máquina , Mycobacterium tuberculosis/genética , Proteínas de Bactérias/metabolismo , Biologia Computacional , Mycobacterium tuberculosis/metabolismo
9.
Fitoterapia ; 146: 104674, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32561423

RESUMO

Three new sesquiterpenoids (1-3) and four new benzofuran dimers (+)-4 and (-)-4, (+)-5 and (-)-5, and four known benzofuran dimers (+)-6 and (-)-6, (+)-7 and (-)-7 were isolated from the underground parts of Eupatorium chinense. The enantiomers of racemates (±)-4 ~ (±)-7 were separated by chiral HPLC columns, and their absolute configurations were determined by circular dichroism experiments. The structures of all new compounds were elucidated on the basis of their NMR, and MS data as well as by comparison with literature values. The all of the isolated compounds were tested in vitro for their cytotoxic activities against the Caski, MDA-MB-231 and HepG2 cancer cell lines.


Assuntos
Antineoplásicos Fitogênicos/farmacologia , Benzofuranos/farmacologia , Eupatorium/química , Sesquiterpenos/farmacologia , Antineoplásicos Fitogênicos/isolamento & purificação , Benzofuranos/isolamento & purificação , China , Células Hep G2 , Humanos , Estrutura Molecular , Compostos Fitoquímicos/isolamento & purificação , Compostos Fitoquímicos/farmacologia , Raízes de Plantas/química , Sesquiterpenos/isolamento & purificação
10.
Curr Pharm Des ; 25(40): 4264-4273, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31696804

RESUMO

Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.


Assuntos
Biologia Computacional , Proteínas Luminescentes/química , Aprendizado de Máquina
11.
Math Biosci Eng ; 16(4): 2466-2480, 2019 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-31137222

RESUMO

The soluble carrier hormone binding protein (HBP) plays an important role in the growth of human and other animals. HBP can also selectively and non-covalently interact with hormone. Therefore, accurate identification of HBP is an important prerequisite for understanding its biological functions and molecular mechanisms. Since experimental methods are still labor intensive and cost ineffective to identify HBP, it's necessary to develop computational methods to accurately and efficiently identify HBP. In this paper, a machine learning-based method was proposed to identify HBP, in which the samples were encoded by using the optimal tripeptide composition obtained based on the binomial distribution method. In the 5-fold cross-validation test, the proposed method yielded an overall accuracy of 97.15%. For the convenience of scientific community, a user-friendly webserver called HBPred2.0 was built, which could be freely accessed at http://lin-group.cn/server/HBPred2.0/.


Assuntos
Proteínas de Transporte/química , Biologia Computacional/métodos , Hormônios/química , Aprendizado de Máquina , Algoritmos , Aminoácidos/química , Análise de Variância , Animais , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Peptídeos/química , Reprodutibilidade dos Testes , Software , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA