Pesquisa | BVS IEC

1.

Reinventing gene expression connectivity through regulatory and spatial structural empowerment via principal node aggregation graph neural network.

Yan, Fengyao; Jiang, Limin; Chen, Danqian; Ceccarelli, Michele; Guo, Yan.

Nucleic Acids Res ; 52(13): e60, 2024 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-38884259

RESUMO

The intricacies of the human genome, manifested as a complex network of genes, transcend conventional representations in text or numerical matrices. The intricate gene-to-gene relationships inherent in this complexity find a more suitable depiction in graph structures. In the pursuit of predicting gene expression, an endeavor shared by predecessors like the L1000 and Enformer methods, we introduce a novel spatial graph-neural network (GNN) approach. This innovative strategy incorporates graph features, encompassing both regulatory and structural elements. The regulatory elements include pair-wise gene correlation, biological pathways, protein-protein interaction networks, and transcription factor regulation. The spatial structural elements include chromosomal distance, histone modification and Hi-C inferred 3D genomic features. Principal Node Aggregation models, validated independently, emerge as frontrunners, demonstrating superior performance compared to traditional regression and other deep learning models. By embracing the spatial GNN paradigm, our method significantly advances the description of the intricate network of gene interactions, surpassing the performance, predictable scope, and initial requirements set by previous methods.

Assuntos

Redes Reguladoras de Genes , Redes Neurais de Computação , Humanos , Regulação da Expressão Gênica , Mapas de Interação de Proteínas/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Genoma Humano , Algoritmos , Código das Histonas

2.

Licorice extract inhibits porcine epidemic diarrhea virus in vitro and in vivo.

Bai, Wenfei; Zhu, Qinghe; Wang, Jun; Jiang, Limin; Guo, Donghua; Li, Chunqiu; Xing, Xiaoxu; Sun, Dongbo.

J Gen Virol ; 105(3)2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38471043

RESUMO

Porcine epidemic diarrhea virus (PEDV) causes severe diarrhea and even death in piglets, resulting in significant economic losses to the pig industry. Because of the ongoing mutation of PEDV, there might be variations between the vaccine strain and the prevailing strain, causing the vaccine to not offer full protection against different PEDV variant strains. Therefore, it is necessary to develop anti-PEDV drugs to compensate for vaccines. This study confirmed the anti-PEDV effect of licorice extract (Le) in vitro and in vivo. Le inhibited PEDV replication in a dose-dependent manner in vitro. By exploring the effect of Le on the life cycle of PEDV, we found that Le inhibited the attachment, internalization, and replication stages of the virus. In vivo, all five piglets in the PEDV-infected group died within 72 h. In comparison, the Le-treated group had a survival rate of 80â% at the same time, with significant relief of clinical symptoms, pathological damage, and viral loads in the jejunum and ileum. Our results suggested that Le can exert anti-PEDV effects in vitro and in vivo. Le is effective and inexpensive; therefore it has the potential to be developed as a new anti-PEDV drug.

Assuntos

Infecções por Coronavirus , Glycyrrhiza , Extratos Vegetais , Vírus da Diarreia Epidêmica Suína , Doenças dos Suínos , Vacinas Virais , Animais , Suínos , Diarreia

3.

Two-stage-vote ensemble framework based on integration of mutation data and gene interaction network for uncovering driver genes.

Kan, Yingxin; Jiang, Limin; Guo, Yan; Tang, Jijun; Guo, Fei.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34791034

RESUMO

Identifying driver genes, exactly from massive genes with mutations, promotes accurate diagnosis and treatment of cancer. In recent years, a lot of works about uncovering driver genes based on integration of mutation data and gene interaction networks is gaining more attention. However, it is in suspense if it is more effective for prioritizing driver genes when integrating various types of mutation information (frequency and functional impact) and gene networks. Hence, we build a two-stage-vote ensemble framework based on somatic mutations and mutual interactions. Specifically, we first represent and combine various kinds of mutation information, which are propagated through networks by an improved iterative framework. The first vote is conducted on iteration results by voting methods, and the second vote is performed to get ensemble results of the first poll for the final driver gene list. Compared with four excellent previous approaches, our method has better performance in identifying driver genes on $33$ types of cancer from The Cancer Genome Atlas. Meanwhile, we also conduct a comparative analysis about two kinds of mutation information, five gene interaction networks and four voting strategies. Our framework offers a new view for data integration and promotes more latent cancer genes to be admitted.

Assuntos

Redes Reguladoras de Genes , Neoplasias , Epistasia Genética , Humanos , Mutação , Neoplasias/genética , Oncogenes

4.

CoMutDB: the landscape of somatic mutation co-occurrence in cancers.

Jiang, Limin; Yu, Hui; Tang, Jijun; Guo, Yan.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36355452

RESUMO

MOTIVATION: Somatic mutation co-occurrence has been proven to have a profound effect on tumorigenesis. While some studies have been conducted on co-mutations, a centralized resource dedicated to co-mutations in cancer is still lacking. RESULTS: Using multi-omics data from over 30â000 subjects and 1747 cancer cell lines, we present the Cancer co-mutation database (CoMutDB), the most comprehensive resource devoted to describing cancer co-mutations and their characteristics. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in the online database CoMutDB: http://www.innovebioinfo.com/Database/CoMutDB/Home.php.

Assuntos

Neoplasias , Humanos , Mutação , Bases de Dados Factuais , Neoplasias/genética , Carcinogênese , Transformação Celular Neoplásica

5.

Somatic mutation effects diffused over microRNA dysregulation.

Yu, Hui; Jiang, Limin; Li, Chung-I; Ness, Scott; Piccirillo, Sara G M; Guo, Yan.

Bioinformatics ; 39(9)2023 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-37624931

RESUMO

MOTIVATION: As an important player in transcriptome regulation, microRNAs may effectively diffuse somatic mutation impacts to broad cellular processes and ultimately manifest disease and dictate prognosis. Previous studies that tried to correlate mutation with gene expression dysregulation neglected to adjust for the disparate multitudes of false positives associated with unequal sample sizes and uneven class balancing scenarios. RESULTS: To properly address this issue, we developed a statistical framework to rigorously assess the extent of mutation impact on microRNAs in relation to a permutation-based null distribution of a matching sample structure. Carrying out the framework in a pan-cancer study, we ascertained 9008 protein-coding genes with statistically significant mutation impacts on miRNAs. Of these, the collective miRNA expression for 83 genes showed significant prognostic power in nine cancer types. For example, in lower-grade glioma, 10 genes' mutations broadly impacted miRNAs, all of which showed prognostic value with the corresponding miRNA expression. Our framework was further validated with functional analysis and augmented with rich features including the ability to analyze miRNA isoforms; aggregative prognostic analysis; advanced annotations such as mutation type, regulator alteration, somatic motif, and disease association; and instructive visualization such as mutation OncoPrint, Ideogram, and interactive mRNA-miRNA network. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in MutMix, at http://innovebioinfo.com/Database/TmiEx/MutMix.php.

Assuntos

Glioma , MicroRNAs , Humanos , Difusão , MicroRNAs/genética , Mutação , RNA Mensageiro

6.

Integrative data of a novel ciliate (Alveolata, Ciliophora) propose the establishment of Heterodeviata nantongensis nov. sp.

Liao, Lijian; Jiang, Limin; Hu, Xiaozhong.

BMC Microbiol ; 24(1): 27, 2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38243176

RESUMO

BACKGROUND: As unicellular eukaryotes, ciliates are an indispensable component of micro-ecosystems that play the role of intermediate nutrition link between bacteria or algae and meiofauna. Recent faunistic studies have revealed many new taxa of hypotrich ciliates, indicating their diversity is greater than previously thought. Here we document an undescribed form isolated from an artificial brackish water pond in East China. Examination of its morphology, ontogenesis and molecular phylogeny suggests that it represents a new species. RESULTS: The morphology and morphogenesis of the new brackish-water deviatid ciliate, Heterodeviata nantongensis nov. sp., isolated from Nantong, China, were investigated using live observations and protargol staining. The diagnostic traits of the new species include three frontal cirri, one buccal cirrus, one or two parabuccal cirri, an inconspicuous frontoventral cirral row of four to six frontoventral cirri derived from two anlagen, three left and two right marginal rows, two dorsal kineties, dorsal kinety 1 with 9-14 dikinetids and dorsal kinety 2 with only two dikinetids, and one to three caudal cirri at the rear end of dorsal kinety 1. Its main morphogenetic features are: (i) the old oral apparatus is completely inherited by the proter except undulating membranes, which are reorganized in situ; (ii) anlagen for marginal rows and the left dorsal kinety develop intrakinetally in both proter and opisthe; (iii) dorsal kinety 2 is generated dorsomarginally; (iv) five cirral anlagen are formed in both proter and opisthe; (v) in the proter, anlagen I and II very likely originate from the parental undulating membranes and the buccal cirrus, respectively, anlage III from anterior parabuccal cirrus, anlage IV originates from the parental frontoventral cirri and anlage V from the innermost parental right marginal row; and (vi) anlagen I-IV of the opisthe are all generated from oral primordium, anlage V from the innermost parental right marginal row. Phylogenetic analyses based on SSU rRNA gene sequence data were performed to determine the systematic position of the new taxon. CONCLUSIONS: The study on the morphology, and ontogenesis of a new brackish-water taxon increases the overall knowledge about the biodiversity of this ciliate group. It also adds to the genetic data available and further provides a reliable reference for environmental monitoring and resource investigations.

Assuntos

Alveolados , Cilióforos , Filogenia , Ecossistema , China , Água

7.

SBSA: an online service for somatic binding sequence annotation.

Jiang, Limin; Guo, Fei; Tang, Jijun; Yu, Hui; Ness, Scott; Duan, Mingrui; Mao, Peng; Zhao, Ying-Yong; Guo, Yan.

Nucleic Acids Res ; 50(1): e4, 2022 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-34606615

RESUMO

Efficient annotation of alterations in binding sequences of molecular regulators can help identify novel candidates for mechanisms study and offer original therapeutic hypotheses. In this work, we developed Somatic Binding Sequence Annotator (SBSA) as a full-capacity online tool to annotate altered binding motifs/sequences, addressing diverse types of genomic variants and molecular regulators. The genomic variants can be somatic mutation, single nucleotide polymorphism, RNA editing, etc. The binding motifs/sequences involve transcription factors (TFs), RNA-binding proteins, miRNA seeds, miRNA-mRNA 3'-UTR binding target, or can be any custom motifs/sequences. Compared to similar tools, SBSA is the first to support miRNA seeds and miRNA-mRNA 3'-UTR binding target, and it unprecedentedly implements a personalized genome approach that accommodates joint adjacent variants. SBSA is empowered to support an indefinite species, including preloaded reference genomes for SARS-Cov-2 and 25 other common organisms. We demonstrated SBSA by annotating multi-omics data from over 30,890 human subjects. Of the millions of somatic binding sequences identified, many are with known severe biological repercussions, such as the somatic mutation in TERT promoter region which causes a gained binding sequence for E26 transformation-specific factor (ETS1). We further validated the function of this TERT mutation using experimental data in cancer cells. Availability:http://innovebioinfo.com/Annotation/SBSA/SBSA.php.

Assuntos

COVID-19/virologia , Biologia Computacional/instrumentação , Genômica/instrumentação , Mutação , Proteômica/instrumentação , SARS-CoV-2 , Regiões 3' não Traduzidas , Algoritmos , Motivos de Aminoácidos , COVID-19/metabolismo , Biologia Computacional/métodos , Computadores , Técnicas Genéticas , Genoma Humano , Genômica/métodos , Humanos , Internet , MicroRNAs/metabolismo , Fenótipo , Regiões Promotoras Genéticas , Ligação Proteica , Proteômica/métodos , Proteína Proto-Oncogênica c-ets-1/genética , Proteína Proto-Oncogênica c-ets-1/metabolismo , Proteínas de Ligação a RNA/metabolismo , Telomerase/metabolismo

8.

Predicting MHC class I binder: existing approaches and a novel recurrent neural network solution.

Jiang, Limin; Yu, Hui; Li, Jiawei; Tang, Jijun; Guo, Yan; Guo, Fei.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34131696

RESUMO

Major histocompatibility complex (MHC) possesses important research value in the treatment of complex human diseases. A plethora of computational tools has been developed to predict MHC class I binders. Here, we comprehensively reviewed 27 up-to-date MHC I binding prediction tools developed over the last decade, thoroughly evaluating feature representation methods, prediction algorithms and model training strategies on a benchmark dataset from Immune Epitope Database. A common limitation was identified during the review that all existing tools can only handle a fixed peptide sequence length. To overcome this limitation, we developed a bilateral and variable long short-term memory (BVLSTM)-based approach, named BVLSTM-MHC. It is the first variable-length MHC class I binding predictor. In comparison to the 10 mainstream prediction tools on an independent validation dataset, BVLSTM-MHC achieved the best performance in six out of eight evaluated metrics. A web server based on the BVLSTM-MHC model was developed to enable accurate and efficient MHC class I binder prediction in human, mouse, macaque and chimpanzee.

Assuntos

Sítios de Ligação , Proteínas de Transporte/química , Biologia Computacional/métodos , Antígenos de Histocompatibilidade Classe I/química , Redes Neurais de Computação , Software , Sequência de Aminoácidos , Proteínas de Transporte/metabolismo , Bases de Dados Factuais , Aprendizado Profundo , Epitopos/química , Epitopos/imunologia , Epitopos/metabolismo , Antígenos de Histocompatibilidade Classe I/imunologia , Antígenos de Histocompatibilidade Classe I/metabolismo , Aprendizado de Máquina , Ligação Proteica , Curva ROC , Reprodutibilidade dos Testes , Navegador

9.

Detecting SARS-CoV-2 and its variant strains with a full genome tiling array.

Jiang, Limin; Guo, Yan; Yu, Hui; Hoff, Kendal; Ding, Xun; Zhou, Wei; Edwards, Jeremy.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34097003

RESUMO

Coronavirus disease 2019 pandemic is the most damaging pandemic in recent human history. Rapid detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and variant strains is paramount for recovery from this pandemic. Conventional SARS-CoV-2 tests interrogate only limited regions of the whole SARS-CoV-2 genome, which are subjected to low specificity and miss the opportunity of detecting variant strains. In this work, we developed the first SARS-CoV-2 tiling array that captures the entire SARS-CoV-2 genome at single nucleotide resolution and offers the opportunity to detect point mutations. A thorough bioinformatics protocol of two base calling methods has been developed to accompany this array. To demonstrate the effectiveness of the tiling array, we genotyped all genomic positions of eight SARS-CoV-2 samples. Using high-throughput sequencing as the benchmark, we show that the tiling array had a genome-wide accuracy of at least 99.5%. From the tiling array analysis results, we identified the D614G mutation in the spike protein in four of the eight samples, suggesting the widespread distribution of this variant at the early stage of the outbreak in the United States. Two additional nonsynonymous mutations were identified in one sample in the nucleocapsid protein (P13L and S197L), which may complicate future vaccine development. With around $5 per array, supreme accuracy, and an ultrafast bioinformatics protocol, the SARS-CoV-2 tiling array makes an invaluable toolkit for combating current and future pandemics. Our SARS-CoV-2 tilting array is currently utilized by Molecular Vision, a CLIA-certified lab for SARS-CoV-2 diagnosis.

Assuntos

Teste para COVID-19 , COVID-19/genética , Genômica , SARS-CoV-2/genética , COVID-19/virologia , Genoma Viral/genética , Humanos , Mutação/genética , SARS-CoV-2/patogenicidade

10.

Deep neural network based tissue deconvolution of circulating tumor cell RNA.

Yan, Fengyao; Jiang, Limin; Ye, Fei; Ping, Jie; Bowley, Tetiana Y; Ness, Scott A; Li, Chung-I; Marchetti, Dario; Tang, Jijun; Guo, Yan.

J Transl Med ; 21(1): 783, 2023 11 04.

Artigo em Inglês | MEDLINE | ID: mdl-37925448

RESUMO

Prior research has shown that the deconvolution of cell-free RNA can uncover the tissue origin. The conventional deconvolution approaches rely on constructing a reference tissue-specific gene panel, which cannot capture the inherent variation present in actual data. To address this, we have developed a novel method that utilizes a neural network framework to leverage the entire training dataset. Our approach involved training a model that incorporated 15 distinct tissue types. Through one semi-independent and two complete independent validations, including deconvolution using a semi in silico dataset, deconvolution with a custom normal tissue mixture RNA-seq data, and deconvolution of longitudinal circulating tumor cell RNA-seq (ctcRNA) data from a cancer patient with metastatic tumors, we demonstrate the efficacy and advantages of the deep-learning approach which were exerted by effectively capturing the inherent variability present in the dataset, thus leading to enhanced accuracy. Sensitivity analyses reveal that neural network models are less susceptible to the presence of missing data, making them more suitable for real-world applications. Moreover, by leveraging the concept of organotropism, we applied our approach to trace the migration of circulating tumor cell-derived RNA (ctcRNA) in a cancer patient with metastatic tumors, thereby highlighting the potential clinical significance of early detection of cancer metastasis.

Assuntos

Células Neoplásicas Circulantes , RNA , Humanos , Redes Neurais de Computação , RNA-Seq , Análise de Sequência de RNA

11.

New contributions to the Cyrtophoria ciliates (Protista, Ciliophora): Establishment of new taxa and phylogenetic analyses using two ribosomal genes.

Wang, Congcong; Jiang, Limin; Pan, Hongbo; Warren, Alan; Hu, Xiaozhong.

J Eukaryot Microbiol ; 70(1): e12938, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-35892241

RESUMO

Periphytic ciliates play a vital role in the material cycle and energy flow of microbial food web, however, their taxonomy and biodiversity are inadequately studied given their high species richness. Two new and one little known species, viz. Derouxella lembodes gen. et sp. nov., Cyrtophoron multivacuolatum sp. nov., and Cyrtophoron apsheronica Aliev, 1991, collected from coastal waters of China, were investigated using modern methods. Derouxella gen. nov. can be recognized by having dorsoventrally flattened body, a podite, one fragmented preoral kinety, two parallel circumoral kineties, and somatic kineties progressively shortened from right to left. Morphological classification and phylogenetic analyses based on nuclear small subunit ribosomal RNA (nSSU rRNA) and mitochondrial small subunit ribosomal RNA (mtSSU rRNA) gene sequence data inferred that Derouxella gen. nov. occupies an intermediate position between Hartmannulidae and Dysteriidae. Cyrtophoron multivacuolatum sp. nov. is characterized by large body size, the numbers of somatic kineties and nematodesmal rods, and having numerous contractile vacuoles. The genus Cyrtophoron and the poorly known species C. apsheronica were redefined. Even with the addition of newly obtained nSSU rRNA and mtSSU rRNA gene sequences of Cyrtophoron, the family Chlamydodontidae was still recovered as a monophyletic group, the monophyly of Cyrtophoron was supported too.

Assuntos

Cilióforos , Filogenia , Cilióforos/genética , Genes de RNAr , RNA Ribossômico , China , DNA Ribossômico/genética

12.

Comprehensive Pan-Cancer Mutation Density Patterns in Enhancer RNA.

Zhang, Troy; Yu, Hui; Jiang, Limin; Bai, Yongsheng; Liu, Xiaoyi; Guo, Yan.

Int J Mol Sci ; 25(1)2023 Dec 30.

Artigo em Inglês | MEDLINE | ID: mdl-38203707

RESUMO

Significant advances have been achieved in understanding the critical role of enhancer RNAs (eRNAs) in the complex field of gene regulation. However, notable uncertainty remains concerning the biology of eRNAs, highlighting the need for continued research to uncover their exact functions in cellular processes and diseases. We present a comprehensive study to scrutinize mutation density patterns, mutation strand bias, and mutation burden in eRNAs across multiple cancer types. Our findings reveal that eRNAs exhibit mutation strand bias akin to that observed in protein-coding RNAs. We also identified a novel pattern, in which mutation density is notably diminished around the central region of the eRNA, but conspicuously elevated towards both the beginning and end. This pattern can be potentially explained by a mechanism involving heightened transcriptional activity and the activation of transcription-coupled repair. The central regions of the eRNAs appear to be more conserved, hinting at a potential mechanism preserving their structural and functional integrity, while the extremities may be more susceptible to mutations due to increased exposure. The evolutionary trajectory of this mutational pattern suggests a nuanced adaptation in eRNAs, where stability at their core coexists with flexibility at their extremities, potentially facilitating their diverse interactions with other genetic entities.

Assuntos

RNAs Intensificadores , Neoplasias , Humanos , Evolução Biológica , Reparo por Excisão , Mutação , Neoplasias/genética , Agitação Psicomotora

13.

Highly Efficient Selective Hydrogenation of Cinnamaldehyde to Cinnamyl Alcohol over CoRe/TiO₂ Catalyst.

Chen, Mengting; Wang, Yun; Jiang, Limin; Cheng, Yuran; Liu, Yingxin; Wei, Zuojun.

Molecules ; 28(8)2023 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-37110570

RESUMO

Allylic alcohols typically produced through selective hydrogenation of α,ß-unsaturated aldehydes are important intermediates in fine chemical industry, but it is still a challenge to achieve its high selectivity transformation. Herein, we report a series of TiO2-supported CoRe bimetallic catalysts for the selective hydrogenation of cinnamaldehyde (CAL) to cinnamyl alcohol (COL) using formic acid (FA) as a hydrogen donor. The resultant catalyst with the optimized Co/Re ratio of 1:1 can achieve an exceptional COL selectivity of 89% with a CAL conversion of 99% under mild conditions of 140 °C for 4 h, and the catalyst can be reused four times without loss of activity. Meanwhile, the Co1Re1/TiO2/FA system was efficient for the selective hydrogenation of various α,ß-unsaturated aldehydes to the corresponding α,ß-unsaturated alcohols. The presence of ReOx on the Co1Re1/TiO2 catalyst surface was advantageous to the adsorption of C=O, and the ultrafine Co nanoparticles provided abundant hydrogenation active sites for the selective hydrogenation. Moreover, FA as a hydrogen donor improved the selectivity to α,ß-unsaturated alcohols.

14.

EditPredict: Prediction of RNA editable sites with convolutional neural network.

Wang, Jiandong; Ness, Scott; Brown, Roger; Yu, Hui; Oyebamiji, Olufunmilola; Jiang, Limin; Sheng, Quanhu; Samuels, David C; Zhao, Ying-Yong; Tang, Jijun; Guo, Yan.

Genomics ; 113(6): 3864-3871, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34562567

RESUMO

RNA editing exerts critical impacts on numerous biological processes. While millions of RNA editings have been identified in humans, much more are expected to be discovered. In this work, we constructed Convolutional Neural Network (CNN) models to predict human RNA editing events in both Alu regions and non-Alu regions. With a validation dataset resulting from CRISPR/Cas9 knockout of the ADAR1 enzyme, the validation accuracies reached 99.5% and 93.6% for Alu and non-Alu regions, respectively. We ported our CNN models in a web service named EditPredict. EditPredict not only works on reference genome sequences but can also take into consideration single nucleotide variants in personal genomes. In addition to the human genome, EditPredict tackles other model organisms including bumblebee, fruitfly, mouse, and squid genomes. EditPredict can be used stand-alone to predict novel RNA editing and it can be used to assist in filtering for candidate RNA editing detected from RNA-Seq data.

Assuntos

Redes Neurais de Computação , Edição de RNA , Animais , Genoma , RNA , RNA-Seq

15.

A sequence-based multiple kernel model for identifying DNA-binding proteins.

Qian, Yuqing; Jiang, Limin; Ding, Yijie; Tang, Jijun; Guo, Fei.

BMC Bioinformatics ; 22(Suppl 3): 291, 2021 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-34058979

RESUMO

BACKGROUND: DNA-Binding Proteins (DBP) plays a pivotal role in biological system. A mounting number of researchers are studying the mechanism and detection methods. To detect DBP, the tradition experimental method is time-consuming and resource-consuming. In recent years, Machine Learning methods have been used to detect DBP. However, it is difficult to adequately describe the information of proteins in predicting DNA-binding proteins. In this study, we extract six features from protein sequence and use Multiple Kernel Learning-based on Centered Kernel Alignment to integrate these features. The integrated feature is fed into Support Vector Machine to build predictive model and detect new DBP. RESULTS: In our work, date sets of PDB1075 and PDB186 are employed to test our method. From the results, our model obtains better results (accuracy) than other existing methods on PDB1075 ([Formula: see text]) and PDB186 ([Formula: see text]), respectively. CONCLUSION: Multiple kernel learning could fuse the complementary information between different features. Compared with existing methods, our method achieves comparable and best results on benchmark data sets.

Assuntos

Proteínas de Ligação a DNA , Máquina de Vetores de Suporte , Aprendizado de Máquina

16.

Correction to: LightCpG: a multi-view CpG sites detection on single-cell whole genome sequence data.

Jiang, Limin; Wang, Chongqing; Tang, Jijun; Guo, Fei.

BMC Genomics ; 20(1): 365, 2019 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-31084602

RESUMO

Following the publication of this article [1], the authors reported that the images of Fig. 2 and Fig. 3 were switched during typesetting.

17.

LightCpG: a multi-view CpG sites detection on single-cell whole genome sequence data.

Jiang, Limin; Wang, Chongqing; Tang, Jijun; Guo, Fei.

BMC Genomics ; 20(1): 306, 2019 Apr 23.

Artigo em Inglês | MEDLINE | ID: mdl-31014252

RESUMO

BACKGROUND: DNA methylation plays an important role in multiple biological processes that are closely related to human health. The study of DNA methylation can provide an insight into the mechanism behind human health and can also have a positive effect on the assessment of human health status. However, the available sequencing technology is limited by incomplete CpG coverage. Therefore, it is crucial to discover an efficient and convenient method capable of distinguishing between the states of CpG sites. Previous studies focused on identifying methylation states of the CpG sites in single cell, which only evaluated sequence information or structural information. RESULTS: In this paper, we propose a novel model, LightCpG, which combines the positional features with the sequence and structural features to provide information on the CpG sites at two stages. Next, we used the LightGBM model for training of the CpG site identification, and further utilized sample extraction and merged features to reduce the training time. Our results indicate that our method achieves outstanding performance in recognition of DNA methylation. The average AUC values of our method using the 25 human hepatocellular carcinoma cells (HCC) cell datasets and six human heptoplastoma-derived (HepG2) cell datasets were 0.9616 and 0.9213, respectively. Moreover, the average training times for our method on the HCC and HepG2 datasets were 8.3 and 5.06 s, respectively. Furthermore, the computational complexity of our model was much lower compared with other available methods that detect methylation states of the CpG sites. CONCLUSIONS: In summary, LightCpG is an accurate model for identifying the DNA methylation status of CpG sites in single cells. Furthermore, three types of feature extraction methods and two strategies used in LightCpG are helpful for other prediction problems.

Assuntos

Ilhas de CpG/genética , Análise de Célula Única , Sequenciamento Completo do Genoma , Carcinoma Hepatocelular/patologia , Células Hep G2 , Humanos , Neoplasias Hepáticas/patologia

18.

FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association.

Jiang, Limin; Xiao, Yongkang; Ding, Yijie; Tang, Jijun; Guo, Fei.

BMC Genomics ; 19(Suppl 10): 911, 2018 Dec 31.

Artigo em Inglês | MEDLINE | ID: mdl-30598109

RESUMO

BACKGROUND: In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations. Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations. RESULT: First, we extract three miRNA similarity kernels and three disease similarity kernels. Then, we combine these kernels into a single kernel through the Fast Kernel Learning (FKL) model, and use sparse kernel (Spa) to eliminate noise in the integrated similarity kernel. Finally, we find the associations via Laplacian Regularized Least Squares (LapRLS). Based on three evaluation methods, global and local leave-one-out cross validation (LOOCV), and 5-fold cross validation, the AUCs of our method achieve 0.9563, 0.8398 and 0.9535, thus it can be seen that our method is reliable. Then, we use case studies of eight neoplasms to further analyze the performance of our method. We find that most of the predicted miRNA-disease associations are confirmed by previous traditional experiments, and some important miRNAs should be paid more attention, which uncover more associations of various neoplasms than other miRNAs. CONCLUSIONS: Our proposed model can reveal miRNA-disease associations and improve the accuracy of correlation prediction for various diseases. Our method can be also easily extended with more similarity kernels.

Assuntos

Doença/genética , Estudos de Associação Genética/métodos , MicroRNAs/genética , Biologia Computacional , Humanos

19.

A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.

Pan, Gaofeng; Jiang, Limin; Tang, Jijun; Guo, Fei.

Int J Mol Sci ; 19(2)2018 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-29419752

RESUMO

DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

Assuntos

Biologia Computacional/métodos , Metilação de DNA , DNA/química , DNA/genética , Algoritmos , Animais , Teorema de Bayes , Bases de Dados de Ácidos Nucleicos , Células-Tronco Embrionárias/metabolismo , Camundongos , Curva ROC , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Máquina de Vetores de Suporte

20.

Mutational screening of GLI3, SHH, preZRS, and ZRS in 102 Chinese children with nonsyndromic polydactyly.

Xiang, Ying; Jiang, Limin; Wang, Bo; Xu, Yunlan; Cai, Haiqing; Fu, Qihua.

Dev Dyn ; 246(5): 392-402, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28127823

RESUMO

BACKGROUND: Polydactyly is a group of congenital limb malformations that show high degree of phenotypic variability and genetic heterogeneity. RESULTS: In the present study, four genomic regions (exons of GLI3, SHH, and noncoding sequences of preZRS and ZRS) involved in hedgehog (Hh) signaling pathway were sequenced for 102 unrelated Chinese children with nonsyndromic polydactyly. Two GLI3 variants (c.2844 G > G/A; c.1486C > C/T) and four preZRS variants (chr7:156585336 A>G; chr7:156585421 C>A; chr7: 156585247 G>C; chr7:156585420 A > C) were observed in 2(2.0%) and 6(5.9%) patients, respectively. These variants are not over-represented in the Chinese healthy population. All the 8 cases showed preaxial polydactyly in hands. Additionally, no specific patterns of malformation predicted mutations in other candidate genes or sequences. CONCLUSIONS: This is the first report of the assessment of the frequency of GLI3/SHH/preZRS/ZRS in Chinese patients to show any higher possibility of mutations or variants for the 4 genes or sequences in China. Developmental Dynamics 246:392-402, 2017. © 2017 Wiley Periodicals, Inc.

Assuntos

Proteínas Hedgehog/genética , Fatores de Transcrição Kruppel-Like/genética , Proteínas de Membrana/genética , Proteínas do Tecido Nervoso/genética , Polidactilia/genética , Adolescente , Criança , Pré-Escolar , Feminino , Frequência do Gene , Testes Genéticos , Deformidades Congênitas da Mão/genética , Proteínas Hedgehog/metabolismo , Humanos , Masculino , Mutação , Polidactilia/epidemiologia , Análise de Sequência de DNA , Transdução de Sinais/genética , Proteína Gli3 com Dedos de Zinco

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA