Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Brief Funct Genomics ; 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38860675

RESUMO

In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.

2.
Front Microbiol ; 15: 1292004, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38357350

RESUMO

Depression is one of the most prevalent mental disorders today. Over the past decade, there has been considerable attention given to the field of gut microbiota associated with depression. A substantial body of research indicates a bidirectional communication pathway between gut microbiota and the brain. In this review, we extensively detail the correlation between gut microbiota, including Lactobacillus acidophilus and Bifidobacterium longum, and metabolites such as short-chain fatty acids (SCFAs) and 5-hydroxytryptamine (5-HT) concerning depression. Furthermore, we delve into the potential health benefits of microbiome-targeted therapies, encompassing probiotics, prebiotics, and synbiotics, in alleviating depression. Lastly, we underscore the importance of employing a constraint-based modeling framework in the era of systems medicine to contextualize metabolomic measurements and integrate multi-omics data. This approach can offer valuable insights into the complex metabolic host-microbiota interactions, enabling personalized recommendations for potential biomarkers, novel drugs, and treatments for depression.

3.
Methods ; 222: 142-151, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38242383

RESUMO

Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.


Assuntos
Benchmarking , Inibidores da Bomba de Prótons , Redes Neurais de Computação , Software
4.
Comput Biol Med ; 170: 107937, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38217975

RESUMO

Heterogeneous data, especially a mixture of numerical and categorical data, widely exist in bioinformatics. Most of works focus on defining new distance metrics rather than learning discriminative metrics for mixed data. Here, we create a new support vector heterogeneous metric learning framework for mixed data. A heterogeneous sample pair kernel is defined for mixed data and metric learning is then converted to a sample pair classification problem. The suggested approach lends itself well to effective resolution through conventional support vector machine solvers. Empirical assessments conducted on mixed data benchmarks and cancer datasets affirm the exceptional efficacy demonstrated by the proposed modeling technique.


Assuntos
Algoritmos , Biologia Computacional , Máquina de Vetores de Suporte
5.
Comput Biol Med ; 168: 107762, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38056212

RESUMO

Antibiotic resistance continues to be a growing concern for global health, accentuating the need for novel antibiotic discoveries. Traditional methodologies in this field have relied heavily on extensive experimental screening, which is often time-consuming and costly. Contrastly, computer-assisted drug screening offers rapid, cost-effective solutions. In this work, we propose FIAMol-AB, a deep learning model that combines graph neural networks, text convolutional networks and molecular fingerprint techniques. This method also combines an attention mechanism to fuse multiple forms of information within the model. The experiments show that FIAMol-AB may offer potential advantages in antibiotic discovery tasks over some existing methods. We conducted some analysis based on our model's results, which help highlight the potential significance of certain features in the model's predictive performance. Compared to different models, ours demonstrate promising results, indicating potential robustness and versatility. This suggests that by integrating multi-view information and attention mechanisms, FIAMol-AB might better learn complex molecular structures, potentially improving the precision and efficiency of antibiotic discovery. We hope our FIAMol-AB can be used as a useful method in the ongoing fight against antibiotic resistance.


Assuntos
Aprendizado Profundo , Antibacterianos/farmacologia , Avaliação Pré-Clínica de Medicamentos , Redes Neurais de Computação
6.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37779250

RESUMO

The microbiota-gut-brain axis denotes a two-way system of interactions between the gut and the brain, comprising three key components: (1) gut microbiota, (2) intermediates and (3) mental ailments. These constituents communicate with one another to induce changes in the host's mood, cognition and demeanor. Knowledge concerning the regulation of the host central nervous system by gut microbiota is fragmented and mostly confined to disorganized or semi-structured unrestricted texts. Such a format hinders the exploration and comprehension of unknown territories or the further advancement of artificial intelligence systems. Hence, we collated crucial information by scrutinizing an extensive body of literature, amalgamated the extant knowledge of the microbiota-gut-brain axis and depicted it in the form of a knowledge graph named MMiKG, which can be visualized on the GraphXR platform and the Neo4j database, correspondingly. By merging various associated resources and deducing prospective connections between gut microbiota and the central nervous system through MMiKG, users can acquire a more comprehensive perception of the pathogenesis of mental disorders and generate novel insights for advancing therapeutic measures. As a free and open-source platform, MMiKG can be accessed at http://yangbiolab.cn:8501/ with no login requirement.


Assuntos
Transtornos Mentais , Microbiota , Humanos , Inteligência Artificial , Reconhecimento Automatizado de Padrão , Estudos Prospectivos , Encéfalo
7.
Comput Struct Biotechnol J ; 20: 3268-3279, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35832615

RESUMO

Lysine crotonylation (Kcr) is a newly discovered protein post-translational modification and has been proved to be widely involved in various biological processes and human diseases. Thus, the accurate and fast identification of this modification became the preliminary task in investigating the related biological functions. Due to the long duration, high cost and intensity of traditional high-throughput experimental techniques, constructing bioinformatics predictors based on machine learning algorithms is treated as a most popular solution. Although dozens of predictors have been reported to identify Kcr sites, only two, nhKcr and DeepKcrot, focused on human nonhistone protein sequences. Moreover, due to the imbalance nature of data distribution, associated detection performance is severely biased towards the major negative samples and remains much room for improvement. In this research, we developed a convolutional neural network framework, dubbed iKcr_CNN, to identify the human nonhistone Kcr modification. To overcome the imbalance issue (Kcr: 15,274; non-Kcr: 74,018 with imbalance ratio: 1:4), we applied the focal loss function instead of the standard cross-entropy as the indicator to optimize the model, which not only assigns different weights to samples belonging to different categories but also distinguishes easy- and hard-classified samples. Ultimately, the obtained model presents more balanced prediction scores between real-world positive and negative samples than existing tools. The user-friendly web server is accessible at ikcrcnn.webmalab.cn/, and the involved Python scripts can be conveniently downloaded at github.com/lijundou/iKcr_CNN/. The proposed model may serve as an efficient tool to assist academicians with their experimental researches.

8.
Bioinformatics ; 38(13): 3488-3489, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35604082

RESUMO

SUMMARY: Integrative analysis of single-cell RNA-sequencing (scRNA-seq) data with spatial data for the same species and organ would provide each cell sample with a predictive spatial location, which would facilitate biological study. However, publicly available spatial sequencing datasets for specific species and organs are rare and are often displayed in different formats. In this study, we introduce a new web-based scRNA-seq analysis tool, webSCST, that integrates well-organized spatial transcriptome sequencing datasets categorized by species and organs, provides a user-friendly interface for raw single-cell processing with popular integration methods and allows users to submit their raw scRNA-seq data once to obtain predicted spatial locations for each cell type. AVAILABILITY AND IMPLEMENTATION: webSCST implemented in shiny with all major browsers supported is available at http://www.webscst.com. webSCST is also freely available as an R package at https://github.com/swsoyee/webSCST.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Sequência de RNA , Software , RNA , Perfilação da Expressão Gênica/métodos
9.
Comput Biol Med ; 143: 105283, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35149459

RESUMO

As a kind of novel non-invasive marker for molecular detection, cell-free DNA (cfDNA) has potential value for the early diagnosis of diseases, prognosis assessment, and efficacy monitoring. The constant developments in molecular biology detection technologies have led to an increase in clinical studies on the use of cfDNA detection methods for patients, and many gratifying outcomes have been achieved. In this review, the contributions of bioinformatics tools to the study of cfDNA are well discussed. The focus of the review is on cfDNA identification signals, cfDNA identification methods, and the relationship of cfDNA with human diseases such as hepatic cancer, lung cancer, end-stage kidney disease, and ischemic stroke. The research significance and existing problems of using cfDNA as a biomarker for diseases are also discussed.

10.
Comput Biol Med ; 143: 105269, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35158118

RESUMO

Alzheimer's disease (AD) is a severe neurodegenerative disease with slow course of onset and deterioration with time. With the speedup of global aging, AD has become a disease that seriously threatens the physical health of the elderly; therefore, the effective prevention and treatments of AD is an extremely important area of study for researchers and clinicians. Rapid technological developments have promoted the analysis of various kinds of complex data sets using machine learning methods. The common machine learning algorithms, such as Lasso, SVM and Random Forest, are very important in AD research. To help accelerate AD-related research, we review some recent research progress on Alzheimer's disease, including database, image analysis, gene expression, etc., which can provide AD researchers with more comprehensive research methods.

11.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834199

RESUMO

Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Modelos Biológicos , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Bases de Dados de Proteínas , Humanos , Redes Neurais de Computação , Proteínas/classificação , Reprodutibilidade dos Testes
12.
RNA Biol ; 18(12): 2236-2246, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-33729104

RESUMO

As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.


Assuntos
Biologia Computacional/métodos , RNA/química , Saccharomyces cerevisiae/genética , Uridina/química , Algoritmos , Animais , Humanos , Camundongos , Estrutura Molecular , Conformação de Ácido Nucleico , RNA de Transferência/metabolismo , Máquina de Vetores de Suporte
13.
J Proteome Res ; 20(1): 191-201, 2021 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-33090794

RESUMO

Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.


Assuntos
Biologia Computacional , Lisina , Algoritmos , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Máquina de Vetores de Suporte
14.
Mol Ther Nucleic Acids ; 21: 332-342, 2020 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-32645685

RESUMO

5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant.

15.
Mol Ther Nucleic Acids ; 19: 293-303, 2020 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-31865116

RESUMO

Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.

16.
Mitochondrial DNA B Resour ; 4(2): 3571-3572, 2019 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-33366090

RESUMO

Salix triandra is a great willow for bees and an excellent choice for living willow structures. In this study, we assembled and annotated the complete chloroplast (cp) genome sequence of S. triandra. The whole cp genome is 155,821 base-pairs (bp) in size, which comprises one small single copy (SSC) region of 16,223 bp and one large single copy (LSC) region of 84,532 bp separated by a pair of inverted repeats (IRs) of 27,533 bp. There are 131 genes, including 86 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Phylogenetic analysis with the Neighbour-joining method indicates that S. triandra is closely related to S. tetrasperma. The complete cp genome will facilitate the biological studies in the order Malpighiales in future.

17.
Mitochondrial DNA B Resour ; 5(1): 125-126, 2019 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-33366451

RESUMO

Populus deltoides is a fast-growing, large tree and one of the largest North American hardwood trees. In this study, the complete chloroplast (cp) genome sequence of P. deltoides is characterized. The whole cp genome was assembled to 156,867 bp, including a large single copy (LSC) region of 85,534 bp, a small single copy (SSC) region of 16,513 bp and a pair of inverted repeats (IRs) region of 27,410 bp. The base content of the P. deltoides cp genome is A (32.0%), T (31.3%), C (18.0%), and G (18.7%), and AT bases occupy a large proportion of the cp genome. The neighbor-joining phylogenetic analysis with 20 cp genomes from the Salicaceae family showed that P. deltoides is sister to Populus davidiana. These will provide for the evolutionary and biological studies in Salicaceae family.

18.
Sci Rep ; 6: 24918, 2016 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-27114050

RESUMO

Surface defect of nanomaterials is an important physical parameter which significantly influences their physical and chemical performances. In this work, high concentration of surface oxygen vancancies (SOVs) are successfully introduced on {001} facets exposed BiOBr nanosheets via a simple surface modification using polybasic carboxylic acids. The chelation interaction between carboxylic acid anions and Bi(3+) results in the weakness of Bi-O bond of BiOBr. Afterwards, under visible-light irradiation, the oxygen atoms would absorb the photo-energy and then be released from the surface of BiOBr, leaving SOVs. The electron spin resonance (ESR), high-resolution transmission electron microscopy (HRTEM), and UV-vis diffuse reflectance spectra (DRS) measurements confirm the existence of SOVs. The SOVs can enhance the absorption in visible light region and improve the separation efficiency of photo-generated charges. Hence, the transformation rate of adsorbed O2 on the as-prepared BiOBr with SOVs to superoxide anion radicals (•O2(-)) and the photocatalytic activity are greatly enhanced. Based on the modification by several carboxylic acids and the photocatalytic results, we propose that carboxylic acids with natural bond orbital (NBO) electrostatic charges absolute values greater than 0.830 are effective in modifying BiOBr.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA