Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
1.
Zhejiang Da Xue Xue Bao Yi Xue Ban ; 53(2): 184-193, 2024 Apr 25.
Artigo em Inglês, Chinês | MEDLINE | ID: mdl-38562030

RESUMO

OBJECTIVES: To investigate the role of m.4435A>G and YARS2 c.572G>T (p.G191V) mutations in the development of essential hypertension. METHODS: A hypertensive patient with m.4435A>G and YARS2 p.G191V mutations was identified from previously collected mitochondrial genome and exon sequencing data. Clinical data were collected, and a molecular genetic study was conducted in the proband and his family members. Peripheral venous blood was collected, and immortalized lymphocyte lines constructed. The mitochondrial transfer RNA (tRNA), mitochondrial protein, adenosine triphosphate (ATP), mitochondrial membrane potential (MMP), and reactive oxygen species (ROS) in the constructed lymphocyte cell lines were measured. RESULTS: Mitochondrial genome sequencing showed that all maternal members carried a highly conserved m.4435A>G mutation. The m.4435A>G mutation might affect the secondary structure and folding free energy of mitochondrial tRNA and change its stability, which may influence the anticodon ring structure. Compared with the control group, the cell lines carrying m.4435A>G and YARS2 p.G191V mutations had decreased mitochondrial tRNA homeostasis, mitochondrial protein expression, ATP production and MMP levels, as well as increased ROS levels (all P<0.05). CONCLUSIONS: The YARS2 p.G191V mutation aggravates the changes in mitochondrial translation and mitochondrial function caused by m.4435A>G through affecting the steady-state level of mitochondrial tRNA and further leads to cell dysfunction, indicating that YARS2 p.G191V and m.4435A>G mutations have a synergistic effect in this family and jointly participate in the occurrence and development of essential hypertension.


Assuntos
Hipertensão Essencial , Mutação , Humanos , Hipertensão Essencial/genética , Masculino , Espécies Reativas de Oxigênio/metabolismo , Potencial da Membrana Mitocondrial/genética , Mitocôndrias/genética , RNA de Transferência/genética , RNA de Transferência de Metionina/genética , Genoma Mitocondrial , Feminino
2.
Curr Opin Struct Biol ; 86: 102793, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38447285

RESUMO

Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.


Assuntos
Ligação Proteica , Proteínas , Ligantes , Sítios de Ligação , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Sítio Alostérico , Humanos
3.
Structure ; 32(5): 611-620.e4, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38447575

RESUMO

Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/química , Proteínas/metabolismo , Ligação Proteica , Humanos
4.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38483285

RESUMO

MOTIVATION: Drug-target interaction (DTI) prediction refers to the prediction of whether a given drug molecule will bind to a specific target and thus exert a targeted therapeutic effect. Although intelligent computational approaches for drug target prediction have received much attention and made many advances, they are still a challenging task that requires further research. The main challenges are manifested as follows: (i) most graph neural network-based methods only consider the information of the first-order neighboring nodes (drug and target) in the graph, without learning deeper and richer structural features from the higher-order neighboring nodes. (ii) Existing methods do not consider both the sequence and structural features of drugs and targets, and each method is independent of each other, and cannot combine the advantages of sequence and structural features to improve the interactive learning effect. RESULTS: To address the above challenges, a Multi-view Integrated learning Network that integrates Deep learning and Graph Learning (MINDG) is proposed in this study, which consists of the following parts: (i) a mixed deep network is used to extract sequence features of drugs and targets, (ii) a higher-order graph attention convolutional network is proposed to better extract and capture structural features, and (iii) a multi-view adaptive integrated decision module is used to improve and complement the initial prediction results of the above two networks to enhance the prediction performance. We evaluate MINDG on two dataset and show it improved DTI prediction performance compared to state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION: https://github.com/jnuaipr/MINDG.


Assuntos
Algoritmos , Redes Neurais de Computação
5.
Comput Biol Med ; 171: 108175, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38402841

RESUMO

Circular RNAs (circRNAs), a class of endogenous RNA with a covalent loop structure, can regulate gene expression by serving as sponges for microRNAs and RNA-binding proteins (RBPs). To date, most computational methods for predicting RBP binding sites on circRNAs focus on circRNA fragments instead of circRNAs. These methods detect whether a circRNA fragment contains binding sites, but cannot determine where are the binding sites and how many binding sites are on the circRNA transcript. We report a hybrid deep learning-based tool, CircSite, to predict RBP binding sites at single-nucleotide resolution and detect key contributed nucleotides on circRNA transcripts. CircSite takes advantage of convolutional neural networks (CNNs) and Transformer for learning local and global representations of circRNAs binding to RBPs, respectively. We construct 37 datasets of circRNAs interacting with proteins for benchmarking and the experimental results show that CircSite offers accurate predictions of RBP binding nucleotides and detects key subsequences aligning well with known binding motifs. CircSite is an easy-to-use online webserver for predicting RBP binding sites on circRNA transcripts and freely available at http://www.csbio.sjtu.edu.cn/bioinf/CircSite/.


Assuntos
MicroRNAs , RNA Circular , RNA Circular/genética , Ligação Proteica , Sítios de Ligação , MicroRNAs/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Nucleotídeos/metabolismo
6.
Nat Commun ; 15(1): 1387, 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38360714

RESUMO

RNA velocity is closely related with cell fate and is an important indicator for the prediction of cell states with elegant physical explanation derived from single-cell RNA-seq data. Most existing RNA velocity models aim to extract dynamics from the phase delay between unspliced and spliced mRNA for each individual gene. However, unspliced/spliced mRNA abundance may not provide sufficient signal for dynamic modeling, leading to poor fit in phase portraits. Motivated by the idea that RNA velocity could be driven by the transcriptional regulation, we propose TFvelo, which expands RNA velocity concept to various single-cell datasets without relying on splicing information, by introducing gene regulatory information. Our experiments on synthetic data and multiple scRNA-Seq datasets show that TFvelo can accurately fit genes dynamics on phase portraits, and effectively infer cell pseudo-time and trajectory from RNA abundance data. TFvelo opens a robust and accurate avenue for modeling RNA velocity for single cell data.


Assuntos
Splicing de RNA , RNA , RNA/genética , Splicing de RNA/genética , RNA Mensageiro/genética , Análise de Sequência de RNA , Análise de Célula Única , Perfilação da Expressão Gênica
7.
Methods ; 222: 28-40, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38159688

RESUMO

Due to the abnormal secretion of adreno-cortico-tropic-hormone (ACTH) by tumors, Cushing's disease leads to hypercortisonemia, a precursor to a series of metabolic disorders and serious complications. Cushing's disease has high recurrence rate, short recurrence time and undiscovered recurrence reason after surgical resection. Qualitative or quantitative automatic image analysis of histology images can potentially in providing insights into Cushing's disease, but still no software has been available to the best of our knowledge. In this study, we propose a quantitative image analysis-based pipeline CRCS, which aims to explore the relationship between the expression level of ACTH in normal cell tissues adjacent to tumor cells and the postoperative prognosis of patients. CRCS mainly consists of image-level clustering, cluster-level multi-modal image registration, patch-level image classification and pixel-level image segmentation on the whole slide imaging (WSI). On both image registration and classification tasks, our method CRCS achieves state-of-the-art performance compared to recently published methods on our collected benchmark dataset. In addition, CRCS achieves an accuracy of 0.83 for postoperative prognosis of 12 cases. CRCS demonstrates great potential for instrumenting automatic diagnosis and treatment for Cushing's disease.


Assuntos
Hipersecreção Hipofisária de ACTH , Humanos , Hipersecreção Hipofisária de ACTH/diagnóstico por imagem , Prognóstico , Hormônio Adrenocorticotrópico
8.
BMC Bioinformatics ; 24(1): 430, 2023 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957563

RESUMO

BACKGROUND: Antibody-mediated immune responses play a crucial role in the immune defense of human body. The evolution of bioengineering has led the progress of antibody-derived drugs, showing promising efficacy in cancer and autoimmune disease therapy. A critical step of this development process is obtaining the affinity between antibodies and their binding antigens. RESULTS: In this study, we introduce a novel sequence-based antigen-antibody affinity prediction method, named DG-Affinity. DG-Affinity uses deep neural networks to efficiently and accurately predict the affinity between antibodies and antigens from sequences, without the need for structural information. The sequences of both the antigen and the antibody are first transformed into embedding vectors by two pre-trained language models, then these embeddings are concatenated into an ConvNeXt framework with a regression task. The results demonstrate the superiority of DG-Affinity over the existing structure-based prediction methods and the sequence-based tools, achieving a Pearson's correlation of over 0.65 on an independent test dataset. CONCLUSIONS: Compared to the baseline methods, DG-Affinity achieves the best performance and can advance the development of antibody design. It is freely available as an easy-to-use web server at https://www.digitalgeneai.tech/solution/affinity .


Assuntos
Anticorpos , Redes Neurais de Computação , Humanos , Afinidade de Anticorpos
9.
Nat Commun ; 14(1): 7861, 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-38030641

RESUMO

Existing drug-target interaction (DTI) prediction methods generally fail to generalize well to novel (unseen) proteins and drugs. In this study, we propose a protein-specific meta-learning framework ZeroBind with subgraph matching for predicting protein-drug interactions from their structures. During the meta-training process, ZeroBind formulates training a protein-specific model, which is also considered a learning task, and each task uses graph neural networks (GNNs) to learn the protein graph embedding and the molecular graph embedding. Inspired by the fact that molecules bind to a binding pocket in proteins instead of the whole protein, ZeroBind introduces a weakly supervised subgraph information bottleneck (SIB) module to recognize the maximally informative and compressive subgraphs in protein graphs as potential binding pockets. In addition, ZeroBind trains the models of individual proteins as multiple tasks, whose importance is automatically learned with a task adaptive self-attention module to make final predictions. The results show that ZeroBind achieves superior performance on DTI prediction over existing methods, especially for those unseen proteins and drugs, and performs well after fine-tuning for those proteins or drugs with a few known binding partners.


Assuntos
Compressão de Dados , Aprendizagem , Interações Medicamentosas , Redes Neurais de Computação
10.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37561093

RESUMO

MOTIVATION: CircRNAs play a critical regulatory role in physiological processes, and the abnormal expression of circRNAs can mediate the processes of diseases. Therefore, exploring circRNAs-disease associations is gradually becoming an important area of research. Due to the high cost of validating circRNA-disease associations using traditional wet-lab experiments, novel computational methods based on machine learning are gaining more and more attention in this field. However, current computational methods suffer to insufficient consideration of latent features in circRNA-disease interactions. RESULTS: In this study, a multilayer attention neural graph-based collaborative filtering (MLNGCF) is proposed. MLNGCF first enhances multiple biological information with autoencoder as the initial features of circRNAs and diseases. Then, by constructing a central network of different diseases and circRNAs, a multilayer cooperative attention-based message propagation is performed on the central network to obtain the high-order features of circRNAs and diseases. A neural network-based collaborative filtering is constructed to predict the unknown circRNA-disease associations and update the model parameters. Experiments on the benchmark datasets demonstrate that MLNGCF outperforms state-of-the-art methods, and the prediction results are supported by the literature in the case studies. AVAILABILITY AND IMPLEMENTATION: The source codes and benchmark datasets of MLNGCF are available at https://github.com/ABard0/MLNGCF.


Assuntos
Redes Neurais de Computação , RNA Circular , Aprendizado de Máquina , Software , Biologia Computacional/métodos
11.
J Mol Biol ; 435(13): 168091, 2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37054909

RESUMO

Identifying the interactions between proteins and ligands is significant for drug discovery and design. Considering the diverse binding patterns of ligands, the ligand-specific methods are trained per ligand to predict binding residues. However, most of the existing ligand-specific methods ignore shared binding preferences among various ligands and generally only cover a limited number of ligands with a sufficient number of known binding proteins. In this study, we propose a relation-aware framework LigBind with graph-level pre-training to enhance the ligand-specific binding residue predictions for 1159 ligands, which can effectively cover the ligands with a few known binding proteins. LigBind first pre-trains a graph neural network-based feature extractor for ligand-residue pairs and relation-aware classifiers for similar ligands. Then, LigBind is fine-tuned with ligand-specific binding data, where a domain adaptive neural network is designed to automatically leverage the diversity and similarity of various ligand-binding patterns for accurate binding residue prediction. We construct ligand-specific benchmark datasets of 1159 ligands and 16 unseen ligands, which are used to evaluate the effectiveness of LigBind. The results demonstrate the LigBind's efficacy on large-scale ligand-specific benchmark datasets, and it generalizes well to unseen ligands. LigBind also enables accurate identification of the ligand-binding residues in the main protease, papain-like protease and the RNA-dependent RNA polymerase of SARS-CoV-2. The web server and source codes of LigBind are available at http://www.csbio.sjtu.edu.cn/bioinf/LigBind/ and https://github.com/YYingXia/LigBind/ for academic use.


Assuntos
Ligação Proteica , Humanos , Sítios de Ligação , Ligantes , Redes Neurais de Computação , SARS-CoV-2 , Proteínas Virais
12.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36961341

RESUMO

MOTIVATION: Generating molecules of high quality and drug-likeness in the vast chemical space is a big challenge in the drug discovery. Most existing molecule generative methods focus on diversity and novelty of molecules, but ignoring drug potentials of the generated molecules during the generation process. RESULTS: In this study, we present a novel de novo multiobjective quality assessment-based drug design approach (QADD), which integrates an iterative refinement framework with a novel graph-based molecular quality assessment model on drug potentials. QADD designs a multiobjective deep reinforcement learning pipeline to generate molecules with multiple desired properties iteratively, where a graph neural network-based model for accurate molecular quality assessment on drug potentials is introduced to guide molecule generation. Experimental results show that QADD can jointly optimize multiple molecular properties with a promising performance and the quality assessment module is capable of guiding the generated molecules with high drug potentials. Furthermore, applying QADD to generate novel molecules binding to a biological target protein DRD2 also demonstrates the algorithm's efficacy. AVAILABILITY AND IMPLEMENTATION: QADD is freely available online for academic use at https://github.com/yifang000/QADD or http://www.csbio.sjtu.edu.cn/bioinf/QADD.


Assuntos
Redes Neurais de Computação , Proteínas , Modelos Moleculares , Desenho de Fármacos
13.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36627113

RESUMO

Protein-ligand binding affinity prediction is an important task in structural bioinformatics for drug discovery and design. Although various scoring functions (SFs) have been proposed, it remains challenging to accurately evaluate the binding affinity of a protein-ligand complex with the known bound structure because of the potential preference of scoring system. In recent years, deep learning (DL) techniques have been applied to SFs without sophisticated feature engineering. Nevertheless, existing methods cannot model the differential contribution of atoms in various regions of proteins, and the relationship between atom properties and intermolecular distance is also not fully explored. We propose a novel empirical graph neural network for accurate protein-ligand binding affinity prediction (EGNA). Graphs of protein, ligand and their interactions are constructed based on different regions of each bound complex. Proteins and ligands are effectively represented by graph convolutional layers, enabling the EGNA to capture interaction patterns precisely by simulating empirical SFs. The contributions of different factors on binding affinity can thus be transparently investigated. EGNA is compared with the state-of-the-art machine learning-based SFs on two widely used benchmark data sets. The results demonstrate the superiority of EGNA and its good generalization capability.


Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Proteínas/química , Ligação Proteica , Algoritmos
14.
Artigo em Inglês | MEDLINE | ID: mdl-35536814

RESUMO

N6-methyladenosine (m6A) is a universal post-transcriptional modification of RNAs, and it is widely involved in various biological processes. Identifying m6A modification sites accurately is indispensable to further investigate m6A-mediated biological functions. How to better represent RNA sequences is crucial for building effective computational methods for detecting m6A modification sites. However, traditional encoding methods require complex biological prior knowledge and are time-consuming. Furthermore, most of the existing m6A sites prediction methods are limited to single species, and few methods are able to predict m6A sites across different species and tissues. Thus, it is necessary to design a more efficient computational method to predict m6A sites across multiple species and tissues. In this paper, we proposed ELMo4m6A, a contextual language embedding-based method for predicting m6A sites from RNA sequences without any prior knowledge. ELMo4m6A first learns embeddings of RNA sequences using a language model ELMo, then uses a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) to identify m6A sites. The results of 5-fold cross-validation and independent testing demonstrate that ELMo4m6A is superior to state-of-the-art methods. Moreover, we applied integrated gradients to find potential sequence patterns contributing to m6A sites.


Assuntos
Adenosina , RNA , RNA/genética , Adenosina/genética , Redes Neurais de Computação , Análise de Sequência de RNA/métodos
15.
Protein Sci ; 31(12): e4462, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36190332

RESUMO

Knowledge of protein-ligand interactions is beneficial for biological process analysis and drug design. Given the complexity of the interactions and the inadequacy of experimental data, accurate ligand binding residue and pocket prediction remains challenging. In this study, we introduce an easy-to-use web server BindWeb for ligand-specific and ligand-general binding residue and pocket prediction from protein structures. BindWeb integrates a graph neural network GraphBind with a hybrid convolutional neural network and bidirectional long short-term memory network DELIA to identify binding residues. Furthermore, BindWeb clusters the predicted binding residues to binding pockets with mean shift clustering. The experimental results and case study demonstrate that BindWeb benefits from the complementarity of two base methods. BindWeb is freely available for academic use at http://www.csbio.sjtu.edu.cn/bioinf/BindWeb/.


Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Sítios de Ligação , Proteínas/química , Análise por Conglomerados , Ligação Proteica
16.
Front Neurosci ; 16: 841145, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35911980

RESUMO

Mammalian cortical interneurons (CINs) could be classified into more than two dozen cell types that possess diverse electrophysiological and molecular characteristics, and participate in various essential biological processes in the human neural system. However, the mechanism to generate diversity in CINs remains controversial. This study aims to predict CIN diversity in mouse embryo by using single-cell transcriptomics and the machine learning methods. Data of 2,669 single-cell transcriptome sequencing results are employed. The 2,669 cells are classified into three categories, caudal ganglionic eminence (CGE) cells, dorsal medial ganglionic eminence (dMGE) cells, and ventral medial ganglionic eminence (vMGE) cells, corresponding to the three regions in the mouse subpallium where the cells are collected. Such transcriptomic profiles were first analyzed by the minimum redundancy and maximum relevance method. A feature list was obtained, which was further fed into the incremental feature selection, incorporating two classification algorithms (random forest and repeated incremental pruning to produce error reduction), to extract key genes and construct powerful classifiers and classification rules. The optimal classifier could achieve an MCC of 0.725, and category-specified prediction accuracies of 0.958, 0.760, and 0.737 for the CGE, dMGE, and vMGE cells, respectively. The related genes and rules may provide helpful information for deepening the understanding of CIN diversity.

17.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35907779

RESUMO

Circular RNA (circRNA) is closely involved in physiological and pathological processes of many diseases. Discovering the associations between circRNAs and diseases is of great significance. Due to the high-cost to verify the circRNA-disease associations by wet-lab experiments, computational approaches for predicting the associations become a promising research direction. In this paper, we propose a method, MDGF-MCEC, based on multi-view dual attention graph convolution network (GCN) with cooperative ensemble learning to predict circRNA-disease associations. First, MDGF-MCEC constructs two disease relation graphs and two circRNA relation graphs based on different similarities. Then, the relation graphs are fed into a multi-view GCN for representation learning. In order to learn high discriminative features, a dual-attention mechanism is introduced to adjust the contribution weights, at both channel level and spatial level, of different features. Based on the learned embedding features of diseases and circRNAs, nine different feature combinations between diseases and circRNAs are treated as new multi-view data. Finally, we construct a multi-view cooperative ensemble classifier to predict the associations between circRNAs and diseases. Experiments conducted on the CircR2Disease database demonstrate that the proposed MDGF-MCEC model achieves a high area under curve of 0.9744 and outperforms the state-of-the-art methods. Promising results are also obtained from experiments on the circ2Disease and circRNADisease databases. Furthermore, the predicted associated circRNAs for hepatocellular carcinoma and gastric cancer are supported by the literature. The code and dataset of this study are available at https://github.com/ABard0/MDGF-MCEC.


Assuntos
RNA Circular , Neoplasias Gástricas , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Aprendizado de Máquina , Neoplasias Gástricas/genética
18.
Front Bioeng Biotechnol ; 10: 890901, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35721855

RESUMO

Diabetes is the most common disease and a major threat to human health. Type 2 diabetes (T2D) makes up about 90% of all cases. With the development of high-throughput sequencing technologies, more and more fundamental pathogenesis of T2D at genetic and transcriptomic levels has been revealed. The recent single-cell sequencing can further reveal the cellular heterogenicity of complex diseases in an unprecedented way. With the expectation on the molecular essence of T2D across multiple cell types, we investigated the expression profiling of more than 1,600 single cells (949 cells from T2D patients and 651 cells from normal controls) and identified the differential expression profiling and characteristics at the transcriptomics level that can distinguish such two groups of cells at the single-cell level. The expression profile was analyzed by several machine learning algorithms, including Monte Carlo feature selection, support vector machine, and repeated incremental pruning to produce error reduction (RIPPER). On one hand, some T2D-associated genes (MTND4P24, MTND2P28, and LOC100128906) were discovered. On the other hand, we revealed novel potential pathogenic mechanisms in a rule manner. They are induced by newly recognized genes and neglected by traditional bulk sequencing techniques. Particularly, the newly identified T2D genes were shown to follow specific quantitative rules with diabetes prediction potentials, and such rules further indicated several potential functional crosstalks involved in T2D.

19.
Front Genet ; 13: 909040, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35651937

RESUMO

In current biology, exploring the biological functions of proteins is important. Given the large number of proteins in some organisms, exploring their functions one by one through traditional experiments is impossible. Therefore, developing quick and reliable methods for identifying protein functions is necessary. Considerable accumulation of protein knowledge and recent developments on computer science provide an alternative way to complete this task, that is, designing computational methods. Several efforts have been made in this field. Most previous methods have adopted the protein sequence features or directly used the linkage from a protein-protein interaction (PPI) network. In this study, we proposed some novel multi-label classifiers, which adopted new embedding features to represent proteins. These features were derived from functional domains and a PPI network via word embedding and network embedding, respectively. The minimum redundancy maximum relevance method was used to assess the features, generating a feature list. Incremental feature selection, incorporating RAndom k-labELsets to construct multi-label classifiers, used such list to construct two optimum classifiers, corresponding to two key measurements: accuracy and exact match. These two classifiers had good performance, and they were superior to classifiers that used features extracted by traditional methods.

20.
PLoS Comput Biol ; 18(3): e1009986, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35324898

RESUMO

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.


Assuntos
Redes Neurais de Computação , Proteínas , Algoritmos , Aprendizagem , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...