Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 161
Filtrar
1.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38483285

RESUMO

MOTIVATION: Drug-target interaction (DTI) prediction refers to the prediction of whether a given drug molecule will bind to a specific target and thus exert a targeted therapeutic effect. Although intelligent computational approaches for drug target prediction have received much attention and made many advances, they are still a challenging task that requires further research. The main challenges are manifested as follows: (i) most graph neural network-based methods only consider the information of the first-order neighboring nodes (drug and target) in the graph, without learning deeper and richer structural features from the higher-order neighboring nodes. (ii) Existing methods do not consider both the sequence and structural features of drugs and targets, and each method is independent of each other, and cannot combine the advantages of sequence and structural features to improve the interactive learning effect. RESULTS: To address the above challenges, a Multi-view Integrated learning Network that integrates Deep learning and Graph Learning (MINDG) is proposed in this study, which consists of the following parts: (i) a mixed deep network is used to extract sequence features of drugs and targets, (ii) a higher-order graph attention convolutional network is proposed to better extract and capture structural features, and (iii) a multi-view adaptive integrated decision module is used to improve and complement the initial prediction results of the above two networks to enhance the prediction performance. We evaluate MINDG on two dataset and show it improved DTI prediction performance compared to state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION: https://github.com/jnuaipr/MINDG.


Assuntos
Algoritmos , Redes Neurais de Computação
2.
Structure ; 32(5): 611-620.e4, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38447575

RESUMO

Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/química , Proteínas/metabolismo , Ligação Proteica , Humanos
3.
Curr Opin Struct Biol ; 86: 102793, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38447285

RESUMO

Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.


Assuntos
Ligação Proteica , Proteínas , Ligantes , Sítios de Ligação , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Sítio Alostérico , Humanos
4.
Comput Biol Med ; 171: 108175, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38402841

RESUMO

Circular RNAs (circRNAs), a class of endogenous RNA with a covalent loop structure, can regulate gene expression by serving as sponges for microRNAs and RNA-binding proteins (RBPs). To date, most computational methods for predicting RBP binding sites on circRNAs focus on circRNA fragments instead of circRNAs. These methods detect whether a circRNA fragment contains binding sites, but cannot determine where are the binding sites and how many binding sites are on the circRNA transcript. We report a hybrid deep learning-based tool, CircSite, to predict RBP binding sites at single-nucleotide resolution and detect key contributed nucleotides on circRNA transcripts. CircSite takes advantage of convolutional neural networks (CNNs) and Transformer for learning local and global representations of circRNAs binding to RBPs, respectively. We construct 37 datasets of circRNAs interacting with proteins for benchmarking and the experimental results show that CircSite offers accurate predictions of RBP binding nucleotides and detects key subsequences aligning well with known binding motifs. CircSite is an easy-to-use online webserver for predicting RBP binding sites on circRNA transcripts and freely available at http://www.csbio.sjtu.edu.cn/bioinf/CircSite/.


Assuntos
MicroRNAs , RNA Circular , RNA Circular/genética , Ligação Proteica , Sítios de Ligação , MicroRNAs/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Nucleotídeos/metabolismo
5.
Nat Commun ; 15(1): 1387, 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38360714

RESUMO

RNA velocity is closely related with cell fate and is an important indicator for the prediction of cell states with elegant physical explanation derived from single-cell RNA-seq data. Most existing RNA velocity models aim to extract dynamics from the phase delay between unspliced and spliced mRNA for each individual gene. However, unspliced/spliced mRNA abundance may not provide sufficient signal for dynamic modeling, leading to poor fit in phase portraits. Motivated by the idea that RNA velocity could be driven by the transcriptional regulation, we propose TFvelo, which expands RNA velocity concept to various single-cell datasets without relying on splicing information, by introducing gene regulatory information. Our experiments on synthetic data and multiple scRNA-Seq datasets show that TFvelo can accurately fit genes dynamics on phase portraits, and effectively infer cell pseudo-time and trajectory from RNA abundance data. TFvelo opens a robust and accurate avenue for modeling RNA velocity for single cell data.


Assuntos
Splicing de RNA , RNA , RNA/genética , Splicing de RNA/genética , RNA Mensageiro/genética , Análise de Sequência de RNA , Análise de Célula Única , Perfilação da Expressão Gênica
6.
J Struct Biol ; 216(1): 108059, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38160703

RESUMO

Cryogenic electron microscopy maps are valuable for determining macromolecule structures. A proper quality assessment method is essential for cryo-EM map selection or revision. This article presents DeepQs, a novel approach to estimate local quality for 3D cryo-EM density maps, using a deep-learning algorithm based on map-model fit score. DeepQs is a parameter-free method for users and incorporates structural information between map and its related atomic model into well-trained models by deep learning. More specifically, the DeepQs approach leverages the interplay between map and atomic model through predefined map-model fit score, Q-score. DeepQs can get close results to the ground truth map-model fit scores with only cryo-EM map as input. In experiments, DeepQs demonstrates the lowest root mean square error with standard method Fourier shell correlation metric and high correlation with map-model fit score, Q-score, when compared with other local quality estimation methods in high-resolution dataset (<=5 Å). DeepQs can also be applied to evaluate the quality of the post-processed maps. In both cases, DeepQs runs faster by using GPU acceleration. Our program is available at http://www.csbio.sjtu.edu.cn/bioinf/DeepQs for academic use.


Assuntos
Aprendizado Profundo , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Microscopia Eletrônica , Algoritmos , Conformação Proteica
7.
Nat Commun ; 14(1): 7861, 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-38030641

RESUMO

Existing drug-target interaction (DTI) prediction methods generally fail to generalize well to novel (unseen) proteins and drugs. In this study, we propose a protein-specific meta-learning framework ZeroBind with subgraph matching for predicting protein-drug interactions from their structures. During the meta-training process, ZeroBind formulates training a protein-specific model, which is also considered a learning task, and each task uses graph neural networks (GNNs) to learn the protein graph embedding and the molecular graph embedding. Inspired by the fact that molecules bind to a binding pocket in proteins instead of the whole protein, ZeroBind introduces a weakly supervised subgraph information bottleneck (SIB) module to recognize the maximally informative and compressive subgraphs in protein graphs as potential binding pockets. In addition, ZeroBind trains the models of individual proteins as multiple tasks, whose importance is automatically learned with a task adaptive self-attention module to make final predictions. The results show that ZeroBind achieves superior performance on DTI prediction over existing methods, especially for those unseen proteins and drugs, and performs well after fine-tuning for those proteins or drugs with a few known binding partners.


Assuntos
Compressão de Dados , Aprendizagem , Interações Medicamentosas , Redes Neurais de Computação
8.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37561093

RESUMO

MOTIVATION: CircRNAs play a critical regulatory role in physiological processes, and the abnormal expression of circRNAs can mediate the processes of diseases. Therefore, exploring circRNAs-disease associations is gradually becoming an important area of research. Due to the high cost of validating circRNA-disease associations using traditional wet-lab experiments, novel computational methods based on machine learning are gaining more and more attention in this field. However, current computational methods suffer to insufficient consideration of latent features in circRNA-disease interactions. RESULTS: In this study, a multilayer attention neural graph-based collaborative filtering (MLNGCF) is proposed. MLNGCF first enhances multiple biological information with autoencoder as the initial features of circRNAs and diseases. Then, by constructing a central network of different diseases and circRNAs, a multilayer cooperative attention-based message propagation is performed on the central network to obtain the high-order features of circRNAs and diseases. A neural network-based collaborative filtering is constructed to predict the unknown circRNA-disease associations and update the model parameters. Experiments on the benchmark datasets demonstrate that MLNGCF outperforms state-of-the-art methods, and the prediction results are supported by the literature in the case studies. AVAILABILITY AND IMPLEMENTATION: The source codes and benchmark datasets of MLNGCF are available at https://github.com/ABard0/MLNGCF.


Assuntos
Redes Neurais de Computação , RNA Circular , Aprendizado de Máquina , Software , Biologia Computacional/métodos
9.
J Mol Biol ; 435(13): 168091, 2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37054909

RESUMO

Identifying the interactions between proteins and ligands is significant for drug discovery and design. Considering the diverse binding patterns of ligands, the ligand-specific methods are trained per ligand to predict binding residues. However, most of the existing ligand-specific methods ignore shared binding preferences among various ligands and generally only cover a limited number of ligands with a sufficient number of known binding proteins. In this study, we propose a relation-aware framework LigBind with graph-level pre-training to enhance the ligand-specific binding residue predictions for 1159 ligands, which can effectively cover the ligands with a few known binding proteins. LigBind first pre-trains a graph neural network-based feature extractor for ligand-residue pairs and relation-aware classifiers for similar ligands. Then, LigBind is fine-tuned with ligand-specific binding data, where a domain adaptive neural network is designed to automatically leverage the diversity and similarity of various ligand-binding patterns for accurate binding residue prediction. We construct ligand-specific benchmark datasets of 1159 ligands and 16 unseen ligands, which are used to evaluate the effectiveness of LigBind. The results demonstrate the LigBind's efficacy on large-scale ligand-specific benchmark datasets, and it generalizes well to unseen ligands. LigBind also enables accurate identification of the ligand-binding residues in the main protease, papain-like protease and the RNA-dependent RNA polymerase of SARS-CoV-2. The web server and source codes of LigBind are available at http://www.csbio.sjtu.edu.cn/bioinf/LigBind/ and https://github.com/YYingXia/LigBind/ for academic use.


Assuntos
Ligação Proteica , Humanos , Sítios de Ligação , Ligantes , Redes Neurais de Computação , SARS-CoV-2 , Proteínas Virais
10.
Front Microbiol ; 14: 1111472, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36992937

RESUMO

Halotolerant microorganisms have developed versatile mechanisms for coping with saline stress. With the increasing number of isolated halotolerant strains and their genomes being sequenced, comparative genome analysis would help understand the mechanisms of salt tolerance. Six type strains of Pontixanthobacter and Allopontixanthobacter, two phylogenetically close genera, were isolated from diverse salty environments and showed different NaCl tolerances, from 3 to 10% (w/v). Based on the co-occurrence greater than 0.8 between halotolerance and open reading frame (ORF) among the six strains, possible explanations for halotolerance were discussed regarding osmolyte, membrane permeability, transportation, intracellular signaling, polysaccharide biosynthesis, and SOS response, which provided hypotheses for further investigations. The strategy of analyzing genome-wide co-occurrence between genetic diversity and physiological characteristics sheds light on how microorganisms adapt to the environment.

11.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36961341

RESUMO

MOTIVATION: Generating molecules of high quality and drug-likeness in the vast chemical space is a big challenge in the drug discovery. Most existing molecule generative methods focus on diversity and novelty of molecules, but ignoring drug potentials of the generated molecules during the generation process. RESULTS: In this study, we present a novel de novo multiobjective quality assessment-based drug design approach (QADD), which integrates an iterative refinement framework with a novel graph-based molecular quality assessment model on drug potentials. QADD designs a multiobjective deep reinforcement learning pipeline to generate molecules with multiple desired properties iteratively, where a graph neural network-based model for accurate molecular quality assessment on drug potentials is introduced to guide molecule generation. Experimental results show that QADD can jointly optimize multiple molecular properties with a promising performance and the quality assessment module is capable of guiding the generated molecules with high drug potentials. Furthermore, applying QADD to generate novel molecules binding to a biological target protein DRD2 also demonstrates the algorithm's efficacy. AVAILABILITY AND IMPLEMENTATION: QADD is freely available online for academic use at https://github.com/yifang000/QADD or http://www.csbio.sjtu.edu.cn/bioinf/QADD.


Assuntos
Redes Neurais de Computação , Proteínas , Modelos Moleculares , Desenho de Fármacos
12.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36736352

RESUMO

Great improvement has been brought to protein tertiary structure prediction through deep learning. It is important but very challenging to accurately rank and score decoy structures predicted by different models. CASP14 results show that existing quality assessment (QA) approaches lag behind the development of protein structure prediction methods, where almost all existing QA models degrade in accuracy when the target is a decoy of high quality. How to give an accurate assessment to high-accuracy decoys is particularly useful with the available of accurate structure prediction methods. Here we propose a fast and effective single-model QA method, QATEN, which can evaluate decoys only by their topological characteristics and atomic types. Our model uses graph neural networks and attention mechanisms to evaluate global and amino acid level scores, and uses specific loss functions to constrain the network to focus more on high-precision decoys and protein domains. On the CASP14 evaluation decoys, QATEN performs better than other QA models under all correlation coefficients when targeting average LDDT. QATEN shows promising performance when considering only high-accuracy decoys. Compared to the embedded evaluation modules of predicted ${C}_{\alpha^{-}} RMSD$ (pRMSD) in RosettaFold and predicted LDDT (pLDDT) in AlphaFold2, QATEN is complementary and capable of achieving better evaluation on some decoy structures generated by AlphaFold2 and RosettaFold. These results suggest that the new QATEN approach can be used as a reliable independent assessment algorithm for high-accuracy protein structure decoys.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/química , Algoritmos , Aminoácidos , Domínios Proteicos , Conformação Proteica , Biologia Computacional/métodos
13.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36627113

RESUMO

Protein-ligand binding affinity prediction is an important task in structural bioinformatics for drug discovery and design. Although various scoring functions (SFs) have been proposed, it remains challenging to accurately evaluate the binding affinity of a protein-ligand complex with the known bound structure because of the potential preference of scoring system. In recent years, deep learning (DL) techniques have been applied to SFs without sophisticated feature engineering. Nevertheless, existing methods cannot model the differential contribution of atoms in various regions of proteins, and the relationship between atom properties and intermolecular distance is also not fully explored. We propose a novel empirical graph neural network for accurate protein-ligand binding affinity prediction (EGNA). Graphs of protein, ligand and their interactions are constructed based on different regions of each bound complex. Proteins and ligands are effectively represented by graph convolutional layers, enabling the EGNA to capture interaction patterns precisely by simulating empirical SFs. The contributions of different factors on binding affinity can thus be transparently investigated. EGNA is compared with the state-of-the-art machine learning-based SFs on two widely used benchmark data sets. The results demonstrate the superiority of EGNA and its good generalization capability.


Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Proteínas/química , Ligação Proteica , Algoritmos
14.
J Struct Biol ; 215(1): 107940, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36709787

RESUMO

Cryo-electron microscopy (cryo-EM) single-particle analysis is a revolutionary imaging technique to resolve and visualize biomacromolecules. Image alignment in cryo-EM is an important and basic step to improve the precision of the image distance calculation. However, it is a very challenging task due to high noise and low signal-to-noise ratio. Therefore, we propose a new deep unsupervised difference learning (UDL) strategy with novel pseudo-label guided learning network architecture and apply it to pair-wise image alignment in cryo-EM. The training framework is fully unsupervised. Furthermore, a variant of UDL called joint UDL (JUDL), is also proposed, which is capable of utilizing the similarity information of the whole dataset and thus further increase the alignment precision. Assessments on both real-world and synthetic cryo-EM single-particle image datasets suggest the new unsupervised joint alignment method can achieve more accurate alignment results. Our method is highly efficient by taking advantages of GPU devices. The source code of our methods is publicly available at "http://www.csbio.sjtu.edu.cn/bioinf/JointUDL/" for academic use.


Assuntos
Imagem Individual de Molécula , Software , Microscopia Crioeletrônica/métodos , Razão Sinal-Ruído , Processamento de Imagem Assistida por Computador/métodos
15.
Protein Sci ; 31(12): e4462, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36190332

RESUMO

Knowledge of protein-ligand interactions is beneficial for biological process analysis and drug design. Given the complexity of the interactions and the inadequacy of experimental data, accurate ligand binding residue and pocket prediction remains challenging. In this study, we introduce an easy-to-use web server BindWeb for ligand-specific and ligand-general binding residue and pocket prediction from protein structures. BindWeb integrates a graph neural network GraphBind with a hybrid convolutional neural network and bidirectional long short-term memory network DELIA to identify binding residues. Furthermore, BindWeb clusters the predicted binding residues to binding pockets with mean shift clustering. The experimental results and case study demonstrate that BindWeb benefits from the complementarity of two base methods. BindWeb is freely available for academic use at http://www.csbio.sjtu.edu.cn/bioinf/BindWeb/.


Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Sítios de Ligação , Proteínas/química , Análise por Conglomerados , Ligação Proteica
16.
Bioinformatics ; 38(21): 4941-4948, 2022 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-36111875

RESUMO

MOTIVATION: Recognition of protein subcellular distribution patterns and identification of location biomarker proteins in cancer tissues are important for understanding protein functions and related diseases. Immunohistochemical (IHC) images enable visualizing the distribution of proteins at the tissue level, providing an important resource for the protein localization studies. In the past decades, several image-based protein subcellular location prediction methods have been developed, but the prediction accuracies still have much space to improve due to the complexity of protein patterns resulting from multi-label proteins and the variation of location patterns across cell types or states. RESULTS: Here, we propose a multi-label multi-instance model based on deep graph convolutional neural networks, GraphLoc, to recognize protein subcellular location patterns. GraphLoc builds a graph of multiple IHC images for one protein, learns protein-level representations by graph convolutions and predicts multi-label information by a dynamic threshold method. Our results show that GraphLoc is a promising model for image-based protein subcellular location prediction with model interpretability. Furthermore, we apply GraphLoc to the identification of candidate location biomarkers and potential members for protein networks. A large portion of the predicted results have supporting evidence from the existing literatures and the new candidates also provide guidance for further experimental screening. AVAILABILITY AND IMPLEMENTATION: The dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/GraphLoc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Redes Neurais de Computação , Humanos , Imuno-Histoquímica , Transporte Proteico , Proteínas
17.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35907779

RESUMO

Circular RNA (circRNA) is closely involved in physiological and pathological processes of many diseases. Discovering the associations between circRNAs and diseases is of great significance. Due to the high-cost to verify the circRNA-disease associations by wet-lab experiments, computational approaches for predicting the associations become a promising research direction. In this paper, we propose a method, MDGF-MCEC, based on multi-view dual attention graph convolution network (GCN) with cooperative ensemble learning to predict circRNA-disease associations. First, MDGF-MCEC constructs two disease relation graphs and two circRNA relation graphs based on different similarities. Then, the relation graphs are fed into a multi-view GCN for representation learning. In order to learn high discriminative features, a dual-attention mechanism is introduced to adjust the contribution weights, at both channel level and spatial level, of different features. Based on the learned embedding features of diseases and circRNAs, nine different feature combinations between diseases and circRNAs are treated as new multi-view data. Finally, we construct a multi-view cooperative ensemble classifier to predict the associations between circRNAs and diseases. Experiments conducted on the CircR2Disease database demonstrate that the proposed MDGF-MCEC model achieves a high area under curve of 0.9744 and outperforms the state-of-the-art methods. Promising results are also obtained from experiments on the circ2Disease and circRNADisease databases. Furthermore, the predicted associated circRNAs for hepatocellular carcinoma and gastric cancer are supported by the literature. The code and dataset of this study are available at https://github.com/ABard0/MDGF-MCEC.


Assuntos
RNA Circular , Neoplasias Gástricas , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Aprendizado de Máquina , Neoplasias Gástricas/genética
18.
PLoS Comput Biol ; 18(3): e1009986, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35324898

RESUMO

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.


Assuntos
Redes Neurais de Computação , Proteínas , Algoritmos , Aprendizagem , Software
19.
Int J Comput Assist Radiol Surg ; 17(7): 1303-1311, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35290645

RESUMO

PURPOSE: Computed tomography (CT) images can display internal organs of patients and are particularly suitable for preoperative surgical diagnoses. The increasing demands for computer-aided systems in recent years have facilitated the development of many automated algorithms, especially deep convolutional neural networks, to segment organs and tumors or identify diseases from CT images. However, performances of some systems are highly affected by the amount of training data, while the sizes of medical image data sets, especially three-dimensional (3D) data sets, are usually small. This condition limits the application of deep learning. METHODS: In this study, given a practical clinical data set that has 3D CT images of 20 patients with renal carcinoma, we designed a pipeline employing transfer learning to alleviate the detrimental effect of the small sample size. A dual-channel fine segmentation network (FS-Net) was constructed to segment kidney and tumor regions, with 210 publicly available 3D images from a competition employed during the training phase. We also built discriminative classifiers to classify the benign and malignant tumors based on the segmented regions, where both handcrafted and deep features were tested. RESULTS: Our experimental results showed that the Dice values of segmented kidney and tumor regions were 0.9662 and 0.7685, respectively, which were better than those of state-of-the-art methods. The classification model using radiomics features can classify most of the tumors correctly. CONCLUSIONS: The designed FS-Net was demonstrated to be more effective than simply fine-tuning on the practical small size data set given that the model can borrow knowledge from large auxiliary data without diluting the signal in primary data. For the small data set, radiomics features outperformed deep features in the classification of benign and malignant tumors. This work highlights the importance of architecture design in transfer learning, and the proposed pipeline is anticipated to provide a reference and inspiration for small data analysis.


Assuntos
Neoplasias Renais , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Neoplasias Renais/diagnóstico por imagem , Aprendizado de Máquina , Tomografia Computadorizada por Raios X/métodos
20.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35152277

RESUMO

With the rapid progress of deep learning in cryo-electron microscopy and protein structure prediction, improving the accuracy of the protein structure model by using a density map and predicted contact/distance map through deep learning has become an urgent need for robust methods. Thus, designing an effective protein structure optimization strategy based on the density map and predicted contact/distance map is critical to improving the accuracy of structure refinement. In this article, a protein structure optimization method based on the density map and predicted contact/distance map by deep-learning technology was proposed in accordance with the result of matching between the density map and the initial model. Physics- and knowledge-based energy functions, integrated with Cryo-EM density map data and deep-learning data, were used to optimize the protein structure in the simulation. The dynamic confidence score was introduced to the iterative process for choosing whether it is a density map or a contact/distance map to dominate the movement in the simulation to improve the accuracy of refinement. The protocol was tested on a large set of 224 non-homologous membrane proteins and generated 214 structural models with correct folds, where 4.5% of structural models were generated from structural models with incorrect folds. Compared with other state-of-the-art methods, the major advantage of the proposed methods lies in the skills for using density map and contact/distance map in the simulation, as well as the new energy function in the re-assembly simulations. Overall, the results demonstrated that this strategy is a valuable approach and ready to use for atomic-level structure refinement using cryo-EM density map and predicted contact/distance map.


Assuntos
Aprendizado Profundo , Microscopia Crioeletrônica/métodos , Proteínas de Membrana , Modelos Moleculares , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA