Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 456, 2023 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-38053020

RESUMO

BACKGROUND: Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS: We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION: The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .


Assuntos
Aminoácidos , Biologia Computacional , Reprodutibilidade dos Testes , Biologia Computacional/métodos
2.
Physiol Mol Biol Plants ; 29(6): 783-790, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37520815

RESUMO

Root systems anchor plants to the substrate in addition to transporting water and nutrients, playing a fundamental role in plant survival. The LAZY1 gene mediates gravity signal transduction and participates in root and shoot development and auxin flow in many plants. In this study, a regulator, LsLAZY1, was identified from Leymus secalinus based on previous transcriptome data. The conserved domain and evolutionary relationship were further analyzed comprehensively. The role of LsLAZY1 in root development was investigated by genetic transformation and associated gravity response and phototropism assay. Subcellular localization showed that LsLAZY1 was localized in the nucleus. LsLAZY1 overexpression in Arabidopsis thaliana (Col-0) increased the length of the primary roots (PRs) and the number of lateral roots (LRs) compared to Col-0. Furthermore, 35S:LsLAZY1 transgenic seedlings affected auxin transport and showed a stronger gravitational and phototropic responses. It also promoted auxin accumulation at the root tips. These results indicated that LsLAZY1 affects root development and auxin transport. Supplementary Information: The online version contains supplementary material available at 10.1007/s12298-023-01326-4.

3.
Brief Funct Genomics ; 21(5): 357-375, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-35652477

RESUMO

Transcription factors are important cellular components of the process of gene expression control. Transcription factor binding sites are locations where transcription factors specifically recognize DNA sequences, targeting gene-specific regions and recruiting transcription factors or chromatin regulators to fine-tune spatiotemporal gene regulation. As the common proteins, transcription factors play a meaningful role in life-related activities. In the face of the increase in the protein sequence, it is urgent how to predict the structure and function of the protein effectively. At present, protein-DNA-binding site prediction methods are based on traditional machine learning algorithms and deep learning algorithms. In the early stage, we usually used the development method based on traditional machine learning algorithm to predict protein-DNA-binding sites. In recent years, methods based on deep learning to predict protein-DNA-binding sites from sequence data have achieved remarkable success. Various statistical and machine learning methods used to predict the function of DNA-binding proteins have been proposed and continuously improved. Existing deep learning methods for predicting protein-DNA-binding sites can be roughly divided into three categories: convolutional neural network (CNN), recursive neural network (RNN) and hybrid neural network based on CNN-RNN. The purpose of this review is to provide an overview of the computational and experimental methods applied in the field of protein-DNA-binding site prediction today. This paper introduces the methods of traditional machine learning and deep learning in protein-DNA-binding site prediction from the aspects of data processing characteristics of existing learning frameworks and differences between basic learning model frameworks. Our existing methods are relatively simple compared with natural language processing, computational vision, computer graphics and other fields. Therefore, the summary of existing protein-DNA-binding site prediction methods will help researchers better understand this field.


Assuntos
Algoritmos , Biologia Computacional , Sítios de Ligação , Cromatina , Biologia Computacional/métodos , DNA , Proteínas de Ligação a DNA , Fatores de Transcrição
4.
Interdiscip Sci ; 14(2): 421-438, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35066812

RESUMO

As an important research field in bioinformatics, protein subcellular location prediction is critical to reveal the protein functions and provide insightful information for disease diagnosis and drug development. Predicting protein subcellular locations remains a challenging task due to the difficulty of finding representative features and robust classifiers. Many feature fusion methods have been widely applied to tackle the above issues. However, they still suffer from accuracy loss due to feature redundancy. Furthermore, multiple protein subcellular locations prediction is more complicated since it is fundamentally a multi-label classification problem. The traditional binary classifiers or even multi-class classifiers cannot achieve satisfactory results. This paper proposes a novel method for protein subcellular location prediction with both single and multiple sites based on deep convolutional neural networks. Specifically, we first obtain the integrated features by simultaneously considering the pseudo amino acid, amino acid index distribution, and physicochemical property. We then adopt deep convolutional neural networks to extract high-dimensional features from the fused feature, removing the redundant preliminary features and gaining better representations of the raw sequences. Moreover, we use the self-attention mechanism and a customized loss function to ensure that the model is more inclined to positive data. In addition, we use random k-label sets to reduce the number of prediction labels. Meanwhile, we employ a hybrid strategy of over-sampling and under-sampling to tackle the data imbalance problem. We compare our model with three representative classification alternatives. The experiment results show that our model achieves the best performance in terms of accuracy, demonstrating the efficacy of the proposed model.


Assuntos
Redes Neurais de Computação , Proteínas , Aminoácidos/química , Biologia Computacional/métodos , Proteínas/química
5.
Med Biol Eng Comput ; 58(12): 3017-3038, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33078303

RESUMO

In the present paper, deep convolutional neural network (DCNN) is applied to multilocus protein subcellular localization as it is more suitable for multi-class classification. There are two main problems with this application. First, the appropriate features for correlation between multiple sites are hard to find. Second, the classifier structure is difficult to determine as it is greatly affected by the distribution of classified data. To solve these problems, a self-evoluting framework using DCNNs for multilocus protein subcellular localization is proposed. It has three characteristics that the previous algorithms do not. The first is that it combines the ant colony algorithm with the DCNN to form a self-evoluting algorithm for multilocus protein subcellular localization. The second is that it randomly groups subcellular sites using a limited random k-labelsets multi-label classification method. It also solves complex problems in a divide-and-conquer approach and proposes a flexible expansion model. The third is that it realizes the random selection feature extraction method in the positioning process and avoids the defects in individual feature extraction methods. The algorithm in the present paper is tested on the human database, and the overall correct rate is 67.17%, which is higher than that for the stacked self-encoder (SAE), support vector machine (SVM), random forest classifier (RF), or single deep convolutional neural network.Graphical abstract The algorithm mentioned in the present paper mainly includes four parts. They are protein sequence data preprocessing, integrated DCNN model construction, finding optimal DCNN combination by ant colony optimization, and protein subcellular localization for sequences. These parts are sequential relationships and the data obtained in the previous part is the basis for the latter part of the function. In the part of data preprocessing, the limited RAkEL multi-label classification method is used to randomly group subcellular sites. At the same time, the feature fusion of protein sequences is carried out by using multiple feature extraction methods. Each combination including features and sites information corresponds to a DCNN model. In the part of finding optimal DCNN combination by ant colony optimization, the main purpose is to find the best combination of DCNN models through the global optimization ability of the ant colony algorithm. The positioning of sequences is mainly to obtain multilocus subcellular localization by the optimal model combination.


Assuntos
Algoritmos , Redes Neurais de Computação , Bases de Dados Factuais , Humanos , Proteínas , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA