Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38557614

RESUMO

As post-transcriptional regulators of gene expression, micro-ribonucleic acids (miRNAs) are regarded as potential biomarkers for a variety of diseases. Hence, the prediction of miRNA-disease associations (MDAs) is of great significance for an in-depth understanding of disease pathogenesis and progression. Existing prediction models are mainly concentrated on incorporating different sources of biological information to perform the MDA prediction task while failing to consider the fully potential utility of MDA network information at the motif-level. To overcome this problem, we propose a novel motif-aware MDA prediction model, namely MotifMDA, by fusing a variety of high- and low-order structural information. In particular, we first design several motifs of interest considering their ability to characterize how miRNAs are associated with diseases through different network structural patterns. Then, MotifMDA adopts a two-layer hierarchical attention to identify novel MDAs. Specifically, the first attention layer learns high-order motif preferences based on their occurrences in the given MDA network, while the second one learns the final embeddings of miRNAs and diseases through coupling high- and low-order preferences. Experimental results on two benchmark datasets have demonstrated the superior performance of MotifMDA over several state-of-the-art prediction models. This strongly indicates that accurate MDA prediction can be achieved by relying solely on MDA network information. Furthermore, our case studies indicate that the incorporation of motif-level structure information allows MotifMDA to discover novel MDAs from different perspectives. The data and codes are available at https://github.com/stevejobws/MotifMDA.

2.
Comput Struct Biotechnol J ; 24: 213-224, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38572168

RESUMO

The intricate task of precisely segmenting retinal vessels from images, which is critical for diagnosing various eye diseases, presents significant challenges for models due to factors such as scale variation, complex anatomical patterns, low contrast, and limitations in training data. Building on these challenges, we offer novel contributions spanning model architecture, loss function design, robustness, and real-time efficacy. To comprehensively address these challenges, a new U-Net-like, lightweight Transformer network for retinal vessel segmentation is presented. By integrating MobileViT+ and a novel local representation in the encoder, our design emphasizes lightweight processing while capturing intricate image structures, enhancing vessel edge precision. A novel joint loss is designed, leveraging the characteristics of weighted cross-entropy and Dice loss to effectively guide the model through the task's challenges, such as foreground-background imbalance and intricate vascular structures. Exhaustive experiments were performed on three prominent retinal image databases. The results underscore the robustness and generalizability of the proposed LiViT-Net, which outperforms other methods in complex scenarios, especially in intricate environments with fine vessels or vessel edges. Importantly, optimized for efficiency, LiViT-Net excels on devices with constrained computational power, as evidenced by its fast performance. To demonstrate the model proposed in this study, a freely accessible and interactive website was established (https://hz-t3.matpool.com:28765?token=aQjYR4hqMI), revealing real-time performance with no login requirements.

3.
Adv Sci (Weinh) ; : e2309781, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38610112

RESUMO

Remote sensing technology, which conventionally employs spectrometers to capture hyperspectral images, allowing for the classification and unmixing based on the reflectance spectrum, has been extensively applied in diverse fields, including environmental monitoring, land resource management, and agriculture. However, miniaturization of remote sensing systems remains a challenge due to the complicated and dispersive optical components of spectrometers. Here, m-phase GaTe0.5Se0.5 with wide-spectral photoresponses (250-1064 nm) and stack it with WSe2 are utilizes to construct a two-dimensional van der Waals heterojunction (2D-vdWH), enabling the design of a gate-tunable wide-spectral photodetector. By utilizing the multi-photoresponses under varying gate voltages, high accuracy recognition can be achieved aided by deep learning algorithms without the original hyperspectral reflectance data. The proof-of-concept device, featuring dozens of tunable gate voltages, achieves an average classification accuracy of 87.00% on 6 prevalent hyperspectral datasets, which is competitive with the accuracy of 250-1000 nm hyperspectral data (88.72%) and far superior to the accuracy of non-tunable photoresponse (71.17%). Artificially designed gate-tunable wide-spectral 2D-vdWHs GaTe0.5Se0.5/WSe2-based photodetector present a promising pathway for the development of miniaturized and cost-effective remote sensing classification technology.

4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426326

RESUMO

Herbs applicability in disease treatment has been verified through experiences over thousands of years. The understanding of herb-disease associations (HDAs) is yet far from complete due to the complicated mechanism inherent in multi-target and multi-component (MTMC) botanical therapeutics. Most of the existing prediction models fail to incorporate the MTMC mechanism. To overcome this problem, we propose a novel dual-channel hypergraph convolutional network, namely HGHDA, for HDA prediction. Technically, HGHDA first adopts an autoencoder to project components and target protein onto a low-dimensional latent space so as to obtain their embeddings by preserving similarity characteristics in their original feature spaces. To model the high-order relations between herbs and their components, we design a channel in HGHDA to encode a hypergraph that describes the high-order patterns of herb-component relations via hypergraph convolution. The other channel in HGHDA is also established in the same way to model the high-order relations between diseases and target proteins. The embeddings of drugs and diseases are then aggregated through our dual-channel network to obtain the prediction results with a scoring function. To evaluate the performance of HGHDA, a series of extensive experiments have been conducted on two benchmark datasets, and the results demonstrate the superiority of HGHDA over the state-of-the-art algorithms proposed for HDA prediction. Besides, our case study on Chuan Xiong and Astragalus membranaceus is a strong indicator to verify the effectiveness of HGHDA, as seven and eight out of the top 10 diseases predicted by HGHDA for Chuan-Xiong and Astragalus-membranaceus, respectively, have been reported in literature.


Assuntos
Algoritmos , Astragalus propinquus , Benchmarking , Carbamatos
5.
IEEE J Biomed Health Inform ; 28(4): 2362-2372, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38265898

RESUMO

As a pivotal post-transcriptional modification of RNA, N6-methyladenosine (m6A) has a substantial influence on gene expression modulation and cellular fate determination. Although a variety of computational models have been developed to accurately identify potential m6A modification sites, few of them are capable of interpreting the identification process with insights gained from consensus knowledge. To overcome this problem, we propose a deep learning model, namely M6A-DCR, by discovering consensus regions for interpretable identification of m6A modification sites. In particular, M6A-DCR first constructs an instance graph for each RNA sequence by integrating specific positions and types of nucleotides. The discovery of consensus regions is then formulated as a graph clustering problem in light of aggregating all instance graphs. After that, M6A-DCR adopts a motif-aware graph reconstruction optimization process to learn high-quality embeddings of input RNA sequences, thus achieving the identification of m6A modification sites in an end-to-end manner. Experimental results demonstrate the superior performance of M6A-DCR by comparing it with several state-of-the-art identification models. The consideration of consensus regions empowers our model to make interpretable predictions at the motif level. The analysis of cross validation through different species and tissues further verifies the consistency between the identification results of M6A-DCR and the evolutionary relationships among species.


Assuntos
Adenosina , RNA , Humanos , Metilação , Consenso , RNA/genética , RNA/metabolismo , Adenosina/genética , Adenosina/metabolismo
6.
Comput Biol Med ; 167: 107691, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37976819

RESUMO

With the wide application of deep learning in Drug Discovery, deep generative model has shown its advantages in drug molecular generation. Generative adversarial networks can be used to learn the internal structure of molecules, but the training process may be unstable, such as gradient disappearance and model collapse, which may lead to the generation of molecules that do not conform to chemical rules or a single style. In this paper, a novel method called STAGAN was proposed to solve the difficulty of model training, by adding a new gradient penalty term in the discriminator and designing a parallel layer of batch normalization used in generator. As an illustration of method, STAGAN generated higher valid and unique molecules than previous models in training datasets from QM9 and ZINC-250K. This indicates that the proposed method can effectively solve the instability problem in the model training process, and can provide more instructive guidance for the further study of molecular graph generation.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Modelos Químicos
7.
Methods ; 220: 106-114, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37972913

RESUMO

Discovering new indications for existing drugs is a promising development strategy at various stages of drug research and development. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering available higher-order connectivity patterns in heterogeneous biological information networks, which are believed to be useful for improving the accuracy of new drug discovering. To this end, we propose a computational-based model, called SFRLDDA, for drug-disease association prediction by using semantic graph and function similarity representation learning. Specifically, SFRLDDA first integrates a heterogeneous information network (HIN) by drug-disease, drug-protein, protein-disease associations, and their biological knowledge. Second, different representation learning strategies are applied to obtain the feature representations of drugs and diseases from different perspectives over semantic graph and function similarity graphs constructed, respectively. At last, a Random Forest classifier is incorporated by SFRLDDA to discover potential drug-disease associations (DDAs). Experimental results demonstrate that SFRLDDA yields a best performance when compared with other state-of-the-art models on three benchmark datasets. Moreover, case studies also indicate that the simultaneous consideration of semantic graph and function similarity of drugs and diseases in the HIN allows SFRLDDA to precisely predict DDAs in a more comprehensive manner.


Assuntos
Algoritmos , Semântica , Serviços de Informação
8.
BMC Bioinformatics ; 24(1): 451, 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-38030973

RESUMO

BACKGROUND: As an important task in bioinformatics, clustering analysis plays a critical role in understanding the functional mechanisms of many complex biological systems, which can be modeled as biological networks. The purpose of clustering analysis in biological networks is to identify functional modules of interest, but there is a lack of online clustering tools that visualize biological networks and provide in-depth biological analysis for discovered clusters. RESULTS: Here we present BioCAIV, a novel webserver dedicated to maximize its accessibility and applicability on the clustering analysis of biological networks. This, together with its user-friendly interface, assists biological researchers to perform an accurate clustering analysis for biological networks and identify functionally significant modules for further assessment. CONCLUSIONS: BioCAIV is an efficient clustering analysis webserver designed for a variety of biological networks. BioCAIV is freely available without registration requirements at http://bioinformatics.tianshanzw.cn:8888/BioCAIV/ .


Assuntos
Biologia Computacional , Software , Análise por Conglomerados
9.
Brief Funct Genomics ; 2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37539561

RESUMO

Recently, the role of competing endogenous RNAs in regulating gene expression through the interaction of microRNAs has been closely associated with the expression of circular RNAs (circRNAs) in various biological processes such as reproduction and apoptosis. While the number of confirmed circRNA-miRNA interactions (CMIs) continues to increase, the conventional in vitro approaches for discovery are expensive, labor intensive, and time consuming. Therefore, there is an urgent need for effective prediction of potential CMIs through appropriate data modeling and prediction based on known information. In this study, we proposed a novel model, called DeepCMI, that utilizes multi-source information on circRNA/miRNA to predict potential CMIs. Comprehensive evaluations on the CMI-9905 and CMI-9589 datasets demonstrated that DeepCMI successfully infers potential CMIs. Specifically, DeepCMI achieved AUC values of 90.54% and 94.8% on the CMI-9905 and CMI-9589 datasets, respectively. These results suggest that DeepCMI is an effective model for predicting potential CMIs and has the potential to significantly reduce the need for downstream in vitro studies. To facilitate the use of our trained model and data, we have constructed a computational platform, which is available at http://120.77.11.78/DeepCMI/. The source code and datasets used in this work are available at https://github.com/LiYuechao1998/DeepCMI.

10.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37505483

RESUMO

MOTIVATION: The task of predicting drug-target interactions (DTIs) plays a significant role in facilitating the development of novel drug discovery. Compared with laboratory-based approaches, computational methods proposed for DTI prediction are preferred due to their high-efficiency and low-cost advantages. Recently, much attention has been attracted to apply different graph neural network (GNN) models to discover underlying DTIs from heterogeneous biological information network (HBIN). Although GNN-based prediction methods achieve better performance, they are prone to encounter the over-smoothing simulation when learning the latent representations of drugs and targets with their rich neighborhood information in HBIN, and thereby reduce the discriminative ability in DTI prediction. RESULTS: In this work, an improved graph representation learning method, namely iGRLDTI, is proposed to address the above issue by better capturing more discriminative representations of drugs and targets in a latent feature space. Specifically, iGRLDTI first constructs an HBIN by integrating the biological knowledge of drugs and targets with their interactions. After that, it adopts a node-dependent local smoothing strategy to adaptively decide the propagation depth of each biomolecule in HBIN, thus significantly alleviating over-smoothing by enhancing the discriminative ability of feature representations of drugs and targets. Finally, a Gradient Boosting Decision Tree classifier is used by iGRLDTI to predict novel DTIs. Experimental results demonstrate that iGRLDTI yields better performance that several state-of-the-art computational methods on the benchmark dataset. Besides, our case study indicates that iGRLDTI can successfully identify novel DTIs with more distinguishable features of drugs and targets. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at https://github.com/stevejobws/iGRLDTI/.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Simulação por Computador , Descoberta de Drogas/métodos , Interações Medicamentosas
11.
Mol Biomed ; 4(1): 21, 2023 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-37442861

RESUMO

Atherosclerosis (AS) is a major contributor to morbidity and mortality worldwide. However, the molecular mechanisms and mediator molecules involved remain largely unknown. Copper, which plays an essential role in cardiovascular disease, has been suggested as a potential risk factor. Copper homeostasis is closely related to the occurrence and development of AS. Recently, a new cell death pathway called cuproptosis has been discovered, which is driven by intracellular copper excess. However, no previous studies have reported a relationship between cuproptosis and AS. In this study, we integrated bulk and single-cell sequencing data to screen and identify key cuproptosis-related genes in AS. We used correlation analysis, enrichment analysis, random forest, and other bioinformatics methods to reveal their relationships. Our findings report, for the first time, the involvement of cuproptosis-related genes FDX1, SLC31A1, and GLS in atherogenesis. FDX1 and SLC31A1 were upregulated, while GLS was downregulated in atherosclerotic plaque. Receiver operating characteristic curves demonstrate their potential diagnostic value for AS. Additionally, we confirm that GLS is mainly expressed in vascular smooth muscle cells, and SLC31A1 is mainly localized in macrophages of atherosclerotic lesions in experiments. These findings shed light on the cuproptosis landscape and potential diagnostic biomarkers for AS, providing further evidence about the vital role of cuproptosis in atherosclerosis progression.

12.
PLoS Comput Biol ; 19(6): e1011207, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37339154

RESUMO

Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.


Assuntos
Reconhecimento Automatizado de Padrão , Fatores de Transcrição , Humanos , Bases de Dados Factuais , Fatores de Transcrição/genética , Redes Reguladoras de Genes , Proteoma , Algoritmos , Biologia de Sistemas , Ontologia Genética
13.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3182-3194, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37155405

RESUMO

Protein-protein interactions (PPIs) play a critical role in the proteomics study, and a variety of computational algorithms have been developed to predict PPIs. Though effective, their performance is constrained by high false-positive and false-negative rates observed in PPI data. To overcome this problem, a novel PPI prediction algorithm, namely PASNVGA, is proposed in this work by combining the sequence and network information of proteins via variational graph autoencoder. To do so, PASNVGA first applies different strategies to extract the features of proteins from their sequence and network information, and obtains a more compact form of these features using principal component analysis. In addition, PASNVGA designs a scoring function to measure the higher-order connectivity between proteins and so as to obtain a higher-order adjacency matrix. With all these features and adjacency matrices, PASNVGA trains a variational graph autoencoder model to further learn the integrated embeddings of proteins. The prediction task is then completed by using a simple feedforward neural network. Extensive experiments have been conducted on five PPI datasets collected from different species. Compared with several state-of-the-art algorithms, PASNVGA has been demonstrated as a promising PPI prediction algorithm.


Assuntos
Redes Neurais de Computação , Mapeamento de Interação de Proteínas , Algoritmos , Proteínas
14.
Front Psychiatry ; 14: 1148534, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37139323

RESUMO

As psychological diseases become more prevalent and are identified as the leading cause of acquired disability, it is essential to assist people in improving their mental health. Digital therapeutics (DTx) has been widely studied to treat psychological diseases with the advantage of cost savings. Among the techniques of DTx, a conversational agent can interact with patients through natural language dialog and has become the most promising one. However, conversational agents' ability to accurately show emotional support (ES) limits their role in DTx solutions, especially in mental health support. One of the main reasons is that the prediction of emotional support systems does not extract effective information from historical dialog data and only depends on the data derived from one single-turn interaction with users. To address this issue, we propose a novel emotional support conversation agent called the STEF agent that generates more supportive responses based on a thorough view of past emotions. The proposed STEF agent consists of the emotional fusion mechanism and strategy tendency encoder. The emotional fusion mechanism focuses on capturing the subtle emotional changes throughout a conversation. The strategy tendency encoder aims at foreseeing strategy evolution through multi-source interactions and extracting latent strategy semantic embedding. Experimental results on the benchmark dataset ESConv demonstrate the effectiveness of the STEF agent compared with competitive baselines.

15.
J Mol Model ; 29(4): 121, 2023 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-36991180

RESUMO

CONTEXT: In recent decades, drug development has become extremely important as different new diseases have emerged. However, drug discovery is a long and complex process with a very low success rate, and methods are needed to improve the efficiency of the process and reduce the possibility of failure. Among them, drug design from scratch has become a promising approach. Molecules are generated from scratch, reducing the reliance on trial and error and prefabricated molecular repositories, but the optimization of its molecular properties is still a challenging multi-objective optimization problem. METHODS: In this study, two stack-augmented recurrent neural networks were used to compose a generative model for generating drug-like molecules, and then reinforcement learning was used for optimization to generate molecules with desirable properties, such as binding affinity and the logarithm of the partition coefficient between octanol and water. In addition, a memory storage network was added to increase the internal diversity of the generated molecules. For multi-objective optimization, we proposed a new approach which utilized the magnitude of different attribute reward values to assign different weights to molecular optimization. The proposed model not only solves the problem that the properties of the generated molecules are extremely biased towards a certain attribute due to the possible conflict between the attributes, but also improves various properties of the generated molecules compared with the traditional weighted sum and alternating weighted sum, among which the molecular validity reaches 97.3%, the internal diversity is 0.8613, and the desirable molecules increases from 55.9 to 92%.


Assuntos
Desenho de Fármacos , Redes Neurais de Computação , Descoberta de Drogas , Recompensa
16.
IEEE J Biomed Health Inform ; 27(1): 562-572, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36327172

RESUMO

Identifying Drug-Target Interactions (DTIs) is a critical step in studying pathogenesis and drug development. Due to the fact that conventional experimental methods usually suffer from high costs and low efficiency, various computational methods have been proposed to detect potential DTIs by extracting features from the biological information of drugs and their target proteins. Though effective, most of them fall short of considering the topological structure of the DTI network, which provides a global view to discover novel DTIs. In this paper, a network-based computational method, namely LG-DTI, is proposed to accurately predict DTIs over a heterogeneous information network. For drugs and target proteins, LG-DTI first learns not only their local representations from drug molecular structures and protein sequences, but also their global representations by using a semi-supervised heterogeneous network embedding method. These two kinds of representations consist of the final representations of drugs and target proteins, which are then incorporated into a Random Forest classifier to complete the task of DTI prediction. The performance of LG-DTI has been evaluated on two independent datasets and also compared with several state-of-the-art methods. Experimental results show the superior performance of LG-DTI. Moreover, our case study indicates that LG-DTI can be a valuable tool for identifying novel DTIs.


Assuntos
Descoberta de Drogas , Proteínas , Humanos , Descoberta de Drogas/métodos , Simulação por Computador , Estrutura Molecular , Proteínas/química , Serviços de Informação
17.
IEEE J Biomed Health Inform ; 27(1): 573-582, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36301791

RESUMO

Identifying protein targets for drugs establishes an indispensable knowledge foundation for drug repurposing and drug development. Though expensive and time-consuming, vitro trials are widely employed to discover drug targets, and the existing relevant computational algorithms still cannot satisfy the demand for real application in drug R&D with regards to the prediction accuracy and performance efficiency, which are urgently needed to be improved. To this end, we propose here the PPAEDTI model, which uses the graph personalized propagation technique to predict drug-target interactions from the known interaction network. To evaluate the prediction performance, six benchmark datasets were used for testing with some state-of-the-art methods compared. As a result, using the 5-fold cross-validation, the proposed PPAEDTI model achieves average AUCs>90% on 5 collected datasets. We also manually checked the top-20 prediction list for 2 proteins (hsa:775 and hsa:779) and a kind of drug (D00618), and successfully confirmed 18, 17, and 20 items from the public datasets, respectively. The experimental results indicate that, given known drug-target interactions, the PPAEDTI model can provide accurate predictions for the new ones, which is anticipated to serve as a useful tool for pharmacology research. Using the proposed model that was trained with the collected datasets, we have built a computational platform that is accessible at http://120.77.11.78/PPAEDTI/ and corresponding codes and datasets are also released.


Assuntos
Algoritmos , Reposicionamento de Medicamentos , Humanos , Interações Medicamentosas , Área Sob a Curva , Proteínas/metabolismo
19.
Org Lett ; 24(41): 7583-7588, 2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36205709

RESUMO

An intermolecular alkene dicarbofunctionalization via electrochemical reduction that combines alkyl and aryl iodides with styrene derivatives was herein reported. The multicomponent reaction exhibited several synthetic advantages including simple operation, wide substrate scope, and convenience of amplification. Mechanistic investigations, including cyclic voltammetry (CV), electron paramagnetic resonance (EPR), and radical trapping reactions, support the electrochemical nickel catalytic cycle and formation of alkyl radical species from alkyl iodides.

20.
BMC Bioinformatics ; 23(1): 447, 2022 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-36303135

RESUMO

BACKGROUND: The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. RESULTS: In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites.


Assuntos
Protease de HIV , HIV-1 , Algoritmos , Protease de HIV/química , HIV-1/enzimologia , Aprendizado de Máquina , Especificidade por Substrato
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA