Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36562719

RESUMO

BACKGROUND: Cell-penetrating peptides (CPPs) have received considerable attention as a means of transporting pharmacologically active molecules into living cells without damaging the cell membrane, and thus hold great promise as future therapeutics. Recently, several machine learning-based algorithms have been proposed for predicting CPPs. However, most existing predictive methods do not consider the agreement (disagreement) between similar (dissimilar) CPPs and depend heavily on expert knowledge-based handcrafted features. RESULTS: In this study, we present SiameseCPP, a novel deep learning framework for automated CPPs prediction. SiameseCPP learns discriminative representations of CPPs based on a well-pretrained model and a Siamese neural network consisting of a transformer and gated recurrent units. Contrastive learning is used for the first time to build a CPP predictive model. Comprehensive experiments demonstrate that our proposed SiameseCPP is superior to existing baseline models for predicting CPPs. Moreover, SiameseCPP also achieves good performance on other functional peptide datasets, exhibiting satisfactory generalization ability.


Assuntos
Peptídeos Penetradores de Células , Peptídeos Penetradores de Células/metabolismo , Algoritmos , Transporte Biológico , Redes Neurais de Computação , Aprendizado de Máquina
2.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38696758

RESUMO

MOTIVATION: Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS: We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION: The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.


Assuntos
Biologia Computacional , Peptídeos , Peptídeos/química , Biologia Computacional/métodos , Humanos , Sequência de Aminoácidos , Algoritmos , Software
3.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35262678

RESUMO

Accurate prediction of drug-target interactions (DTIs) can reduce the cost and time of drug repositioning and drug discovery. Many current methods integrate information from multiple data sources of drug and target to improve DTIs prediction accuracy. However, these methods do not consider the complex relationship between different data sources. In this study, we propose a novel computational framework, called MccDTI, to predict the potential DTIs by multiview network embedding, which can integrate the heterogenous information of drug and target. MccDTI learns high-quality low-dimensional representations of drug and target by preserving the consistent and complementary information between multiview networks. Then MccDTI adopts matrix completion scheme for DTIs prediction based on drug and target representations. Experimental results on two datasets show that the prediction accuracy of MccDTI outperforms four state-of-the-art methods for DTIs prediction. Moreover, literature verification for DTIs prediction shows that MccDTI can predict the reliable potential DTIs. These results indicate that MccDTI can provide a powerful tool to predict new DTIs and accelerate drug discovery. The code and data are available at: https://github.com/ShangCS/MccDTI.


Assuntos
Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos , Descoberta de Drogas , Interações Medicamentosas
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36070864

RESUMO

The location of microRNAs (miRNAs) in cells determines their function in regulation activity. Studies have shown that miRNAs are stable in the extracellular environment that mediates cell-to-cell communication and are located in the intracellular region that responds to cellular stress and environmental stimuli. Though in situ detection techniques of miRNAs have made great contributions to the study of the localization and distribution of miRNAs, miRNA subcellular localization and their role are still in progress. Recently, some machine learning-based algorithms have been designed for miRNA subcellular location prediction, but their performance is still far from satisfactory. Here, we present a new data partitioning strategy that categorizes functionally similar locations for the precise and instructive prediction of miRNA subcellular location in Homo sapiens. To characterize the localization signals, we adopted one-hot encoding with post padding to represent the whole miRNA sequences, and proposed a deep bidirectional long short-term memory with the multi-head self-attention algorithm to model. The algorithm showed high selectivity in distinguishing extracellular miRNAs from intracellular miRNAs. Moreover, a series of motif analyses were performed to explore the mechanism of miRNA subcellular localization. To improve the convenience of the model, a user-friendly web server named iLoc-miRNA was established (http://iLoc-miRNA.lin-group.cn/).


Assuntos
Biologia Computacional , MicroRNAs , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , MicroRNAs/genética
5.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37847658

RESUMO

MOTIVATION: The rapid and extensive transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to an unprecedented global health emergency, affecting millions of people and causing an immense socioeconomic impact. The identification of SARS-CoV-2 phosphorylation sites plays an important role in unraveling the complex molecular mechanisms behind infection and the resulting alterations in host cell pathways. However, currently available prediction tools for identifying these sites lack accuracy and efficiency. RESULTS: In this study, we presented a comprehensive biological function analysis of SARS-CoV-2 infection in a clonal human lung epithelial A549 cell, revealing dramatic changes in protein phosphorylation pathways in host cells. Moreover, a novel deep learning predictor called PSPred-ALE is specifically designed to identify phosphorylation sites in human host cells that are infected with SARS-CoV-2. The key idea of PSPred-ALE lies in the use of a self-adaptive learning embedding algorithm, which enables the automatic extraction of context sequential features from protein sequences. In addition, the tool uses multihead attention module that enables the capturing of global information, further improving the accuracy of predictions. Comparative analysis of features demonstrated that the self-adaptive learning embedding features are superior to hand-crafted statistical features in capturing discriminative sequence information. Benchmarking comparison shows that PSPred-ALE outperforms the state-of-the-art prediction tools and achieves robust performance. Therefore, the proposed model can effectively identify phosphorylation sites assistant the biomedical scientists in understanding the mechanism of phosphorylation in SARS-CoV-2 infection. AVAILABILITY AND IMPLEMENTATION: PSPred-ALE is available at https://github.com/jiaoshihu/PSPred-ALE and Zenodo (https://doi.org/10.5281/zenodo.8330277).


Assuntos
COVID-19 , Redes Neurais de Computação , Humanos , SARS-CoV-2 , Fosforilação , Algoritmos
6.
PLoS Comput Biol ; 19(12): e1011450, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38096269

RESUMO

Cancer is known as a heterogeneous disease. Cancer driver genes (CDGs) need to be inferred for understanding tumor heterogeneity in cancer. However, the existing computational methods have identified many common CDGs. A key challenge exploring cancer progression is to infer cancer subtype-specific driver genes (CSDGs), which provides guidane for the diagnosis, treatment and prognosis of cancer. The significant advancements in single-cell RNA-sequencing (scRNA-seq) technologies have opened up new possibilities for studying human cancers at the individual cell level. In this study, we develop a novel unsupervised method, CSDGI (Cancer Subtype-specific Driver Gene Inference), which applies Encoder-Decoder-Framework consisting of low-rank residual neural networks to inferring driver genes corresponding to potential cancer subtypes at the single-cell level. To infer CSDGs, we apply CSDGI to the tumor single-cell transcriptomics data. To filter the redundant genes before driver gene inference, we perform the differential expression genes (DEGs). The experimental results demonstrate CSDGI is effective to infer driver genes that are cancer subtype-specific. Functional and disease enrichment analysis shows these inferred CSDGs indicate the key biological processes and disease pathways. CSDGI is the first method to explore cancer driver genes at the cancer subtype level. We believe that it can be a useful method to understand the mechanisms of cell transformation driving tumours.


Assuntos
Neoplasias , Oncogenes , Humanos , Perfilação da Expressão Gênica , Neoplasias/genética , Neoplasias/patologia , Transformação Celular Neoplásica/genética , Análise de Célula Única/métodos
7.
Methods ; 211: 61-67, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36804215

RESUMO

Recent advances in multi-omics databases offer the opportunity to explore complex systems of cancers across hierarchical biological levels. Some methods have been proposed to identify the genes that play a vital role in disease development by integrating multi-omics. However, the existing methods identify the related genes separately, neglecting the gene interactions that are related to the multigenic disease. In this study, we develop a learning framework to identify the interactive genes based on multi-omics data including gene expression. Firstly, we integrate different omics based on their similarities and apply spectral clustering for cancer subtype identification. Then, a gene co-expression network is construct for each cancer subtype. Finally, we detect the interactive genes in the co-expression network by learning the dense subgraphs based on the L1 prosperities of eigenvectors in the modularity matrix. We apply the proposed learning framework on a multi-omics cancer dataset to identify the interactive genes for each cancer subtype. The detected genes are examined by DAVID and KEGG tools for systematic gene ontology enrichment analysis. The analysis results show that the detected genes have relationships to cancer development and the genes in different cancer subtypes are related to different biological processes and pathways, which are expected to yield important references for understanding tumor heterogeneity and improving patient survival.


Assuntos
Multiômica , Neoplasias , Humanos , Neoplasias/genética , Análise por Conglomerados , Bases de Dados Factuais
8.
Methods ; 219: 1-7, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37689121

RESUMO

With the increasing availability of large-scale QSAR (Quantitative Structure-Activity Relationship) datasets, collaborative analysis has become a promising approach for drug discovery. Traditional centralized analysis which typically concentrates data on a central server for training faces challenges such as data privacy and security. Distributed analysis such as federated learning offers a solution by enabling collaborative model training without sharing raw data. However, it may fail when the training data in the local devices are non-independent and identically distributed (non-IID). In this paper, we propose a novel framework for collaborative drug discovery using federated learning on non-IID datasets. We address the difficulty of training on non-IID data by globally sharing a small subset of data among all institutions. Our framework allows multiple institutions to jointly train a robust predictive model while preserving the privacy of their individual data. We leverage the federated learning paradigm to distribute the model training process across local devices, eliminating the need for data exchange. The experimental results on 15 benchmark datasets demonstrate that the proposed method achieves competitive predictive accuracy to centralized analysis while respecting data privacy. Moreover, our framework offers benefits such as reduced data transmission and enhanced scalability, making it suitable for large-scale collaborative drug discovery efforts.


Assuntos
Benchmarking , Descoberta de Drogas
9.
BMC Biol ; 21(1): 93, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-37095510

RESUMO

BACKGROUND: RNA 5-methyluridine (m5U) modifications are obtained by methylation at the C5 position of uridine catalyzed by pyrimidine methylation transferase, which is related to the development of human diseases. Accurate identification of m5U modification sites from RNA sequences can contribute to the understanding of their biological functions and the pathogenesis of related diseases. Compared to traditional experimental methods, computational methods developed based on machine learning with ease of use can identify modification sites from RNA sequences in an efficient and time-saving manner. Despite the good performance of these computational methods, there are some drawbacks and limitations. RESULTS: In this study, we have developed a novel predictor, m5U-SVM, based on multi-view features and machine learning algorithms to construct predictive models for identifying m5U modification sites from RNA sequences. In this method, we used four traditional physicochemical features and distributed representation features. The optimized multi-view features were obtained from the four fused traditional physicochemical features by using the two-step LightGBM and IFS methods, and then the distributed representation features were fused with the optimized physicochemical features to obtain the new multi-view features. The best performing classifier, support vector machine, was identified by screening different machine learning algorithms. Compared with the results, the performance of the proposed model is better than that of the existing state-of-the-art tool. CONCLUSIONS: m5U-SVM provides an effective tool that successfully captures sequence-related attributes of modifications and can accurately predict m5U modification sites from RNA sequences. The identification of m5U modification sites helps to understand and delve into the related biological processes and functions.


Assuntos
RNA , Máquina de Vetores de Suporte , Humanos , Algoritmos , Metilação , Biologia Computacional/métodos
10.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33454758

RESUMO

Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.


Assuntos
Algoritmos , Biologia Computacional/métodos , DNA/química , Drogas em Investigação/farmacologia , Proteínas/química , Software , Sequência de Aminoácidos , DNA/genética , DNA/metabolismo , Descoberta de Drogas , Drogas em Investigação/síntese química , Humanos , Domínios Proteicos , Estrutura Secundária de Proteína , Proteínas/genética , Proteínas/metabolismo , Homologia de Sequência de Aminoácidos
11.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822870

RESUMO

MOTIVATION: Peptides have recently emerged as promising therapeutic agents against various diseases. For both research and safety regulation purposes, it is of high importance to develop computational methods to accurately predict the potential toxicity of peptides within the vast number of candidate peptides. RESULTS: In this study, we proposed ATSE, a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural networks and attention mechanism. More specifically, it consists of four modules: (i) a sequence processing module for converting peptide sequences to molecular graphs and evolutionary profiles, (ii) a feature extraction module designed to learn discriminative features from graph structural information and evolutionary information, (iii) an attention module employed to optimize the features and (iv) an output module determining a peptide as toxic or non-toxic, using optimized features from the attention module. CONCLUSION: Comparative studies demonstrate that the proposed ATSE significantly outperforms all other competing methods. We found that structural information is complementary to the evolutionary information, effectively improving the predictive performance. Importantly, the data-driven features learned by ATSE can be interpreted and visualized, providing additional information for further analysis. Moreover, we present a user-friendly online computational platform that implements the proposed ATSE, which is now available at http://server.malab.cn/ATSE. We expect that it can be a powerful and useful tool for researchers of interest.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Redes Neurais de Computação , Peptídeos/toxicidade , Software , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Evolução Molecular , Humanos , Peptídeos/química
12.
Bioinformatics ; 38(7): 1964-1971, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35134828

RESUMO

MOTIVATION: Drug-target interaction prediction plays an important role in new drug discovery and drug repurposing. Binding affinity indicates the strength of drug-target interactions. Predicting drug-target binding affinity is expected to provide promising candidates for biologists, which can effectively reduce the workload of wet laboratory experiments and speed up the entire process of drug research. Given that, numerous new proteins are sequenced and compounds are synthesized, several improved computational methods have been proposed for such predictions, but there are still some challenges. (i) Many methods only discuss and implement one application scenario, they focus on drug repurposing and ignore the discovery of new drugs and targets. (ii) Many methods do not consider the priority order of proteins (or drugs) related to each target drug (or protein). Therefore, it is necessary to develop a comprehensive method that can be used in multiple scenarios and focuses on candidate order. RESULTS: In this study, we propose a method called NerLTR-DTA that uses the neighbor relationship of similarity and sharing to extract features, and applies a ranking framework with regression attributes to predict affinity values and priority order of query drug (or query target) and its related proteins (or compounds). It is worth noting that using the characteristics of learning to rank to set different queries can smartly realize the multi-scenario application of the method, including the discovery of new drugs and new targets. Experimental results on two commonly used datasets show that NerLTR-DTA outperforms some state-of-the-art competing methods. NerLTR-DTA achieves excellent performance in all application scenarios mentioned in this study, and the rm(test)2 values guarantee such excellent performance is not obtained by chance. Moreover, it can be concluded that NerLTR-DTA can provide accurate ranking lists for the relevant results of most queries through the statistics of the association relationship of each query drug (or query protein). In general, NerLTR-DTA is a powerful tool for predicting drug-target associations and can contribute to new drug discovery and drug repurposing. AVAILABILITY AND IMPLEMENTATION: The proposed method is implemented in Python and Java. Source codes and datasets are available at https://github.com/RUXIAOQING964914140/NerLTR-DTA.


Assuntos
Algoritmos , Software , Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos , Proteínas/química
13.
Bioinformatics ; 38(6): 1514-1524, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34999757

RESUMO

MOTIVATION: Recently, peptides have emerged as a promising class of pharmaceuticals for various diseases treatment poised between traditional small molecule drugs and therapeutic proteins. However, one of the key bottlenecks preventing them from therapeutic peptides is their toxicity toward human cells, and few available algorithms for predicting toxicity are specially designed for short-length peptides. RESULTS: We present ToxIBTL, a novel deep learning framework by utilizing the information bottleneck principle and transfer learning to predict the toxicity of peptides as well as proteins. Specifically, we use evolutionary information and physicochemical properties of peptide sequences and integrate the information bottleneck principle into a feature representation learning scheme, by which relevant information is retained and the redundant information is minimized in the obtained features. Moreover, transfer learning is introduced to transfer the common knowledge contained in proteins to peptides, which aims to improve the feature representation capability. Extensive experimental results demonstrate that ToxIBTL not only achieves a higher prediction performance than state-of-the-art methods on the peptide dataset, but also has a competitive performance on the protein dataset. Furthermore, a user-friendly online web server is established as the implementation of the proposed ToxIBTL. AVAILABILITY AND IMPLEMENTATION: The proposed ToxIBTL and data can be freely accessible at http://server.wei-group.net/ToxIBTL. Our source code is available at https://github.com/WLYLab/ToxIBTL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Peptídeos , Humanos , Proteínas , Software , Algoritmos
14.
Brief Bioinform ; 21(1): 11-23, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30239616

RESUMO

Cell-penetrating peptides (CPPs) have been shown to be a transport vehicle for delivering cargoes into live cells, offering great potential as future therapeutics. It is essential to identify CPPs for better understanding of their functional mechanisms. Machine learning-based methods have recently emerged as a main approach for computational identification of CPPs. However, one of the main challenges and difficulties is to propose an effective feature representation model that sufficiently exploits the inner difference and relevance between CPPs and non-CPPs, in order to improve the predictive performance. In this paper, we have developed CPPred-FL, a powerful bioinformatics tool for fast, accurate and large-scale identification of CPPs. In our predictor, we introduce a new feature representation learning scheme that enables one to learn feature representations from totally 45 well-trained random forest models with multiple feature descriptors from different perspectives, such as compositional information, position-specific information and physicochemical properties, etc. We integrate class and probabilistic information into our feature representations. To improve the feature representation ability, we further remove redundant and irrelevant features by feature space optimization. Benchmarking experiments showed that CPPred-FL, using 19 informative features only, is able to achieve better performance than the state-of-the-art predictors. We anticipate that CPPred-FL will be a powerful tool for large-scale identification of CPPs, facilitating the characterization of their functional mechanisms and accelerating their applications in clinical therapy.

15.
J Biomed Inform ; 128: 104049, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35283266

RESUMO

Renal cell carcinoma (RCC) is one of the deadliest cancers and mainly consists of three subtypes: kidney clear cell carcinoma (KIRC), kidney papillary cell carcinoma (KIRP), and kidney chromophobe (KICH). Gene signature identification plays an important role in the precise classification of RCC subtypes and personalized treatment. However, most of the existing gene selection methods focus on statically selecting the same informative genes for each subtype, and fail to consider the heterogeneity of patients which causes pattern differences in each subtype. In this work, to explore different informative gene subsets for each subtype, we propose a novel gene selection method, named sequential reinforcement active feature learning (SRAFL), which dynamically acquire the different genes in each sample to identify the different gene signatures for each subtype. The proposed SRAFL method combines the cancer subtype classifier with the reinforcement learning (RL) agent, which sequentially select the active genes in each sample from three mixed RCC subtypes in a cost-sensitive manner. Moreover, the module-based gene filtering is run before gene selection to filter the redundant genes. We mainly evaluate the proposed SRAFL method based on mRNA and long non-coding RNA (lncRNA) expression profiles of RCC datasets from The Cancer Genome Atlas (TCGA). The experimental results demonstrate that the proposed method can automatically identify different gene signatures for different subtypes to accurately classify RCC subtypes. More importantly, we here for the first time show the proposed SRAFL method can consider the heterogeneity of samples to select different gene signatures for different RCC subtypes, which shows more potential for the precision-based RCC care in the future.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/metabolismo , Genoma , Humanos , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Neoplasias Renais/metabolismo , RNA Mensageiro
16.
Sensors (Basel) ; 22(20)2022 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-36298289

RESUMO

The Tactile Internet enables physical touch to be transmitted over the Internet. In the context of electronic medicine, an authenticated key agreement for the Tactile Internet allows surgeons to perform operations via robotic systems and receive tactile feedback from remote patients. The fifth generation of networks has completely changed the network space and has increased the efficiency of the Tactile Internet with its ultra-low latency, high data rates, and reliable connectivity. However, inappropriate and insecure authentication key agreements for the Tactile Internet may cause misjudgment and improper operation by medical staff, endangering the life of patients. In 2021, Kamil et al. developed a novel and lightweight authenticated key agreement scheme that is suitable for remote surgery applications in the Tactile Internet environment. However, their scheme directly encrypts communication messages with constant secret keys and directly stores secret keys in the verifier table, making the scheme vulnerable to possible attacks. Therefore, in this investigation, we discuss the limitations of the scheme proposed by Kamil scheme and present an enhanced scheme. The enhanced scheme is developed using a one-time key to protect communication messages, whereas the verifier table is protected with a secret gateway key to mitigate the mentioned limitations. The enhanced scheme is proven secure against possible attacks, providing more security functionalities than similar schemes and retaining a lightweight computational cost.


Assuntos
Segurança Computacional , Telemedicina , Humanos , Confidencialidade , Tato , Internet
17.
Entropy (Basel) ; 24(9)2022 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-36141058

RESUMO

The publication of trajectory data provides critical information for various location-based services, and it is critical to publish trajectory data safely while ensuring its availability. Differential privacy is a promising privacy protection technology for publishing trajectory data securely. Most of the existing trajectory privacy protection schemes do not take into account the user's preference for location and the influence of semantic location. Besides, differential privacy for trajectory protection still has the problem of balance between the privacy budget and service quality. In this paper, a semantics- and prediction-based differential privacy protection scheme for trajectory data is proposed. Firstly, trajectory data are transformed into a prefix tree structure to ensure that they satisfy differential privacy. Secondly, considering the influence of semantic location on trajectory, semantic sensitivity combined with location check-in frequency is used to calculate the sensitivity of each position in the trajectory. The privacy level of the position is classified by setting thresholds. Moreover, the corresponding privacy budget is allocated according to the location privacy level. Finally, a Markov chain is used to predict the attack probability of each position in the trajectory. On this basis, the allocation of the privacy budget is further adjusted and its utilization rate is improved. Thus, the problem of the balance between the privacy budget and service quality is solved. Experimental results show that the proposed scheme is able to ensure data availability while protecting data privacy.

18.
Int J Med Sci ; 16(7): 949-959, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31341408

RESUMO

Background: In recent years, the development and diagnosis of secondary cancer have become the primary concern of cancer survivors. A number of studies have been developing strategies to extract knowledge from the clinical data, aiming to identify important risk factors that can be used to prevent the recurrence of diseases. However, these studies do not focus on secondary cancer. Secondary cancer is lack of the strategies for clinical treatment as well as risk factor identification to prevent the occurrence. Methods: We propose an effective ensemble feature learning method to identify the risk factors for predicting secondary cancer by considering class imbalance and patient heterogeneity. We first divide the patients into some heterogeneous groups based on spectral clustering. In each group, we apply the oversampling method to balance the number of samples in each class and use them as training data for ensemble feature learning. The purpose of ensemble feature learning is to identify the risk factors and construct a diagnosis model for each group. The importance of risk factors is measured based on the properties of patients in each group separately. We predict secondary cancer by assigning the patient to a corresponding group and based on the diagnosis model in this corresponding group. Results: Analysis of the results shows that the decision tree obtains the best results for predicting secondary cancer in the three classifiers. The best results of the decision tree are 0.72 in terms of AUC when dividing the patients into 15 groups, 0.38 in terms of F1 score when dividing the patients into 20 groups. In terms of AUC, decision tree achieves 67.4% improvement compared to using all 20 predictor variables and 28.6% improvement compared to no group division. In terms of F1 score, decision tree achieves 216.7% improvement compared to using all 20 predictor variables and 80.9% improvement compared to no group division. Different groups provide different ranking results for the predictor variables. Conclusion: The accuracies of predicting secondary cancer using k-nearest neighbor, decision tree, support vector machine indeed increased after using the selected important risk factors as predictors. Group division on patients to predict secondary cancer on the separated models can further improve the prediction accuracies. The information discovered in the experiments can provide important references to the personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with secondary cancer in all phases of the recurrent trajectory.


Assuntos
Sobreviventes de Câncer/estatística & dados numéricos , Análise de Dados , Modelos Biológicos , Segunda Neoplasia Primária/diagnóstico , Conjuntos de Dados como Assunto , Árvores de Decisões , Estudos de Viabilidade , Humanos , Segunda Neoplasia Primária/epidemiologia , Prognóstico , Medição de Risco/métodos , Fatores de Risco , Máquina de Vetores de Suporte
19.
Comput Biol Med ; 171: 108181, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38428094

RESUMO

In the field of drug discovery and pharmacology research, precise and rapid prediction of drug-target binding affinity (DTA) and drug-drug interaction (DDI) are essential for drug efficacy and safety. However, pharmacological data are often distributed across different institutions. Moreover, due to concerns regarding data privacy and intellectual property, the sharing of pharmacological data is often restricted. It is difficult for institutions to achieve the desired performance by solely utilizing their data. This urgent challenge calls for a solution that not only enhances collaboration between multiple institutions to improve prediction accuracy but also safeguards data privacy. In this study, we propose a novel federated learning (FL) framework to advance the prediction of DTA and DDI, namely FL-DTA and FL-DDI. The proposed framework enables multiple institutions to collaboratively train a predictive model without the need to share their local data. Moreover, to ensure data privacy, we employ secure multi-party computation (MPC) during the federated learning model aggregation phase. We evaluated the proposed method on two DTA and one DDI benchmark datasets and compared them with centralized learning and local learning. The experimental results indicate that the proposed method performs closely to centralized learning, and significantly outperforms local learning. Moreover, the proposed framework ensures data security while promoting collaboration among institutions, thereby accelerating the drug discovery process.


Assuntos
Benchmarking , Aprendizagem , Sistemas de Liberação de Medicamentos , Descoberta de Drogas
20.
Comput Biol Med ; 168: 107762, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38056212

RESUMO

Antibiotic resistance continues to be a growing concern for global health, accentuating the need for novel antibiotic discoveries. Traditional methodologies in this field have relied heavily on extensive experimental screening, which is often time-consuming and costly. Contrastly, computer-assisted drug screening offers rapid, cost-effective solutions. In this work, we propose FIAMol-AB, a deep learning model that combines graph neural networks, text convolutional networks and molecular fingerprint techniques. This method also combines an attention mechanism to fuse multiple forms of information within the model. The experiments show that FIAMol-AB may offer potential advantages in antibiotic discovery tasks over some existing methods. We conducted some analysis based on our model's results, which help highlight the potential significance of certain features in the model's predictive performance. Compared to different models, ours demonstrate promising results, indicating potential robustness and versatility. This suggests that by integrating multi-view information and attention mechanisms, FIAMol-AB might better learn complex molecular structures, potentially improving the precision and efficiency of antibiotic discovery. We hope our FIAMol-AB can be used as a useful method in the ongoing fight against antibiotic resistance.


Assuntos
Aprendizado Profundo , Antibacterianos/farmacologia , Avaliação Pré-Clínica de Medicamentos , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA