Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Methods ; 131: 83-92, 2017 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-28694066

RESUMO

Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics on the PPIK network, we further propose to represent proteins with metagraphs. Different from a traditional network motif or subgraph, a metagraph can capture a particular topological arrangement involving the interactions/associations between both proteins and keywords. Based on the novel metagraph representations for proteins, we further build classifiers for disease protein classification through supervised learning. Our experiments on three different PPI databases demonstrate that the proposed method consistently improves disease protein prediction across various classifiers, by 15.3% in AUC on average. It outperforms the baselines including the diffusion-based methods (e.g., RWR) and the module-based methods by 13.8-32.9% for overall disease protein prediction. For predicting breast cancer genes, it outperforms RWR, PRINCE and the module-based baselines by 6.6-14.2%. Finally, our predictions also turn out to have better correlations with literature findings from PubMed.


Assuntos
Biologia Computacional/métodos , Doença/genética , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Proteínas/classificação , Humanos , Proteínas/genética , Proteínas/metabolismo , Aprendizado de Máquina Supervisionado
2.
Bioinformatics ; 31(11): 1701-7, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25630377

RESUMO

MOTIVATION: Genome-wide association studies (GWASs) are commonly applied on human genomic data to understand the causal gene combinations statistically connected to certain diseases. Patients involved in these GWASs could be re-identified when the studies release statistical information on a large number of single-nucleotide polymorphisms. Subsequent work, however, found that such privacy attacks are theoretically possible but unsuccessful and unconvincing in real settings. RESULTS: We derive the first practical privacy attack that can successfully identify specific individuals from limited published associations from the Wellcome Trust Case Control Consortium (WTCCC) dataset. For GWAS results computed over 25 randomly selected loci, our algorithm always pinpoints at least one patient from the WTCCC dataset. Moreover, the number of re-identified patients grows rapidly with the number of published genotypes. Finally, we discuss prevention methods to disable the attack, thus providing a solution for enhancing patient privacy. AVAILABILITY AND IMPLEMENTATION: Proofs of the theorems and additional experimental results are available in the support online documents. The attack algorithm codes are publicly available at https://sites.google.com/site/zhangzhenjie/GWAS_attack.zip. The genomic dataset used in the experiments is available at http://www.wtccc.org.uk/ on request.


Assuntos
Algoritmos , Privacidade Genética , Estudo de Associação Genômica Ampla , Genoma Humano , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único
3.
Neural Comput Appl ; 34(16): 13355-13369, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35677085

RESUMO

This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered gradient perturbation, which uses the differentially private stochastic gradient descent algorithm to guarantee the privacy of each time series sample in the learning stage. On the other hand, we considered input perturbation, which adds differential privacy guarantees in each sample of the series before applying any learning. We compared four state-of-the-art recurrent neural networks: Long Short-Term Memory, Gated Recurrent Unit, and their Bidirectional architectures, i.e., Bidirectional-LSTM and Bidirectional-GRU. Extensive experiments were conducted with a real-world multivariate mobility dataset, which we published openly along with this paper. As shown in the results, differentially private deep learning models trained under gradient or input perturbation achieve nearly the same performance as non-private deep learning models, with loss in performance varying between 0.57 and 2.8 % . The contribution of this paper is significant for those involved in urban planning and decision-making, providing a solution to the human mobility multivariate forecast problem through differentially private deep learning models.

4.
Artigo em Inglês | MEDLINE | ID: mdl-32545399

RESUMO

The accurate prediction of ambulance demand provides great value to emergency service providers and people living within a city. It supports the rational and dynamic allocation of ambulances and hospital staffing, and ensures patients have timely access to such resources. However, this task has been challenging due to complex multi-nature dependencies and nonlinear dynamics within ambulance demand, such as spatial characteristics involving the region of the city at which the demand is estimated, short and long-term historical demands, as well as the demographics of a region. Machine learning techniques are thus useful to quantify these characteristics of ambulance demand. However, there is generally a lack of studies that use machine learning tools for a comprehensive modeling of the important demand dependencies to predict ambulance demands. In this paper, an original and novel approach that leverages machine learning tools and extraction of features based on the multi-nature insights of ambulance demands is proposed. We experimentally evaluate the performance of next-day demand prediction across several state-of-the-art machine learning techniques and ambulance demand prediction methods, using real-world ambulatory and demographical datasets obtained from Singapore. We also provide an analysis of this ambulatory dataset and demonstrate the accuracy in modeling dependencies of different natures using various machine learning techniques.


Assuntos
Algoritmos , Ambulâncias , Serviços Médicos de Emergência , Necessidades e Demandas de Serviços de Saúde , Adulto , Idoso , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Gravidez , Singapura
5.
Methods Mol Biol ; 1807: 211-224, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30030814

RESUMO

This chapter is based on exploiting the network-based representations of proteins, metagraphs, in protein-protein interaction network to identify candidate disease-causing proteins. Protein-protein interaction (PPI) networks are effective tools in studying the functional roles of proteins in the development of various diseases. However, they are insufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To enhance PPI networks, we utilize biological properties of individual proteins as well. More specifically, we integrate keywords from UniProt database describing protein properties into the PPI network and construct a novel heterogeneous PPI-Keyword (PPIK) network consisting of both proteins and keywords. As proteins with similar functional duties or involving in the same metabolic pathway tend to have similar topological characteristics, we propose to represent them with metagraphs. Compared to the traditional network motif or subgraph, a metagraph can capture the topological arrangements through not only the protein-protein interactions but also protein-keyword associations. We feed those novel metagraph representations into classifiers for disease protein prediction and conduct our experiments on three different PPI databases. They show that the proposed method consistently increases disease protein prediction performance across various classifiers, by 15.3% in AUC on average. It outperforms the diffusion-based (e.g., RWR) and the module-based baselines by 13.8-32.9% in overall disease protein prediction. Breast cancer protein prediction outperforms RWR, PRINCE, and the module-based baselines by 6.6-14.2%. Finally, our predictions also exhibit better correlations with literature findings from PubMed database.


Assuntos
Algoritmos , Biologia Computacional/métodos , Doença/genética , Mineração de Dados , Humanos , Mapeamento de Interação de Proteínas , Publicações
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA