Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ArXiv ; 2023 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-37292471

RESUMO

As machine learning (ML) algorithms are increasingly used in high-stakes applications, concerns have arisen that they may be biased against certain social groups. Although many approaches have been proposed to make ML models fair, they typically rely on the assumption that data distributions in training and deployment are identical. Unfortunately, this is commonly violated in practice and a model that is fair during training may lead to an unexpected outcome during its deployment. Although the problem of designing robust ML models under dataset shifts has been widely studied, most existing works focus only on the transfer of accuracy. In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains. We first develop theoretical bounds on the unfairness and expected loss at deployment, and then derive sufficient conditions under which fairness and accuracy can be perfectly transferred via invariant representation learning. Guided by this, we design a learning algorithm such that fair ML models learned with training data still have high fairness and accuracy when deployment environments change. Experiments on real-world data validate the proposed algorithm. Model implementation is available at https://github.com/pth1993/FATDM.

2.
Knowl Inf Syst ; 65(4): 1487-1521, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36998311

RESUMO

In healthcare domain, complication risk profiling which can be seen as multiple clinical risk prediction tasks is challenging due to the complex interaction between heterogeneous clinical entities. With the availability of real-world data, many deep learning methods are proposed for complication risk profiling. However, the existing methods face three open challenges. First, they leverage clinical data from a single view and then lead to suboptimal models. Second, most existing methods lack an effective mechanism to interpret predictions. Third, models learned from clinical data may have inherent pre-existing biases and exhibit discrimination against certain social groups. We then propose a multi-view multi-task network (MuViTaNet) to tackle these issues. MuViTaNet complements patient representation by using a multi-view encoder to exploit more information. Moreover, it uses a multi-task learning to generate more generalized representations using both labeled and unlabeled datasets. Last, a fairness variant (F-MuViTaNet) is proposed to mitigate the unfairness issues and promote healthcare equity. The experiments show that MuViTaNet outperforms existing methods for cardiac complication profiling. Its architecture also provides an effective mechanism for interpreting the predictions, which helps clinicians discover the underlying mechanism triggering the complication onsets. F-MuViTaNet can also effectively mitigate the unfairness with only negligible impact on accuracy.

3.
Proc SIAM Int Conf Data Min ; 2022: 720-728, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35509686

RESUMO

De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.

4.
Patterns (N Y) ; 3(4): 100441, 2022 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-35465231

RESUMO

Chemical-induced gene expression profiles provide critical information of chemicals in a biological system, thus offering new opportunities for drug discovery. Despite their success, large-scale analysis leveraging gene expressions is limited by time and cost. Although several methods for predicting gene expressions were proposed, they only focused on imputation and classification settings, which have limited applications to real-world scenarios of drug discovery. Therefore, a chemical-induced gene expression ranking (CIGER) framework is proposed to target a more realistic but more challenging setting in which overall rankings in gene expression profiles induced by de novo chemicals are predicted. The experimental results show that CIGER significantly outperforms existing methods in both ranking and classification metrics. Furthermore, a drug screening pipeline based on CIGER is proposed to identify potential treatments of drug-resistant pancreatic cancer. Our predictions have been validated by experiments, thereby showing the effectiveness of CIGER for phenotypic compound screening of precision medicine.

5.
Nat Mach Intell ; 3(3): 247-257, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33796820

RESUMO

Phenotype-based compound screening has advantages over target-based drug discovery, but is unscalable and lacks understanding of mechanism. Chemical-induced gene expression profile provides a mechanistic signature of phenotypic response. However, the use of such data is limited by their sparseness, unreliability, and relatively low throughput. Few methods can perform phenotype-based de novo chemical compound screening. Here, we propose a mechanism-driven neural network-based method DeepCE, which utilizes graph neural network and multi-head attention mechanism to model chemical substructure-gene and gene-gene associations, for predicting the differential gene expression profile perturbed by de novo chemicals. Moreover, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves superior performances to state-of-the-art methods. The effectiveness of gene expression profiles generated from DeepCE is further supported by comparing them with observed data for downstream classification tasks. To demonstrate the value of DeepCE, we apply it to drug repurposing of COVID-19, and generate novel lead compounds consistent with clinical evidence. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data and screening novel chemicals for the modulation of a systemic response to disease.

6.
bioRxiv ; 2020 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-32743586

RESUMO

Target-based high-throughput compound screening dominates conventional one-drug-one-gene drug discovery process. However, the readout from the chemical modulation of a single protein is poorly correlated with phenotypic response of organism, leading to high failure rate in drug development. Chemical-induced gene expression profile provides an attractive solution to phenotype-based screening. However, the use of such data is currently limited by their sparseness, unreliability, and relatively low throughput. Several methods have been proposed to impute missing values for gene expression datasets. However, few existing methods can perform de novo chemical compound screening. In this study, we propose a mechanism-driven neural network-based method named DeepCE (Deep Chemical Expression) which utilizes graph convolutional neural network to learn chemical representation and multi-head attention mechanism to model chemical substructure-gene and gene-gene feature associations. In addition, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves the superior performances not only in de novo chemical setting but also in traditional imputation setting compared to state-of-the-art baselines for the prediction of chemical-induced gene expression. We further verify the effectiveness of gene expression profiles generated from DeepCE by comparing them with gene expression profiles in L1000 dataset for downstream classification tasks including drug-target and disease predictions. To demonstrate the value of DeepCE, we apply it to patient-specific drug repurposing of COVID-19 for the first time, and generate novel lead compounds consistent with clinical evidences. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data as well as screening novel chemicals for the modulation of systemic response to disease.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...