Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15098-15119, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37624713

RESUMO

As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years. Instead of providing explicit guidance for network training, multimodal guidance offers intuitive and flexible means for image synthesis and editing. On the other hand, this field is also facing several challenges in alignment of multimodal features, synthesis of high-resolution images, faithful evaluation metrics, etc. In this survey, we comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modalities and model types. We start with an introduction to different guidance modalities in image synthesis and editing, and then describe multimodal image synthesis and editing approaches extensively according to their model types. After that, we describe benchmark datasets and evaluation metrics as well as corresponding experimental results. Finally, we provide insights about the current research challenges and possible directions for future research.

2.
IEEE Trans Med Imaging ; 42(7): 1944-1954, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37015445

RESUMO

Data government has played an instrumental role in securing the privacy-critical infrastructure in the medical domain and has led to an increased need of federated learning (FL). While decentralization can limit the effectiveness of standard supervised learning, the impact of decentralization on partially supervised learning remains unclear. Besides, due to data scarcity, each client may have access to only limited partially labeled data. As a remedy, this work formulates and discusses a new learning problem federated partially supervised learning (FPSL) for limited decentralized medical images with partial labels. We study the impact of decentralized partially labeled data on deep learning-based models via an exemplar of FPSL, namely, federated partially supervised learning multi-label classification. By dissecting FedAVG, a seminal FL framework, we formulate and analyze two major challenges of FPSL and propose a simple yet robust FPSL framework, FedPSL, which addresses these challenges. In particular, FedPSL contains two modules, task-dependent model aggregation and task-agnostic decoupling learning, where the first module addresses the weight assignment and the second module improves the generalization ability of the feature extractor. We provide a comprehensive empirical understanding of FSPL under data scarcity with simulated experiments. The empirical results not only indicate that FPSL is an under-explored problem with practical value but also show that the proposed FedPSL can achieve robust performance against baseline methods on data challenges such as data scarcity and domain shifts. The findings of this study also pose a new research direction towards label-efficient learning on medical images.


Assuntos
Diagnóstico por Imagem , Aprendizado de Máquina Supervisionado , Humanos
3.
Value Health ; 26(9): 1301-1307, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36736697

RESUMO

OBJECTIVES: The aim to this study was to assess preferences for sharing of electronic health record (EHR) and genetic information separately and to examine whether there are different preferences for sharing these 2 types of information. METHODS: Using a population-based, nationally representative survey of the United States, we conducted a discrete choice experiment in which half of the subjects (N = 790) responded to questions about sharing of genetic information and the other half (N = 751) to questions about sharing of EHR information. Conditional logistic regression models assessed relative preferences across attribute levels of where patients learn about health information sharing, whether shared data are deidentified, whether data are commercialized, how long biospecimens are kept, and what the purpose of sharing the information is. RESULTS: Individuals had strong preferences to share deidentified (vs identified) data (odds ratio [OR] 3.26, 95% confidence interval 2.68-3.96) and to be able to opt out of sharing information with commercial companies (OR 4.26, 95% confidence interval 3.42-5.30). There were no significant differences regarding how long biospecimens are kept or why the data are being shared. Individuals had a stronger preference for opting out of sharing genetic (OR 4.26) versus EHR information (OR 2.64) (P = .002). CONCLUSIONS: Hospital systems and regulatory bodies should consider patient preferences for sharing of personal medical records or genetic information. For both genetic and EHR information, patients strongly prefer their data to be deidentified and to have the choice to opt out of sharing information with commercial companies.


Assuntos
Confidencialidade , Registros Eletrônicos de Saúde , Humanos , Estados Unidos , Disseminação de Informação , Modelos Logísticos , Coleta de Dados
4.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 12832-12843, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35917572

RESUMO

Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, which (i) is the first image-level few-shot detector, and (ii) introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes for robust and accurate few-shot object detection. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. In addition, the introduced correlational meta-learning enables Meta-DETR to simultaneously attend to multiple support classes within a single feedforward, which allows to capture the inter-class correlation among different classes, thus significantly reducing the misclassification over similar classes and enhancing knowledge generalization to novel classes. Experiments over multiple few-shot object detection benchmarks show that the proposed Meta-DETR outperforms state-of-the-art methods by large margins. The implementation codes are publicly available at https://github.com/ZhangGongjie/Meta-DETR.

5.
J Comput Biol ; 29(12): 1353-1356, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36194088

RESUMO

We introduce the python software package Kernel Mixed Model (KMM), which allows users to incorporate the network structure into transcriptome-wide association studies (TWASs). Our software is based on the association algorithm KMM, which is a method that enables the incorporation of the network structure as the kernels of the linear mixed model for TWAS. The implementation of the algorithm aims to offer users simple access to the algorithm through a one-line command. Furthermore, to improve the computing efficiency in case when the interaction network is sparse, we also provide the flexibility of computing with the sparse counterpart of the matrices offered in Python, which reduces both the computation operations and the memory required.


Assuntos
Software , Transcriptoma , Algoritmos , Modelos Lineares , Estudo de Associação Genômica Ampla/métodos
6.
Nat Commun ; 13(1): 6039, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-36266298

RESUMO

The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, with mission critical measures and robustness throughout the process. Drawing on experience in both spacecraft engineering and machine learning (research through product across domain areas), we've developed a proven systems engineering approach for machine learning and artificial intelligence: the Machine Learning Technology Readiness Levels framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for machine learning workflows, including key distinctions from traditional software engineering, and a lingua franca for people across teams and organizations to work collaboratively on machine learning and artificial intelligence technologies. Here we describe the framework and elucidate with use-cases from physics research to computer vision apps to medical diagnostics.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Humanos , Tecnologia , Software , Engenharia
7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 3849-3853, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36085751

RESUMO

Deep neural networks (DNNs) are the primary driving force for the current development of medical imaging analysis tools and often provide exciting performance on various tasks. However, such results are usually reported on the overall performance of DNNs, such as the Peak signal-to-noise ratio (PSNR) or mean square error (MSE) for imaging generation tasks. As a black-box, DNNs usually produce a relatively stable performance on the same task across multiple training trials, while the learned feature spaces could be significantly different. We believe additional insightful analysis, such as uncertainty analysis of the learned feature space, is equally important, if not more. Through this work, we evaluate the learned feature space of multiple U-Net architectures for image generation tasks using computational analysis and clustering analysis methods. We demonstrate that the learned feature spaces are easily separable between different training trials of the same architecture with the same hyperparameter setting, indicating the models using different criteria for the same tasks. This phenomenon naturally raises the question of which criteria are correct to use. Thus, our work suggests that assessments other than overall performance are needed before applying a DNN model to real-world practice.


Assuntos
Diagnóstico por Imagem , Redes Neurais de Computação , Incerteza
8.
Artigo em Inglês | MEDLINE | ID: mdl-35895656

RESUMO

Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. However, in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs an instance discrimination task, which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this article, we propose a prototypical graph contrastive learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then, given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proven to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods. The code is publicly available at https://github.com/ha-lins/PGCL.

9.
J Comput Biol ; 29(3): 233-242, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35230156

RESUMO

Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique to trade off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors-population stratification and environmental confounding factors-and study how different methods that are commonly used in practice trade off these two confounding factors differently.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Estudo de Associação Genômica Ampla/métodos , Modelos Lineares , Polimorfismo de Nucleotídeo Único
10.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7610-7620, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34156951

RESUMO

Clustering algorithms based on deep neural networks have been widely studied for image analysis. Most existing methods require partial knowledge of the true labels, namely, the number of clusters, which is usually not available in practice. In this article, we propose a Bayesian nonparametric framework, deep nonparametric Bayes (DNB), for jointly learning image clusters and deep representations in a doubly unsupervised manner. In doubly unsupervised learning, we are dealing with the problem of "unknown unknowns," where we estimate not only the unknown image labels but also the unknown number of labels as well. The proposed algorithm alternates between generating a potentially unbounded number of clusters in the forward pass and learning the deep networks in the backward pass. With the help of the Dirichlet process mixtures, the proposed method is able to partition the latent representations space without specifying the number of clusters a priori. An important feature of this work is that all the estimation is realized with an end-to-end solution, which is very different from the methods that rely on post hoc analysis to select the number of clusters. Another key idea in this article is to provide a principled solution to the problem of "trivial solution" for deep clustering, which has not been much studied in the current literature. With extensive experiments on benchmark datasets, we show that our doubly unsupervised method achieves good clustering performance and outperforms many other unsupervised image clustering methods.

11.
J Chem Phys ; 154(12): 124118, 2021 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-33810693

RESUMO

The recent boom in computational chemistry has enabled several projects aimed at discovering useful materials or catalysts. We acknowledge and address two recurring issues in the field of computational catalyst discovery. First, calculating macro-scale catalyst properties is not straightforward when using ensembles of atomic-scale calculations [e.g., density functional theory (DFT)]. We attempt to address this issue by creating a multi-scale model that estimates bulk catalyst activity using adsorption energy predictions from both DFT and machine learning models. The second issue is that many catalyst discovery efforts seek to optimize catalyst properties, but optimization is an inherently exploitative objective that is in tension with the explorative nature of early-stage discovery projects. In other words, why invest so much time finding a "best" catalyst when it is likely to fail for some other, unforeseen problem? We address this issue by relaxing the catalyst discovery goal into a classification problem: "What is the set of catalysts that is worth testing experimentally?" Here, we present a catalyst discovery method called myopic multiscale sampling, which combines multiscale modeling with automated selection of DFT calculations. It is an active classification strategy that seeks to classify catalysts as "worth investigating" or "not worth investigating" experimentally. Our results show an ∼7-16 times speedup in catalyst classification relative to random sampling. These results were based on offline simulations of our algorithm on two different datasets: a larger, synthesized dataset and a smaller, real dataset.

12.
BMC Bioinformatics ; 22(1): 50, 2021 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-33546598

RESUMO

BACKGROUND: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .


Assuntos
Estudo de Associação Genômica Ampla , Fenótipo , Software , Algoritmos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único
13.
Bioinformatics ; 37(16): 2340-2346, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33620460

RESUMO

MOTIVATION: Cryo-Electron Tomography (cryo-ET) is a 3D bioimaging tool that visualizes the structural and spatial organization of macromolecules at a near-native state in single cells, which has broad applications in life science. However, the systematic structural recognition and recovery of macromolecules captured by cryo-ET are difficult due to high structural complexity and imaging limits. Deep learning-based subtomogram classification has played critical roles for such tasks. As supervised approaches, however, their performance relies on sufficient and laborious annotation on a large training dataset. RESULTS: To alleviate this major labeling burden, we proposed a Hybrid Active Learning (HAL) framework for querying subtomograms for labeling from a large unlabeled subtomogram pool. Firstly, HAL adopts uncertainty sampling to select the subtomograms that have the most uncertain predictions. This strategy enforces the model to be aware of the inductive bias during classification and subtomogram selection, which satisfies the discriminativeness principle in AL literature. Moreover, to mitigate the sampling bias caused by such strategy, a discriminator is introduced to judge if a certain subtomogram is labeled or unlabeled and subsequently the model queries the subtomogram that have higher probabilities to be unlabeled. Such query strategy encourages to match the data distribution between the labeled and unlabeled subtomogram samples, which essentially encodes the representativeness criterion into the subtomogram selection process. Additionally, HAL introduces a subset sampling strategy to improve the diversity of the query set, so that the information overlap is decreased between the queried batches and the algorithmic efficiency is improved. Our experiments on subtomogram classification tasks using both simulated and real data demonstrate that we can achieve comparable testing performance (on average only 3% accuracy drop) by using less than 30% of the labeled subtomograms, which shows a very promising result for subtomogram classification task with limited labeling resources. AVAILABILITY AND IMPLEMENTATION: https://github.com/xulabs/aitom. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

14.
J Comput Biol ; 28(5): 501-513, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33470876

RESUMO

Dimensionality reduction is an important first step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. In addition to enabling the visualization of the profiled cells, such representations are used by many downstream analyses methods ranging from pseudo-time reconstruction to clustering to alignment of scRNA-seq data from different experiments, platforms, and laboratories. Both supervised and unsupervised methods have been proposed to reduce the dimension of scRNA-seq. However, all methods to date are sensitive to batch effects. When batches correlate with cell types, as is often the case, their impact can lead to representations that are batch rather than cell-type specific. To overcome this, we developed a domain adversarial neural network model for learning a reduced dimension representation of scRNA-seq data. The adversarial model tries to simultaneously optimize two objectives. The first is the accuracy of cell-type assignment and the second is the inability to distinguish the batch (domain). We tested the method by using the resulting representation to align several different data sets. As we show, by overcoming batch effects our method was able to correctly separate cell types, improving on several prior methods suggested for this task. Analysis of the top features used by the network indicates that by taking the batch impact into account, the reduced representation is much better able to focus on key genes for each cell type.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Humanos , Análise de Célula Única , Aprendizado de Máquina Supervisionado
15.
PLoS Comput Biol ; 16(11): e1008297, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33151940

RESUMO

In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.


Assuntos
Aprendizado Profundo , Desoxiguanosina/metabolismo , Poli A/metabolismo , Transdução de Sinais , Animais , Humanos , Redes Neurais de Computação , Especificidade da Espécie
16.
Int J Comput Assist Radiol Surg ; 15(7): 1205-1213, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32445127

RESUMO

PURPOSE: The cup-to-disc ratio (CDR), a clinical metric of the relative size of the optic cup to the optic disc, is a key indicator of glaucoma, a chronic eye disease leading to loss of vision. CDR can be measured from fundus images through the segmentation of optic disc and optic cup . Deep convolutional networks have been proposed to achieve biomedical image segmentation with less time and more accuracy, but requires large amounts of annotated training data on a target domain, which is often unavailable. Unsupervised domain adaptation framework alleviates this problem through leveraging off-the-shelf labeled data from its relevant source domains, which is realized by learning domain invariant features and improving the generalization capabilities of the segmentation model. METHODS: In this paper, we propose a WGAN domain adaptation framework for detecting optic disc-and-cup boundary in fundus images. Specifically, we build a novel adversarial domain adaptation framework that is guided by Wasserstein distance, therefore with better stability and convergence than typical adversarial methods. We finally evaluate our approach on publicly available datasets. RESULTS: Our experiments show that the proposed approach improves Intersection-over-Union score for optic disc-and-cup segmentation, Dice score and reduces the root-mean-square error of cup-to-disc ratio, when we compare it with direct transfer learning and other state-of-the-art adversarial domain adaptation methods. CONCLUSION: With this work, we demonstrate that WGAN guided domain adaptation obtains a state-of-the-art performance for the joint optic disc-and-cup segmentation in fundus images.


Assuntos
Aprendizado Profundo , Fundo de Olho , Glaucoma/diagnóstico por imagem , Processamento de Imagem Assistida por Computador , Disco Óptico/diagnóstico por imagem , Imagem Óptica/métodos , Humanos
17.
BMC Med Genomics ; 13(Suppl 3): 19, 2020 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-32093702

RESUMO

BACKGROUND: The current understanding of the genetic basis of complex human diseases is that they are caused and affected by many common and rare genetic variants. A considerable number of the disease-associated variants have been identified by Genome Wide Association Studies, however, they can explain only a small proportion of heritability. One of the possible reasons for the missing heritability is that many undiscovered disease-causing variants are weakly associated with the disease. This can pose serious challenges to many statistical methods, which seems to be only capable of identifying disease-associated variants with relatively stronger coefficients. RESULTS: In order to help identify weaker variants, we propose a novel statistical method, Constrained Sparse multi-locus Linear Mixed Model (CS-LMM) that aims to uncover genetic variants of weaker associations by incorporating known associations as a prior knowledge in the model. Moreover, CS-LMM accounts for polygenic effects as well as corrects for complex relatednesses. Our simulation experiments show that CS-LMM outperforms other competing existing methods in various settings when the combinations of MAFs and coefficients reflect different scenarios in complex human diseases. CONCLUSIONS: We also apply our method to the GWAS data of alcoholism and Alzheimer's disease and exploratively discover several SNPs. Many of these discoveries are supported through literature survey. Furthermore, our association results strengthen the belief in genetic links between alcoholism and Alzheimer's disease.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Estatística como Assunto/métodos , Adulto , Alcoolismo/genética , Algoritmos , Doença de Alzheimer/genética , Simulação por Computador , Feminino , Variação Genética , Humanos , Masculino , Modelos Genéticos , Polimorfismo de Nucleotídeo Único
18.
BMC Bioinformatics ; 20(Suppl 23): 656, 2019 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-31881907

RESUMO

BACKGROUND: Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. RESULTS: In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. CONCLUSIONS: After validating the performance of our method using simulation experiments, we further apply it to Alzheimer's disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer's disease.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Modelos Genéticos , Algoritmos , Doença de Alzheimer/genética , Área Sob a Curva , Sequência de Bases , Simulação por Computador , Humanos , Polimorfismo de Nucleotídeo Único/genética , Curva ROC
19.
Pac Symp Biocomput ; 24: 54-65, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864310

RESUMO

The proliferation of healthcare data has brought the opportunities of applying data-driven approaches, such as machine learning methods, to assist diagnosis. Recently, many deep learning methods have been shown with impressive successes in predicting disease status with raw input data. However, the "black-box" nature of deep learning and the highreliability requirement of biomedical applications have created new challenges regarding the existence of confounding factors. In this paper, with a brief argument that inappropriate handling of confounding factors will lead to models' sub-optimal performance in real-world applications, we present an efficient method that can remove the inuences of confounding factors such as age or gender to improve the across-cohort prediction accuracy of neural networks. One distinct advantage of our method is that it only requires minimal changes of the baseline model's architecture so that it can be plugged into most of the existing neural networks. We conduct experiments across CT-scan, MRA, and EEG brain wave with convolutional neural networks and LSTM to verify the efficiency of our method.


Assuntos
Aprendizado Profundo , Informática Médica , Redes Neurais de Computação , Biologia Computacional , Diagnóstico por Computador , Humanos , Aprendizado de Máquina , Aplicações da Informática Médica , Computação em Informática Médica
20.
Pac Symp Biocomput ; 24: 112-123, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864315

RESUMO

The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the exibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.


Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas/estatística & dados numéricos , Aprendizado Profundo , Estudos de Associação Genética/estatística & dados numéricos , Algoritmos , Biologia Computacional/métodos , Técnicas de Apoio para a Decisão , Humanos , Bases de Conhecimento , Cadeias de Markov , PubMed , Reprodutibilidade dos Testes , Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA