Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Mapping medical image-text to a joint space via masked modeling.

Chen, Zhihong; Du, Yuhao; Hu, Jinpeng; Liu, Yang; Li, Guanbin; Wan, Xiang; Chang, Tsung-Hui.

Med Image Anal ; 91: 103018, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37976867

RESUMO

Recently, masked autoencoders have demonstrated their feasibility in extracting effective image and text features (e.g., BERT for natural language processing (NLP) and MAE in computer vision (CV)). This study investigates the potential of applying these techniques to vision-and-language representation learning in the medical domain. To this end, we introduce a self-supervised learning paradigm, multi-modal masked autoencoders (M3AE). It learns to map medical images and texts to a joint space by reconstructing pixels and tokens from randomly masked images and texts. Specifically, we design this approach from three aspects: First, taking into account the varying information densities of vision and language, we employ distinct masking ratios for input images and text, with a notably higher masking ratio for images; Second, we utilize visual and textual features from different layers for reconstruction to address varying levels of abstraction in vision and language; Third, we develop different designs for vision and language decoders. We establish a medical vision-and-language benchmark to conduct an extensive evaluation. Our experimental results exhibit the effectiveness of the proposed method, achieving state-of-the-art results on all downstream tasks. Further analyses validate the effectiveness of the various components and discuss the limitations of the proposed approach. The source code is available at https://github.com/zhjohnchan/M3AE.

Assuntos

Benchmarking , Idioma , Humanos , Software

Why Batch Normalization Damage Federated Learning on Non-IID Data?

Wang, Yanmeng; Shi, Qingjiang; Chang, Tsung-Hui.

IEEE Trans Neural Netw Learn Syst ; PP2023 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-37910415

RESUMO

As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients. To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability. However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data. While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme. Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence. In this article, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence. In view of this, we develop a new FL algorithm that is tailored to BN, called FedTAN, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation. Comprehensive experimental results demonstrate the superiority of the proposed FedTAN over existing baselines for training BN-based DNN models.

A Novel Sparse Graph-Regularized Singular Value Decomposition Model and Its Application to Genomic Data Analysis.

Min, Wenwen; Wan, Xiang; Chang, Tsung-Hui; Zhang, Shihua.

IEEE Trans Neural Netw Learn Syst ; 33(8): 3842-3856, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-33556027

RESUMO

Learning the gene coexpression pattern is a central challenge for high-dimensional gene expression analysis. Recently, sparse singular value decomposition (SVD) has been used to achieve this goal. However, this model ignores the structural information between variables (e.g., a gene network). The typical graph-regularized penalty can be used to incorporate such prior graph information to achieve more accurate discovery and better interpretability. However, the existing approach fails to consider the opposite effect of variables with negative correlations. In this article, we propose a novel sparse graph-regularized SVD model with absolute operator (AGSVD) for high-dimensional gene expression pattern discovery. The key of AGSVD is to impose a novel graph-regularized penalty ( | u|T L| u| ). However, such a penalty is a nonconvex and nonsmooth function, so it brings new challenges to model solving. We show that the nonconvex problem can be efficiently handled in a convex fashion by adopting an alternating optimization strategy. The simulation results on synthetic data show that our method is more effective than the existing SVD-based ones. In addition, the results on several real gene expression data sets show that the proposed methods can discover more biologically interpretable expression patterns by incorporating the prior gene network.

Assuntos

Algoritmos , Análise de Dados , Redes Reguladoras de Genes , Genômica/métodos , Redes Neurais de Computação

TSCCA: A tensor sparse CCA method for detecting microRNA-gene patterns from multiple cancers.

Min, Wenwen; Chang, Tsung-Hui; Zhang, Shihua; Wan, Xiang.

PLoS Comput Biol ; 17(6): e1009044, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34061840

RESUMO

Existing studies have demonstrated that dysregulation of microRNAs (miRNAs or miRs) is involved in the initiation and progression of cancer. Many efforts have been devoted to identify microRNAs as potential biomarkers for cancer diagnosis, prognosis and therapeutic targets. With the rapid development of miRNA sequencing technology, a vast amount of miRNA expression data for multiple cancers has been collected. These invaluable data repositories provide new paradigms to explore the relationship between miRNAs and cancer. Thus, there is an urgent need to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data in a pan-cancer paradigm. In this study, we present a tensor sparse canonical correlation analysis (TSCCA) method for identifying cancer-related miRNA-gene modules across multiple cancers. TSCCA is able to overcome the drawbacks of existing solutions and capture both the cancer-shared and specific miRNA-gene co-expressed modules with better biological interpretations. We comprehensively evaluate the performance of TSCCA using a set of simulated data and matched miRNA/gene expression data across 33 cancer types from the TCGA database. We uncover several dysfunctional miRNA-gene modules with important biological functions and statistical significance. These modules can advance our understanding of miRNA regulatory mechanisms of cancer and provide insights into miRNA-based treatments for cancer.

Assuntos

Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , Modelos Biológicos , Neoplasias/genética , Humanos

Real-world data medical knowledge graph: construction and applications.

Li, Linfeng; Wang, Peng; Yan, Jun; Wang, Yao; Li, Simin; Jiang, Jinpeng; Sun, Zhe; Tang, Buzhou; Chang, Tsung-Hui; Wang, Shenghui; Liu, Yuting.

Artif Intell Med ; 103: 101817, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-32143785

RESUMO

OBJECTIVE: Medical knowledge graph (KG) is attracting attention from both academic and healthcare industry due to its power in intelligent healthcare applications. In this paper, we introduce a systematic approach to build medical KG from electronic medical records (EMRs) with evaluation by both technical experiments and end to end application examples. MATERIALS AND METHODS: The original data set contains 16,217,270 de-identified clinical visit data of 3,767,198 patients. The KG construction procedure includes 8 steps, which are data preparation, entity recognition, entity normalization, relation extraction, property calculation, graph cleaning, related-entity ranking, and graph embedding respectively. We propose a novel quadruplet structure to represent medical knowledge instead of the classical triplet in KG. A novel related-entity ranking function considering probability, specificity and reliability (PSR) is proposed. Besides, probabilistic translation on hyperplanes (PrTransH) algorithm is used to learn graph embedding for the generated KG. RESULTS: A medical KG with 9 entity types including disease, symptom, etc. was established, which contains 22,508 entities and 579,094 quadruplets. Compared with term frequency - inverse document frequency (TF/IDF) method, the normalized discounted cumulative gain (NDCG@10) increased from 0.799 to 0.906 with the proposed ranking function. The embedding representation for all entities and relations were learned, which are proven to be effective using disease clustering. CONCLUSION: The established systematic procedure can efficiently construct a high-quality medical KG from large-scale EMRs. The proposed ranking function PSR achieves the best performance under all relations, and the disease clustering result validates the efficacy of the learned embedding vector as entity's semantic representation. Moreover, the obtained KG finds many successful applications due to its statistics-based quadruplet. where Ncomin is a minimum co-occurrence number and R is the basic reliability value. The reliability value can measure how reliable is the relationship between Si and Oij. The reason for the definition is the higher value of Nco(Si, Oij), the relationship is more reliable. However, the reliability values of the two relationships should not have a big difference if both of their co-occurrence numbers are very big. In our study, we finally set Ncomin = 10 and R = 1 after some experiments. For instance, if co-occurrence numbers of three relationships are 1, 100 and 10000, their reliability values are 1, 2.96 and 5 respectively.

Assuntos

Bases de Dados Factuais , Registros Eletrônicos de Saúde/organização & administração , Reconhecimento Automatizado de Padrão/métodos , Semântica , Algoritmos , Humanos , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA