Pesquisa | Portal Regional da BVS

1.

The hatred of all against all? Evidence from online community platforms in South Korea.

Koo, Jeong-Woo; Suh, Chan S; Chung, Jin Won; Sohn, Kyung-Ah; Han, Kyungsik.

PLoS One ; 19(5): e0300530, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38709721

RESUMO

BACKGROUND: Over several years of recent efforts to make sense and detect online hate speech, we still know relatively little about how hateful expressions enter online platforms and whether there are patterns and features characterizing the corpus of hateful speech. OBJECTIVE: In this research, we introduce a new conceptual framework suitable for better capturing the overall scope and dynamics of the current forms of online hateful speech. METHODS: We adopt several Python-based crawlers to collect a comprehensive data set covering a variety of subjects from a multiplicity of online communities in South Korea. We apply the notions of marginalization and polarization in identifying patterns and dynamics of online hateful speech. RESULTS: Our analyses suggest that polarization driven by political orientation and age difference predominates in the hateful speech in most communities, while marginalization of social minority groups is also salient in other communities. Furthermore, we identify a temporal shift in the trends of online hate from gender to age based, reflecting the changing sociopolitical conditions within the polarization dynamics in South Korea. CONCLUSION: By expanding our understanding of how hatred shifts and evolves in online communities, our study provides theoretical and practical implications for both researchers and policy-makers.

Assuntos

Internet , República da Coreia , Humanos , Masculino , Feminino , Adulto , Política , Adulto Jovem , Pessoa de Meia-Idade

2.

Frequency Domain Deep Learning With Non-Invasive Features for Intraoperative Hypotension Prediction.

Moon, Jeong-Hyeon; Lee, Garam; Lee, Seung Mi; Ryu, Jiho; Kim, Dokyoon; Sohn, Kyung-Ah.

IEEE J Biomed Health Inform ; PP2024 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-38768003

RESUMO

BACKGROUND: Intraoperative hypotension can lead to postoperative organ dysfunction. Previous studies primarily used invasive arterial pressure as the key biosignal for the detection of hypotension. However, these studies had limitations in incorporating different biosignal modalities and utilizing the periodic nature of biosignals. To address these limitations, we utilized frequency-domain information, which provides key insights that time-domain analysis cannot provide, as revealed by recent advances in deep learning. With the frequency-domain information, we propose a deep-learning approach that integrates multiple biosignal modalities. METHODS: We used the discrete Fourier transform technique, to extract frequency information from biosignal data, which we then combined with the original time-domain data as input for our deep learning model. To improve the interpretability of our results, we incorporated recent interpretable modules for deep-learning models into our analysis. RESULTS: We constructed 75,994 segments from the data of 3,226 patients to predict hypotension during surgery. Our proposed frequency-domain deep-learning model outperformed conventional approaches that rely solely on time-domain information. Notably, our model achieved a greater increase in AUROC performance than the time-domain deep learning models when trained on non-invasive biosignal data only (AUROC 0.898 [95% CI: 0.885-0.91] vs. 0.853 [95% CI: 0.839-0.867]). Further analysis revealed that the 1.5-3.0 Hz frequency band played an important role in predicting hypotension events. CONCLUSION: Utilizing the frequency domain not only demonstrated high performance on invasive data but also showed significant performance improvement when applied to non-invasive data alone. Our proposed framework offers clinicians a novel perspective for predicting intraoperative hypotension.

3.

Author Correction: Predicting Alzheimer's disease progression using multi-modal deep learning approach.

Lee, Garam; Nho, Kwangsik; Kang, Byungkon; Sohn, Kyung-Ah; Kim, Dokyoon.

Sci Rep ; 13(1): 12466, 2023 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-37528098

4.

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.

Wang, Sehee; Kim, So Yeon; Sohn, Kyung-Ah.

Bioengineering (Basel) ; 10(7)2023 Jul 10.

Artigo em Inglês | MEDLINE | ID: mdl-37508851

RESUMO

Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issues by using reconstruction error from low-dimensional embeddings as a proxy for the entropy term in the mutual information. However, ClearF still has limitations, including a nontransparent bottleneck layer selection process, which can result in unstable feature selection. To address these limitations, we propose ClearF++, which simplifies the bottleneck layer selection and incorporates feature-wise clustering to enhance biomarker detection. We compare its performance with other commonly used methods such as MultiSURF and IFS, as well as ClearF, across multiple benchmark datasets. Our results demonstrate that ClearF++ consistently outperforms these methods in terms of prediction accuracy and stability, even with limited samples. We also observe that employing the Deep Embedded Clustering (DEC) algorithm for feature-wise clustering improves performance, indicating its suitability for handling complex data structures with limited samples. ClearF++ offers an improved biomarker prioritization approach with enhanced prediction performance and faster execution. Its stability and effectiveness with limited samples make it particularly valuable for biomedical data analysis.

5.

A deep learning model for screening type 2 diabetes from retinal photographs.

Yun, Jae-Seung; Kim, Jaesik; Jung, Sang-Hyuk; Cha, Seon-Ah; Ko, Seung-Hyun; Ahn, Yu-Bae; Won, Hong-Hee; Sohn, Kyung-Ah; Kim, Dokyoon.

Nutr Metab Cardiovasc Dis ; 32(5): 1218-1226, 2022 05.

Artigo em Inglês | MEDLINE | ID: mdl-35197214

RESUMO

BACKGROUND AND AIMS: We aimed to develop and evaluate a non-invasive deep learning algorithm for screening type 2 diabetes in UK Biobank participants using retinal images. METHODS AND RESULTS: The deep learning model for prediction of type 2 diabetes was trained on retinal images from 50,077 UK Biobank participants and tested on 12,185 participants. We evaluated its performance in terms of predicting traditional risk factors (TRFs) and genetic risk for diabetes. Next, we compared the performance of three models in predicting type 2 diabetes using 1) an image-only deep learning algorithm, 2) TRFs, 3) the combination of the algorithm and TRFs. Assessing net reclassification improvement (NRI) allowed quantification of the improvement afforded by adding the algorithm to the TRF model. When predicting TRFs with the deep learning algorithm, the areas under the curve (AUCs) obtained with the validation set for age, sex, and HbA1c status were 0.931 (0.928-0.934), 0.933 (0.929-0.936), and 0.734 (0.715-0.752), respectively. When predicting type 2 diabetes, the AUC of the composite logistic model using non-invasive TRFs was 0.810 (0.790-0.830), and that for the deep learning model using only fundus images was 0.731 (0.707-0.756). Upon addition of TRFs to the deep learning algorithm, discriminative performance was improved to 0.844 (0.826-0.861). The addition of the algorithm to the TRFs model improved risk stratification with an overall NRI of 50.8%. CONCLUSION: Our results demonstrate that this deep learning algorithm can be a useful tool for stratifying individuals at high risk of type 2 diabetes in the general population.

Assuntos

Aprendizado Profundo , Diabetes Mellitus Tipo 2 , Algoritmos , Área Sob a Curva , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiologia , Fundo de Olho , Humanos

6.

A Novel Scoring System for Response of Preoperative Chemoradiotherapy in Locally Advanced Rectal Cancer Using Early-Treatment Blood Features Derived From Machine Learning.

Kim, Jaesik; Sohn, Kyung-Ah; Kwak, Jung-Hak; Kim, Min Jung; Ryoo, Seung-Bum; Jeong, Seung-Yong; Park, Kyu Joo; Kang, Hyun-Cheol; Chie, Eui Kyu; Jung, Sang-Hyuk; Kim, Dokyoon; Park, Ji Won.

Front Oncol ; 11: 790894, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34912724

RESUMO

BACKGROUND: Preoperative chemoradiotherapy (CRT) is a standard treatment for locally advanced rectal cancer (LARC). However, individual responses to preoperative CRT vary from patient to patient. The aim of this study is to develop a scoring system for the response of preoperative CRT in LARC using blood features derived from machine learning. METHODS: Patients who underwent total mesorectal excision after preoperative CRT were included in this study. The performance of machine learning models using blood features before CRT (pre-CRT) and from 1 to 2 weeks after CRT (early-CRT) was evaluated. Based on the best model, important features were selected. The scoring system was developed from the selected model and features. The performance of the new scoring system was compared with those of systemic inflammatory indicators: neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, lymphocyte-to-monocyte ratio, and the prognostic nutritional index. RESULTS: The models using early-CRT blood features had better performances than those using pre-CRT blood features. Based on the ridge regression model, which showed the best performance among the machine learning models (AUROC 0.6322 and AUPRC 0.5965), a novel scoring system for the response of preoperative CRT, named Response Prediction Score (RPS), was developed. The RPS system showed higher predictive power (AUROC 0.6747) than single blood features and systemic inflammatory indicators and stratified the tumor regression grade and overall downstaging clearly. CONCLUSION: We discovered that we can more accurately predict CRT response by using early-treatment blood data. With larger data, we can develop a more accurate and reliable indicator that can be used in real daily practices. In the future, we urge the collection of early-treatment blood data and pre-treatment blood data.

7.

HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball.

Kim, Jaesik; Kim, Dokyoon; Sohn, Kyung-Ah.

Bioinformatics ; 37(18): 2971-2980, 2021 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-33760022

RESUMO

MOTIVATION: Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. RESULTS: In this article, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. AVAILABILITYAND IMPLEMENTATION: https://github.com/JaesikKim/HiG2Vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional , Proteínas , Ontologia Genética , Biologia Computacional/métodos , Proteínas/genética , Semântica , Anotação de Sequência Molecular , RNA

8.

Multi-layered network-based pathway activity inference using directed random walks: application to predicting clinical outcomes in urologic cancer.

Kim, So Yeon; Choe, Eun Kyung; Shivakumar, Manu; Kim, Dokyoon; Sohn, Kyung-Ah.

Bioinformatics ; 37(16): 2405-2413, 2021 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-33543748

RESUMO

MOTIVATION: To better understand the molecular features of cancers, a comprehensive analysis using multi-omics data has been conducted. In addition, a pathway activity inference method has been developed to facilitate the integrative effects of multiple genes. In this respect, we have recently proposed a novel integrative pathway activity inference approach, iDRW and demonstrated the effectiveness of the method with respect to dichotomizing two survival groups. However, there were several limitations, such as a lack of generality. In this study, we designed a directed gene-gene graph using pathway information by assigning interactions between genes in multiple layers of networks. RESULTS: As a proof-of-concept study, it was evaluated using three genomic profiles of urologic cancer patients. The proposed integrative approach achieved improved outcome prediction performances compared with a single genomic profile alone and other existing pathway activity inference methods. The integrative approach also identified common/cancer-specific candidate driver pathways as predictive prognostic features in urologic cancers. Furthermore, it provides better biological insights into the prioritized pathways and genes in an integrated view using a multi-layered gene-gene network. Our framework is not specifically designed for urologic cancers and can be generally applicable for various datasets. AVAILABILITY AND IMPLEMENTATION: iDRW is implemented as the R software package. The source codes are available at https://github.com/sykim122/iDRW. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.

Interpretable temporal graph neural network for prognostic prediction of Alzheimer's disease using longitudinal neuroimaging data.

Kim, Mansu; Kim, Jaesik; Qu, Jeffrey; Huang, Heng; Long, Qi; Sohn, Kyung-Ah; Kim, Dokyoon; Shen, Li.

Proceedings (IEEE Int Conf Bioinformatics Biomed) ; 2021: 1381-1384, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-35299717

RESUMO

Alzheimer's disease (AD) is a progressive neurodegenerative brain disorder characterized by memory loss and cognitive decline. Early detection and accurate prognosis of AD is an important research topic, and numerous machine learning methods have been proposed to solve this problem. However, traditional machine learning models are facing challenges in effectively integrating longitudinal neuroimaging data and biologically meaningful structure and knowledge to build accurate and interpretable prognostic predictors. To bridge this gap, we propose an interpretable graph neural network (GNN) model for AD prognostic prediction based on longitudinal neuroimaging data while embracing the valuable knowledge of structural brain connectivity. In our empirical study, we demonstrate that 1) the proposed model outperforms several competing models (i.e., DNN, SVM) in terms of prognostic prediction accuracy, and 2) our model can capture neuroanatomical contribution to the prognostic predictor and yield biologically meaningful interpretation to facilitate better mechanistic understanding of the Alzheimer's disease. Source code is available at https://github.com/JaesikKim/temporal-GNN.

10.

Epidemiology, Comorbidities, and Prescription Patterns of Korean Prurigo Nodularis Patients: A Multi-Institution Study.

Woo, Yu-Ri; Wang, Sehee; Sohn, Kyung-Ah; Kim, Hei-Sung.

J Clin Med ; 11(1)2021 Dec 24.

Artigo em Inglês | MEDLINE | ID: mdl-35011837

RESUMO

Prurigo nodularis (PN) is a chronic dermatosis typified by extraordinarily itchy nodules. However, little is known of the nature and extent of PN in Asian people. This study aimed to describe the epidemiology, comorbidities, and prescription pattern of PN in Koreans based on a large dermatology outpatient cohort. Patients with PN were identified from the Catholic Medical Center (CMC) clinical data warehouse. Anonymized data on age, sex, diagnostic codes, prescriptions, visitation dates, and other relevant parameters were collected. Pearson correlation analysis was used to calculate the correlation between PN prevalence and patient age. Conditional logistic regression modeling was adopted to measure the comorbidity risk of PN. A total of 3591 patients with PN were identified at the Catholic Medical Center Health System dermatology outpatient clinic in the period 2007-2020. A comparison of the study patients with age- and sex-matched controls (dermatology outpatients without PN) indicated that PN was associated with various comorbidities including chronic kidney disease (adjusted odds ratio (aOR), 1.48; 95% confidence interval (CI), 1.29-1.70), dyslipidemia (aOR, 1.88; 95% CI, 1.56-2.27), type 2 diabetes mellitus (aOR, 1.37; 95% CI, 1.22-1.54), arterial hypertension (aOR, 1.50; 95% CI, 1.30-1.73), autoimmune thyroiditis (aOR, 2.43; 95% CI, 1.42-4.16), non-Hodgkin's lymphoma (aOR, 1.95; 95% CI, 1.23-3.07), and atopic dermatitis (aOR, 2.16, 95% CI, 1.91-2.45). Regarding prescription patterns, topical steroids were most favored, followed by topical calcineurin inhibitors; oral antihistamines were the most preferred systemic agent for PN. PN is a relatively rare but significant disease among Korean dermatology outpatients with a high comorbidity burden compared to dermatology outpatients without PN. There is great need for breakthroughs in PN treatment.

11.

EEG-Based Emotion Classification for Alzheimer's Disease Patients Using Conventional Machine Learning and Recurrent Neural Network Models.

Seo, Jungryul; Laine, Teemu H; Oh, Gyuhwan; Sohn, Kyung-Ah.

Sensors (Basel) ; 20(24)2020 Dec 16.

Artigo em Inglês | MEDLINE | ID: mdl-33339334

RESUMO

As the number of patients with Alzheimer's disease (AD) increases, the effort needed to care for these patients increases as well. At the same time, advances in information and sensor technologies have reduced caring costs, providing a potential pathway for developing healthcare services for AD patients. For instance, if a virtual reality (VR) system can provide emotion-adaptive content, the time that AD patients spend interacting with VR content is expected to be extended, allowing caregivers to focus on other tasks. As the first step towards this goal, in this study, we develop a classification model that detects AD patients' emotions (e.g., happy, peaceful, or bored). We first collected electroencephalography (EEG) data from 30 Korean female AD patients who watched emotion-evoking videos at a medical rehabilitation center. We applied conventional machine learning algorithms, such as a multilayer perceptron (MLP) and support vector machine, along with deep learning models of recurrent neural network (RNN) architectures. The best performance was obtained from MLP, which achieved an average accuracy of 70.97%; the RNN model's accuracy reached only 48.18%. Our study results open a new stream of research in the field of EEG-based emotion detection for patients with neurological disorders.

Assuntos

Doença de Alzheimer , Eletroencefalografia , Emoções/classificação , Aprendizado de Máquina , Redes Neurais de Computação , Doença de Alzheimer/diagnóstico , Feminino , Humanos

12.

An Exploration of Machine Learning Methods for Robust Boredom Classification Using EEG and GSR Data.

Seo, Jungryul; Laine, Teemu H; Sohn, Kyung-Ah.

Sensors (Basel) ; 19(20)2019 Oct 20.

Artigo em Inglês | MEDLINE | ID: mdl-31635194

RESUMO

In recent years, affective computing has been actively researched to provide a higher level of emotion-awareness. Numerous studies have been conducted to detect the user's emotions from physiological data. Among a myriad of target emotions, boredom, in particular, has been suggested to cause not only medical issues but also challenges in various facets of daily life. However, to the best of our knowledge, no previous studies have used electroencephalography (EEG) and galvanic skin response (GSR) together for boredom classification, although these data have potential features for emotion classification. To investigate the combined effect of these features on boredom classification, we collected EEG and GSR data from 28 participants using off-the-shelf sensors. During data acquisition, we used a set of stimuli comprising a video clip designed to elicit boredom and two other video clips of entertaining content. The collected samples were labeled based on the participants' questionnaire-based testimonies on experienced boredom levels. Using the collected data, we initially trained 30 models with 19 machine learning algorithms and selected the top three candidate classifiers. After tuning the hyperparameters, we validated the final models through 1000 iterations of 10-fold cross validation to increase the robustness of the test results. Our results indicated that a Multilayer Perceptron model performed the best with a mean accuracy of 79.98% (AUC: 0.781). It also revealed the correlation between boredom and the combined features of EEG and GSR. These results can be useful for building accurate affective computing systems and understanding the physiological properties of boredom.

Assuntos

Tédio , Eletroencefalografia/métodos , Aprendizado de Máquina , Adulto , Área Sob a Curva , Análise Discriminante , Feminino , Resposta Galvânica da Pele , Humanos , Masculino , Curva ROC , Inquéritos e Questionários , Adulto Jovem

13.

MildInt: Deep Learning-Based Multimodal Longitudinal Data Integration Framework.

Lee, Garam; Kang, Byungkon; Nho, Kwangsik; Sohn, Kyung-Ah; Kim, Dokyoon.

Front Genet ; 10: 617, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31316553

RESUMO

As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has shown promising results in a variety of research areas. However, applying the deep learning approach requires expertise for constructing a deep architecture that can take multimodal longitudinal data. Thus, in this paper, a deep learning-based python package for data integration is developed. The python package deep learning-based multimodal longitudinal data integration framework (MildInt) provides the preconstructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved. We validated the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer's disease to predict the disease progression. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset.

14.

ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction.

Wang, Sehee; Jeong, Hyun-Hwan; Sohn, Kyung-Ah.

BMC Med Genomics ; 12(Suppl 5): 95, 2019 07 11.

Artigo em Inglês | MEDLINE | ID: mdl-31296201

RESUMO

BACKGROUND: Feature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information. RESULTS: In this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets. CONCLUSIONS: The proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.

Assuntos

Biomarcadores/metabolismo , Biologia Computacional/métodos , Aprendizado de Máquina Supervisionado , Benchmarking

15.

Topological integration of RPPA proteomic data with multi-omics data for survival prediction in breast cancer via pathway activity inference.

Kim, Tae Rim; Jeong, Hyun-Hwan; Sohn, Kyung-Ah.

BMC Med Genomics ; 12(Suppl 5): 94, 2019 07 11.

Artigo em Inglês | MEDLINE | ID: mdl-31296204

RESUMO

BACKGROUND: The analysis of integrated multi-omics data enables the identification of disease-related biomarkers that cannot be identified from a single omics profile. Although protein-level data reflects the cellular status of cancer tissue more directly than gene-level data, past studies have mainly focused on multi-omics integration using gene-level data as opposed to protein-level data. However, the use of protein-level data (such as mass spectrometry) in multi-omics integration has some limitations. For example, the correlation between the characteristics of gene-level data (such as mRNA) and protein-level data is weak, and it is difficult to detect low-abundance signaling proteins that are used to target cancer. The reverse phase protein array (RPPA) is a highly sensitive antibody-based quantification method for signaling proteins. However, the number of protein features in RPPA data is extremely low compared to the number of gene features in gene-level data. In this study, we present a new method for integrating RPPA profiles with RNA-Seq and DNA methylation profiles for survival prediction based on the integrative directed random walk (iDRW) framework proposed in our previous study. In the iDRW framework, each omics profile is merged into a single pathway profile that reflects the topological information of the pathway. In order to address the sparsity of RPPA profiles, we employ the random walk with restart (RWR) approach on the pathway network. RESULTS: Our model was validated using survival prediction analysis for a breast cancer dataset from The Cancer Genome Atlas. Our proposed model exhibited improved performance compared with other methods that utilize pathway information and also out-performed models that did not include the RPPA data utilized in our study. The risk pathways identified for breast cancer in this study were closely related to well-known breast cancer risk pathways. CONCLUSIONS: Our results indicated that RPPA data is useful for survival prediction for breast cancer patients under our framework. We also observed that iDRW effectively integrates RNA-Seq, DNA methylation, and RPPA profiles, while variation in the composition of the omics data can affect both prediction performance and risk pathway identification. These results suggest that omics data composition is a critical parameter for iDRW.

Assuntos

Neoplasias da Mama/metabolismo , Análise Serial de Proteínas , Proteômica , Neoplasias da Mama/genética , Metilação de DNA , Humanos , Análise de Sobrevida

16.

Assessment of intratumoral heterogeneity with mutations and gene expression profiles.

Sung, Ji-Yong; Shin, Hyun-Tae; Sohn, Kyung-Ah; Shin, Soo-Yong; Park, Woong-Yang; Joung, Je-Gun.

PLoS One ; 14(7): e0219682, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31310640

RESUMO

Intratumoral heterogeneity (ITH) refers to the presence of distinct tumor cell populations. It provides vital information for the clinical prognosis, drug responsiveness, and personalized treatment of cancer patients. As genomic ITH in various cancers affects the expression patterns of genes, the expression profile could be utilized for determining ITH level. Herein, we present a novel approach to directly detect high ITH defined as a larger number of subclones from the gene expression pattern through machine learning approaches. We examined associations between gene expression profile and ITH of 12 cancer types from The Cancer Genome Atlas (TCGA) database. Using stomach adenocarcinoma (STAD) showing high association, we evaluated the performance of our method in predicting ITH by employing three machine learning algorithms using gene expression profile data. We classified tumors into high and low heterogeneity groups using the learning model through the selection of LASSO feature. The result showed that support vector machines (SVMs) outperformed other algorithms (AUC = 0.84 in SVMs and 0.82 in Naïve Bayes) and we were able to improve predictive power by using both combined data from mutation and expression. Furthermore, we evaluated the prediction ability of each model using simulation data generated by mixing cell lines of the Cancer Cell Line Encyclopedia (CCLE), and obtained consistent results with using real dataset. Our approach could be utilized for discriminating tumors with heterogeneous cell populations to characterize ITH.

Assuntos

Adenocarcinoma/genética , Perfilação da Expressão Gênica , Mutação , Neoplasias Gástricas/genética , Algoritmos , Área Sob a Curva , Teorema de Bayes , Linhagem Celular Tumoral , Simulação por Computador , Bases de Dados Factuais , Regulação Neoplásica da Expressão Gênica , Heterogeneidade Genética , Genoma Humano , Genômica , Humanos , Prognóstico , Curva ROC , Máquina de Vetores de Suporte , Transcriptoma

17.

Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies.

Kim, So Yeon; Jeong, Hyun-Hwan; Kim, Jaesik; Moon, Jeong-Hyeon; Sohn, Kyung-Ah.

Biol Direct ; 14(1): 8, 2019 04 29.

Artigo em Inglês | MEDLINE | ID: mdl-31036036

RESUMO

BACKGROUND: Integrating the rich information from multi-omics data has been a popular approach to survival prediction and bio-marker identification for several cancer studies. To facilitate the integrative analysis of multiple genomic profiles, several studies have suggested utilizing pathway information rather than using individual genomic profiles. METHODS: We have recently proposed an integrative directed random walk-based method utilizing pathway information (iDRW) for more robust and effective genomic feature extraction. In this study, we applied iDRW to multiple genomic profiles for two different cancers, and designed a directed gene-gene graph which reflects the interaction between gene expression and copy number data. In the experiments, the performances of the iDRW method and four state-of-the-art pathway-based methods were compared using a survival prediction model which classifies samples into two survival groups. RESULTS: The results show that the integrative analysis guided by pathway information not only improves prediction performance, but also provides better biological insights into the top pathways and genes prioritized by the model in both the neuroblastoma and the breast cancer datasets. The pathways and genes selected by the iDRW method were shown to be related to the corresponding cancers. CONCLUSIONS: In this study, we demonstrated the effectiveness of a directed random walk-based multi-omics data integration method applied to gene expression and copy number data for both breast cancer and neuroblastoma datasets. We revamped a directed gene-gene graph considering the impact of copy number variation on gene expression and redefined the weight initialization and gene-scoring method. The benchmark result for iDRW with four pathway-based methods demonstrated that the iDRW method improved survival prediction performance and jointly identified cancer-related pathways and genes for two different cancer datasets. REVIEWERS: This article was reviewed by Helena Molina-Abril and Marta Hidalgo.

Assuntos

Neoplasias da Mama/epidemiologia , Variações do Número de Cópias de DNA , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Neuroblastoma/epidemiologia , Neoplasias da Mama/genética , Biologia Computacional/métodos , Humanos , Modelos Genéticos , Neuroblastoma/genética , Análise de Sobrevida

18.

Predicting Alzheimer's disease progression using multi-modal deep learning approach.

Lee, Garam; Nho, Kwangsik; Kang, Byungkon; Sohn, Kyung-Ah; Kim, Dokyoon.

Sci Rep ; 9(1): 1952, 2019 02 13.

Artigo em Inglês | MEDLINE | ID: mdl-30760848

RESUMO

Alzheimer's disease (AD) is a progressive neurodegenerative condition marked by a decline in cognitive functions with no validated disease modifying treatment. It is critical for timely treatment to detect AD in its earlier stage before clinical manifestation. Mild cognitive impairment (MCI) is an intermediate stage between cognitively normal older adults and AD. To predict conversion from MCI to probable AD, we applied a deep learning approach, multimodal recurrent neural network. We developed an integrative framework that combines not only cross-sectional neuroimaging biomarkers at baseline but also longitudinal cerebrospinal fluid (CSF) and cognitive performance biomarkers obtained from the Alzheimer's Disease Neuroimaging Initiative cohort (ADNI). The proposed framework integrated longitudinal multi-domain data. Our results showed that 1) our prediction model for MCI conversion to AD yielded up to 75% accuracy (area under the curve (AUC) = 0.83) when using only single modality of data separately; and 2) our prediction model achieved the best performance with 81% accuracy (AUC = 0.86) when incorporating longitudinal multi-domain data. A multi-modal deep learning approach has potential to identify persons at risk of developing AD who might benefit most from a clinical trial or as a stratification approach within clinical trials.

Assuntos

Doença de Alzheimer/fisiopatologia , Disfunção Cognitiva/diagnóstico , Previsões/métodos , Idoso , Idoso de 80 Anos ou mais , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Área Sob a Curva , Biomarcadores/líquido cefalorraquidiano , Encéfalo/metabolismo , Cognição/fisiologia , Estudos Transversais , Aprendizado Profundo , Progressão da Doença , Feminino , Humanos , Masculino , Neuroimagem/métodos , Fragmentos de Peptídeos/líquido cefalorraquidiano , Proteínas tau/líquido cefalorraquidiano

19.

Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer.

Kim, So Yeon; Kim, Tae Rim; Jeong, Hyun-Hwan; Sohn, Kyung-Ah.

BMC Med Genomics ; 11(Suppl 3): 68, 2018 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-30255812

RESUMO

BACKGROUND: Integrative analysis on multi-omics data has gained much attention recently. To investigate the interactive effect of gene expression and DNA methylation on cancer, we propose a directed random walk-based approach on an integrated gene-gene graph that is guided by pathway information. METHODS: Our approach first extracts a single pathway profile matrix out of the gene expression and DNA methylation data by performing the random walk over the integrated graph. We then apply a denoising autoencoder to the pathway profile to further identify important pathway features and genes. The extracted features are validated in the survival prediction task for breast cancer patients. RESULTS: The results show that the proposed method substantially improves the survival prediction performance compared to that of other pathway-based prediction methods, revealing that the combined effect of gene expression and methylation data is well reflected in the integrated gene-gene graph combined with pathway information. Furthermore, we show that our joint analysis on the methylation features and gene expression profile identifies cancer-specific pathways with genes related to breast cancer. CONCLUSIONS: In this study, we proposed a DRW-based method on an integrated gene-gene graph with expression and methylation profiles in order to utilize the interactions between them. The results showed that the constructed integrated gene-gene graph can successfully reflect the combined effect of methylation features on gene expression profiles. We also found that the selected features by DA can effectively extract topologically important pathways and genes specifically related to breast cancer.

Assuntos

Biomarcadores Tumorais/genética , Neoplasias da Mama/mortalidade , Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Neoplasias da Mama/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Prognóstico , Taxa de Sobrevida , Transcriptoma

20.

Biological Brain Age Prediction Using Cortical Thickness Data: A Large Scale Cohort Study.

Aycheh, Habtamu M; Seong, Joon-Kyung; Shin, Jeong-Hyeon; Na, Duk L; Kang, Byungkon; Seo, Sang W; Sohn, Kyung-Ah.

Front Aging Neurosci ; 10: 252, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30186151

RESUMO

Brain age estimation from anatomical features has been attracting more attention in recent years. This interest in brain age estimation is motivated by the importance of biological age prediction in health informatics, with an application to early prediction of neurocognitive disorders. It is well-known that normal brain aging follows a specific pattern, which enables researchers and practitioners to predict the age of a human's brain from its degeneration. In this paper, we model brain age predicted by cortical thickness data gathered from large cohort brain images. We collected 2,911 cognitively normal subjects (age 45-91 years) at a single medical center and acquired their brain magnetic resonance (MR) images. All images were acquired using the same scanner with the same protocol. We propose to first apply Sparse Group Lasso (SGL) for feature selection by utilizing the brain's anatomical grouping. Once the features are selected, a non-parametric non-linear regression using the Gaussian Process Regression (GPR) algorithm is applied to fit the final age prediction model. Experimental results demonstrate that the proposed method achieves the mean absolute error of 4.05 years, which is comparable with or superior to several recent methods. Our method can also be a critical tool for clinicians to differentiate patients with neurodegenerative brain disease by extracting a cortical thinning pattern associated with normal aging.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA