Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 25
1.
Article En | MEDLINE | ID: mdl-38083327

A preliminary analysis was conducted on data acquired from RNA sequencing and SomaScan platforms, for the classification of patients with Inflammation of Unknown Origin. To this end, a multimodal data integration approach was designed, by combining the two platforms, in order to assess the potentiality of learning estimators, using the differentially expressed features from the independent profiling experiments of both platforms. The classification framing was the differentiation of Inflammation of Unknown Origin patients against a multitude of Systemic Autoinflammatory disease patients. Separate false discovery rate analyses were performed on each dataset to extract statistically significant features between the two designated sample groups. Genomic analysis managed higher overall classification metrics compared to proteomic analysis, averaging an ~19% increase overall metrics and classifiers, with a ~0.07% increase in standard error. The multimodal data integration approach achieved similar results to the individual platforms' analyses. More specifically, it managed the same classification accuracy, sensitivity, and specificity scores as the best individual analysis, with the simple Logistic Regression estimator.Clinical Relevance- This study highlights the advantage of exploiting RNA sequencing data to identify potential Inflammation of Unknown Origin disease specific biomarkers, even against other Systemic Autoinflammatory diseases. These findings are further emphasized given the non-apparent clinical discrepancy between Inflammation of Unknown Origin and other Systemic Autoinflammatory diseases.


Hereditary Autoinflammatory Diseases , Proteomics , Humans , Proteomics/methods , RNA-Seq , Genomics/methods , Sequence Analysis, RNA/methods , Syndrome
2.
Psychooncology ; 32(11): 1762-1770, 2023 11.
Article En | MEDLINE | ID: mdl-37830776

OBJECTIVE: This study aimed to describe distinct trajectories of anxiety/depression symptoms and overall health status/quality of life over a period of 18 months following a breast cancer diagnosis, and identify the medical, socio-demographic, lifestyle, and psychological factors that predict these trajectories. METHODS: 474 females (mean age = 55.79 years) were enrolled in the first weeks after surgery or biopsy. Data from seven assessment points over 18 months, at 3-month intervals, were used. The two outcomes were assessed at all points. Potential predictors were assessed at baseline and the first follow-up. Machine-Learning techniques were used to detect latent patterns of change and identify the most important predictors. RESULTS: Five trajectories were identified for each outcome: stably high, high with fluctuations, recovery, deteriorating/delayed response, and stably poor well-being (chronic distress). Psychological factors (i.e., negative affect, coping, sense of control, social support), age, and a few medical variables (e.g., symptoms, immune-related inflammation) predicted patients' participation in the delayed response and the chronic distress trajectories versus all other trajectories. CONCLUSIONS: There is a strong possibility that resilience does not always reflect a stable response pattern, as there might be some interim fluctuations. The use of machine-learning techniques provides a unique opportunity for the identification of illness trajectories and a shortlist of major bio/behavioral predictors. This will facilitate the development of early interventions to prevent a significant deterioration in patient well-being.


Breast Neoplasms , Female , Humans , Middle Aged , Breast Neoplasms/psychology , Quality of Life/psychology , Adaptation, Psychological , Depression/psychology , Anxiety/psychology
3.
J Med Internet Res ; 25: e43838, 2023 06 12.
Article En | MEDLINE | ID: mdl-37307043

BACKGROUND: Health professionals are often faced with the need to identify women at risk of manifesting poor psychological resilience following the diagnosis and treatment of breast cancer. Machine learning algorithms are increasingly used to support clinical decision support (CDS) tools in helping health professionals identify women who are at risk of adverse well-being outcomes and plan customized psychological interventions for women at risk. Clinical flexibility, cross-validated performance accuracy, and model explainability permitting person-specific identification of risk factors are highly desirable features of such tools. OBJECTIVE: This study aimed to develop and cross-validate machine learning models designed to identify breast cancer survivors at risk of poor overall mental health and global quality of life and identify potential targets of personalized psychological interventions according to an extensive set of clinical recommendations. METHODS: A set of 12 alternative models was developed to improve the clinical flexibility of the CDS tool. All models were validated using longitudinal data from a prospective, multicenter clinical pilot at 5 major oncology centers in 4 countries (Italy, Finland, Israel, and Portugal; the Predicting Effective Adaptation to Breast Cancer to Help Women to BOUNCE Back [BOUNCE] project). A total of 706 patients with highly treatable breast cancer were enrolled shortly after diagnosis and before the onset of oncological treatments and were followed up for 18 months. An extensive set of demographic, lifestyle, clinical, psychological, and biological variables measured within 3 months after enrollment served as predictors. Rigorous feature selection isolated key psychological resilience outcomes that could be incorporated into future clinical practice. RESULTS: Balanced random forest classifiers were successful at predicting well-being outcomes, with accuracies ranging between 78% and 82% (for 12-month end points after diagnosis) and between 74% and 83% (for 18-month end points after diagnosis). Explainability and interpretability analyses built on the best-performing models were used to identify potentially modifiable psychological and lifestyle characteristics that, if addressed systematically in the context of personalized psychological interventions, would be most likely to promote resilience for a given patient. CONCLUSIONS: Our results highlight the clinical utility of the BOUNCE modeling approach by focusing on resilience predictors that can be readily available to practicing clinicians at major oncology centers. The BOUNCE CDS tool paves the way for personalized risk assessment methods to identify patients at high risk of adverse well-being outcomes and direct valuable resources toward those most in need of specialized psychological interventions.


Breast Neoplasms , Decision Support Systems, Clinical , Resilience, Psychological , Humans , Female , Prospective Studies , Quality of Life , Risk Assessment , Machine Learning
4.
Sci Rep ; 13(1): 7059, 2023 04 29.
Article En | MEDLINE | ID: mdl-37120428

Identifying individual patient characteristics that contribute to long-term mental health deterioration following diagnosis of breast cancer (BC) is critical in clinical practice. The present study employed a supervised machine learning pipeline to address this issue in a subset of data from a prospective, multinational cohort of women diagnosed with stage I-III BC with a curative treatment intention. Patients were classified as displaying stable HADS scores (Stable Group; n = 328) or reporting a significant increase in symptomatology between BC diagnosis and 12 months later (Deteriorated Group; n = 50). Sociodemographic, life-style, psychosocial, and medical variables collected on the first visit to their oncologist and three months later served as potential predictors of patient risk stratification. The flexible and comprehensive machine learning (ML) pipeline used entailed feature selection, model training, validation and testing. Model-agnostic analyses aided interpretation of model results at the variable- and patient-level. The two groups were discriminated with a high degree of accuracy (Area Under the Curve = 0.864) and a fair balance of sensitivity (0.85) and specificity (0.87). Both psychological (negative affect, certain coping with cancer reactions, lack of sense of control/positive expectations, and difficulties in regulating negative emotions) and biological variables (baseline percentage of neutrophils, thrombocyte count) emerged as important predictors of mental health deterioration in the long run. Personalized break-down profiles revealed the relative impact of specific variables toward successful model predictions for each patient. Identifying key risk factors for mental health deterioration is an essential first step toward prevention. Supervised ML models may guide clinical recommendations toward successful illness adaptation.


Breast Neoplasms , Mental Health , Humans , Female , Prospective Studies , Breast Neoplasms/diagnosis , Breast Neoplasms/psychology , Algorithms , Adaptation, Psychological
5.
Article En | MEDLINE | ID: mdl-36085801

Being diagnosed with breast cancer (BC) can be a traumatic experience for patients who may experience symptoms of depression. In order to facilitate the prevention of such symptoms, it is crucial to understand how and why depressive symptoms emerge and evolve for each individual, from diagnosis through treatment and recovery. In the present work, data from a multicentric study of 706 BC patients followed for 12 months are analyzed. First, a trajectory-based unsupervised clustering based on K-means is performed to capture the dynamic patterns of change in patients' depressive symptoms after BC diagnosis and to identify distinct trajectory clusters. Then a supervised learning approach was employed to build a classification model of depression progression and to identify potential predictors. Patients were clustered into 4 groups: stable low, stable high, improving, and worsening depressive symptoms. In a nested cross-validation pipeline, the performance of the Support Vector Machine model for discriminating between "good" and "poor" progression was 0.78±0.05 in terms of AUC. Several psychological variables emerged as highly predictive of the evolution of depressive symptoms with the most important ones being negative affectivity and anxious preoccupation. Clinical Relevance-The findings of the present study may help clinicians tailor individualized psychological interventions aiming at alleviating the burden of these symptoms in women with breast cancer and improving their overall well-being.


Breast Neoplasms , Breast Neoplasms/complications , Breast Neoplasms/diagnosis , Cluster Analysis , Depression/diagnosis , Depression/etiology , Female , Humans , Longitudinal Studies , Support Vector Machine
6.
Article En | MEDLINE | ID: mdl-36086666

A meta-analysis study was conducted to compare high-throughput technologies in the classification of Adult-Onset Still's Disease patients, using differentially expressed genes from independent profiling experiments. We exploited two publicly available datasets from the Gene Expression Omnibus and performed a separate differential expression analysis on each dataset to extract statistically important genes. We then mapped the genes of the two datasets and subsequently we employed well-established machine learning algorithms to evaluate the denoted genes as candidate biomarkers. Using next-generation sequencing data, we managed to achieve the maximum (100%) classification accuracy, sensitivity and specificity with the Gradient Boosting and the Random Forest classifiers, compared to the 83% of the DNA microarray data. Clinical Relevance- When biomarkers derived from one study are applied to the data of another, in many cases the results may diverge significantly. Here we establish that in cross-profiling meta-analysis approaches based on differential expression analysis, next-generation sequencing data provide more accurate results than microarray experiments in the classification of Adult-Onset Still's Disease patients.


Gene Expression Profiling , Still's Disease, Adult-Onset , Biomarkers , Gene Expression Profiling/methods , Humans , Machine Learning , Oligonucleotide Array Sequence Analysis/methods , Still's Disease, Adult-Onset/diagnosis , Still's Disease, Adult-Onset/genetics
7.
Comput Biol Med ; 141: 105176, 2022 02.
Article En | MEDLINE | ID: mdl-35007991

The coronavirus disease 2019 (COVID-19) which is caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) is consistently causing profound wounds in the global healthcare system due to its increased transmissibility. Currently, there is an urgent unmet need to identify the underlying dynamic associations among COVID-19 patients and distinguish patient subgroups with common clinical profiles towards the development of robust classifiers for ICU admission and mortality. To address this need, we propose a four step pipeline which: (i) enhances the quality of multiple timeseries clinical data through an automated data curation workflow, (ii) deploys Dynamic Bayesian Networks (DBNs) for the detection of features with increased connectivity based on dynamic association analysis across multiple points, (iii) utilizes Self Organizing Maps (SOMs) and trajectory analysis for the early identification of COVID-19 patients with common clinical profiles, and (iv) trains robust multiple additive regression trees (MART) for ICU admission and mortality classification based on the extracted homogeneous clusters, to identify risk factors and biomarkers for disease progression. The contribution of the extracted clusters and the dynamically associated clinical data improved the classification performance for ICU admission to sensitivity 0.83 and specificity 0.83, and for mortality to sensitivity 0.74 and specificity 0.76. Additional information was included to enhance the performance of the classifiers yielding an increase by 4% in sensitivity and specificity for mortality. According to the risk factor analysis, the number of lymphocytes, SatO2, PO2/FiO2, and O2 supply type were highlighted as risk factors for ICU admission and the percentage of neutrophils and lymphocytes, PO2/FiO2, LDH, and ALP for mortality, among others. To our knowledge, this is the first study that combines dynamic modeling with clustering analysis to identify homogeneous groups of COVID-19 patients towards the development of robust classifiers for ICU admission and mortality.


COVID-19 , Bayes Theorem , Hospitalization , Humans , Intensive Care Units , Retrospective Studies , SARS-CoV-2
8.
Comput Struct Biotechnol J ; 20: 471-484, 2022.
Article En | MEDLINE | ID: mdl-35070169

For many decades, the clinical unmet needs of primary Sjögren's Syndrome (pSS) have been left unresolved due to the rareness of the disease and the complexity of the underlying pathogenic mechanisms, including the pSS-associated lymphomagenesis process. Here, we present the HarmonicSS cloud-computing exemplar which offers beyond the state-of-the-art data analytics services to address the pSS clinical unmet needs, including the development of lymphoma classification models and the identification of biomarkers for lymphomagenesis. The users of the platform have been able to successfully interlink, curate, and harmonize 21 regional, national, and international European cohorts of 7,551 pSS patients with respect to the ethical and legal issues for data sharing. Federated AI algorithms were trained across the harmonized databases, with reduced execution time complexity, yielding robust lymphoma classification models with 85% accuracy, 81.25% sensitivity, 85.4% specificity along with 5 biomarkers for lymphoma development. To our knowledge, this is the first GDPR compliant platform that provides federated AI services to address the pSS clinical unmet needs.

9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 1753-1756, 2021 11.
Article En | MEDLINE | ID: mdl-34891626

Breast cancer diagnosis has been associated with poor mental health, with significant impairment of quality of life. In order to ensure support for successful adaptation to this illness, it is of paramount importance to identify the most prominent factors affecting well-being that allow for accurate prediction of mental health status across time. Here we exploit a rich set of clinical, psychological, socio-demographic and lifestyle data from a large multicentre study of patients recently diagnosed with breast cancer, in order to classify patients based on their mental health status and further identify potential predictors of such status. For this purpose, a supervised learning pipeline using cross-sectional data was implemented for the formulation of a classification scheme of mental health status 6 months after diagnosis. Model performance in terms of AUC ranged from 0.81± 0.04 to 0.90± 0.03. Several psychological variables, including initial levels of anxiety and depression, emerged as highly predictive of short-term mental health status of women diagnosed with breast cancer.


Breast Neoplasms , Mental Health , Breast Neoplasms/diagnosis , Breast Neoplasms/psychology , Cross-Sectional Studies , Depression/diagnosis , Female , Humans , Quality of Life
10.
Comput Struct Biotechnol J ; 19: 5546-5555, 2021.
Article En | MEDLINE | ID: mdl-34712399

Artificial Intelligence (AI) has recently altered the landscape of cancer research and medical oncology using traditional Machine Learning (ML) algorithms and cutting-edge Deep Learning (DL) architectures. In this review article we focus on the ML aspect of AI applications in cancer research and present the most indicative studies with respect to the ML algorithms and data used. The PubMed and dblp databases were considered to obtain the most relevant research works of the last five years. Based on a comparison of the proposed studies and their research clinical outcomes concerning the medical ML application in cancer research, three main clinical scenarios were identified. We give an overview of the well-known DL and Reinforcement Learning (RL) methodologies, as well as their application in clinical practice, and we briefly discuss Systems Biology in cancer research. We also provide a thorough examination of the clinical scenarios with respect to disease diagnosis, patient classification and cancer prognosis and survival. The most relevant studies identified in the preceding year are presented along with their primary findings. Furthermore, we examine the effective implementation and the main points that need to be addressed in the direction of robustness, explainability and transparency of predictive models. Finally, we summarize the most recent advances in the field of AI/ML applications in cancer research and medical oncology, as well as some of the challenges and open issues that need to be addressed before data-driven models can be implemented in healthcare systems to assist physicians in their daily practice.

11.
Comput Biol Med ; 131: 104266, 2021 04.
Article En | MEDLINE | ID: mdl-33607379

Displaying resilience following a diagnosis of breast cancer is crucial for successful adaptation to illness, well-being, and health outcomes. Several theoretical and computational models have been proposed toward understanding the complex process of illness adaptation, involving a large variety of patient sociodemographic, lifestyle, medical, and psychological characteristics. To date, conventional multivariate statistical methods have been used extensively to model resilience. In the present work we describe a computational pipeline designed to identify the most prominent predictors of mental health outcomes following breast cancer diagnosis. A machine learning framework was developed and tested on the baseline data (recorded immediately post diagnosis) from an ongoing prospective, multinational study. This fully annotated dataset includes socio-demographic, lifestyle, medical and self-reported psychological characteristics of women recently diagnosed with breast cancer (N = 609). Nine different feature selection and cross-validated classification schemes were compared on their performance in classifying patients into low vs high depression symptom severity. Best-performing approaches involved a meta-estimator combined with a Support Vector Machines (SVMs) classification algorithm, exhibiting balanced accuracy of 0.825, and a fair balance between sensitivity (90%) and specificity (74%). These models consistently identified a set of psychological traits (optimism, perceived ability to cope with trauma, resilience as trait, ability to comprehend the illness), and subjective perceptions of personal functionality (physical, social, cognitive) as key factors accounting for concurrent depression symptoms. A comprehensive supervised learning pipeline is proposed for the identification of predictors of depression symptoms which could severely impede adaptation to illness.


Breast Neoplasms , Breast Neoplasms/diagnosis , Demography , Female , Humans , Life Style , Machine Learning , Outcome Assessment, Health Care , Prospective Studies , Self Report
12.
Diagnostics (Basel) ; 12(1)2021 Dec 28.
Article En | MEDLINE | ID: mdl-35054223

BACKGROUND: Although several studies have been launched towards the prediction of risk factors for mortality and admission in the intensive care unit (ICU) in COVID-19, none of them focuses on the development of explainable AI models to define an ICU scoring index using dynamically associated biological markers. METHODS: We propose a multimodal approach which combines explainable AI models with dynamic modeling methods to shed light into the clinical features of COVID-19. Dynamic Bayesian networks were used to seek associations among cytokines across four time intervals after hospitalization. Explainable gradient boosting trees were trained to predict the risk for ICU admission and mortality towards the development of an ICU scoring index. RESULTS: Our results highlight LDH, IL-6, IL-8, Cr, number of monocytes, lymphocyte count, TNF as risk predictors for ICU admission and survival along with LDH, age, CRP, Cr, WBC, lymphocyte count for mortality in the ICU, with prediction accuracy 0.79 and 0.81, respectively. These risk factors were combined with dynamically associated biological markers to develop an ICU scoring index with accuracy 0.9. CONCLUSIONS: to our knowledge, this is the first multimodal and explainable AI model which quantifies the risk of intensive care with accuracy up to 0.9 across multiple timepoints.

13.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 5544-5547, 2020 07.
Article En | MEDLINE | ID: mdl-33019234

In this study, we propose a dynamic Bayesian network (DBN)-based approach to behavioral modelling of community dwelling older adults at risk for falls during the daily sessions of a hologram-enabled vestibular rehabilitation therapy programme. The component of human behavior being modelled is the level of frustration experienced by the user at each exercise, as it is assessed by the NASA Task Load Index. Herein, we present the topology of the DBN and test its inference performance on real-patient data.Clinical Relevance- Precise behavioral modelling will provide an indicator for tailoring the rehabilitation programme to each individual's personal psychological needs.


Augmented Reality , Postural Balance , Accidental Falls/prevention & control , Aged , Bayes Theorem , Humans , Physical Therapy Modalities
14.
Comput Biol Med ; 116: 103577, 2020 01.
Article En | MEDLINE | ID: mdl-32001012

Genomic profiling of cancer studies has generated comprehensive gene expression patterns for diverse phenotypes. Computational methods which employ transcriptomics datasets have been proposed to model gene expression data. Dynamic Bayesian Networks (DBNs) have been used for modeling time series datasets and for the inference of regulatory networks. Furthermore, cancer classification through DBN-based approaches could reveal the importance of exploiting knowledge from statistically significant genes and key regulatory molecules. Although microarray datasets have been employed extensively by several classification methods for decision making, the use of new knowledge from the pathway level has not been addressed adequately in the literature in terms of DBNs for cancer classification. In the present study, we identify the genes that act as regulators and mediate the activity of transcription factors that have been found in all promoters of our differentially expressed gene sets. These features serve as potential priors for distinguishing tumor from normal samples using a DBN-based classification approach. We employed three microarray datasets from the Gene Expression Omnibus (GEO) public functional repository and performed differential expression analysis. Promoter and pathway analysis of the identified genes revealed the key regulators which influence the transcription mechanisms of these genes. We applied the DBN algorithm on selected genes and identified the features that can accurately classify the samples into tumors and controls. Both accuracy and Area Under the Curve (AUC) were high for the gene sets comprising of the differentially expressed genes along with their master regulators (accuracy: 70.8%-98.5%; AUC: 0.562-0.985).


Gene Regulatory Networks , Neoplasms , Algorithms , Bayes Theorem , Computational Biology , Gene Expression Profiling , Gene Regulatory Networks/genetics , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis
15.
IEEE Open J Eng Med Biol ; 1: 83-90, 2020.
Article En | MEDLINE | ID: mdl-35402941

Goal: To present a framework for data sharing, curation, harmonization and federated data analytics to solve open issues in healthcare, such as, the development of robust disease prediction models. Methods: Data curation is applied to remove data inconsistencies. Lexical and semantic matching methods are used to align the structure of the heterogeneous, curated cohort data along with incremental learning algorithms including class imbalance handling and hyperparameter optimization to enable the development of disease prediction models. Results: The applicability of the framework is demonstrated in a case study of primary Sjögren's Syndrome, yielding harmonized data with increased quality and more than 85% agreement, along with lymphoma prediction models with more than 80% sensitivity and specificity. Conclusions: The framework provides data quality, harmonization and analytics workflows that can enhance the statistical power of heterogeneous clinical data and enables the development of robust models for disease prediction.

16.
IEEE Open J Eng Med Biol ; 1: 49-56, 2020.
Article En | MEDLINE | ID: mdl-35402956

Lymphoma development constitutes one of the most serious clinico-pathological manifestations of patients with Sjögren's Syndrome (SS). Over the last decades the risk for lymphomagenesis in SS patients has been studied aiming to identify novel biomarkers and risk factors predicting lymphoma development in this patient population. Objective: The current study aims to explore whether genetic susceptibility profiles of SS patients along with known clinical, serological and histological risk factors enhance the accuracy of predicting lymphoma development in this patient population. Methods: The potential predicting role of both genetic variants, clinical and laboratory risk factors were investigated through a Machine Learning-based (ML) framework which encapsulates ensemble classifiers. Results: Ensemble methods empower the classification accuracy with approaches which are sensitive to minor perturbations in the training phase. The evaluation of the proposed methodology based on a 10-fold stratified cross validation procedure yielded considerable results in terms of balanced accuracy (GB: 0.7780 ± 0.1514, RF Gini: 0.7626 ± 0.1787, RF Entropy: 0.7590 ± 0.1837). Conclusions: The initial clinical, serological, histological and genetic findings at an early diagnosis have been exploited in an attempt to establish predictive tools in clinical practice and further enhance our understanding towards lymphoma development in SS.

17.
Clin Exp Rheumatol ; 37 Suppl 118(3): 90-96, 2019.
Article En | MEDLINE | ID: mdl-31287405

OBJECTIVES: To address the need for automatically assessing the quality of clinical data in terms of accuracy, relevance, conformity, and completeness, through the concise development and application of an automated method which is able to automatically detect problematic fields and match clinical terms under a specific domain. METHODS: The proposed methodology involves the automated construction of three diagnostic reports that summarise valuable information regarding the types and ranges of each term in the dataset, along with the detected outliers, inconsistencies, and missing values, followed by a set of clinically relevant terms based on a reference model which serves as a set of terms which describes the domain knowledge of a disease of interest. RESULTS: A case study was conducted using anonymised data from 250 patients who were diagnosed with primary Sjögren's syndrome (pSS), yielding reliable outcomes that were highlighted for clinical evaluation. Our method was able to successfully identify 28 features with detected outliers, and unknown data types, as well as, identify outliers, missing values, similar terms, and inconsistencies within the dataset. The data standardisation method was able to match 76 out of 85 (89.41%) pSS-related terms according to a standard pSS reference model which has been introduced by the clinicians. CONCLUSIONS: Our results confirm the clinical value of the data curation method towards the improvement of the dataset quality through the precise identification of outliers, missing values, inconsistencies, and similar terms, as well as, through the automated detection of pSS-related relevant terms towards data standardisation.


Data Curation , Sjogren's Syndrome , Data Accuracy , Humans
18.
Comput Biol Med ; 107: 270-283, 2019 04.
Article En | MEDLINE | ID: mdl-30878889

Data quality assessment has gained attention in the recent years since more and more companies and medical centers are highlighting the importance of an automated framework to effectively manage the quality of their big data. Data cleaning, also known as data curation, lies in the heart of the data quality assessment and is a key aspect prior to the development of any data analytics services. In this work, we present the objectives, functionalities and methodological advances of an automated framework for data curation from a medical perspective. The steps towards the development of a system for data quality assessment are first described along with multidisciplinary data quality measures. A three-layer architecture which realizes these steps is then presented. Emphasis is given on the detection and tracking of inconsistencies, missing values, outliers, and similarities, as well as, on data standardization to finally enable data harmonization. A case study is conducted in order to demonstrate the applicability and reliability of the proposed framework on two well-established cohorts with clinical data related to the primary Sjögren's Syndrome (pSS). Our results confirm the validity of the proposed framework towards the automated and fast identification of outliers, inconsistencies, and highly-correlated and duplicated terms, as well as, the successful matching of more than 85% of the pSS-related medical terms in both cohorts, yielding more accurate, relevant, and consistent clinical data.


Data Accuracy , Data Curation/methods , Electronic Health Records , Big Data , Female , Humans , Male , Sjogren's Syndrome
19.
Annu Int Conf IEEE Eng Med Biol Soc ; 2017: 3876-3879, 2017 Jul.
Article En | MEDLINE | ID: mdl-29060744

We propose a meta-analysis scheme for identifying differentially expressed genes in Oral Squamous Cell Carcinoma (OSCC) from different microarray studies. We detect a subset of relevant features and further classify samples under two experimental conditions (i.e healthy and cancer samples) for better patient stratification. A well-established meta-analysis method is adopted and gene expression data sets are derived from a public functional genomics data repository. Our primary aim is the accurate identification of up- and down-regulated genes in order to extract valuable biological information concerning the changes in expression between healthy and cancer samples. According to our results and the extracted informative gene list, a high classification accuracy of healthy and OSCC tumors is achieved with as few genes as possible. Furthermore, the proposed scheme implies that the combination of datasets from different origins may reduce the estimated percentage of false predictions, while the power of gene identification and disease classification is increased.


Mouth Neoplasms , Carcinoma, Squamous Cell , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Oligonucleotide Array Sequence Analysis
20.
IEEE J Biomed Health Inform ; 21(2): 320-327, 2017 03.
Article En | MEDLINE | ID: mdl-28114044

Oral squamous cell carcinoma has been characterized as a complex disease which involves dynamic genomic changes at the molecular level. These changes indicate the worth to explore the interactions of the molecules and especially of differentially expressed genes that contribute to cancer progression. Moreover, based on this knowledge the identification of differentially expressed genes and related molecular pathways is of great importance. In the present study, we exploit differentially expressed genes in order to further perform pathway enrichment analysis. According to our results we found significant pathways in which the disease associated genes have been identified as strongly enriched. Furthermore, based on the results of the pathway enrichment analysis we propose a methodology for predicting oral cancer recurrence using dynamic Bayesian networks. The methodology takes into consideration time series gene expression data in order to predict a disease recurrence. Subsequently, we are able to conjecture about the causal interactions between genes in consecutive time intervals. Concerning the performance of the predictive models, the overall accuracy of the algorithm is 81.8% and the area under the ROC curve 89.2% regarding the knowledge from the overrepresented pre-NOTCH Expression and processing pathway.


Computational Biology/methods , Models, Statistical , Mouth Neoplasms/genetics , Neoplasm Recurrence, Local/genetics , Signal Transduction/genetics , Algorithms , Carcinoma, Squamous Cell/epidemiology , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/metabolism , Gene Expression Profiling , Humans , Mouth Neoplasms/epidemiology , Mouth Neoplasms/metabolism , Neoplasm Recurrence, Local/epidemiology , Neoplasm Recurrence, Local/metabolism , ROC Curve , Transcriptome/genetics
...