Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Front Aging Neurosci ; 16: 1356745, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38813529

RESUMO

Objectives: Accurately predicting when patients with mild cognitive impairment (MCI) will progress to dementia is a formidable challenge. This work aims to develop a predictive deep learning model to accurately predict future cognitive decline and magnetic resonance imaging (MRI) marker changes over time at the individual level for patients with MCI. Methods: We recruited 657 amnestic patients with MCI from the Samsung Medical Center who underwent cognitive tests, brain MRI scans, and amyloid-ß (Aß) positron emission tomography (PET) scans. We devised a novel deep learning architecture by leveraging an attention mechanism in a recurrent neural network. We trained a predictive model by inputting age, gender, education, apolipoprotein E genotype, neuropsychological test scores, and brain MRI and amyloid PET features. Cognitive outcomes and MRI features of an MCI subject were predicted using the proposed network. Results: The proposed predictive model demonstrated good prediction performance (AUC = 0.814 ± 0.035) in five-fold cross-validation, along with reliable prediction in cognitive decline and MRI markers over time. Faster cognitive decline and brain atrophy in larger regions were forecasted in patients with Aß (+) than with Aß (-). Conclusion: The proposed method provides effective and accurate means for predicting the progression of individuals within a specific period. This model could assist clinicians in identifying subjects at a higher risk of rapid cognitive decline by predicting future cognitive decline and MRI marker changes over time for patients with MCI. Future studies should validate and refine the proposed predictive model further to improve clinical decision-making.

2.
Genomics ; 116(3): 110834, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38527595

RESUMO

The edgeR (Robust) is a popular approach for identifying differentially expressed genes (DEGs) from RNA-Seq profiles. However, it shows weak performance against gene-specific outliers and is unable to handle missing observations. To address these issues, we proposed a pre-processing approach of RNA-Seq count data by combining the iLOO-based outlier detection and random forest-based missing imputation approach for boosting the performance of edgeR (Robust). Both simulation and real RNA-Seq count data analysis results showed that the proposed edgeR (Robust) outperformed than the conventional edgeR (Robust). To investigate the effectiveness of identified DEGs for diagnosis, and therapies of ovarian cancer (OC), we selected top-ranked 12 DEGs (IL6, XCL1, CXCL8, C1QC, C1QB, SNAI2, TYROBP, COL1A2, SNAP25, NTS, CXCL2, and AGT) and suggested hub-DEGs guided top-ranked 10 candidate drug-molecules for the treatment against OC. Hence, our proposed procedure might be an effective computational tool for exploring potential DEGs from RNA-Seq profiles for diagnosis and therapies of any disease.


Assuntos
Biomarcadores Tumorais , Neoplasias Ovarianas , RNA-Seq , Humanos , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/terapia , Feminino , Biomarcadores Tumorais/genética , Software , Transcriptoma , Perfilação da Expressão Gênica
3.
Technol Health Care ; 32(1): 75-87, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-37248924

RESUMO

BACKGROUND: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose. OBJECTIVE: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles. METHOD: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison. RESULTS: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method. CONCLUSION: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.


Assuntos
Aprendizado Profundo , Humanos , Análise por Conglomerados , Árvores de Decisões
4.
Int J Med Inform ; 178: 105191, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37657203

RESUMO

BACKGROUND: Mortality risk prediction is to predict whether a patient has the risk of death based on relevant diagnosis and treatment data. How to accurately predict patient mortality risk based on electronic health records (EHR) is currently a hot research topic in the healthcare field. In actual medical datasets, there are often many missing values, which can seriously interfere with the effect of model prediction. However, when missing values are interpolated, most existing methods do not take into account the fidelity or confidence of the interpolated values. Misestimation of missing variables can lead to modeling difficulties and performance degradation, while the reliability of the model may be compromised in clinical environments. MATERIALS AND METHODS: We propose a model based on Missing Value Imputation and Reliability Assessment for mortality risk prediction (MVIRA). The model uses a combination of variational autoencoder and recurrent neural networks to complete the interpolation of missing values and enhance the characterization ability of EHR data, thus improving the performance of mortality risk prediction. In addition, we also introduce the Monte Carlo Dropout method to calculate the uncertainty of the model prediction results and thus achieve the reliability assessment of the model. RESULTS: We perform performance validation of the model on the public datasets MIMIC-III and MIMIC-IV. The proposed model showed improved performance compared with competitive models in terms of overall specialties. CONCLUSION: The proposed model can effectively improve the accuracy of mortality risk prediction, and can help medical institutions assess the condition of patients.

5.
Sensors (Basel) ; 23(10)2023 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-37430495

RESUMO

With an increasing number of offshore wind farms, monitoring and evaluating the effects of the wind turbines on the marine environment have become important tasks. Here we conducted a feasibility study with the focus on monitoring these effects by utilizing different machine learning methods. A multi-source dataset for a study site in the North Sea is created by combining satellite data, local in situ data and a hydrodynamic model. The machine learning algorithm DTWkNN, which is based on dynamic time warping and k-nearest neighbor, is used for multivariate time series data imputation. Subsequently, unsupervised anomaly detection is performed to identify possible inferences in the dynamic and interdepending marine environment around the offshore wind farm. The anomaly results are analyzed in terms of location, density and temporal variability, granting access to information and building a basis for explanation. Temporal detection of anomalies with COPOD is found to be a suitable method. Actionable insights are the direction and magnitude of potential effects of the wind farm on the marine environment, depending on the wind direction. This study works towards a digital twin of offshore wind farms and provides a set of methods based on machine learning to monitor and evaluate offshore wind farm effects, supporting stakeholders with information for decision making on future maritime energy infrastructures.

6.
J Biomed Inform ; 144: 104440, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37429511

RESUMO

The imputation of missing values in multivariate time series (MTS) data is critical in ensuring data quality and producing reliable data-driven predictive models. Apart from many statistical approaches, a few recent studies have proposed state-of-the-art deep learning methods to impute missing values in MTS data. However, the evaluation of these deep methods is limited to one or two data sets, low missing rates, and completely random missing value types. This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Our extensive analysis reveals that no single imputation method outperforms the others on all five data sets. The imputation performance depends on data types, individual variable statistics, missing value rates, and types. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods. Although computationally expensive, deep learning methods are practical given the current availability of high-performance computing resources, especially when data quality and sample size are of paramount importance in healthcare informatics. Our findings highlight the importance of data-centric selection of imputation methods to optimize data-driven predictive models.


Assuntos
Benchmarking , Projetos de Pesquisa , Fatores de Tempo , Estudos Transversais , Inquéritos e Questionários
7.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37419612

RESUMO

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.


Assuntos
Algoritmos , Genômica , Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Espectrometria de Massas/métodos
8.
Drug Discov Today ; 28(9): 103661, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37301250

RESUMO

In data-processing pipelines, upstream steps can influence downstream processes because of their sequential nature. Among these data-processing steps, batch effect (BE) correction (BEC) and missing value imputation (MVI) are crucial for ensuring data suitability for advanced modeling and reducing the likelihood of false discoveries. Although BEC-MVI interactions are not well studied, they are ultimately interdependent. Batch sensitization can improve the quality of MVI. Conversely, accounting for missingness also improves proper BE estimation in BEC. Here, we discuss how BEC and MVI are interconnected and interdependent. We show how batch sensitization can improve any MVI and bring attention to the idea of BE-associated missing values (BEAMs). Finally, we discuss how batch-class imbalance problems can be mitigated by borrowing ideas from machine learning.


Assuntos
Processamento Eletrônico de Dados
9.
Nutrients ; 15(6)2023 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-36986073

RESUMO

Recommendations to reduce intake of free sugars are included in some national dietary guidelines. However, as the content of free sugars is absent from most of the food composition tables, the adherence to such recommendations is hard to monitor. We developed a novel method to estimate the free sugar content in the Philippines food composition table, based on a data-driven algorithm that enabled automated annotation. We then used these estimates to analyze the free sugar intake of 66,016 Filipinos aged 4 years and over. The average free sugar consumption was 19 g/day, accounting for an average of 3% of the total caloric intake. Snacks and breakfast were the meals with the highest content of free sugars. Intake of free sugars, in grams per day and as % of energy, was positively associated with wealth status. The same pattern was observed for the consumption of sugar-sweetened beverages.


Assuntos
Dieta , Açúcares , Humanos , Bebidas/análise , Inquéritos Nutricionais , Ingestão de Energia , Refeições
10.
Ann Oper Res ; 325(1): 557-588, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35068645

RESUMO

The importance of young athletes in the field of professional cycling has sky-rocketed during the past years. Nevertheless, the early talent identification of these riders largely remains a subjective assessment. Therefore, an analytical system which automatically detects talented riders based on their freely available youth results should be installed. However, such a system cannot be copied directly from related fields, as large distinctions are observed between cycling and other sports. The aim of this paper is to develop such a data analytical system, which leverages the unique features of each race and thereby focusses on feature engineering, data quality, and visualization. To facilitate the deployment of prediction algorithms in situations without complete cases, we propose an adaptation to the k-nearest neighbours imputation algorithm which uses expert knowledge. Overall, our proposed method correlates strongly with eventual rider performance and can aid scouts in targeting young talents. On top of that, we introduce several model interpretation tools to give insight into which current starting professional riders are expected to perform well and why.

11.
Front Mol Biosci ; 9: 907150, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36458095

RESUMO

Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.

12.
Entropy (Basel) ; 24(12)2022 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-36554203

RESUMO

Time series data are usually characterized by having missing values, high dimensionality, and large data volume. To solve the problem of high-dimensional time series with missing values, this paper proposes an attention-based sequence-to-sequence model to imputation missing values in time series (ASSM), which is a sequence-to-sequence model based on the combination of feature learning and data computation. The model consists of two parts, encoder and decoder. The encoder part is a BIGRU recurrent neural network and incorporates a self-attentive mechanism to make the model more capable of handling long-range time series; The decoder part is a GRU recurrent neural network and incorporates a cross-attentive mechanism into associate with the encoder part. The relationship weights between the generated sequences in the decoder part and the known sequences in the encoder part are calculated to achieve the purpose of focusing on the sequences with a high degree of correlation. In this paper, we conduct comparison experiments with four evaluation metrics and six models on four real datasets. The experimental results show that the model proposed in this paper outperforms the six comparative missing value interpolation algorithms.

13.
Proteomics ; 22(23-24): e2200092, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36349819

RESUMO

Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.


Assuntos
Algoritmos , Proteômica
14.
Knowl Based Syst ; 2492022 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-36159738

RESUMO

Missing values in tabular data restrict the use and performance of machine learning, requiring the imputation of missing values. The most popular imputation algorithm is arguably multiple imputations using chains of equations (MICE), which estimates missing values from linear conditioning on observed values. This paper proposes methods to improve both the imputation accuracy of MICE and the classification accuracy of imputed data by replacing MICE's linear regressors with ensemble learning and deep neural networks (DNN). The imputation accuracy is further improved by characterizing individual samples with cluster labels (CISCL) obtained from the training data. Our extensive analyses involving six tabular data sets, up to 80% missing values, and three missing types (missing completely at random, missing at random, missing not at random) reveal that ensemble or deep learning within MICE is superior to the baseline MICE (b-MICE), both of which are consistently outperformed by CISCL. Results show that CISCL + b-MICE outperforms b-MICE for all percentages and types of missingness. Our proposed DNN-based MICE and gradient boosting MICE plus CISCL (GB-MICE-CISCL) outperform seven state-of-the-art imputation algorithms in most experimental cases. The classification accuracy of GB-MICE imputed data is further improved by our proposed GB-MICE-CISCL imputation method across all missingness percentages. Results also reveal a shortcoming of the MICE framework at high missingness (>50%) and when the missing type is not random. This paper provides a generalized approach to identifying the best imputation model for a data set with a missingness percentage and type.

15.
J Am Med Inform Assoc ; 29(11): 1949-1957, 2022 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-36040195

RESUMO

OBJECTIVE: Despite efforts to improve screening and early detection of prostate cancer (PC), no available biomarker has shown acceptable performance in patients with prostate-specific antigen (PSA) gray zones. We aimed to develop a deep learning-based prediction model with minimized parameters and missing value handling algorithms for PC and clinically significant PC (CSPC). MATERIALS AND METHODS: We retrospectively analyzed data from 18 824 prostate biopsies collected between March 2003 and December 2020 from 2 databases, resulting in 12 739 cases in the PSA gray zone of 2.0-10.0 ng/mL. Dense neural network (DNN) and extreme gradient boosting (XGBoost) models for PC and CSPC were developed with 5-fold cross-validation. The area under the curve of the receiver operating characteristic (AUROC) was compared with that of serum PSA, PSA density, free PSA (fPSA) portion, and prostate health index (PHI). RESULTS: The AUROC values in the DNN model with the imputation of missing values were 0.739 and 0.708 (PC) and 0.769 and 0.742 (CSPC) in internal and external validation, whereas those of the non-imputed dataset were 0.740 and 0.771 (PC) and 0.807 and 0.771 (CSPC), respectively. The performance of the DNN model was like that of the XGBoost model, but better than all tested clinical biomarkers for both PC and CSPC. The developed DNN model outperformed PHI, serum PSA, and percent-fPSA with or without missing value imputation. DISCUSSION: DNN models for missing value imputation can be used to predict PC and CSPC. Further validation in real-life scenarios are need to recommend for actual implementation, but the results from our study support the increasing role of deep learning analytics in the clinical setting. CONCLUSIONS: A deep learning model for PC and CSPC in PSA gray zones using minimal, routinely used clinical parameter variables and data imputation of missing values was successfully developed and validated.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Aprendizado Profundo , Neoplasias da Próstata , Biópsia/métodos , Humanos , Masculino , Próstata/patologia , Antígeno Prostático Específico , Neoplasias da Próstata/diagnóstico , Curva ROC , Estudos Retrospectivos
16.
BMC Genomics ; 23(1): 496, 2022 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35804317

RESUMO

BACKGROUND: Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primates (NHP), LFQ is challenging because the query databases for NHP are limited since the genomes of these species are not comprehensively annotated. This invariably results in limited discovery of proteins and associated Post Translational Modifications (PTMs) and a higher fraction of missing data points. While identification of fewer proteins and PTMs due to database limitations can negatively impact uncovering important and meaningful biological information, missing data also limits downstream analyses (e.g., multivariate analyses), decreases statistical power, biases statistical inference, and makes biological interpretation of the data more challenging. In this study we attempted to address both issues: first, we used the MetaMorphues proteomics search engine to counter the limits of NHP query databases and maximize the discovery of proteins and associated PTMs, and second, we evaluated different imputation methods for accurate data inference. We used a generic approach for missing data imputation analysis without distinguising the potential source of missing data (either non-assigned m/z or missing values across runs). RESULTS: Using the MetaMorpheus proteomics search engine we obtained quantitative data for 1622 proteins and 10,634 peptides including 58 different PTMs (biological, metal and artifacts) across a diverse age range of NHP brain frontal cortex. However, among the 1622 proteins identified, only 293 proteins were quantified across all samples with no missing values, emphasizing the importance of implementing an accurate and statiscaly valid imputation method to fill in missing data. In our imputation analysis we demonstrate that Single Imputation methods that borrow information from correlated proteins such as Generalized Ridge Regression (GRR), Random Forest (RF), local least squares (LLS), and a Bayesian Principal Component Analysis methods (BPCA), are able to estimate missing protein abundance values with great accuracy. CONCLUSIONS: Overall, this study offers a detailed comparative analysis of LFQ data generated in NHP and proposes strategies for improved LFQ in NHP proteomics data.


Assuntos
Algoritmos , Proteômica , Animais , Teorema de Bayes , Primatas , Proteômica/métodos , Software
17.
Quant Finance ; 22(6): 1113-1132, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35782965

RESUMO

Analysts' forecast is one of the most common and important estimators for firms' future earnings. However, it is challenging to fully utilize because of the missing values. This study applies machine learning techniques to impute missing values in individual analysts' forecasts and subsequently to predict firms' future earnings based on both imputed and observed forecasts. After imputing missing values, the forecast error is reduced by 41% compared to the mean forecast, suggesting that missing values after imputation indeed useful for earnings forecast. We analyze multiple imputation methods and show that the out-performance of matrix factorization (MF) is consistent using different evaluation measures and across firms. Finally, we propose a stochastic gradient descent based coupled matrix factorization (CMF) to augment the imputation quality of missing values with multiple datasets. CMF further reduces the error of earnings forecast by 19% compared to the MF with a single dataset.

18.
PeerJ ; 10: e13525, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35769140

RESUMO

One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely "at random" or "not at random". To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods.


Assuntos
Proteômica , Projetos de Pesquisa , Confiabilidade dos Dados
19.
Neural Netw ; 150: 422-439, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35364417

RESUMO

If left untreated, Alzheimer's disease (AD) is a leading cause of slowly progressive dementia. Therefore, it is critical to detect AD to prevent its progression. In this study, we propose a bidirectional progressive recurrent network with imputation (BiPro) that uses longitudinal data, including patient demographics and biomarkers of magnetic resonance imaging (MRI), to forecast clinical diagnoses and phenotypic measurements at multiple timepoints. To compensate for missing observations in the longitudinal data, we use an imputation module to inspect both temporal and multivariate relations associated with the mean and forward relations inherent in the time series data. To encode the imputed information, we define a modification of the long short-term memory (LSTM) cell by using a progressive module to compute the progression score of each biomarker between the given timepoint and the baseline through a negative exponential function. These features are used for the prediction task. The proposed system is an end-to-end deep recurrent network that can accomplish multiple tasks at the same time, including (1) imputing missing values, (2) forecasting phenotypic measurements, and (3) predicting the clinical status of a patient based on longitudinal data. We experimented on 1,335 participants from The Alzheimer's Disease Prediction of Longitudinal Evolution (TADPOLE) challenge cohort. The proposed method achieved a mean area under the receiver-operating characteristic curve (mAUC) of 78% for predicting the clinical status of patients, a mean absolute error (MAE) of 3.5ml for forecasting MRI biomarkers, and an MAE of 6.9ml for missing value imputation. The results confirm that our proposed model outperforms prevalent approaches, and can be used to minimize the progression of Alzheimer's disease.


Assuntos
Doença de Alzheimer , Doença de Alzheimer/diagnóstico por imagem , Biomarcadores , Previsões , Humanos , Imageamento por Ressonância Magnética/métodos
20.
Entropy (Basel) ; 24(2)2022 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-35205580

RESUMO

Handling missing values in matrix data is an important step in data analysis. To date, many methods to estimate missing values based on data pattern similarity have been proposed. Most previously proposed methods perform missing value imputation based on data trends over the entire feature space. However, individual missing values are likely to show similarity to data patterns in local feature space. In addition, most existing methods focus on single class data, while multiclass analysis is frequently required in various fields. Missing value imputation for multiclass data must consider the characteristics of each class. In this paper, we propose two methods based on closed itemsets, CIimpute and ICIimpute, to achieve missing value imputation using local feature space for multiclass matrix data. CIimpute estimates missing values using closed itemsets extracted from each class. ICIimpute is an improved method of CIimpute in which an attribute reduction process is introduced. Experimental results demonstrate that attribute reduction considerably reduces computational time and improves imputation accuracy. Furthermore, it is shown that, compared to existing methods, ICIimpute provides superior imputation accuracy but requires more computational time.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA