Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37419612

RESUMO

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.


Assuntos
Algoritmos , Genômica , Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Espectrometria de Massas/métodos
2.
Genomics ; 116(3): 110834, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38527595

RESUMO

The edgeR (Robust) is a popular approach for identifying differentially expressed genes (DEGs) from RNA-Seq profiles. However, it shows weak performance against gene-specific outliers and is unable to handle missing observations. To address these issues, we proposed a pre-processing approach of RNA-Seq count data by combining the iLOO-based outlier detection and random forest-based missing imputation approach for boosting the performance of edgeR (Robust). Both simulation and real RNA-Seq count data analysis results showed that the proposed edgeR (Robust) outperformed than the conventional edgeR (Robust). To investigate the effectiveness of identified DEGs for diagnosis, and therapies of ovarian cancer (OC), we selected top-ranked 12 DEGs (IL6, XCL1, CXCL8, C1QC, C1QB, SNAI2, TYROBP, COL1A2, SNAP25, NTS, CXCL2, and AGT) and suggested hub-DEGs guided top-ranked 10 candidate drug-molecules for the treatment against OC. Hence, our proposed procedure might be an effective computational tool for exploring potential DEGs from RNA-Seq profiles for diagnosis and therapies of any disease.


Assuntos
Biomarcadores Tumorais , Neoplasias Ovarianas , RNA-Seq , Humanos , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/terapia , Feminino , Biomarcadores Tumorais/genética , Software , Transcriptoma , Perfilação da Expressão Gênica
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34791014

RESUMO

High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.


Assuntos
Aprendizado Profundo , Genômica , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala
4.
Network ; : 1-24, 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38828665

RESUMO

The imputation of missing values in multivariate time-series data is a basic and popular data processing technology. Recently, some studies have exploited Recurrent Neural Networks (RNNs) and Generative Adversarial Networks (GANs) to impute/fill the missing values in multivariate time-series data. However, when faced with datasets with high missing rates, the imputation error of these methods increases dramatically. To this end, we propose a neural network model based on dynamic contribution and attention, denoted as ContrAttNet. ContrAttNet consists of three novel modules: feature attention module, iLSTM (imputation Long Short-Term Memory) module, and 1D-CNN (1-Dimensional Convolutional Neural Network) module. ContrAttNet exploits temporal information and spatial feature information to predict missing values, where iLSTM attenuates the memory of LSTM according to the characteristics of the missing values, to learn the contributions of different features. Moreover, the feature attention module introduces an attention mechanism based on contributions, to calculate supervised weights. Furthermore, under the influence of these supervised weights, 1D-CNN processes the time-series data by treating them as spatial features. Experimental results show that ContrAttNet outperforms other state-of-the-art models in the missing value imputation of multivariate time-series data, with average 6% MAPE and 9% MAE on the benchmark datasets.

5.
BMC Bioinformatics ; 24(1): 281, 2023 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-37434115

RESUMO

BACKGROUND: Network analysis is a powerful tool for studying gene regulation and identifying biological processes associated with gene function. However, constructing gene co-expression networks can be a challenging task, particularly when dealing with a large number of missing values. RESULTS: We introduce GeCoNet-Tool, an integrated gene co-expression network construction and analysis tool. The tool comprises two main parts: network construction and network analysis. In the network construction part, GeCoNet-Tool offers users various options for processing gene co-expression data derived from diverse technologies. The output of the tool is an edge list with the option of weights associated with each link. In network analysis part, the user can produce a table that includes several network properties such as communities, cores, and centrality measures. With GeCoNet-Tool, users can explore and gain insights into the complex interactions between genes.


Assuntos
Redes Reguladoras de Genes , Software
6.
J Biomed Inform ; 144: 104440, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37429511

RESUMO

The imputation of missing values in multivariate time series (MTS) data is critical in ensuring data quality and producing reliable data-driven predictive models. Apart from many statistical approaches, a few recent studies have proposed state-of-the-art deep learning methods to impute missing values in MTS data. However, the evaluation of these deep methods is limited to one or two data sets, low missing rates, and completely random missing value types. This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Our extensive analysis reveals that no single imputation method outperforms the others on all five data sets. The imputation performance depends on data types, individual variable statistics, missing value rates, and types. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods. Although computationally expensive, deep learning methods are practical given the current availability of high-performance computing resources, especially when data quality and sample size are of paramount importance in healthcare informatics. Our findings highlight the importance of data-centric selection of imputation methods to optimize data-driven predictive models.


Assuntos
Benchmarking , Projetos de Pesquisa , Fatores de Tempo , Estudos Transversais , Inquéritos e Questionários
7.
Sensors (Basel) ; 23(10)2023 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-37430495

RESUMO

With an increasing number of offshore wind farms, monitoring and evaluating the effects of the wind turbines on the marine environment have become important tasks. Here we conducted a feasibility study with the focus on monitoring these effects by utilizing different machine learning methods. A multi-source dataset for a study site in the North Sea is created by combining satellite data, local in situ data and a hydrodynamic model. The machine learning algorithm DTWkNN, which is based on dynamic time warping and k-nearest neighbor, is used for multivariate time series data imputation. Subsequently, unsupervised anomaly detection is performed to identify possible inferences in the dynamic and interdepending marine environment around the offshore wind farm. The anomaly results are analyzed in terms of location, density and temporal variability, granting access to information and building a basis for explanation. Temporal detection of anomalies with COPOD is found to be a suitable method. Actionable insights are the direction and magnitude of potential effects of the wind farm on the marine environment, depending on the wind direction. This study works towards a digital twin of offshore wind farms and provides a set of methods based on machine learning to monitor and evaluate offshore wind farm effects, supporting stakeholders with information for decision making on future maritime energy infrastructures.

8.
Proteomics ; 22(23-24): e2200092, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36349819

RESUMO

Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.


Assuntos
Algoritmos , Proteômica
9.
BMC Bioinformatics ; 23(1): 170, 2022 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-35534830

RESUMO

BACKGROUND: Gene co-expression networks (GCNs) can be used to determine gene regulation and attribute gene function to biological processes. Different high throughput technologies, including one and two-channel microarrays and RNA-sequencing, allow evaluating thousands of gene expression data simultaneously, but these methodologies provide results that cannot be directly compared. Thus, it is complex to analyze co-expression relations between genes, especially when there are missing values arising for experimental reasons. Networks are a helpful tool for studying gene co-expression, where nodes represent genes and edges represent co-expression of pairs of genes. RESULTS: In this paper, we establish a method for constructing a gene co-expression network for the Anopheles gambiae transcriptome from 257 unique studies obtained with different methodologies and experimental designs. We introduce the sliding threshold approach to select node pairs with high Pearson correlation coefficients. The resulting network, which we name AgGCN1.0, is robust to random removal of conditions and has similar characteristics to small-world and scale-free networks. Analysis of network sub-graphs revealed that the core is largely comprised of genes that encode components of the mitochondrial respiratory chain and the ribosome, while different communities are enriched for genes involved in distinct biological processes. CONCLUSION: Analysis of the network reveals that both the architecture of the core sub-network and the network communities are based on gene function, supporting the power of the proposed method for GCN construction. Application of network science methodology reveals that the overall network structure is driven to maximize the integration of essential cellular functions, possibly allowing the flexibility to add novel functions.


Assuntos
Redes Reguladoras de Genes , Transcriptoma , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA
10.
BMC Genomics ; 23(1): 496, 2022 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35804317

RESUMO

BACKGROUND: Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primates (NHP), LFQ is challenging because the query databases for NHP are limited since the genomes of these species are not comprehensively annotated. This invariably results in limited discovery of proteins and associated Post Translational Modifications (PTMs) and a higher fraction of missing data points. While identification of fewer proteins and PTMs due to database limitations can negatively impact uncovering important and meaningful biological information, missing data also limits downstream analyses (e.g., multivariate analyses), decreases statistical power, biases statistical inference, and makes biological interpretation of the data more challenging. In this study we attempted to address both issues: first, we used the MetaMorphues proteomics search engine to counter the limits of NHP query databases and maximize the discovery of proteins and associated PTMs, and second, we evaluated different imputation methods for accurate data inference. We used a generic approach for missing data imputation analysis without distinguising the potential source of missing data (either non-assigned m/z or missing values across runs). RESULTS: Using the MetaMorpheus proteomics search engine we obtained quantitative data for 1622 proteins and 10,634 peptides including 58 different PTMs (biological, metal and artifacts) across a diverse age range of NHP brain frontal cortex. However, among the 1622 proteins identified, only 293 proteins were quantified across all samples with no missing values, emphasizing the importance of implementing an accurate and statiscaly valid imputation method to fill in missing data. In our imputation analysis we demonstrate that Single Imputation methods that borrow information from correlated proteins such as Generalized Ridge Regression (GRR), Random Forest (RF), local least squares (LLS), and a Bayesian Principal Component Analysis methods (BPCA), are able to estimate missing protein abundance values with great accuracy. CONCLUSIONS: Overall, this study offers a detailed comparative analysis of LFQ data generated in NHP and proposes strategies for improved LFQ in NHP proteomics data.


Assuntos
Algoritmos , Proteômica , Animais , Teorema de Bayes , Primatas , Proteômica/métodos , Software
11.
Stat Med ; 41(7): 1172-1190, 2022 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-34786744

RESUMO

Confidence intervals for the mean of discrete exponential families are widely used in many applications. Since missing data are commonly encountered, the interval estimation for incomplete data is an important problem. The performances of the existing multiple imputation confidence intervals are unsatisfactory. We propose modified multiple imputation confidence intervals to improve the existing confidence intervals for the mean of the discrete exponential families with quadratic variance functions. A simulation study shows that the coverage probabilities of the modified confidence intervals are closer to the nominal level than the existing confidence intervals when the true mean is near the boundaries of the parameter space. These confidence intervals are also illustrated with real data examples.


Assuntos
Intervalos de Confiança , Simulação por Computador , Humanos , Probabilidade , Distribuições Estatísticas
12.
Sensors (Basel) ; 22(15)2022 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-35957433

RESUMO

Despite the extensive efforts, accurate traffic time series forecasting remains challenging. By taking into account the non-linear nature of traffic in-depth, we propose a novel ST-CRMF model consisting of the Compensated Residual Matrix Factorization with Spatial-Temporal regularization for graph-based traffic time series forecasting. Our model inherits the benefits of MF and regularizer optimization and further carries out the compensatory modeling of the spatial-temporal correlations through a well-designed bi-directional residual structure. Of particular concern is that MF modeling and later residual learning share and synchronize iterative updates as equal training parameters, which considerably alleviates the error propagation problem that associates with rolling forecasting. Besides, most of the existing prediction models have neglected the difficult-to-avoid issue of missing traffic data; the ST-CRMF model can repair the possible missing value while fulfilling the forecasting tasks. After testing the effects of key parameters on model performance, the numerous experimental results confirm that our ST-CRMF model can efficiently capture the comprehensive spatial-temporal dependencies and significantly outperform those state-of-the-art models in the short-to-long terms (5-/15-/30-/60-min) traffic forecasting tasks on the open Seattle-Loop and METR-LA traffic datasets.


Assuntos
Fatores de Tempo , Previsões
13.
Sensors (Basel) ; 22(9)2022 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-35591257

RESUMO

Existing material identification for loose particles inside sealed relays focuses on the selection and optimization of classification algorithms, which ignores the features in the material dataset. In this paper, we propose a feature optimization method of material identification for loose particles inside sealed relays. First, for the missing value problem, multiple methods were used to process the material dataset. By comparing the identification accuracy achieved by a Random-Forest-based classifier (RF classifier) on the different processed datasets, the optimal direct-discarding method was obtained. Second, for the uneven data distribution problem, multiple methods were used to process the material dataset. By comparing the achieved identification accuracy, the optimal min-max standardization method was obtained. Then, for the feature selection problem, an innovative multi-index-fusion feature selection method was designed, and its superiority was verified through several tests. Test results show that the identification accuracy achieved by RF classifier on the dataset was improved from 59.63% to 63.60%. Test results of ten material verification datasets show that the identification accuracies achieved by RF classifier were greatly improved, with an average improvement of 3.01%. This strongly promotes research progress in loose particle material identification and is an important supplement to existing loose particle detection research. This is also the highest loose particle material identification accuracy achieved to in aerospace engineering, which has important practical value for improving the reliability of aerospace systems. Theoretically, it can be applied to feature optimization in machine learning.


Assuntos
Algoritmos , Aprendizado de Máquina , Reprodutibilidade dos Testes
14.
Entropy (Basel) ; 24(2)2022 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-35205580

RESUMO

Handling missing values in matrix data is an important step in data analysis. To date, many methods to estimate missing values based on data pattern similarity have been proposed. Most previously proposed methods perform missing value imputation based on data trends over the entire feature space. However, individual missing values are likely to show similarity to data patterns in local feature space. In addition, most existing methods focus on single class data, while multiclass analysis is frequently required in various fields. Missing value imputation for multiclass data must consider the characteristics of each class. In this paper, we propose two methods based on closed itemsets, CIimpute and ICIimpute, to achieve missing value imputation using local feature space for multiclass matrix data. CIimpute estimates missing values using closed itemsets extracted from each class. ICIimpute is an improved method of CIimpute in which an attribute reduction process is introduced. Experimental results demonstrate that attribute reduction considerably reduces computational time and improves imputation accuracy. Furthermore, it is shown that, compared to existing methods, ICIimpute provides superior imputation accuracy but requires more computational time.

15.
Entropy (Basel) ; 24(12)2022 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-36554203

RESUMO

Time series data are usually characterized by having missing values, high dimensionality, and large data volume. To solve the problem of high-dimensional time series with missing values, this paper proposes an attention-based sequence-to-sequence model to imputation missing values in time series (ASSM), which is a sequence-to-sequence model based on the combination of feature learning and data computation. The model consists of two parts, encoder and decoder. The encoder part is a BIGRU recurrent neural network and incorporates a self-attentive mechanism to make the model more capable of handling long-range time series; The decoder part is a GRU recurrent neural network and incorporates a cross-attentive mechanism into associate with the encoder part. The relationship weights between the generated sequences in the decoder part and the known sequences in the encoder part are calculated to achieve the purpose of focusing on the sequences with a high degree of correlation. In this paper, we conduct comparison experiments with four evaluation metrics and six models on four real datasets. The experimental results show that the model proposed in this paper outperforms the six comparative missing value interpolation algorithms.

16.
Knowl Based Syst ; 2492022 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-36159738

RESUMO

Missing values in tabular data restrict the use and performance of machine learning, requiring the imputation of missing values. The most popular imputation algorithm is arguably multiple imputations using chains of equations (MICE), which estimates missing values from linear conditioning on observed values. This paper proposes methods to improve both the imputation accuracy of MICE and the classification accuracy of imputed data by replacing MICE's linear regressors with ensemble learning and deep neural networks (DNN). The imputation accuracy is further improved by characterizing individual samples with cluster labels (CISCL) obtained from the training data. Our extensive analyses involving six tabular data sets, up to 80% missing values, and three missing types (missing completely at random, missing at random, missing not at random) reveal that ensemble or deep learning within MICE is superior to the baseline MICE (b-MICE), both of which are consistently outperformed by CISCL. Results show that CISCL + b-MICE outperforms b-MICE for all percentages and types of missingness. Our proposed DNN-based MICE and gradient boosting MICE plus CISCL (GB-MICE-CISCL) outperform seven state-of-the-art imputation algorithms in most experimental cases. The classification accuracy of GB-MICE imputed data is further improved by our proposed GB-MICE-CISCL imputation method across all missingness percentages. Results also reveal a shortcoming of the MICE framework at high missingness (>50%) and when the missing type is not random. This paper provides a generalized approach to identifying the best imputation model for a data set with a missingness percentage and type.

17.
Neuroimage ; 237: 118143, 2021 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-33991694

RESUMO

Alzheimer's disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years, with no treatments or available medicines. In this regard, there have been efforts to identify the risk of developing AD in its earliest time. While many of the previous works considered cross-sectional analysis, more recent studies have focused on the diagnosis and prognosis of AD with longitudinal or time series data in a way of disease progression modeling. Under the same problem settings, in this work, we propose a novel computational framework that can predict the phenotypic measurements of MRI biomarkers and trajectories of clinical status along with cognitive scores at multiple future time points. However, in handling time series data, it generally faces many unexpected missing observations. In regard to such an unfavorable situation, we define a secondary problem of estimating those missing values and tackle it in a systematic way by taking account of temporal and multivariate relations inherent in time series data. Concretely, we propose a deep recurrent network that jointly tackles the four problems of (i) missing value imputation, (ii) phenotypic measurements forecasting, (iii) trajectory estimation of a cognitive score, and (iv) clinical status prediction of a subject based on his/her longitudinal imaging biomarkers. Notably, the learnable parameters of all the modules in our predictive models are trained in an end-to-end manner by taking the morphological features and cognitive scores as input, with our circumspectly defined loss function. In our experiments over The Alzheimers Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge cohort, we measured performance for various metrics and compared our method to competing methods in the literature. Exhaustive analyses and ablation studies were also conducted to better confirm the effectiveness of our method.


Assuntos
Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Aprendizado Profundo , Progressão da Doença , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/diagnóstico por imagem , Biomarcadores , Disfunção Cognitiva/diagnóstico por imagem , Feminino , Humanos , Estudos Longitudinais , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Testes Neuropsicológicos , Prognóstico
18.
Int J Mol Sci ; 22(17)2021 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-34502557

RESUMO

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of "missing at random" (MAR) across batches of samples and varying rates of "missing not at random" (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.


Assuntos
Confiabilidade dos Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Espectrometria de Massas/métodos , Proteômica/estatística & dados numéricos , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Glucose/metabolismo , Humanos , Proteômica/métodos , Proteômica/normas
19.
J Biomed Inform ; 111: 103576, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33010424

RESUMO

Electronic health records (EHRs) often suffer missing values, for which recent advances in deep learning offer a promising remedy. We develop a deep learning-based, unsupervised method to impute missing values in patient records, then examine its imputation effectiveness and predictive efficacy for peritonitis patient management. Our method builds on a deep autoencoder framework, incorporates missing patterns, accounts for essential relationships in patient data, considers temporal patterns common to patient records, and employs a novel loss function for error calculation and regularization. Using a data set of 27,327 patient records, we perform a comparative evaluation of the proposed method and several prevalent benchmark techniques. The results indicate the greater imputation performance of our method relative to all the benchmark techniques, recording 5.3-15.5% lower imputation errors. Furthermore, the data imputed by the proposed method better predict readmission, length of stay, and mortality than those obtained from any benchmark techniques, achieving 2.7-11.5% improvements in predictive efficacy. The illustrated evaluation indicates the proposed method's viability, imputation effectiveness, and clinical decision support utilities. Overall, our method can reduce imputation biases and be applied to various missing value scenarios clinically, thereby empowering physicians and researchers to better analyze and utilize EHRs for improved patient management.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Confiabilidade dos Dados , Humanos , Projetos de Pesquisa
20.
Sensors (Basel) ; 20(6)2020 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-32210112

RESUMO

For efficient and effective energy management, accurate energy consumption forecasting is required in energy management systems (EMSs). Recently, several artificial intelligence-based techniques have been proposed for accurate electric load forecasting; moreover, perfect energy consumption data are critical for the prediction. However, owing to diverse reasons, such as device malfunctions and signal transmission errors, missing data are frequently observed in the actual data. Previously, many imputation methods have been proposed to compensate for missing values; however, these methods have achieved limited success in imputing electric energy consumption data because the period of data missing is long and the dependency on historical data is high. In this study, we propose a novel missing-value imputation scheme for electricity consumption data. The proposed scheme uses a bagging ensemble of multilayer perceptrons (MLPs), called softmax ensemble network, wherein the ensemble weight of each MLP is determined by a softmax function. This ensemble network learns electric energy consumption data with explanatory variables and imputes missing values in this data. To evaluate the performance of our scheme, we performed diverse experiments on real electric energy consumption data and confirmed that the proposed scheme can deliver superior performance compared to other imputation methods.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa