Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 129
Filtrar
1.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113074

RESUMO

Optimizing and benchmarking data reduction methods for dynamic or spatial visualization and interpretation (DSVI) face challenges due to many factors, including data complexity, lack of ground truth, time-dependent metrics, dimensionality bias and different visual mappings of the same data. Current studies often focus on independent static visualization or interpretability metrics that require ground truth. To overcome this limitation, we propose the MIBCOVIS framework, a comprehensive and interpretable benchmarking and computational approach. MIBCOVIS enhances the visualization and interpretability of high-dimensional data without relying on ground truth by integrating five robust metrics, including a novel time-ordered Markov-based structural metric, into a semi-supervised hierarchical Bayesian model. The framework assesses method accuracy and considers interaction effects among metric features. We apply MIBCOVIS using linear and nonlinear dimensionality reduction methods to evaluate optimal DSVI for four distinct dynamic and spatial biological processes captured by three single-cell data modalities: CyTOF, scRNA-seq and CODEX. These data vary in complexity based on feature dimensionality, unknown cell types and dynamic or spatial differences. Unlike traditional single-summary score approaches, MIBCOVIS compares accuracy distributions across methods. Our findings underscore the joint evaluation of visualization and interpretability, rather than relying on separate metrics. We reveal that prioritizing average performance can obscure method feature performance. Additionally, we explore the impact of data complexity on visualization and interpretability. Specifically, we provide optimal parameters and features and recommend methods, like the optimized variational contractive autoencoder, for targeted DSVI for various data complexities. MIBCOVIS shows promise for evaluating dynamic single-cell atlases and spatiotemporal data reduction models.


Assuntos
Benchmarking , Análise de Célula Única , Teorema de Bayes , Análise de Célula Única/métodos
2.
J Synchrotron Radiat ; 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39078694

RESUMO

MuscleX is an integrated, open-source computer software suite for data reduction of X-ray fiber diffraction patterns from striated muscle and other fibrous systems. It is written in Python and runs on Linux, Microsoft Windows or macOS. Most modules can be run either from a graphical user interface or in a `headless mode' from the command line, suitable for incorporation into beamline control systems. Here, we provide an overview of the general structure of the MuscleX software package and describe the specific features of the individual modules as well as examples of applications.

3.
Stat Med ; 42(26): 4776-4793, 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-37635131

RESUMO

Understanding the relationships between exposure and disease incidence is an important problem in environmental epidemiology. Typically, a large number of these exposures are measured, and it is found either that a few exposures transmit risk or that each exposure transmits a small amount of risk, but, taken together, these may pose a substantial disease risk. Further, these exposure effects can be nonlinear. We develop a latent functional approach, which assumes that the individual effect of each exposure can be characterized as one of a series of unobserved functions, where the number of latent functions is less than or equal to the number of exposures. We propose Bayesian methodology to fit models with a large number of exposures and show that existing Bayesian group LASSO approaches are a special case of the proposed model. An efficient Markov chain Monte Carlo sampling algorithm is developed for carrying out Bayesian inference. The deviance information criterion is used to choose an appropriate number of nonlinear latent functions. We demonstrate the good properties of the approach using simulation studies. Further, we show that complex exposure relationships can be represented with only a few latent functional curves. The proposed methodology is illustrated with an analysis of the effect of cumulative pesticide exposure on cancer risk in a large cohort of farmers.

4.
BMC Med Res Methodol ; 23(1): 56, 2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36859239

RESUMO

BACKGROUND: Science is becoming increasingly data intensive as digital innovations bring new capacity for continuous data generation and storage. This progress also brings challenges, as many scientific initiatives are challenged by the shear volumes of data produced. Here we present a case study of a data intensive randomized clinical trial assessing the utility of continuous pressure imaging (CPI) for reducing pressure injuries. OBJECTIVE: To explore an approach to reducing the amount of CPI data required for analyses to a manageable size without loss of critical information using a nested subset of pressure data. METHODS: Data from four enrolled study participants excluded from the analytical phase of the study were used to develop an approach to data reduction. A two-step data strategy was used. First, raw data were sampled at different frequencies (5, 30, 60, 120, and 240 s) to identify optimal measurement frequency. Second, similarity between adjacent frames was evaluated using correlation coefficients to identify position changes of enrolled study participants. Data strategy performance was evaluated through visual inspection using heat maps and time series plots. RESULTS: A sampling frequency of every 60 s provided reasonable representation of changes in interface pressure over time. This approach translated to using only 1.7% of the collected data in analyses. In the second step it was found that 160 frames within 24 h represented the pressure states of study participants. In total, only 480 frames from the 72 h of collected data would be needed for analyses without loss of information. Only ~ 0.2% of the raw data collected would be required for assessment of the primary trial outcome. CONCLUSIONS: Data reduction is an important component of big data analytics. Our two-step strategy markedly reduced the amount of data required for analyses without loss of information. This data reduction strategy, if validated, could be used in other CPI and other settings where large amounts of both temporal and spatial data must be analysed.


Assuntos
Tecnologia , Humanos , Coleta de Dados , Fatores de Tempo , Processamento de Sinais Assistido por Computador
5.
Anal Bioanal Chem ; 415(10): 1791-1801, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36826506

RESUMO

Non-target screening (NTS) based on high-resolution mass spectrometry (HRMS) is necessary to comprehensively characterize per- and polyfluoroalkyl substances (PFAS) in environmental, biological, and technical samples due to the very limited availability of authentic PFAS reference standards. Since in trace analysis, MS/MS information is not always achievable and only selected PFAS are present in homologous series, further techniques to prioritize measured HRMS data (features) according to their likelihood of being PFAS are highly desired due to the importance of efficient data reduction during NTS. Kaufmann et al. (J AOAC Int, 2022) presented a very promising approach to separate selected PFAS from sample matrix features by plotting the mass defect (MD) normalized to the number of carbons (MD/C) vs. mass normalized to the number of C (m/C). We systematically evaluated the advantages and limitations of this approach by using ~ 490,000 chemical formulas of organic chemicals (~ 210,000 PFAS, ~ 160,000 organic contaminants, and 125,000 natural organic matter compounds) and calculating how efficiently, and especially which, PFAS can be prioritized. While PFAS with high fluorine content (approximately: F/C > 0.8, H/F < 0.8, mass percent of fluorine > 55%) can be separated well, partially fluorinated PFAS with a high hydrogen content are more difficult to prioritize, which we discuss for selected PFAS. In the MD/C-m/C approach, even compounds with highly positive MDs above 0.5 Da and hence incorrectly assigned to negative MDs can still be separated from true negative mass defect features by the normalized mass (m/C). Furthermore, based on the position in the MD/C-m/C plot, we propose the estimation of the fluorine fraction in molecules for selected PFAS classes. The promising MD/C-m/C approach can be widely used in PFAS research and routine analysis. The concept is also applicable to other compound classes like iodinated compounds.

6.
J Sports Sci ; 41(20): 1845-1851, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38184790

RESUMO

The monitoring of athletes is crucial to preventing injuries, identifying fatigue or supporting return-to-play decisions. The purpose of this study was to explore the ability of Kohonen neural network self-organizing maps (SOM) to objectively characterize movement patterns during sidestepping and their association with injury risk. Further, the network's sensitivity to detect limb dominance was assessed. The data of 67 athletes with a total of 613 trials were included in this study. The 3D trajectories of 28 lower-body passive markers collected during sidestepping were used to train a SOM. The network consisted of 1247 neurons distributed over a 43 × 29 rectangular map with a hexagonal neighbourhood topology. Out of 61,913 input vectors, the SOM identified 1247 unique body postures. Visualizing the movement trajectories and adding several hidden variables allows for the investigation of different movement patterns and their association with joint loading. The used approach identified athletes that show significantly different movement strategies when sidestepping with their dominant or non-dominant leg, where one strategy was clearly associated with ACL-injury-relevant risk factors. The results highlight the ability of unsupervised machine learning to monitor an individual athlete's status without the necessity to reduce the complexity of the data describing the movement.


Assuntos
Lesões do Ligamento Cruzado Anterior , Articulação do Joelho , Humanos , Articulação do Joelho/fisiologia , Aprendizado de Máquina não Supervisionado , Redes Neurais de Computação , Movimento/fisiologia , Atletas , Lesões do Ligamento Cruzado Anterior/etiologia , Fenômenos Biomecânicos
7.
Sensors (Basel) ; 23(23)2023 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-38067800

RESUMO

With the development of intelligent IoT applications, vast amounts of data are generated by various volume sensors. These sensor data need to be reduced at the sensor and then reconstructed later to save bandwidth and energy. As the reduced data increase, the reconstructed data become less accurate. Usually, the trade-off between reduction rate and reconstruction accuracy is controlled by the reduction threshold, which is calculated by experiments based on historical data. Considering the dynamic nature of IoT, a fixed threshold cannot balance the reduction rate with the reconstruction accuracy adaptively. Aiming to dynamically balance the reduction rate with the reconstruction accuracy, an autonomous IoT data reduction method based on an adaptive threshold is proposed. During data reduction, concept drift detection is performed to capture IoT dynamic changes and trigger threshold adjustment. During data reconstruction, a data trend is added to improve reconstruction accuracy. The effectiveness of the proposed method is demonstrated by comparing the proposed method with the basic Kalman filtering algorithm, LMS algorithm, and PIP algorithm on stationary and nonstationary datasets. Compared with not applying the adaptive threshold, on average, there is an 11.7% improvement in accuracy for the same reduction rate or a 17.3% improvement in reduction rate for the same accuracy.

8.
Sensors (Basel) ; 23(4)2023 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-36850865

RESUMO

Wideband spectrum sensing is a challenging problem in the framework of cognitive radio and spectrum surveillance, mainly because of the high sampling rates required by standard approaches. In this paper, a compressed sensing approach was considered to solve this problem, relying on a sub-Nyquist or Xsampling scheme, known as a modulated wideband converter. First, the data reduction at its output is performed in order to enable a highly effective processing scheme for spectrum reconstruction. The impact of this data transformation on the behavior of the most popular sparse reconstruction algorithms is then analyzed. A new mathematical approach is proposed to demonstrate that greedy reconstruction algorithms, such as Orthogonal Matching Pursuit, are invariant with respect to the proposed data reduction. Relying on the same formalism, a data reduction invariant version of the LASSO (least absolute shrinkage and selection operator) reconstruction algorithm was also introduced. It is finally demonstrated that the proposed algorithm provides good reconstruction results in a wideband spectrum sensing scenario, using both synthetic and measured data.

9.
Sensors (Basel) ; 23(12)2023 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-37420612

RESUMO

Depth data and the digital bottom model created from it are very important in the inland and coastal water zones studies and research. The paper undertakes the subject of bathymetric data processing using reduction methods and examines the impact of data reduction according to the resulting representations of the bottom surface in the form of numerical bottom models. Data reduction is an approach that is meant to reduce the size of the input dataset to make it easier and more efficient for analysis, transmission, storage and similar. For the purposes of this article, test datasets were created by discretizing a selected polynomial function. The real dataset, which was used to verify the analyzes, was acquired using an interferometric echosounder mounted on a HydroDron-1 autonomous survey vessel. The data were collected in the ribbon of Lake Klodno, Zawory. Data reduction was conducted in two commercial programs. Three equal reduction parameters were adopted for each algorithm. The research part of the paper presents the results of the conducted analyzes of the reduced bathymetric datasets based on the visual comparison of numerical bottom models, isobaths, and statistical parameters. The article contains tabular results with statistics, as well as the spatial visualization of the studied fragments of numerical bottom models and isobaths. This research is being used in the course of work on an innovative project that aims to develop a prototype of a multi-dimensional and multi-temporal coastal zone monitoring system using autonomous, unmanned floating platforms at a single survey pass.


Assuntos
Algoritmos , Inquéritos e Questionários
10.
Sensors (Basel) ; 24(1)2023 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-38203036

RESUMO

This study uses a neural network to propose a methodology for creating digital bathymetric models for shallow water areas that are partially covered by a mix of hydroacoustic and photogrammetric data. A key challenge of this approach is the preparation of the training dataset from such data. Focusing on cases in which the training dataset covers only part of the measured depths, the approach employs generalized linear regression for data optimization followed by multilayer perceptron neural networks for bathymetric model creation. The research assessed the impact of data reduction, outlier elimination, and regression surface-based filtering on neural network learning. The average values of the root mean square (RMS) error were successively obtained for the studied nearshore, middle, and deep water areas, which were 0.12 m, 0.03 m, and 0.06 m, respectively; moreover, the values of the mean absolute error (MAE) were 0.11 m, 0.02 m, and 0.04 m, respectively. Following detailed quantitative and qualitative error analyses, the results indicate variable accuracy across different study areas. Nonetheless, the methodology demonstrated effectiveness in depth calculations for water bodies, although it faces challenges with respect to accuracy, especially in preserving nearshore values in shallow areas.

11.
Environ Monit Assess ; 195(12): 1434, 2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-37940769

RESUMO

Studying spatiotemporal water quality characteristics and their correlation with land use/land cover (LULC) patterns is essential for discerning the origins of various pollution sources and for informing strategic land use planning, which, in turn, requires a comprehensive analysis of spatiotemporal water quality data to comprehend how surface water quality evolves across different time and space dimensions. In this study, we compared catchment, riparian, and reach scale models to assess the effect of LULC on WQ. Using various multivariate techniques, a 14-year dataset of 20 WQ variables from 20 monitoring stations (67,200 observations) is studied along the Middle Ganga Basin (MGB). Based on the similarity and dissimilarity of WQPs, the K-means clustering algorithm classified the 20 monitoring stations into four clusters. Seasonally, the three PCs chosen explained 75.69% and 75% of the variance in the data. With PCs > 0.70, the variables EC, pH, Temp, TDS, NO2 + NO3, P-Tot, BOD, COD, and DO have been identified as dominant pollution sources. The applied RDA analysis revealed that LULC has a moderate to strong contribution to WQPs during the wet season but not during the dry season. Furthermore, dense vegetation is critical for keeping water clean, whereas agriculture, barren land, and built-up area degrade WQ. Besides that, the findings suggest that the relationship between WQPs and LULC differs at different scales. The stacked ensemble regression (SER) model is applied to understand the model's predictive power across different clusters and scales. Overall, the results indicate that the riparian scale is more predictive than the watershed and reach scales.


Assuntos
Monitoramento Ambiental , Qualidade da Água , Monitoramento Ambiental/métodos , Agricultura , Rios
12.
Neuroimage ; 254: 119169, 2022 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-35367650

RESUMO

Preterm birth is closely associated with diffuse white matter dysmaturation inferred from diffusion MRI and neurocognitive impairment in childhood. Diffusion tensor imaging (DTI) and neurite orientation dispersion and density imaging (NODDI) are distinct dMRI modalities, yet metrics derived from these two methods share variance across tracts. This raises the hypothesis that dimensionality reduction approaches may provide efficient whole-brain estimates of white matter microstructure that capture (dys)maturational processes. To investigate the optimal model for accurate classification of generalised white matter dysmaturation in preterm infants we assessed variation in DTI and NODDI metrics across 16 major white matter tracts using principal component analysis and structural equation modelling, in 79 term and 141 preterm infants at term equivalent age. We used logistic regression models to evaluate performances of single-metric and multimodality general factor frameworks for efficient classification of preterm infants based on variation in white matter microstructure. Single-metric general factors from DTI and NODDI capture substantial shared variance (41.8-72.5%) across 16 white matter tracts, and two multimodality factors captured 93.9% of variance shared between DTI and NODDI metrics themselves. General factors associate with preterm birth and a single model that includes all seven DTI and NODDI metrics provides the most accurate prediction of microstructural variations associated with preterm birth. This suggests that despite global covariance of dMRI metrics in neonates, each metric represents information about specific (and additive) aspects of the underlying microstructure that differ in preterm compared to term subjects.


Assuntos
Nascimento Prematuro , Substância Branca , Encéfalo/diagnóstico por imagem , Imagem de Tensor de Difusão/métodos , Feminino , Humanos , Lactente , Recém-Nascido , Recém-Nascido Prematuro , Neuritos , Gravidez , Substância Branca/diagnóstico por imagem
13.
J Synchrotron Radiat ; 29(Pt 6): 1420-1428, 2022 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-36345750

RESUMO

As synchrotron facilities continue to generate increasingly brilliant X-rays and detector speeds increase, swift data reduction from the collected area detector images to more workable 1D diffractograms becomes of increasing importance. This work reports an integration algorithm that can integrate diffractograms in real time on modern laptops and can reach 10 kHz integration speeds on modern workstations using an efficient pixel-splitting and parallelization scheme. This algorithm is limited not by the computation of the integration itself but is rather bottlenecked by the speed of the data transfer to the processor, the data decompression and/or the saving of results. The algorithm and its implementation is described while the performance is investigated on 2D scanning X-ray diffraction/fluorescence data collected at the interface between an implant and forming bone.


Assuntos
Algoritmos , Síncrotrons , Difração de Raios X , Raios X , Radiografia
14.
Sensors (Basel) ; 22(2)2022 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-35062547

RESUMO

Household appliances, climate control machines, vehicles, elevators, cash counting machines, etc., are complex machines with key contributions to the smart city. Those devices have limited memory and processing power, but they are not just actuators; they embed tens of sensors and actuators managed by several microcontrollers and microprocessors communicated by control buses. On the other hand, predictive maintenance and the capability of identifying failures to avoid greater damage of machines is becoming a topic of great relevance in Industry 4.0, and the large amount of data to be processed is a concern. This article proposes a layered methodology to enable complex machines with automatic fault detection or predictive maintenance. It presents a layered structure to perform the collection, filtering and extraction of indicators, along with their processing. The aim is to reduce the amount of data to work with, and to optimize them by generating indicators that concentrate the information provided by data. To test its applicability, a prototype of a cash counting machine has been used. With this prototype, different failure cases have been simulated by introducing defective elements. After the extraction of the indicators, using the Kullback-Liebler divergence, it has been possible to visualize the differences between the data associated with normal and failure operation. Subsequently, using a neural network, good results have been obtained, being able to correctly classify the failure in 90% of the cases. The result of this application demonstrates the proper functioning of the proposed approach in complex machines.


Assuntos
Algoritmos , Redes Neurais de Computação , Indústrias
15.
Sensors (Basel) ; 22(17)2022 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-36080845

RESUMO

Data storage is a problem that cannot be ignored in the long-term monitoring of a phase-sensitive optical time-domain reflectometry (Φ-OTDR) system. In this paper, we proposed a data-reduction approach for heterodyne Φ-OTDR using an ultra-low sampling resolution and undersampling techniques. The operation principles were demonstrated and experiments with different sensing configurations were carried out to verify the proposed method. The results showed that the vibration signal could be accurately reconstructed from the undersampled 1-bit data. A space saving ratio of 98.75% was achieved by converting 128 MB of data (corresponding to 268.44 ms of sensing time) to 1.6 MB. The proposed method led to a potentially new data-reduction approach for heterodyne Φ-OTDR, which also provided economical guidance for the selection of the data-acquisition device.

16.
Sensors (Basel) ; 22(11)2022 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-35684818

RESUMO

Electrochemical impedance spectroscopy (EIS) is the golden tool for many emerging biomedical applications that describes the behavior, stability, and long-term durability of physical interfaces in a specific range of frequency. Impedance measurements of any biointerface during in vivo and clinical applications could be used for assessing long-term biopotential measurements and diagnostic purposes. In this paper, a novel approach to predicting impedance behavior is presented and consists of a dimensional reduction procedure by converting EIS data over many days of an experiment into a one-dimensional sequence of values using a novel formula called day factor (DF) and then using a long short-term memory (LSTM) network to predict the future behavior of the DF. Three neural interfaces of different material compositions with long-term in vitro aging tests were used to validate the proposed approach. The results showed good accuracy in predicting the quantitative change in the impedance behavior (i.e., higher than 75%), in addition to good prediction of the similarity between the actual and the predicted DF signals, which expresses the impedance fluctuations among soaking days. The DF approach showed a lower computational time and algorithmic complexity compared with principal component analysis (PCA) and provided the ability to involve or emphasize several important frequencies or impedance range in a more flexible way.


Assuntos
Espectroscopia Dielétrica , Impedância Elétrica , Previsões
17.
Behav Res Methods ; 54(1): 42-53, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34100199

RESUMO

Assessing the intelligibility of speech-disordered individuals generally involves asking them to read aloud texts such as word lists, a procedure that can be time-consuming if the materials are lengthy. This paper seeks to optimize such elicitation materials by identifying an optimal trade-off between the quantity of material needed for assessment purposes and its capacity to elicit a robust intelligibility metrics. More specifically, it investigates the effect of reducing the number of pseudowords used in a phonetic-acoustic decoding task in a speech-impaired population in terms of the subsequent impact on the intelligibility classifier as quantified by accuracy indexes (AUC of ROC, Balanced Accuracy index and F-scores). A comparison of obtained accuracy indexes shows that when reduction of the amount of elicitation material is based on a phonetic criterion-here, related to phonotactic complexity-the classifier has a higher classifying ability than when the material is arbitrarily reduced. Crucially, downsizing the material to about 30% of the original dataset does not diminish the classifier's performance nor affect its stability. This result is of significant interest to clinicians as well as patients since it validates a tool that is both reliable and efficient.


Assuntos
Inteligibilidade da Fala , Percepção da Fala , Humanos , Fonética , Medida da Produção da Fala/métodos , Inquéritos e Questionários
18.
Comput Stat ; : 1-19, 2022 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-36124011

RESUMO

A long tradition of analysing ordinal response data deals with parametric models, which started with the seminal approach of cumulative models. When data are collected by means of Likert scale survey questions in which several scored items measure one or more latent traits, one of the sore topics is how to deal with the ordered categories. A stacked ensemble (or hybrid) model is introduced in the proposal to tackle the limitations of summing up the items. In particular, multiple items responses are synthesised into a single meta-item, defined via a joint data reduction approach; the meta-item is then modelled according to regression approaches for ordered polytomous variables accounting for potential scaling effects. Finally, a recursive partitioning method yielding trees provides automatic variable selection. The performance of the method is evaluated empirically by using a survey on Distance Learning perception.

19.
Sensors (Basel) ; 21(21)2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-34770398

RESUMO

Multi-channel Impact-echo (IE) testing was used to evaluate debonding defects at the interface between track concrete layer, TCL, and hydraulically stabilized base course, HSB, in a real scale mockup model of concrete slab tracks for Korea high-speed railway (KHSR) system. The mockup model includes three debonding defects that were fabricated by inserting three 400 mm by 400 mm (length and width) thin plastic foam boards with three different thicknesses of 5 mm, 10 mm, and 15 mm, before casting concrete in TCL. Multi-channel IE signals obtained over solid concrete and debonding defects were reduced to three critical IE testing parameters (the velocity of concrete, peak frequency, and Q factor). Bilinear classification models were used to evaluate the individual and a combination of the characteristic parameters. It was demonstrated that the best evaluation performance was obtained by using average peak frequency or the combination of average peak frequency and average Q factor, obtained by eight accelerometers in the multi-channel IE device. The results and discussion in this study would improve the understanding of characteristics of multiple IE testing parameters in concrete slab tracks and provide a fundamental basis to develop an effective prediction model of non-destructive evaluation for debonding defects at the interface between TCL and HSB in concrete slab tracks.

20.
Sensors (Basel) ; 21(21)2021 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-34770342

RESUMO

Enormous heterogeneous sensory data are generated in the Internet of Things (IoT) for various applications. These big data are characterized by additional features related to IoT, including trustworthiness, timing and spatial features. This reveals more perspectives to consider while processing, posing vast challenges to traditional data fusion methods at different fusion levels for collection and analysis. In this paper, an IoT-based spatiotemporal data fusion (STDF) approach for low-level data in-data out fusion is proposed for real-time spatial IoT source aggregation. It grants optimum performance through leveraging traditional data fusion methods based on big data analytics while exclusively maintaining the data expiry, trustworthiness and spatial and temporal IoT data perspectives, in addition to the volume and velocity. It applies cluster sampling for data reduction upon data acquisition from all IoT sources. For each source, it utilizes a combination of k-means clustering for spatial analysis and Tiny AGgregation (TAG) for temporal aggregation to maintain spatiotemporal data fusion at the processing server. STDF is validated via a public IoT data stream simulator. The experiments examine diverse IoT processing challenges in different datasets, reducing the data size by 95% and decreasing the processing time by 80%, with an accuracy level up to 90% for the largest used dataset.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa