RESUMO
PURPOSE: The objective of this study was to propose a novel preprocessing approach to simultaneously correct for the frequency and phase drifts in MRS data using cross-correlation technique. METHODS: The performance of the proposed method was first investigated at different SNR levels using simulation. Random frequency and phase offsets were added to a previously acquired STEAM human data at 7 T, simulating two different noise levels with and without baseline artifacts. Alongside the proposed spectral cross-correlation (SC) method, three other simultaneous alignment approaches were evaluated. Validation was performed on human brain data at 3 T and mouse brain data at 16.4 T. RESULTS: The results showed that the SC technique effectively corrects for both small and large frequency and phase drifts, even at low SNR levels. Furthermore, the mean square measurement error of the SC algorithm was comparable to the other three methods used, with much faster processing time. The efficacy of the proposed technique was successfully demonstrated in both human brain MRS data and in a noisy MRS dataset acquired from a small volume-of-interest in the mouse brain. CONCLUSION: The study demonstrated the availability of a fast and robust technique that accurately corrects for both small and large frequency and phase shifts in MRS.
Assuntos
Algoritmos , Artefatos , Encéfalo , Espectroscopia de Ressonância Magnética , Humanos , Camundongos , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Animais , Espectroscopia de Ressonância Magnética/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Razão Sinal-Ruído , Processamento de Sinais Assistido por ComputadorRESUMO
Automated EEG pre-processing pipelines provide several key advantages over traditional manual data cleaning approaches; primarily, they are less time-intensive and remove potential experimenter error/bias. Automated pipelines also require fewer technical expertise as they remove the need for manual artefact identification. We recently developed the fully automated Reduction of Electroencephalographic Artefacts (RELAX) pipeline and demonstrated its performance in cleaning EEG data recorded from adult populations. Here, we introduce the RELAX-Jr pipeline, which was adapted from RELAX and designed specifically for pre-processing of data collected from children. RELAX-Jr implements multi-channel Wiener filtering (MWF) and/or wavelet-enhanced independent component analysis (wICA) combined with the adjusted-ADJUST automated independent component classification algorithm to identify and reduce all artefacts using algorithms adapted to optimally identify artefacts in EEG recordings taken from children. Using a dataset of resting-state EEG recordings (N = 136) from children spanning early-to-middle childhood (4-12 years), we assessed the cleaning performance of RELAX-Jr using a range of metrics including signal-to-error ratio, artefact-to-residue ratio, ability to reduce blink and muscle contamination, and differences in estimates of alpha power between eyes-open and eyes-closed recordings. We also compared the performance of RELAX-Jr against four publicly available automated cleaning pipelines. We demonstrate that RELAX-Jr provides strong cleaning performance across a range of metrics, supporting its use as an effective and fully automated cleaning pipeline for neurodevelopmental EEG data.
Assuntos
Artefatos , Eletroencefalografia , Processamento de Sinais Assistido por Computador , Humanos , Eletroencefalografia/métodos , Eletroencefalografia/normas , Criança , Pré-Escolar , Masculino , Feminino , Encéfalo/fisiologia , AlgoritmosRESUMO
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
RESUMO
Droplet-based microfluidics techniques coupled to microscopy allow for the characterization of cells at the single-cell scale. However, such techniques generate substantial amounts of data and microscopy images that must be analyzed. Droplets on these images usually need to be classified depending on the number of cells they contain. This verification, when visually carried out by the experimenter image-per-image, is time-consuming and impractical for analysis of many assays or when an assay yields many putative droplets of interest. Machine learning models have already been developed to classify cell-containing droplets within microscopy images, but not in the context of assays in which non-cellular structures are present inside the droplet in addition to cells. Here we develop a deep learning model using the neural network ResNet-50 that can be applied to functional droplet-based microfluidic assays to classify droplets according to the number of cells they contain with >90% accuracy in a very short time. This model performs high accuracy classification of droplets containing both cells with non-cellular structures and cells alone and can accommodate several different cell types, for generalization to a broader array of droplet-based microfluidics applications.
RESUMO
This review article provides a comprehensive examination of the state-of-the-art in maize disease detection leveraging Convolutional Neural Networks (CNNs). Beginning with the intrinsic significance of plants and the pivotal role of maize in global agriculture, the increasing importance of detecting and mitigating maize diseases for ensuring food security is explored. The transformative potential of artificial intelligence, particularly CNNs, in automating the identification and diagnosis of maize diseases is investigated. Various aspects of the existing research landscape, including data sources, datasets, and the diversity of maize diseases covered, are scrutinized. A detailed analysis of data preprocessing strategies and data collection zones is conducted to add depth to the understanding of the field. The spectrum of algorithms and models employed in maize disease detection is comprehensively outlined, shedding light on their unique contributions and performance outcomes. The role of hyperparameter optimization techniques in refining model performance is explored across multiple studies. Performance metrics such as accuracy, precision, recall, F1 score, IoU, and mAP are systematically presented, offering insights into the efficacy of different CNN-based approaches. Challenges faced in maize disease detection are critically examined, emerging opportunities are identified, and future research directions are outlined. The review concludes by emphasizing the transformative impact of CNNs in revolutionizing maize disease detection while highlighting the need for ongoing research to address existing challenges and unlock the full potential of this technology.
RESUMO
Adaptive filtering methods based on least-mean-square (LMS) error criterion have been commonly used in auscultation to reduce ambient noise. For non-Gaussian signals containing pulse components, such methods are prone to weights misalignment. Unlike the commonly used variable step-size methods, this paper introduced linear preprocessing to address this issue. The role of linear preprocessing in improving the denoising performance of the normalized least-mean-square (NLMS) adaptive filtering algorithm was analyzed. It was shown that, the steady-state mean square weight deviation of the NLMS adaptive filter was proportional to the variance of the body sounds and inversely proportional to the variance of the ambient noise signals in the secondary channel. Preprocessing with properly set parameters could suppress the spikes of body sounds, and decrease the variance and the power spectral density of the body sounds, without significantly reducing or even with increasing the variance and the power spectral density of the ambient noise signals in the secondary channel. As a result, the preprocessing could reduce weights misalignment, and correspondingly, significantly improve the performance of ambient-noise reduction. Finally, a case of heart-sound auscultation was given to demonstrate how to design the preprocessing and how the preprocessing improved the ambient-noise reduction performance. The results can guide the design of adaptive denoising algorithms for body sound auscultation.
Assuntos
Algoritmos , Auscultação , Ruído , Processamento de Sinais Assistido por Computador , Razão Sinal-Ruído , Humanos , Ruído/prevenção & controle , Auscultação/métodos , Análise dos Mínimos QuadradosRESUMO
Infertility affects a significant number of humans. A supported reproduction technology was verified to ease infertility problems. In vitro fertilization (IVF) is one of the best choices, and its success relies on the preference for a higher-quality embryo for transmission. These have been normally completed physically by testing embryos in a microscope. The traditional morphological calculation of embryos shows predictable disadvantages, including effort- and time-consuming and expected risks of bias related to individual estimations completed by specific embryologists. Different computer vision (CV) and artificial intelligence (AI) techniques and devices have been recently applied in fertility hospitals to improve efficacy. AI addresses the imitation of intellectual performance and the capability of technologies to simulate cognitive learning, thinking, and problem-solving typically related to humans. Deep learning (DL) and machine learning (ML) are advanced AI algorithms in various fields and are considered the main algorithms for future human assistant technology. This study presents an Embryo Development and Morphology Using a Computer Vision-Aided Swin Transformer with a Boosted Dipper-Throated Optimization (EDMCV-STBDTO) technique. The EDMCV-STBDTO technique aims to accurately and efficiently detect embryo development, which is critical for improving fertility treatments and advancing developmental biology using medical CV techniques. Primarily, the EDMCV-STBDTO method performs image preprocessing using a bilateral filter (BF) model to remove the noise. Next, the swin transformer method is implemented for the feature extraction technique. The EDMCV-STBDTO model employs the variational autoencoder (VAE) method to classify human embryo development. Finally, the hyperparameter selection of the VAE method is implemented using the boosted dipper-throated optimization (BDTO) technique. The efficiency of the EDMCV-STBDTO method is validated by comprehensive studies using a benchmark dataset. The experimental result shows that the EDMCV-STBDTO method performs better than the recent techniques.
RESUMO
BACKGROUND AND OBJECTIVE: Preprocessing of data is a vital step for almost all deep learning workflows. In computer vision, manipulation of data intensity and spatial properties can improve network stability and can provide an important source of generalisation for deep neural networks. Models are frequently trained with preprocessing pipelines composed of many stages, but these pipelines come with a drawback; each stage that resamples the data costs time, degrades image quality, and adds bias to the output. Long pipelines can also be complex to design, especially in medical imaging, where cropping data early can cause significant artifacts. METHODS: We present Lazy Resampling, a software that rephrases spatial preprocessing operations as a graphics pipeline. Rather than each transform individually modifying the data, the transforms generate transform descriptions that are composited together into a single resample operation wherever possible. This reduces pipeline execution time and, most importantly, limits signal degradation. It enables simpler pipeline design as crops and other operations become non-destructive. Lazy Resampling is designed in such a way that it provides the maximum benefit to users without requiring them to understand the underlying concepts or change the way that they build pipelines. RESULTS: We evaluate Lazy Resampling by comparing traditional pipelines and the corresponding lazy resampling pipeline for the following tasks on Medical Segmentation Decathlon datasets. We demonstrate lower information loss in lazy pipelines vs. traditional pipelines. We demonstrate that Lazy Resampling can avoid catastrophic loss of semantic segmentation label accuracy occurring in traditional pipelines when passing labels through a pipeline and then back through the inverted pipeline. Finally, we demonstrate statistically significant improvements when training UNets for semantic segmentation. CONCLUSION: Lazy Resampling reduces the loss of information that occurs when running processing pipelines that traditionally have multiple resampling steps and enables researchers to build simpler pipelines by making operations such as rotation and cropping effectively non-destructive. It makes it possible to invert labels back through a pipeline without catastrophic loss of accuracy. A reference implementation for Lazy Resampling can be found at https://github.com/KCL-BMEIS/LazyResampling. Lazy Resampling is being implemented as a core feature in MONAI, an open source python-based deep learning library for medical imaging, with a roadmap for a full integration.
RESUMO
Fast and accurate anomaly detection is critical in telemetry systems because it helps operators take appropriate actions in response to abnormal behaviours. However, recent techniques are accurate but not fast enough to deal with real-time data. There is a need to reduce the anomaly detection time, which motivates us to propose two new algorithms called AnDePeD (Anomaly Detector on Periodic Data) and AnDePed Pro. The novelty of the proposed algorithms lies in exploiting the periodic nature of data in anomaly detection. Our proposed algorithms apply a variational mode decomposition technique to find and extract periodic components from the original data before using Long Short-Term Memory neural networks to detect anomalies in the remainder time series. Furthermore, our methods include advanced techniques to eliminate prediction errors and automatically tune operational parameters. Extensive numerical results show that the proposed algorithms achieve comparable performance in terms of Precision, Recall, F-score, and MCC metrics while outperforming most of the state-of-the-art anomaly detection approaches in terms of initialisation delay and detection delay, which is favourable for practical applications.
RESUMO
Radiomics is a method to extract detailed information from diagnostic images that cannot be perceived by the naked eye. Although radiomics research carries great potential to improve clinical decision-making, its inherent methodological complexities make it difficult to comprehend every step of the analysis, often causing reproducibility and generalizability issues that hinder clinical adoption. Critical steps in the radiomics analysis and model development pipeline-such as image, application of image filters, and selection of feature extraction parameters-can greatly affect the values of radiomic features. Moreover, common errors in data partitioning, model comparison, fine-tuning, assessment, and calibration can reduce reproducibility and impede clinical translation. Clinical adoption of radiomics also requires a deep understanding of model explainability and the development of intuitive interpretations of radiomic features. To address these challenges, it is essential for radiomics model developers and clinicians to be well-versed in current best practices. Proper knowledge and application of these practices is crucial for accurate radiomics feature extraction, robust model development, and thorough assessment, ultimately increasing reproducibility, generalizability, and the likelihood of successful clinical translation. In this article, we have provided researchers with our recommendations along with practical examples to facilitate good research practices in radiomics. KEY POINTS: Radiomics' inherent methodological complexity should be understood to ensure rigorous radiomic model development to improve clinical decision-making. Adherence to radiomics-specific checklists and quality assessment tools ensures methodological rigor. Use of standardized radiomics tools and best practices enhances clinical translation of radiomics models.
RESUMO
Single-cell DNA methylation sequencing technology has seen rapid advancements in recent years, playing a crucial role in uncovering cellular heterogeneity and the mechanisms of epigenetic regulation. As sequencing technologies have progressed, the quality and quantity of single-cell methylation data have also increased, making standardized preprocessing workflows and appropriate analysis methods essential for ensuring data comparability and result reliability. However, a comprehensive data analysis pipeline to guide researchers in mining existing data has yet to be established. This review systematically summarizes the preprocessing steps and analysis methods for single-cell methylation data, introduces relevant algorithms and tools, and explores the application prospects of single-cell methylation technology in neuroscience, hematopoietic differentiation, and cancer research. The aim is to provide guidance for researchers in data analysis and to promote the development and application of single-cell methylation sequencing technology.
Assuntos
Metilação de DNA , Análise de Sequência de DNA , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise de Sequência de DNA/métodos , Epigênese Genética , Animais , AlgoritmosRESUMO
Brain tumor, a leading cause of uncontrolled cell growth in the central nervous system, presents substantial challenges in medical diagnosis and treatment. Early and accurate detection is essential for effective intervention. This study aims to enhance the detection and classification of brain tumors in Magnetic Resonance Imaging (MRI) scans using an innovative framework combining Vision Transformer (ViT) and Gated Recurrent Unit (GRU) models. We utilized primary MRI data from Bangabandhu Sheikh Mujib Medical College Hospital (BSMMCH) in Faridpur, Bangladesh. Our hybrid ViT-GRU model extracts essential features via ViT and identifies relationships between these features using GRU, addressing class imbalance and outperforming existing diagnostic methods. We extensively processed the dataset, and then trained the model using various optimizers (SGD, Adam, AdamW) and evaluated through rigorous 10-fold cross-validation. Additionally, we incorporated Explainable Artificial Intelligence (XAI) techniques-Attention Map, SHAP, and LIME-to enhance the interpretability of the model's predictions. For the primary dataset BrTMHD-2023, the ViT-GRU model achieved precision, recall, and F1-score metrics of 97%. The highest accuracies obtained with SGD, Adam, and AdamW optimizers were 81.66%, 96.56%, and 98.97%, respectively. Our model outperformed existing Transfer Learning models by 1.26%, as validated through comparative analysis and cross-validation. The proposed model also shows excellent performances with another Brain Tumor Kaggle Dataset outperforming the existing research done on the same dataset with 96.08% accuracy. The proposed ViT-GRU framework significantly improves the detection and classification of brain tumors in MRI scans. The integration of XAI techniques enhances the model's transparency and reliability, fostering trust among clinicians and facilitating clinical application. Future work will expand the dataset and apply findings to real-time diagnostic devices, advancing the field.
Assuntos
Neoplasias Encefálicas , Imageamento por Ressonância Magnética , Humanos , Bangladesh , Imageamento por Ressonância Magnética/métodos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/classificação , Neoplasias Encefálicas/patologia , Inteligência Artificial , Algoritmos , Interpretação de Imagem Assistida por Computador/métodosRESUMO
In the present investigation, we have devised a hyperspectral imaging (HSI) apparatus to assess the chemical characteristics and freshness of the yellow croaker (Larimichthys polyactis) throughout its storage period. This system operates within the shortwave infrared spectrum, specifically ranging from 900 to 1700 nm. A variety of spectral pre-processing techniques, including standard normal variate (SNV), multiple scatter correction, and Savitzky-Golay (SG) derivatives, were employed to augment the predictive accuracy of total volatile basic nitrogen (TVB-N)-which serves as a critical freshness parameter. Among the assessed methodologies, SG-1 pre-processing demonstrated superior predictive accuracy (Rp2 = 0.8166). Furthermore, this investigation visualized freshness indicators as concentration images to elucidate the spatial distribution of TVB-N across the samples. These results indicate that HSI, in conjunction with chemometric analysis, constitutes an efficacious instrument for the surveillance of quality and safety in yellow croakers during its storage phase. Moreover, this methodology guarantees the freshness and safety of seafood products within the aquatic food sector.
RESUMO
With the advancement of deep learning (DL) technology, DL-based intrusion detection models have emerged as a focal point of research within the domain of cybersecurity. This paper provides an overview of the datasets frequently utilized in the research. This article presents an overview of the widely utilized datasets in the research, establishing a basis for future investigation and analysis. The text subsequently summarizes the prevalent data preprocessing methods and feature engineering techniques utilized in intrusion detection. Following this, it provides a review of seven deep learning-based intrusion detection models, namely, deep autoencoders, deep belief networks, deep neural networks, convolutional neural networks, recurrent neural networks, generative adversarial networks, and transformers. Each model is examined from various dimensions, highlighting their unique architectures and applications within the context of cybersecurity. Furthermore, this paper broadens its scope to include intrusion detection techniques facilitated by the following two large-scale predictive models: the BERT series and the GPT series. These models, leveraging the power of transformers and attention mechanisms, have demonstrated remarkable capabilities in understanding and processing sequential data. In light of these findings, this paper concludes with a prospective outlook on future research directions. Four key areas have been identified for further research. By addressing these issues and advancing research in the aforementioned areas, this paper envisions a future in which DL-based intrusion detection systems are not only more accurate and efficient but also better aligned with the dynamic and evolving landscape of cybersecurity threats.
RESUMO
Liquid chromatography, when used in conjunction with mass spectrometry (LC/MS), is a powerful tool for conducting accurate and reproducible investigations of numerous metabolites in natural products (NPs). LC/MS has gained prominence in metabolomic research due to its high throughput, the availability of multiple ionization techniques and its ability to provide comprehensive metabolite coverage. This unique method can significantly influence various scientific domains. This review offers a comprehensive overview of the current state of LC/MS-based metabolomics in the investigation of NPs. This review provides a thorough overview of the state of the art in LC/MS-based metabolomics for the investigation of NPs. It covers the principles of LC/MS, various aspects of LC/MS-based metabolomics such as sample preparation, LC modes, method development, ionization techniques and data pre-processing. Moreover, it presents the applications of LC/MS-based metabolomics in numerous fields of NPs research such as including biomarker discovery, the agricultural research, food analysis, the study of marine NPs and microbiological research. Additionally, this review discusses the challenges and limitations of LC/MS-based metabolomics, as well as emerging trends and developments in this field.
RESUMO
This study explored the quantitative inversion of the chlorophyll content in Paulownia seedling leaves under drought stress and analyzed the factors influencing the chlorophyll content from multiple perspectives to obtain the optimal model. Paulownia seedlings were selected as the experimental materials for the potted water control experiments. Four drought stress treatments were set up to obtain four types of Paulownia seedlings: one pair of top leaves (T1), two pairs of leaves (T2), three pairs of leaves (T3), and four pairs of leaves (T4). In total, 23 spectral transformations were selected, and the following four methods were adopted to construct the prediction model, select the best spectral preprocessing method, and explore the influence of water bands: partial least squares modeling with all spectral bands (all-band partial least squares, AB-PLS), principal component analysis partial least squares (PCA-PLS), correlation analysis partial least squares (CA-PLS), correlation analysis (water band) partial least squares, ([CA(W)-PLS]), and vegetation index modeling. Based on the prediction accuracy and the uniformity of different leaf positions, the optimal model was systematically explored. The results of the analysis of spectral reflectance showed significant differences at different leaf positions. The sensitive bands of chlorophyll were located near 550 nm, whereas the sensitive bands of water were located near 1440 and 1920 nm. The results of the vegetation index models indicate that the multiple-index models outperformed the single-index models. Accuracy decreased as the number of indicators decreased. We found that different model construction methods matched different optimal spectral preprocessing methods. First derivative spectra (R') was the best preprocessing method for the AB-PLS, PCA-PLS, and CA-PLS models, whereas the inverse log spectra (log(1/R)) was the best preprocessing method for the CA(W)-PLS model. Among the 14 indices, the green normalized difference vegetation index (GNDVI) was most correlated with the chlorophyll content sensitivity indices, and the water index (WI) was most correlated with the water sensitive indices. At the same time, the water band affected the cross validation accuracy. When characteristic bands were used for modeling, the cross validation accuracy was significantly increased. In contrast, when vegetation indices were used for modeling, the accuracy of the cross validation increased slightly but its predictive ability was reduced; thus, these changes could be ignored. We found that leaf position also affected the prediction accuracy, with the first pair of top leaves exhibiting the worst predictive ability. This was a bottleneck that limited predictive capability. Finally, we found that the CA(W)-PLS model was optimal. The model was based on 23 spectral transformations, four PLS construction methods, water bands, and different leaf positions to ensure systematicity, stability, and applicability.
Assuntos
Clorofila , Secas , Folhas de Planta , Clorofila/análise , Clorofila/metabolismo , Folhas de Planta/química , Folhas de Planta/metabolismo , Folhas de Planta/fisiologia , Análise dos Mínimos Quadrados , Estresse Fisiológico/fisiologia , Água/química , Água/metabolismo , Análise de Componente Principal , Análise Espectral/métodosRESUMO
Sentiment analysis is an essential task that involves the extraction, identification, characterization, and classification of textual data to understand and categorize the attitudes and opinions expressed by individuals. While other languages have extensive datasets in this field, the number of sentiment analysis datasets in the Kurdish language is extremely limited, highlighting the necessity to build datasets for the language to advance its development. This paper presents a Twitter dataset comprising 24,668 tweets from the initial sample of 30,009 texts. Human annotators labelled the tweets based on subjectivity, sentiment, offensiveness, and target. After the initial annotation, an independent reviewer examined all labelled data to ensure the construction of a robust dataset. The cleaned dataset includes 8772 subjective tweets and 15,896 non-subjective tweets. Regarding sentiment, 12,938 were classified as negative, 3189 as neutral, and 8541 as positive. Moreover, 22,436 were non-offensive tweets, while 2232 were offensive. Additionally, the dataset distinguishes between targeted and non-targeted tweets, with 22,436 tweets not aimed at specific individuals or entities, and 2232 tweets directed towards particular targets. This dataset serves as an essential resource for scholars in the field to build state-of-the-art models for the Kurdish language.
RESUMO
Determining the geographical origin of kimchi holds significance because of the considerable variation in quality and price among kimchi products from different regions. This study explored the feasibility of employing Fourier transform near-infrared spectroscopy in conjunction with supervised chemometric techniques to differentiate domestic and imported kimchi products. A total of 30 domestic and 30 imported kimchi products were used to build datasets. Three categories of preprocessing methods such as scattering correction (multiplicative signal correction and standard normal variate), spectral derivatives (the first derivative and the second derivative), and data smoothing (Savitzky-Golay filtering and Norris derivative filtering) were used. K-nearest neighbors, support vector machine, random forest, and partial least squares-discriminant analysis were employed. By appropriately preprocessing spectral data, these four methods successfully distinguished between the two sample groups based on their origin. Notably, the k-nearest neighbors method exhibited exceptional performance, accurately classifying the sample groups irrespective of the preprocessing method employed and swiftly achieving this classification. In comparison, classification and regression tree as well as naïve Bayes methods were outperformed by the aforementioned four classification techniques. Particularly, the efficiency and accuracy of the k-nearest neighbors method make it the most recommended chemometric tool for determining the geographical origins of kimchi.
RESUMO
The study of the extensive data sets generated by spectrometers, which are of the type commonly referred to as big data, plays a crucial role in extracting valuable information on mineral composition in various fields, such as chemistry, geology, archaeology, pharmacy and anthropology. The analysis of these spectroscopic data falls into the category of big data, which requires the application of advanced statistical methods such as principal component analysis and cluster analysis. However, the large amount of data (big data) recorded by spectrometers makes it very difficult to obtain reliable results from raw data. The usual method is to carry out different mathematical transformations of the raw data. Here, we propose to use the affine transformation for highlight the underlying features for each sample. Finally, an application to spectroscopic data collected from minerals or rocks recorded by NASA's Jet Propulsion Laboratory is performed. An illustrative example has been included by analysing three mineral samples, which have different diageneses and parageneses and belong to different mineralogical groups.
RESUMO
BACKGROUND: The optimization of patient care pathways is crucial for hospital managers in the context of a scarcity of medical resources. Assuming unlimited capacities, the pathway of a patient would only be governed by pure medical logic to meet at best the patient's needs. However, logistical limitations (eg, resources such as inpatient beds) are often associated with delayed treatments and may ultimately affect patient pathways. This is especially true for unscheduled patients-when a patient in the emergency department needs to be admitted to another medical unit without disturbing the flow of planned hospitalizations. OBJECTIVE: In this study, we proposed a new framework to automatically detect activities in patient pathways that may be unrelated to patients' needs but rather induced by logistical limitations. METHODS: The scientific contribution lies in a method that transforms a database of historical pathways with bias into 2 databases: a labeled pathway database where each activity is labeled as relevant (related to a patient's needs) or irrelevant (induced by logistical limitations) and a corrected pathway database where each activity corresponds to the activity that would occur assuming unlimited resources. The labeling algorithm was assessed through medical expertise. In total, 2 case studies quantified the impact of our method of preprocessing health care data using process mining and discrete event simulation. RESULTS: Focusing on unscheduled patient pathways, we collected data covering 12 months of activity at the Groupe Hospitalier Bretagne Sud in France. Our algorithm had 87% accuracy and demonstrated its usefulness for preprocessing traces and obtaining a clean database. The 2 case studies showed the importance of our preprocessing step before any analysis. The process graphs of the processed data had, on average, 40% (SD 10%) fewer variants than the raw data. The simulation revealed that 30% of the medical units had >1 bed difference in capacity between the processed and raw data. CONCLUSIONS: Patient pathway data reflect the actual activity of hospitals that is governed by medical requirements and logistical limitations. Before using these data, these limitations should be identified and corrected. We anticipate that our approach can be generalized to obtain unbiased analyses of patient pathways for other hospitals.