Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 341
Filtrar
1.
Prog Brain Res ; 287: 1-24, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39097349

RESUMO

In a recent study employing time production, a number of participants presented aberrant data, which normally would have marked them as being outliers. Given the ongoing discussion in the literature regarding the illusory nature of the flow of time, in this paper we consider whether their data may indicate discontinuity in time perception. We analyze the log-log plots for these outliers, investigating to what degree linearity is preserved for all the data points, as opposed to achieving a better fit using bisegmental regression. The current results, though preliminary, can contribute to the debate regarding the non-linearity of subjective time. It would seem that with longer target durations, the ongoing experience of time can be either one of a subjective slowing down of time (longer time units, increase in slope), or of a subjective speeding up of time (shorter time units, decrease in slope).


Assuntos
Psicofísica , Percepção do Tempo , Humanos , Percepção do Tempo/fisiologia , Fatores de Tempo
2.
Sci Rep ; 14(1): 18542, 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39122861

RESUMO

In the mechanical cutting industry, trial production is used for predicting and evaluating the quality of product processes before batch production, and it can be expressed through the qualification rate. However, it cannot objectively and comprehensively evaluate the quality of product processes. This study optimizes the analysis of outliers and stability in mathematical statistics to better apply it in the mechanical cutting industry; then, it combines them with process capability analysis. Simultaneously, considering the non-normal distribution of process parameters, a batch production-prediction model is proposed. The reliability of batch production-prediction model is verified by the diameter, roundness and roughness of structural common samples. Meanwhile, for other mechanical parts in the mechanical cutting industry, the model proposed in this paper can be used to quickly and accurately predict and evaluate batch production.

4.
Behav Res Methods ; 2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39048860

RESUMO

When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.

5.
Sci Rep ; 14(1): 17599, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080303

RESUMO

The linear regression is critical for data modelling, especially for scientists. Nevertheless, with the plenty of high-dimensional data, there are data with more explanatory variables than the number of observations. In such circumstances, traditional approaches fail. This paper proposes a modified sparse regression model that solves the problem of heterogeneity using seaweed big data as a use case. The modified heterogeneity models for ridge, LASSO and Elastic net were used to model the data. Robust estimations M Bi-Square, M Hampel, M Huber, MM and S were used. Based on the results, the hybrid model of sparse regression for before, after, and modified heterogeneity robust regression with the 45 high ranking variables and a 2-sigma limit can be used efficiently and effectively to reduce the outliers. The obtained results confirm that the hybrid model of the modified sparse LASSO with the M Bi-Square estimator for the 45 high ranking parameters performed better compared with other existing methods.

6.
Stat Med ; 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38899515

RESUMO

Meta-analysis is an essential tool to comprehensively synthesize and quantitatively evaluate results of multiple clinical studies in evidence-based medicine. In many meta-analyses, the characteristics of some studies might markedly differ from those of the others, and these outlying studies can generate biases and potentially yield misleading results. In this article, we provide effective robust statistical inference methods using generalized likelihoods based on the density power divergence. The robust inference methods are designed to adjust the influences of outliers through the use of modified estimating equations based on a robust criterion, even when multiple and serious influential outliers are present. We provide the robust estimators, statistical tests, and confidence intervals via the generalized likelihoods for the fixed-effect and random-effects models of meta-analysis. We also assess the contribution rates of individual studies to the robust overall estimators that indicate how the influences of outlying studies are adjusted. Through simulations and applications to two recently published systematic reviews, we demonstrate that the overall conclusions and interpretations of meta-analyses can be markedly changed if the robust inference methods are applied and that only the conventional inference methods might produce misleading evidence. These methods would be recommended to be used at least as a sensitivity analysis method in the practice of meta-analysis. We have also developed an R package, robustmeta, that implements the robust inference methods.

7.
Planta ; 260(1): 32, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38896307

RESUMO

MAIN CONCLUSION: By studying Cistus albidus shrubs in their natural habitat, we show that biological outliers can help us to understand the causes and consequences of maximum photochemical efficiency decreases in plants, thus reinforcing the importance of integrating these often-neglected data into scientific practice. Outliers are individuals with exceptional traits that are often excluded of data analysis. However, this may result in very important mistakes not accurately capturing the true trajectory of the population, thereby limiting our understanding of a given biological process. Here, we studied the role of biological outliers in understanding the causes and consequences of maximum photochemical efficiency decreases in plants, using the semi-deciduous shrub C. albidus growing in a Mediterranean-type ecosystem. We assessed interindividual variability in winter, spring and summer maximum PSII photochemical efficiency in a population of C. albidus growing under Mediterranean conditions. A strong correlation was observed between maximum PSII photochemical efficiency (Fv/Fm ratio) and leaf water desiccation. While decreases in maximum PSII photochemical efficiency did not result in any damage at the organ level during winter, reductions in the Fv/Fm ratio were associated to leaf mortality during summer. However, all plants could recover after rainfalls, thus maximum PSII photochemical efficiency decreases did not result in an increased mortality at the organism level, despite extreme water deficit and temperatures exceeding 40ºC during the summer. We conclude that, once methodological outliers are excluded, not only biological outliers must not be excluded from data analysis, but focusing on them is crucial to understand the causes and consequences of maximum PSII photochemical efficiency decreases in plants.


Assuntos
Cistus , Complexo de Proteína do Fotossistema II , Folhas de Planta , Estações do Ano , Complexo de Proteína do Fotossistema II/metabolismo , Folhas de Planta/fisiologia , Folhas de Planta/metabolismo , Cistus/fisiologia , Fotossíntese , Ecossistema , Água , Temperatura , Clorofila/metabolismo
8.
Sci Rep ; 14(1): 13529, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866829

RESUMO

In real-life situations, we have to analyze the data that contains the atypical observations, and the presence of outliers has adverse effects on the performance of ordinary least square estimates. In this situation, redescedning M-estimators, proposed by Huber (1964), are used to tackle the effects of outliers to increase the efficiency of least square estimates. In this study, we introduce a redescending M-estimator designed to generate robust estimates by mitigating the influence of outlier observations, even when the tuning constant is set to low values. This innovative estimator exhibits enhanced linearity at its core and maintains continuity throughout its range. Our proposed estimator stands out for its novelty, simplicity, differentiability, and practical applicability across real-world scenarios. The results of the proposed redescedning M-estimators are compared with existing robust estimators using an extensive simulation study. Two examples based on real-life data are also added to validate the performance of the suggested function. The formulated redescedning M-estimator produced efficient results as compared to all the considered redescedning M-estimators.

9.
ISA Trans ; 151: 164-173, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38811310

RESUMO

The existence of dynamic outliers poses a significant challenge to the Kalman filter (KF). In addressing this challenge, this paper presents an innovative solution: Firstly, by analyzing a period of measurement information to more accurately identify state and measurement dynamic outliers, the system's capacity to adapt to dynamic changes is significantly improved. Next, noise is modeled as a Gaussian-Student's t mixture distribution (GSTM), with mixed model parameters inferred using the variational Bayesian (VB) method based on measurement information, cleverly integrated into the Moving Horizon Estimation (MHE) framework, thus enhancing the flexibility and accuracy of the noise model. Lastly, the optimal window size was identified through simulation experiment analysis to further increase the estimation accuracy. Simulation results demonstrate that the proposed filter exhibits stronger robustness in resisting dynamic outliers compared to existing filters.

10.
Biomedicines ; 12(5)2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38790902

RESUMO

Angiotensin-converting enzyme (ACE) metabolizes a number of important peptides participating in blood pressure regulation and vascular remodeling. Elevated ACE expression in tissues (which is generally reflected by blood ACE levels) is associated with an increased risk of cardiovascular diseases. Elevated blood ACE is also a marker for granulomatous diseases. Decreased blood ACE activity is becoming a new risk factor for Alzheimer's disease. We applied our novel approach-ACE phenotyping-to characterize pairs of tissues (lung, heart, lymph nodes) and serum ACE in 50 patients. ACE phenotyping includes (1) measurement of ACE activity with two substrates (ZPHL and HHL); (2) calculation of the ratio of hydrolysis of these substrates (ZPHL/HHL ratio); (3) determination of ACE immunoreactive protein levels using mAbs to ACE; and (4) ACE conformation with a set of mAbs to ACE. The ACE phenotyping approach in screening format with special attention to outliers, combined with analysis of sequencing data, allowed us to identify patient with a unique ACE phenotype related to decreased ability of inhibition of ACE activity by albumin, likely due to competition with high CCL18 in this patient for binding to ACE. We also confirmed recently discovered gender differences in sialylation of some glycosylation sites of ACE. ACE phenotyping is a promising new approach for the identification of ACE phenotype outliers with potential clinical significance, making it useful for screening in a personalized medicine approach.

11.
Eur J Intern Med ; 2024 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-38735801

RESUMO

BACKGROUND: the burden of acute complex patients, increasingly older and poli-pathological, accessing to Emergency Departments (ED) leads up hospital overcrowding and the outlying phenomenon. These issues highlight the need for new adequate patients' management strategies. The aim of this study is to analyse the effects on in-hospital patient flow and clinical outcomes of a high-technology and time-limited Medical Admission Unit (MAU) run by internists. METHODS: all consecutive patients admitted to MAU from Dec-2017 to Nov-2019 were included in the study. The admissions number from ED and hospitalization rate, the overall in-hospital mortality rate in medical department, the total days of hospitalization and the overall outliers bed days were compared to those from the previous two years. RESULTS: 2162 patients were admitted in MAU, 2085(95.6%) from ED, 476(22.0%) were directly discharged, 88(4.1%) died and 1598(73.9%) were transferred to other wards, with a median in-MAU time of stay of 64.5 [0.2-344.2] hours. Comparing the 24 months before, despite the increase in admissions/year from ED in medical department (3842 ± 106 in Dec2015-Nov2017 vs 4062 ± 100 in Dec2017-Nov2019, p<0.001), the number of the outlier bed days has been reduced, especially in surgical department (11.46 ± 6.25% in Dec2015-Nov2017 vs 6.39 ± 3.08% in Dec2017-Nov2019, p=0.001), and mortality in medical area has dropped from 8.74 ± 0.37% to 7.29 ± 0.57%, p<0.001. CONCLUSIONS: over two years, a patient-centred and problem-oriented approach in a medical admission buffer unit run by internists has ensured a constant flow of acute patients with positive effects on clinical risk and quality of care reducing medical outliers and in-hospital mortality.

12.
Behav Res Methods ; 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38811517

RESUMO

A methodological problem in most reaction time (RT) studies is that some measured RTs may be outliers-that is, they may be very fast or very slow for reasons unconnected to the task-related processing of interest. Numerous ad hoc methods have been suggested to discriminate between such outliers and the valid RTs of interest, but it is extremely difficult to determine how well these methods work in practice because virtually nothing is known about the actual characteristics of outliers in real RT datasets. This article proposes a new method of pooling cumulative distribution function values for examining empirical RT distributions to assess both the proportions of outliers and their latencies relative to those of the valid RTs. As the method is developed, its strengths and weaknesses are examined using simulations based on previously suggested ad hoc models for RT outliers with particular assumed proportions and distributions of valid RTs and outliers. The method is then applied to several large RT datasets from lexical decision tasks, and the results provide the first empirically based description of outlier RTs. For these datasets, fewer than 1% of the RTs seem to be outliers, and the median outlier latency appears to be approximately 4-6 standard deviations of RT above the mean of the valid RT distribution.

13.
BMC Med Res Methodol ; 24(1): 89, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622516

RESUMO

BACKGROUND: Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure. METHOD: We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries. RESULTS: Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process. CONCLUSIONS: Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.

14.
Sensors (Basel) ; 24(8)2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38676027

RESUMO

The variety of equipment implementing laser triangulation technology for 3D scanning makes it difficult to analyse their performance, comparability, and traceability. In this study, three laser triangulation sensors arranged in different configurations are analysed using high precision spheres made of different materials and surface finishes. Three types of reference parameters were used: diameter, form error, and standard deviation of the point cloud. The experimentation was based on studying the quality of the point clouds generated by the three sensors, which enabled us to find and quantify an edge effect in the horizon of the scanned surface. A procedure to reach the optimal filtering conditions was proposed, and a chart of recommended usage of each sphere (material and finish) was created for the different types of sensors. This filter enables removal of both spurious points and those few points that spoil the form error, greatly improving the quality of the measurement.

15.
J Peripher Nerv Syst ; 29(2): 202-212, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38581130

RESUMO

BACKGROUND: Caused by duplications of the gene encoding peripheral myelin protein 22 (PMP22), Charcot-Marie-Tooth disease type 1A (CMT1A) is the most common hereditary neuropathy. Despite this shared genetic origin, there is considerable variability in clinical severity. It is hypothesized that genetic modifiers contribute to this heterogeneity, the identification of which may reveal novel therapeutic targets. In this study, we present a comprehensive analysis of clinical examination results from 1564 CMT1A patients sourced from a prospective natural history study conducted by the RDCRN-INC (Inherited Neuropathy Consortium). Our primary objective is to delineate extreme phenotype profiles (mild and severe) within this patient cohort, thereby enhancing our ability to detect genetic modifiers with large effects. METHODS: We have conducted large-scale statistical analyses of the RDCRN-INC database to characterize CMT1A severity across multiple metrics. RESULTS: We defined patients below the 10th (mild) and above the 90th (severe) percentiles of age-normalized disease severity based on the CMT Examination Score V2 and foot dorsiflexion strength (MRC scale). Based on extreme phenotype categories, we defined a statistically justified recruitment strategy, which we propose to use in future modifier studies. INTERPRETATION: Leveraging whole genome sequencing with base pair resolution, a future genetic modifier evaluation will include single nucleotide association, gene burden tests, and structural variant analysis. The present work not only provides insight into the severity and course of CMT1A, but also elucidates the statistical foundation and practical considerations for a cost-efficient and straightforward patient enrollment strategy that we intend to conduct on additional patients recruited globally.


Assuntos
Doença de Charcot-Marie-Tooth , Doença de Charcot-Marie-Tooth/genética , Doença de Charcot-Marie-Tooth/fisiopatologia , Humanos , Adulto , Masculino , Feminino , Pessoa de Meia-Idade , Adolescente , Adulto Jovem , Índice de Gravidade de Doença , Criança , Proteínas da Mielina/genética , Seleção de Pacientes , Fenótipo , Idoso , Genes Modificadores , Pré-Escolar
16.
Heliyon ; 10(8): e28934, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38681655

RESUMO

Various authors have put their sincere efforts into proposing ratio estimators for estimating the population's mean and variance under different situations and sampling methods. But the problem arises when data is unstable, imprecise, ambiguous, incomplete and vague. In such situations, classical methods of estimation do not yield precise results, as these methods are not meant for such problems. Given this difficulty, Neutrosophic statistics are the only alternative as it deals with indeterminacy. So in this study, we have proposed a generalized Neutrosophic robust ratio type estimator which can be used to provide good results in such situations, as well as in the case of the presence of outliers. For the evaluation point of view, we have made use of real data set and simulation study to check the efficacy of our suggested estimators over the mentioned existed estimators.

17.
Behav Res Methods ; 56(4): 4162-4172, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38528245

RESUMO

Beyond the challenge of keeping up to date with current best practices regarding the diagnosis and treatment of outliers, an additional difficulty arises concerning the mathematical implementation of the recommended methods. Here, we provide an overview of current recommendations and best practices and demonstrate how they can easily and conveniently be implemented in the R statistical computing software, using the {performance} package of the easystats ecosystem. We cover univariate, multivariate, and model-based statistical outlier detection methods, their recommended threshold, standard output, and plotting methods. We conclude by reviewing the different theoretical types of outliers, whether to exclude or winsorize them, and the importance of transparency. A preprint of this paper is available at: 10.31234/osf.io/bu6nt.


Assuntos
Modelos Estatísticos , Software , Humanos , Interpretação Estatística de Dados
18.
Mol Oncol ; 18(6): 1460-1485, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38468448

RESUMO

Multiple strategies are continuously being explored to expand the drug target repertoire in solid tumors. We devised a novel computational workflow for transcriptome-wide gene expression outlier analysis that allows the systematic identification of both overexpression and underexpression events in cancer cells. Here, it was applied to expression values obtained through RNA sequencing in 226 colorectal cancer (CRC) cell lines that were also characterized by whole-exome sequencing and microarray-based DNA methylation profiling. We found cell models displaying an abnormally high or low expression level for 3533 and 965 genes, respectively. Gene expression abnormalities that have been previously associated with clinically relevant features of CRC cell lines were confirmed. Moreover, by integrating multi-omics data, we identified both genetic and epigenetic alternations underlying outlier expression values. Importantly, our atlas of CRC gene expression outliers can guide the discovery of novel drug targets and biomarkers. As a proof of concept, we found that CRC cell lines lacking expression of the MTAP gene are sensitive to treatment with a PRMT5-MTA inhibitor (MRTX1719). Finally, other tumor types may also benefit from this approach.


Assuntos
Neoplasias Colorretais , Regulação Neoplásica da Expressão Gênica , Transcriptoma , Humanos , Neoplasias Colorretais/genética , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/patologia , Neoplasias Colorretais/metabolismo , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Linhagem Celular Tumoral , Transcriptoma/genética , Perfilação da Expressão Gênica , Metilação de DNA/genética
19.
Sci Total Environ ; 927: 171950, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38537822

RESUMO

Information on sedimentary microplastics and phthalates has been restricted to the coastal regions of the Persian Gulf and the Gulf of Makran. Our basin-wide study monitored their levels, spatial behaviors, and potential risks using GIS-based techniques. Microplastics and phthalates ranged from 5 to 75 particles/kg d.w and 0.004-1.219 µg g-1 d.w, respectively. Microplastics were in the size category of 100 µm to 3 mm, and black microfibers (< 1 mm) and high-density polymers were dominant. The total number of microplastics was between 356.333 × 1012 and 469.075 × 1012 particles in the surface sediments of the studied regions (confidence interval = 99 %). Diethylhexyl phthalate (DEHP) and Di-isobutyl phthalate contributed 88 % of detected phthalates. Significant correlations among microplastic abundance, total phthalates, and DEHP were distinguished (p < 0.05). Overall, the findings reiterated the widespread presence of microplastics and a potential link between phthalates and microplastics. Semi-variogram, cluster Voronoi polygons, and Trend analysis identified spatial outliers and major deposition sites of microplastics and phthalates and consequently outlined the localities where upcoming studies should be concentrated. A hotspot of potential risks was marked using Fuzzy logic and GIS-based algorithms in the Sea of Makran, covering an area equal to 342. 99 km2.

20.
Water Res ; 255: 121499, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38552494

RESUMO

Recently, there has been a significant advancement in the water quality index (WQI) models utilizing data-driven approaches, especially those integrating machine learning and artificial intelligence (ML/AI) technology. Although, several recent studies have revealed that the data-driven model has produced inconsistent results due to the data outliers, which significantly impact model reliability and accuracy. The present study was carried out to assess the impact of data outliers on a recently developed Irish Water Quality Index (IEWQI) model, which relies on data-driven techniques. To the author's best knowledge, there has been no systematic framework for evaluating the influence of data outliers on such models. For the purposes of assessing the outlier impact of the data outliers on the water quality (WQ) model, this was the first initiative in research to introduce a comprehensive approach that combines machine learning with advanced statistical techniques. The proposed framework was implemented in Cork Harbour, Ireland, to evaluate the IEWQI model's sensitivity to outliers in input indicators to assess the water quality. In order to detect the data outlier, the study utilized two widely used ML techniques, including Isolation Forest (IF) and Kernel Density Estimation (KDE) within the dataset, for predicting WQ with and without these outliers. For validating the ML results, the study used five commonly used statistical measures. The performance metric (R2) indicates that the model performance improved slightly (R2 increased from 0.92 to 0.95) in predicting WQ after removing the data outlier from the input. But the IEWQI scores revealed that there were no statistically significant differences among the actual values, predictions with outliers, and predictions without outliers, with a 95 % confidence interval at p < 0.05. The results of model uncertainty also revealed that the model contributed <1 % uncertainty to the final assessment results for using both datasets (with and without outliers). In addition, all statistical measures indicated that the ML techniques provided reliable results that can be utilized for detecting outliers and their impacts on the IEWQI model. The findings of the research reveal that although the data outliers had no significant impact on the IEWQI model architecture, they had moderate impacts on the rating schemes' of the model. This finding indicated that detecting the data outliers could improve the accuracy of the IEWQI model in rating WQ as well as be helpful in mitigating the model eclipsing problem. In addition, the results of the research provide evidence of how the data outliers influenced the data-driven model in predicting WQ and reliability, particularly since the study confirmed that the IEWQI model's could be effective for accurately rating WQ despite the presence of the data outliers in the input. It could occur due to the spatio-temporal variability inherent in WQ indicators. However, the research assesses the influence of data input outliers on the IEWQI model and underscores important areas for future investigation. These areas include expanding temporal analysis using multi-year data, examining spatial outlier patterns, and evaluating detection methods. Moreover, it is essential to explore the real-world impacts of revised rating categories, involve stakeholders in outlier management, and fine-tune model parameters. Analysing model performance across varying temporal and spatial resolutions and incorporating additional environmental data can significantly enhance the accuracy of WQ assessment. Consequently, this study offers valuable insights to strengthen the IEWQI model's robustness and provides avenues for enhancing its utility in broader WQ assessment applications. Moreover, the study successfully adopted the framework for evaluating how data input outliers affect the data-driven model, such as the IEWQI model. The current study has been carried out in Cork Harbour for only a single year of WQ data. The framework should be tested across various domains for evaluating the response of the IEWQI model's in terms of the spatio-temporal resolution of the domain. Nevertheless, the study recommended that future research should be conducted to adjust or revise the IEWQI model's rating schemes and investigate the practical effects of data outliers on updated rating categories. However, the study provides potential recommendations for enhancing the IEWQI model's adaptability and reveals its effectiveness in expanding its applicability in more general WQ assessment scenarios.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA