Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 8.584
1.
Comput Methods Programs Biomed ; 252: 108234, 2024 Jul.
Article En | MEDLINE | ID: mdl-38823206

BACKGROUND AND OBJECTIVE: Patient-specific 3D computational fluid dynamics (CFD) models are increasingly being used to understand and predict transarterial radioembolization procedures used for hepatocellular carcinoma treatment. While sensitivity analyses of these CFD models can help to determine the most impactful input parameters, such analyses are computationally costly. Therefore, we aim to use surrogate modelling to allow relatively cheap sensitivity analysis. As an example, we compute Sobol's sensitivity indices for three input waveform shape parameters. METHODS: We extracted three characteristic shape parameters from our input mass flow rate waveform (peak systolic mass flow rate, heart rate, systolic duration) and defined our 3D input parameter space by varying these parameters within 75 %-125 % of their nominal values. To fit our surrogate model with a minimal number of costly CFD simulations, we developed an adaptive design of experiments (ADOE) algorithm. The ADOE uses 100 Latin hypercube sampled points in 3D input space to define the initial design of experiments (DOE). Subsequently, we re-sample input space with 10,000 Latin Hypercube sampled points and cheaply estimate the outputs using the surrogate model. In each of 27 equivolume bins which divide our input space, we determine the most uncertain prediction of the 10,000 points, compute the true outputs using CFD, and add these points to the DOE. For each ADOE iteration, we calculate Sobol's sensitivity indices, and we continue to add batches of 27 samples to the DOE until the Sobol indices have stabilized. RESULTS: We tested our ADOE algorithm on the Ishigami function and showed that we can reliably obtain Sobol's indices with an absolute error <0.1. Applying ADOE to our waveform sensitivity problem, we found that the first-order sensitivity indices were 0.0550, 0.0191 and 0.407 for the peak systolic mass flow rate, heart rate, and the systolic duration, respectively. CONCLUSIONS: Although the current study was an illustrative case, the ADOE allows reliable sensitivity analysis with a limited number of complex model evaluations, and performs well even when the optimal DOE size is a priori unknown. This enables us to identify the highest-impact input parameters of our model, and other novel, costly models in the future.


Algorithms , Carcinoma, Hepatocellular , Embolization, Therapeutic , Liver Neoplasms , Humans , Liver Neoplasms/radiotherapy , Carcinoma, Hepatocellular/radiotherapy , Embolization, Therapeutic/methods , Normal Distribution , Liver , Computer Simulation , Hydrodynamics , Regression Analysis , Imaging, Three-Dimensional
2.
PLoS One ; 19(5): e0301259, 2024.
Article En | MEDLINE | ID: mdl-38709733

Bayesian Control charts are emerging as the most efficient statistical tools for monitoring manufacturing processes and providing effective control over process variability. The Bayesian approach is particularly suitable for addressing parametric uncertainty in the manufacturing industry. In this study, we determine the monitoring threshold for the shape parameter of the Inverse Gaussian distribution (IGD) and design different exponentially-weighted-moving-average (EWMA) control charts based on different loss functions (LFs). The impact of hyperparameters is investigated on Bayes estimates (BEs) and posterior risks (PRs). The performance measures such as average run length (ARL), standard deviation of run length (SDRL), and median of run length (MRL) are employed to evaluate the suggested approach. The designed Bayesian charts are evaluated for different settings of smoothing constant of the EWMA chart, different sample sizes, and pre-specified false alarm rates. The simulative study demonstrates the effectiveness of the suggested Bayesian method-based EWMA charts as compared to the conventional classical setup-based EWMA charts. The proposed techniques of EWMA charts are highly efficient in detecting shifts in the shape parameter and outperform their classical counterpart in detecting faults quickly. The proposed technique is also applied to real-data case studies from the aerospace manufacturing industry. The quality characteristic of interest was selected as the monthly industrial production index of aircraft from January 1980 to December 2022. The real-data-based findings also validate the conclusions based on the simulative results.


Bayes Theorem , Normal Distribution , Algorithms , Humans , Models, Statistical
3.
Sci Rep ; 14(1): 12148, 2024 05 27.
Article En | MEDLINE | ID: mdl-38802532

MPS III is an autosomal recessive lysosomal storage disease caused mainly by missense variants in the NAGLU, GNS, HGSNAT, and SGSH genes. The pathogenicity interpretation of missense variants is still challenging. We aimed to develop unsupervised clustering-based pathogenicity predictor scores using extracted features from eight in silico predictors to predict the impact of novel missense variants of Sanfilippo syndrome. The model was trained on a dataset consisting of 415 uncertain significant (VUS) missense NAGLU variants. Performance The SanfilippoPred tool was evaluated by validation and test datasets consisting of 197-labelled NAGLU missense variants, and its performance was compared versus individual pathogenicity predictors using receiver operating characteristic (ROC) analysis. Moreover, we tested the SanfilippoPred tool using extra-labelled 427 missense variants to assess its specificity and sensitivity threshold. Application of the trained machine learning (ML) model on the test dataset of labelled NAGLU missense variants showed that SanfilippoPred has an accuracy of 0.93 (0.86-0.97 at CI 95%), sensitivity of 0.93, and specificity of 0.92. The comparative performance of the SanfilippoPred showed better performance (AUC = 0.908) than the individual predictors SIFT (AUC = 0.756), Polyphen-2 (AUC = 0.788), CADD (AUC = 0.568), REVEL (AUC = 0.548), MetaLR (AUC = 0.751), and AlphMissense (AUC = 0.885). Using high-confidence labelled NAGLU variants, showed that SanfilippoPred has an 85.7% sensitivity threshold. The poor correlation between the Sanfilippo syndrome phenotype and genotype represents a demand for a new tool to classify its missense variants. This study provides a significant tool for preventing the misinterpretation of missense variants of the Sanfilippo syndrome-relevant genes. Finally, it seems that ML-based pathogenicity predictors and Sanfilippo syndrome-specific prediction tools could be feasible and efficient pathogenicity predictors in the future.


Bayes Theorem , Mucopolysaccharidosis III , Mutation, Missense , Mucopolysaccharidosis III/genetics , Humans , Machine Learning , ROC Curve , Computational Biology/methods , Normal Distribution
4.
Hum Brain Mapp ; 45(7): e26692, 2024 May.
Article En | MEDLINE | ID: mdl-38712767

In neuroimaging studies, combining data collected from multiple study sites or scanners is becoming common to increase the reproducibility of scientific discoveries. At the same time, unwanted variations arise by using different scanners (inter-scanner biases), which need to be corrected before downstream analyses to facilitate replicable research and prevent spurious findings. While statistical harmonization methods such as ComBat have become popular in mitigating inter-scanner biases in neuroimaging, recent methodological advances have shown that harmonizing heterogeneous covariances results in higher data quality. In vertex-level cortical thickness data, heterogeneity in spatial autocorrelation is a critical factor that affects covariance heterogeneity. Our work proposes a new statistical harmonization method called spatial autocorrelation normalization (SAN) that preserves homogeneous covariance vertex-level cortical thickness data across different scanners. We use an explicit Gaussian process to characterize scanner-invariant and scanner-specific variations to reconstruct spatially homogeneous data across scanners. SAN is computationally feasible, and it easily allows the integration of existing harmonization methods. We demonstrate the utility of the proposed method using cortical thickness data from the Social Processes Initiative in the Neurobiology of the Schizophrenia(s) (SPINS) study. SAN is publicly available as an R package.


Cerebral Cortex , Magnetic Resonance Imaging , Schizophrenia , Humans , Magnetic Resonance Imaging/standards , Magnetic Resonance Imaging/methods , Schizophrenia/diagnostic imaging , Schizophrenia/pathology , Cerebral Cortex/diagnostic imaging , Cerebral Cortex/anatomy & histology , Neuroimaging/methods , Neuroimaging/standards , Image Processing, Computer-Assisted/methods , Image Processing, Computer-Assisted/standards , Male , Female , Adult , Normal Distribution , Brain Cortical Thickness
5.
Article En | MEDLINE | ID: mdl-38717876

Neurovascular coupling (NVC) provides important insights into the intricate activity of brain functioning and may aid in the early diagnosis of brain diseases. Emerging evidences have shown that NVC could be assessed by the coupling between electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS). However, this endeavor presents significant challenges due to the absence of standardized methodologies and reliable techniques for coupling analysis of these two modalities. In this study, we introduced a novel method, i.e., the collaborative multi-output variational Gaussian process convergent cross-mapping (CMVGP-CCM) approach to advance coupling analysis of EEG and fNIRS. To validate the robustness and reliability of the CMVGP-CCM method, we conducted extensive experiments using chaotic time series models with varying noise levels, sequence lengths, and causal driving strengths. In addition, we employed the CMVGP-CCM method to explore the NVC between EEG and fNIRS signals collected from 26 healthy participants using a working memory (WM) task. Results revealed a significant causal effect of EEG signals, particularly the delta, theta, and alpha frequency bands, on the fNIRS signals during WM. This influence was notably observed in the frontal lobe, and its strength exhibited a decline as cognitive demands increased. This study illuminates the complex connections between brain electrical activity and cerebral blood flow, offering new insights into the underlying NVC mechanisms of WM.


Algorithms , Electroencephalography , Memory, Short-Term , Neurovascular Coupling , Spectroscopy, Near-Infrared , Humans , Electroencephalography/methods , Male , Female , Spectroscopy, Near-Infrared/methods , Adult , Normal Distribution , Neurovascular Coupling/physiology , Young Adult , Memory, Short-Term/physiology , Healthy Volunteers , Reproducibility of Results , Multivariate Analysis , Frontal Lobe/physiology , Frontal Lobe/diagnostic imaging , Brain Mapping/methods , Theta Rhythm/physiology , Brain/physiology , Brain/diagnostic imaging , Brain/blood supply , Nonlinear Dynamics , Delta Rhythm/physiology , Alpha Rhythm/physiology
6.
Biometrics ; 80(2)2024 Mar 27.
Article En | MEDLINE | ID: mdl-38708763

Time-series data collected from a network of random variables are useful for identifying temporal pathways among the network nodes. Observed measurements may contain multiple sources of signals and noises, including Gaussian signals of interest and non-Gaussian noises, including artifacts, structured noise, and other unobserved factors (eg, genetic risk factors, disease susceptibility). Existing methods, including vector autoregression (VAR) and dynamic causal modeling do not account for unobserved non-Gaussian components. Furthermore, existing methods cannot effectively distinguish contemporaneous relationships from temporal relations. In this work, we propose a novel method to identify latent temporal pathways using time-series biomarker data collected from multiple subjects. The model adjusts for the non-Gaussian components and separates the temporal network from the contemporaneous network. Specifically, an independent component analysis (ICA) is used to extract the unobserved non-Gaussian components, and residuals are used to estimate the contemporaneous and temporal networks among the node variables based on method of moments. The algorithm is fast and can easily scale up. We derive the identifiability and the asymptotic properties of the temporal and contemporaneous networks. We demonstrate superior performance of our method by extensive simulations and an application to a study of attention-deficit/hyperactivity disorder (ADHD), where we analyze the temporal relationships between brain regional biomarkers. We find that temporal network edges were across different brain regions, while most contemporaneous network edges were bilateral between the same regions and belong to a subset of the functional connectivity network.


Algorithms , Biomarkers , Computer Simulation , Models, Statistical , Humans , Biomarkers/analysis , Normal Distribution , Attention Deficit Disorder with Hyperactivity , Time Factors , Biometry/methods
7.
Environ Monit Assess ; 196(6): 563, 2024 May 21.
Article En | MEDLINE | ID: mdl-38771410

The greenhouse gas (GHG) emissions inventories in our context result from the production of electricity from fuel oil at the Mbalmayo thermal power plant between 2016 and 2020. Our study area is located in the Central Cameroon region. The empirical method of the second level of industrialisation was applied to estimate GHG emissions and the application of the genetic algorithm-Gaussian (GA-Gaussian) coupling method was used to optimise the estimation of GHG emissions. Our work is of an experimental nature and aims to estimate the quantities of GHG produced by the Mbalmayo thermal power plant during its operation. The search for the best objective function using genetic algorithms is designed to bring us closer to the best concentration, and the Gaussian model is used to estimate the concentration level. The results obtained show that the average monthly emissions in kilograms (kg) of GHGs from the Mbalmayo thermal power plant are: 526 kg for carbon dioxide (CO2), 971.41 kg for methane (CH4) and 309.41 kg for nitrous oxide (N2O), for an average monthly production of 6058.12 kWh of energy. Evaluation of the stack height shows that increasing the stack height helps to reduce local GHG concentrations. According to the Cameroonian standards published in 2021, the limit concentrations of GHGs remain below 30 mg/m3 for CO2 and 200 µg/m3 for N2O, while for CH4 we reach the limit value of 60 µg/m3. These results will enable the authorities to take appropriate measures to reduce GHG concentrations.


Air Pollutants , Algorithms , Environmental Monitoring , Greenhouse Gases , Methane , Power Plants , Greenhouse Gases/analysis , Environmental Monitoring/methods , Air Pollutants/analysis , Cameroon , Methane/analysis , Carbon Dioxide/analysis , Nitrous Oxide/analysis , Air Pollution/statistics & numerical data , Normal Distribution
8.
Article En | MEDLINE | ID: mdl-38564353

Electroencephalographic (EEG) source imaging (ESI) is a powerful method for studying brain functions and surgical resection of epileptic foci. However, accurately estimating the location and extent of brain sources remains challenging due to noise and background interference in EEG signals. To reconstruct extended brain sources, we propose a new ESI method called Variation Sparse Source Imaging based on Generalized Gaussian Distribution (VSSI-GGD). VSSI-GGD uses the generalized Gaussian prior as a sparse constraint on the spatial variation domain and embeds it into the Bayesian framework for source estimation. Using a variational technique, we approximate the intractable true posterior with a Gaussian density. Through convex analysis, the Bayesian inference problem is transformed entirely into a series of regularized L2p -norm ( ) optimization problems, which are efficiently solved with the ADMM algorithm. Imaging results of numerical simulations and human experimental dataset analysis reveal the superior performance of VSSI-GGD, which provides higher spatial resolution with clear boundaries compared to benchmark algorithms. VSSI-GGD can potentially serve as an effective and robust spatiotemporal EEG source imaging method. The source code of VSSI-GGD is available at https://github.com/Mashirops/VSSI-GGD.git.


Brain , Electroencephalography , Humans , Bayes Theorem , Normal Distribution , Electroencephalography/methods , Brain/diagnostic imaging , Brain Mapping/methods , Algorithms , Magnetoencephalography/methods
9.
J Neural Eng ; 21(2)2024 Apr 09.
Article En | MEDLINE | ID: mdl-38592090

Objective.The extended infomax algorithm for independent component analysis (ICA) can separate sub- and super-Gaussian signals but converges slowly as it uses stochastic gradient optimization. In this paper, an improved extended infomax algorithm is presented that converges much faster.Approach.Accelerated convergence is achieved by replacing the natural gradient learning rule of extended infomax by a fully-multiplicative orthogonal-group based update scheme of the ICA unmixing matrix, leading to an orthogonal extended infomax algorithm (OgExtInf). The computational performance of OgExtInf was compared with original extended infomax and with two fast ICA algorithms: the popular FastICA and Picard, a preconditioned limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm belonging to the family of quasi-Newton methods.Main results.OgExtInf converges much faster than original extended infomax. For small-size electroencephalogram (EEG) data segments, as used for example in online EEG processing, OgExtInf is also faster than FastICA and Picard.Significance.OgExtInf may be useful for fast and reliable ICA, e.g. in online systems for epileptic spike and seizure detection or brain-computer interfaces.


Algorithms , Brain-Computer Interfaces , Electroencephalography , Learning , Normal Distribution
10.
PLoS One ; 19(4): e0300688, 2024.
Article En | MEDLINE | ID: mdl-38652734

Despite their widespread use as therapeutics, clinical development of small molecule drugs remains challenging. Among the many parameters that undergo optimization during the drug development process, increasing passive cell permeability (i.e., log(P)) can have some of the largest impact on potency. Cyclic peptides (CPs) have emerged as a viable alternative to small molecules, as they retain many of the advantages of small molecules (oral availability, target specificity) while being highly effective at traversing the plasma membrane. However, the relationship between the dominant conformations that typify CPs in an aqueous versus a membrane environment and cell permeability remain poorly characterized. In this study, we have used Gaussian accelerated molecular dynamics (GaMD) simulations to characterize the effect of solvent on the free energy landscape of lariat peptides, a subset of CPs that have recently shown potential for drug development (Kelly et al., JACS 2021). Differences in the free energy of lariat peptides as a function of solvent can be used to predict permeability of these molecules, and our results show that permeability is most greatly influenced by N-methylation and exposure to solvent. Our approach lays the groundwork for using GaMD as a way to virtually screen large libraries of CPs and drive forward development of CP-based therapeutics.


Molecular Dynamics Simulation , Peptides, Cyclic , Peptides, Cyclic/chemistry , Peptides, Cyclic/metabolism , Solvents/chemistry , Cell Membrane Permeability , Permeability , Thermodynamics , Normal Distribution
11.
PLoS One ; 19(4): e0298467, 2024.
Article En | MEDLINE | ID: mdl-38630677

The giant honeybee Apis dorsata (Fabricius, 1793) is an evolutionarily ancient species that builds its nests in the open. The nest consists of a single honeycomb covered with the bee curtain which are several layers of worker bees that remain almost motionless with their heads up and abdomens down on the nest surface, except for the mouth area, the hub between inner- and outer-nest activities. A colony may change this semi-quiescence several times a day, depending on its reproductive state and ambient temperature, to enter the state of mass flight activity (MFA), in which nest organisation is restructured and defense ability is likely to be suppressed (predicted by the mass-flight-suspend-defensiveness hypothesis). For this study, three episode of MFA (mfa1-3) of a selected experimental nest were analysed in a case study with sequences of >60 000 images at 50 Hz, each comprise a short pre-MFA session, the MFA and the post-MFA phase of further 10 min. To test colony defensiveness under normative conditions, a dummy wasp was cyclically presented with a standardised motion programme (Pd) with intervening sessions without such a presentation (nPd). Motion activity at five selected surveillance zones (sz1-5) on the nest were analysed. In contrast to mfa1,2, in mfa3 the experimental regime started with the cyclic presentation of the dummy wasp only after the MFA had subsided. As a result, the MFA intensity in mfa3 was significantly lower than in mfa1-2, suggesting that a colony is able to perceive external threats during the MFA. Characteristic ripples appear in the motion profiles, which can be interpreted as a start signal for the transition to MFA. Because they are strongest in the mouth zone and shift to higher frequencies on their way to the nest periphery, it can be concluded that MFA starts earlier in the mouth zone than in the peripheral zones, also suggesting that the mouth zone is a control centre for the scheduling of MFA. In Pd phases of pre- and postMFA, the histogram-based motion spectra are biphasic, suggesting two cohorts in the process, one remaining at quiescence and the other involved in shimmering. Under MFA, nPd and Pd spectra were typically Gaussian, suggesting that the nest mates with a uniform workload shifted to higher motion activity. At the end of the MFA, the spectra shift back to the lower motion activities and the Pd spectra form a biphasic again. This happens a few minutes earlier in the peripheral zones than in the mouth zone. Using time profiles of the skewness of the Pd motion spectra, the mass-flight-suspend-defensiveness hypothesis is confirmed, whereby the inhibition of defense ability was found to increase progressively during the MFA. These sawtooth-like time profiles of skewness during MFA show that defense capability is recovered again quite quickly at the end of MFA. Finally, with the help of the Pd motion spectra, clear indications can be obtained that the giant honeybees engage in a decision in the sense of a tradeoff between MFA and collective defensiveness, especially in the regions in the periphery to the mouth zone.


Porifera , Wasps , Bees , Animals , Motion , Wasps/physiology , Normal Distribution , Bedding and Linens
12.
Neural Netw ; 175: 106281, 2024 Jul.
Article En | MEDLINE | ID: mdl-38579573

Due to distribution shift, deep learning based methods for image dehazing suffer from performance degradation when applied to real-world hazy images. In this paper, this study considers a dehazing framework based on conditional diffusion models for improved generalization to real haze. First, our work finds that optimizing the training objective of diffusion models, i.e., Gaussian noise vectors, is non-trivial. The spectral bias of deep networks hinders the higher frequency modes in Gaussian vectors from being learned and hence impairs the reconstruction of image details. To tackle this issue, this study designs a network unit, named Frequency Compensation block (FCB), with a bank of filters that jointly emphasize the mid-to-high frequencies of an input signal. Our work demonstrates that diffusion models with FCB achieve significant gains in both perceptual and distortion metrics. Second, to further boost the generalization performance, this study proposed a novel data synthesis pipeline, HazeAug, to augment haze in terms of degree and diversity. Within the framework, a solid baseline for blind dehazing is set up where models are trained on synthetic hazy-clean pairs, and directly generalize to real data. Extensive evaluations on real dehazing datasets demonstrate the superior performance of the proposed dehazing diffusion model in distortion metrics. Compared to recent methods pre-trained on large-scale, high-quality image datasets, our model achieves a significant PSNR improvement of over 1 dB on challenging databases such as Dense-Haze and Nh-Haze.


Deep Learning , Neural Networks, Computer , Image Processing, Computer-Assisted/methods , Humans , Algorithms , Normal Distribution
13.
Comput Biol Med ; 175: 108437, 2024 Jun.
Article En | MEDLINE | ID: mdl-38669732

Gastric cancer (GC), characterized by its inconspicuous initial symptoms and rapid invasiveness, presents a formidable challenge. Overlooking postoperative intervention opportunities may result in the dissemination of tumors to adjacent areas and distant organs, thereby substantially diminishing prospects for patient survival. Consequently, the prompt recognition and management of GC postoperative recurrence emerge as a matter of paramount urgency to mitigate the deleterious implications of the ailment. This study proposes an enhanced feature selection model, bRSPSO-FKNN, integrating boosted particle swarm optimization (RSPSO) with fuzzy k-nearest neighbor (FKNN), for predicting GC. It incorporates the Runge-Kutta search, for improved model accuracy, and Gaussian sampling, enhancing the search performance and helping to avoid locally optimal solutions. It outperforms the sophisticated variants of particle swarm optimization when evaluated in the CEC 2014 test suite. Furthermore, the bRSPSO-FKNN feature selection model was introduced for GC recurrence prediction analysis, achieving up to 82.082 % and 86.185 % accuracy and specificity, respectively. In summation, this model attains a notable level of precision, poised to ameliorate the early warning system for GC recurrence and, in turn, advance therapeutic options for afflicted patients.


Neoplasm Recurrence, Local , Stomach Neoplasms , Stomach Neoplasms/pathology , Humans , Algorithms , Normal Distribution
14.
Comput Med Imaging Graph ; 115: 102372, 2024 Jul.
Article En | MEDLINE | ID: mdl-38581959

PURPOSE: To investigate the feasibility of a deep learning algorithm combining variational autoencoder (VAE) and two-dimensional (2D) convolutional neural networks (CNN) for automatically quantifying hard tissue presence and morphology in multi-contrast magnetic resonance (MR) images of peripheral arterial disease (PAD) occlusive lesions. METHODS: Multi-contrast MR images (T2-weighted and ultrashort echo time) were acquired from lesions harvested from six amputated legs with high isotropic spatial resolution (0.078 mm and 0.156 mm, respectively) at 9.4 T. A total of 4014 pseudo-color combined images were generated, with 75% used to train a VAE employing custom 2D CNN layers. A Gaussian mixture model (GMM) was employed to classify the latent space data into four tissue classes: I) concentric calcified (c), II) eccentric calcified (e), III) occluded with hard tissue (h) and IV) occluded with soft tissue (s). Test image probabilities, encoded by the trained VAE were used to evaluate model performance. RESULTS: GMM component classification probabilities ranged from 0.92 to 0.97 for class (c), 1.00 for class (e), 0.82-0.95 for class (h) and 0.56-0.93 for the remaining class (s). Due to the complexity of soft-tissue lesions reflected in the heterogeneity of the pseudo-color images, more GMM components (n=17) were attributed to class (s), compared to the other three (c, e and h) (n=6). CONCLUSION: Combination of 2D CNN VAE and GMM achieves high classification probabilities for hard tissue-containing lesions. Automatic recognition of these classes may aid therapeutic decision-making and identifying uncrossable lesions prior to endovascular intervention.


Feasibility Studies , Magnetic Resonance Imaging , Peripheral Arterial Disease , Humans , Peripheral Arterial Disease/diagnostic imaging , Magnetic Resonance Imaging/methods , Normal Distribution , Algorithms , Neural Networks, Computer , Deep Learning
15.
Environ Sci Pollut Res Int ; 31(22): 32784-32799, 2024 May.
Article En | MEDLINE | ID: mdl-38662293

The precise assessment of a water body's eutrophication status is essential for making informed decisions in water environment management. However, conventional approaches frequently fail to consider the randomness, fuzziness, and inherent hidden information of water quality indicators. These would result in an unreliable assessment. An enhanced method was proposed for the eutrophication assessment under uncertainty in this study. The multi-dimension gaussian cloud distribution was introduced to capture the randomness and fuzziness. The Shannon entropy based on various sample size and trophic levels was proposed to maximize valuable information hidden in the datasets. Twenty-seven significant lakes and reservoirs located in the Yangtze River Basin were selected to demonstrate the proposed method. The sensitivity and consistency were used to evaluate the accuracy of the proposed method. Results indicate that the proposed method has the capability to effectively assess the eutrophication status of lakes and reservoirs under uncertainty and that it has a better sensitivity since it can identify more than 33-50% trophic levels compared to the traditional methods. Further scenario experiments analysis revealed that the sample information richness, i.e., sample size and the number of trophic levels is of great significance to the accuracy/robustness of the method. Moreover, a sample size of 60 can offer the most favorable balance between accuracy/robustness and the monitoring expenses. These findings are crucial to optimizing the eutrophication assessment.


Environmental Monitoring , Eutrophication , Lakes , Environmental Monitoring/methods , Uncertainty , Normal Distribution , China , Rivers/chemistry
16.
J Chem Inf Model ; 64(8): 3059-3079, 2024 Apr 22.
Article En | MEDLINE | ID: mdl-38498942

Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.


Molecular Dynamics Simulation , Unsupervised Machine Learning , Normal Distribution , Automation
17.
Stat Methods Med Res ; 33(3): 449-464, 2024 Mar.
Article En | MEDLINE | ID: mdl-38511638

Motivated by measurement errors in radiographic diagnosis of osteoarthritis, we propose a Bayesian approach to identify latent classes in a model with continuous response subject to a monotonic, that is, non-decreasing or non-increasing, process with measurement error. A latent class linear mixed model has been introduced to consider measurement error while the monotonic process is accounted for via truncated normal distributions. The main purpose is to classify the response trajectories through the latent classes to better describe the disease progression within homogeneous subpopulations.


Bayes Theorem , Latent Class Analysis , Normal Distribution
18.
Biometrics ; 80(1)2024 Jan 29.
Article En | MEDLINE | ID: mdl-38497826

Multiple testing has been a prominent topic in statistical research. Despite extensive work in this area, controlling false discoveries remains a challenging task, especially when the test statistics exhibit dependence. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependencies among the test statistics. One key approach is to transform arbitrary dependence into weak dependence and subsequently establish the strong consistency of FDP and false discovery rate under weak dependence. As a result, FDPs converge to the same asymptotic limit within the framework of weak dependence. However, we have observed that the asymptotic variance of FDP can be significantly influenced by the dependence structure of the test statistics, even when they exhibit only weak dependence. Quantifying this variability is of great practical importance, as it serves as an indicator of the quality of FDP estimation from the data. To the best of our knowledge, there is limited research on this aspect in the literature. In this paper, we aim to fill in this gap by quantifying the variation of FDP, assuming that the test statistics exhibit weak dependence and follow normal distributions. We begin by deriving the asymptotic expansion of the FDP and subsequently investigate how the asymptotic variance of the FDP is influenced by different dependence structures. Based on the insights gained from this study, we recommend that in multiple testing procedures utilizing FDP, reporting both the mean and variance estimates of FDP can provide a more comprehensive assessment of the study's outcomes.


Uncertainty , Normal Distribution
19.
J Neural Eng ; 21(2)2024 May 02.
Article En | MEDLINE | ID: mdl-38513289

The detection of events in time-series data is a common signal-processing problem. When the data can be modeled as a known template signal with an unknown delay in Gaussian noise, detection of the template signal can be done with a traditional matched filter. However, in many applications, the event of interest is represented in multimodal data consisting of both Gaussian and point-process time series. Neuroscience experiments, for example, can simultaneously record multimodal neural signals such as local field potentials (LFPs), which can be modeled as Gaussian, and neuronal spikes, which can be modeled as point processes. Currently, no method exists for event detection from such multimodal data, and as such our objective in this work is to develop a method to meet this need. Here we address this challenge by developing the multimodal event detector (MED) algorithm which simultaneously estimates event times and classes. To do this, we write a multimodal likelihood function for Gaussian and point-process observations and derive the associated maximum likelihood estimator of simultaneous event times and classes. We additionally introduce a cross-modal scaling parameter to account for model mismatch in real datasets. We validate this method in extensive simulations as well as in a neural spike-LFP dataset recorded during an eye-movement task, where the events of interest are eye movements with unknown times and directions. We show that the MED can successfully detect eye movement onset and classify eye movement direction. Further, the MED successfully combines information across data modalities, with multimodal performance exceeding unimodal performance. This method can facilitate applications such as the discovery of latent events in multimodal neural population activity and the development of brain-computer interfaces for naturalistic settings without constrained tasks or prior knowledge of event times.


Algorithms , Neurons/physiology , Normal Distribution , Animals , Models, Neurological , Action Potentials/physiology , Computer Simulation , Humans
20.
Graefes Arch Clin Exp Ophthalmol ; 262(6): 1819-1828, 2024 Jun.
Article En | MEDLINE | ID: mdl-38446204

PURPOSE: The aim of this study is to investigate the distribution of spherical equivalent and axial length in the general population and to analyze the influence of education on spherical equivalent with a focus on ocular biometric parameters. METHODS: The Gutenberg Health Study is a population-based cohort study in Mainz, Germany. Participants underwent comprehensive ophthalmologic examinations as part of the 5-year follow-up examination in 2012-2017 including genotyping. The spherical equivalent and axial length distributions were modeled with gaussian mixture models. Regression analysis (on person-individual level) was performed to analyze associations between biometric parameters and educational factors. Mendelian randomization analysis explored the causal effect between spherical equivalent, axial length, and education. Additionally, effect mediation analysis examined the link between spherical equivalent and education. RESULTS: A total of 8532 study participants were included (median age: 57 years, 49% female). The distribution of spherical equivalent and axial length follows a bi-Gaussian function, partially explained by the length of education (i.e., < 11 years education vs. 11-20 years). Mendelian randomization indicated an effect of education on refractive error using a genetic risk score of education as an instrument variable (- 0.35 diopters per SD increase in the instrument, 95% CI, - 0.64-0.05, p = 0.02) and an effect of education on axial length (0.63 mm per SD increase in the instrument, 95% CI, 0.22-1.04, p = 0.003). Spherical equivalent, axial length and anterior chamber depth were associated with length of education in regression analyses. Mediation analysis revealed that the association between spherical equivalent and education is mainly driven (70%) by alteration in axial length. CONCLUSIONS: The distribution of axial length and spherical equivalent is represented by subgroups of the population (bi-Gaussian). This distribution can be partially explained by length of education. The impact of education on spherical equivalent is mainly driven by alteration in axial length.


Axial Length, Eye , Educational Status , Humans , Female , Male , Middle Aged , Germany/epidemiology , Axial Length, Eye/pathology , Normal Distribution , Biometry/methods , Refraction, Ocular/physiology , Follow-Up Studies , Refractive Errors/physiopathology , Refractive Errors/diagnosis , Refractive Errors/genetics , Aged , Adult
...