Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
BMC Public Health ; 23(1): 359, 2023 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-36803324

RESUMO

BACKGROUND: The spread of the COVID-19 (SARS-CoV-2) and the surging number of cases across the United States have resulted in full hospitals and exhausted health care workers. Limited availability and questionable reliability of the data make outbreak prediction and resource planning difficult. Any estimates or forecasts are subject to high uncertainty and low accuracy to measure such components. The aim of this study is to apply, automate, and assess a Bayesian time series model for the real-time estimation and forecasting of COVID-19 cases and number of hospitalizations in Wisconsin healthcare emergency readiness coalition (HERC) regions. METHODS: This study makes use of the publicly available Wisconsin COVID-19 historical data by county. Cases and effective time-varying reproduction number [Formula: see text] by the HERC region over time are estimated using Bayesian latent variable models. Hospitalizations are estimated by the HERC region over time using a Bayesian regression model. Cases, effective Rt, and hospitalizations are forecasted over a 1-day, 3-day, and 7-day time horizon using the last 28 days of data, and the 20%, 50%, and 90% Bayesian credible intervals of the forecasts are calculated. The frequentist coverage probability is compared to the Bayesian credible level to evaluate performance. RESULTS: For cases and effective [Formula: see text], all three time horizons outperform the three credible levels of the forecast. For hospitalizations, all three time horizons outperform the 20% and 50% credible intervals of the forecast. On the contrary, the 1-day and 3-day periods underperform the 90% credible intervals. Questions about uncertainty quantification should be re-calculated using the frequentist coverage probability of the Bayesian credible interval based on observed data for all three metrics. CONCLUSIONS: We present an approach to automate the real-time estimation and forecasting of cases and hospitalizations and corresponding uncertainty using publicly available data. The models were able to infer short-term trends consistent with reported values at the HERC region level. Additionally, the models were able to accurately forecast and estimate the uncertainty of the measurements. This study can help identify the most affected regions and major outbreaks in the near future. The workflow can be adapted to other geographic regions, states, and even countries where decision-making processes are supported in real-time by the proposed modeling system.


Assuntos
COVID-19 , Humanos , Estados Unidos , COVID-19/epidemiologia , SARS-CoV-2 , Saúde Pública , Teorema de Bayes , Wisconsin/epidemiologia , Reprodutibilidade dos Testes , Previsões , Incerteza , Hospitalização
2.
Sensors (Basel) ; 22(15)2022 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-35898047

RESUMO

Predicting not only the target but also an accurate measure of uncertainty is important for many machine learning applications, and in particular, safety-critical ones. In this work, we study the calibration of uncertainty prediction for regression tasks which often arise in real-world systems. We show that the existing definition for the calibration of regression uncertainty has severe limitations in distinguishing informative from non-informative uncertainty predictions. We propose a new definition that escapes this caveat and an evaluation method using a simple histogram-based approach. Our method clusters examples with similar uncertainty prediction and compares the prediction with the empirical uncertainty on these examples. We also propose a simple, scaling-based calibration method that preforms as well as much more complex ones. We show results on both a synthetic, controlled problem and on the object detection bounding-box regression task using the COCO and KITTI datasets.

3.
BMC Infect Dis ; 21(1): 533, 2021 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-34098885

RESUMO

BACKGROUND: Many popular disease transmission models have helped nations respond to the COVID-19 pandemic by informing decisions about pandemic planning, resource allocation, implementation of social distancing measures, lockdowns, and other non-pharmaceutical interventions. We study how five epidemiological models forecast and assess the course of the pandemic in India: a baseline curve-fitting model, an extended SIR (eSIR) model, two extended SEIR (SAPHIRE and SEIR-fansy) models, and a semi-mechanistic Bayesian hierarchical model (ICM). METHODS: Using COVID-19 case-recovery-death count data reported in India from March 15 to October 15 to train the models, we generate predictions from each of the five models from October 16 to December 31. To compare prediction accuracy with respect to reported cumulative and active case counts and reported cumulative death counts, we compute the symmetric mean absolute prediction error (SMAPE) for each of the five models. For reported cumulative cases and deaths, we compute Pearson's and Lin's correlation coefficients to investigate how well the projected and observed reported counts agree. We also present underreporting factors when available, and comment on uncertainty of projections from each model. RESULTS: For active case counts, SMAPE values are 35.14% (SEIR-fansy) and 37.96% (eSIR). For cumulative case counts, SMAPE values are 6.89% (baseline), 6.59% (eSIR), 2.25% (SAPHIRE) and 2.29% (SEIR-fansy). For cumulative death counts, the SMAPE values are 4.74% (SEIR-fansy), 8.94% (eSIR) and 0.77% (ICM). Three models (SAPHIRE, SEIR-fansy and ICM) return total (sum of reported and unreported) cumulative case counts as well. We compute underreporting factors as of October 31 and note that for cumulative cases, the SEIR-fansy model yields an underreporting factor of 7.25 and ICM model yields 4.54 for the same quantity. For total (sum of reported and unreported) cumulative deaths the SEIR-fansy model reports an underreporting factor of 2.97. On October 31, we observe 8.18 million cumulative reported cases, while the projections (in millions) from the baseline model are 8.71 (95% credible interval: 8.63-8.80), while eSIR yields 8.35 (7.19-9.60), SAPHIRE returns 8.17 (7.90-8.52) and SEIR-fansy projects 8.51 (8.18-8.85) million cases. Cumulative case projections from the eSIR model have the highest uncertainty in terms of width of 95% credible intervals, followed by those from SAPHIRE, the baseline model and finally SEIR-fansy. CONCLUSIONS: In this comparative paper, we describe five different models used to study the transmission dynamics of the SARS-Cov-2 virus in India. While simulation studies are the only gold standard way to compare the accuracy of the models, here we were uniquely poised to compare the projected case-counts against observed data on a test period. The largest variability across models is observed in predicting the "total" number of infections including reported and unreported cases (on which we have no validation data). The degree of under-reporting has been a major concern in India and is characterized in this report. Overall, the SEIR-fansy model appeared to be a good choice with publicly available R-package and desired flexibility plus accuracy.


Assuntos
COVID-19/epidemiologia , COVID-19/transmissão , Pandemias , Teorema de Bayes , Controle de Doenças Transmissíveis/métodos , Simulação por Computador , Previsões , Humanos , Índia/epidemiologia , Modelos Estatísticos
4.
J Environ Manage ; 289: 112509, 2021 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-33836439

RESUMO

Rural land valuation plays an important role in the development of land use policies for agricultural purposes. The advance of computational software and machine learning methods has enhanced mass appraisal methodologies for modeling and predicting economic values. New machine learning methods, like tree-based regression models, have been proposed as an alternative to linear regression to predict economic values from ancillary variables, since these algorithms are able to handle non-normality and non-linearity in the data. However, regression trees are commonly estimated assuming independent rather than spatially correlated data. This study aims to build a tree-based regression model that will help to tackle methodological problems related to the determination of prices of rural lands. The Quantile Regression Forest (QRF) algorithm was used to provide a regression model to predict and assess the uncertainty associated with model-derived predictions. However, the classical QRF ignores the autocorrelation underlying spatialized land values. The objective of this work was to develop, implement, and evaluate a spatial version of QRF, named sQRF, for computer-assisted mass appraisal of rural land values accounting for information from neighboring sites. We compared predictions of land values from sQRF with those obtained from spatial random forest, kriging regression, and linear regression models. sQRF performed well in predicting rural land values; indeed, it performed better than multiple linear regression. An important feature of sQRF is its ability to produce a direct uncertainty measure to assess the goodness of the predictions. Land values reflect a complex mix of agricultural returns, localization, and access to markets, which can be predicted from ancillary environmental variables. Good predictive models are essential to determine land values for multiple purposes including territorial taxation.


Assuntos
Monitoramento Ambiental , Aprendizado de Máquina , Algoritmos , Modelos Lineares , Análise Espacial
5.
Molecules ; 25(6)2020 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-32210186

RESUMO

A great variety of computational approaches support drug design processes, helping in selection of new potentially active compounds, and optimization of their physicochemical and ADMET properties. Machine learning is a group of methods that are able to evaluate in relatively short time enormous amounts of data. However, the quality of machine-learning-based prediction depends on the data supplied for model training. In this study, we used deep neural networks for the task of compound activity prediction and developed dropout-based approaches for estimating prediction uncertainty. Several types of analyses were performed: the relationships between the prediction error, similarity to the training set, prediction uncertainty, number and standard deviation of activity values were examined. It was tested whether incorporation of information about prediction uncertainty influences compounds ranking based on predicted activity and prediction uncertainty was used to search for the potential errors in the ChEMBL database. The obtained outcome indicates that incorporation of information about uncertainty of compound activity prediction can be of great help during virtual screening experiments.


Assuntos
Bases de Dados de Compostos Químicos , Aprendizado Profundo , Desenho de Fármacos , Descoberta de Drogas , Modelos Químicos
6.
Food Microbiol ; 76: 504-512, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30166180

RESUMO

Building secondary models that describe the growth rate as a function of multiple environmental conditions is often very labour intensive and costly. As such, the current research aims to assist in decreasing the required experimental effort by studying the efficacy of both design of experiments (DOE) and optimal experimental designs (OED) techniques. This is the first research in predictive microbiology (i) to make a comparison of these techniques based on the (relative) model prediction uncertainty of the obtained models and (ii) to compare OED criteria for the design of experiments with static (instead of dynamic) environmental conditions. A comparison of the DOE techniques demonstrated that the inscribed central composite design and full factorial design were most suitable. Five conventional and two tailor made OED criteria were tested. The commonly used D-criterion performed best out of the conventional designs and almost equally well as the best of the dedicated criteria. Moreover, the modelling results of the D-criterion were less dependent on the experimental variability and differences in the microbial response than the two selected DOE techniques. Finally, it was proven that solving the optimisation of the D-criterion can be made more efficient by considering the sensitivities of the growth rate relative to its value as Jacobian matrix instead of the sensitivities of the cell density measurements.


Assuntos
Bactérias/crescimento & desenvolvimento , Projetos de Pesquisa , Bactérias/química , Cinética , Modelos Biológicos
7.
Stat Med ; 35(12): 2016-30, 2016 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-26712471

RESUMO

Prediction of an outcome for a given unit based on prediction models built on a training sample plays a major role in many research areas. The uncertainty of the prediction is predominantly characterized by the subject sampling variation in current practice, where prediction models built on hypothetically re-sampled units yield variable predictions for the same unit of interest. It is almost always true that the predictors used to build prediction models are simply a subset of the entirety of factors related to the outcome. Following the frequentist principle, we can account for the variation because of hypothetically re-sampled predictors used to build the prediction models. This is particularly important in medicine where the prediction has important and sometime life-death consequences on a patient's health status. In this article, we discuss some rationale along this line in the context of medicine. We propose a simple approach to estimate the standard error of the prediction that accounts for the variation because of sampling both subjects and predictors under logistic and Cox regression models. A simulation study is presented to support our argument and demonstrate the performance of our method. The concept and method are applied to a real data set. Copyright © 2015 John Wiley & Sons, Ltd.


Assuntos
Estudos de Amostragem , Incerteza , Pesquisa Biomédica/métodos , Previsões , Humanos , Funções Verossimilhança , Modelos Estatísticos , Modelos Teóricos , Probabilidade , Modelos de Riscos Proporcionais , Projetos de Pesquisa
8.
Glob Chang Biol ; 21(3): 1328-41, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25294087

RESUMO

Predicting rice (Oryza sativa) productivity under future climates is important for global food security. Ecophysiological crop models in combination with climate model outputs are commonly used in yield prediction, but uncertainties associated with crop models remain largely unquantified. We evaluated 13 rice models against multi-year experimental yield data at four sites with diverse climatic conditions in Asia and examined whether different modeling approaches on major physiological processes attribute to the uncertainties of prediction to field measured yields and to the uncertainties of sensitivity to changes in temperature and CO2 concentration [CO2 ]. We also examined whether a use of an ensemble of crop models can reduce the uncertainties. Individual models did not consistently reproduce both experimental and regional yields well, and uncertainty was larger at the warmest and coolest sites. The variation in yield projections was larger among crop models than variation resulting from 16 global climate model-based scenarios. However, the mean of predictions of all crop models reproduced experimental data, with an uncertainty of less than 10% of measured yields. Using an ensemble of eight models calibrated only for phenology or five models calibrated in detail resulted in the uncertainty equivalent to that of the measured yield in well-controlled agronomic field experiments. Sensitivity analysis indicates the necessity to improve the accuracy in predicting both biomass and harvest index in response to increasing [CO2 ] and temperature.


Assuntos
Agricultura , Clima , Modelos Teóricos , Oryza/crescimento & desenvolvimento , Ásia , Abastecimento de Alimentos , Sensibilidade e Especificidade , Incerteza
9.
Environ Int ; 188: 108764, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38788418

RESUMO

A strong need exists for broadly applicable nano-QSARs, capable of predicting toxicological outcomes towards untested species and nanomaterials, under different environmental conditions. Existing nano-QSARs are generally limited to only a few species but the inclusion of species characteristics into models can aid in making them applicable to multiple species, even when toxicity data is not available for biological species. Species traits were used to create classification- and regression machine learning models to predict acute toxicity towards aquatic species for metallic nanomaterials. Afterwards, the individual classification- and regression models were stacked into a meta-model to improve performance. Additionally, the uncertainty and limitations of the models were assessed in detail (beyond the OECD principles) and it was investigated whether models would benefit from the addition of more data. Results showed a significant improvement in model performance following model stacking. Investigation of model uncertainties and limitations highlighted the discrepancy between the applicability domain and accuracy of predictions. Data points outside of the assessed chemical space did not have higher likelihoods of generating inadequate predictions or vice versa. It is therefore concluded that the applicability domain does not give complete insight into the uncertainty of predictions and instead the generation of prediction intervals can help in this regard. Furthermore, results indicated that an increase of the dataset size did not improve model performance. This implies that larger dataset sizes may not necessarily improve model performance while in turn also meaning that large datasets are not necessarily required for prediction of acute toxicity with nano-QSARs.


Assuntos
Relação Quantitativa Estrutura-Atividade , Incerteza , Nanoestruturas/toxicidade , Animais , Aprendizado de Máquina , Organismos Aquáticos/efeitos dos fármacos
10.
Sci Total Environ ; 919: 170972, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38360318

RESUMO

Assessment and proper management of sites contaminated with heavy metals require precise information on the spatial distribution of these metals. This study aimed to predict and map the distribution of Cd, Cu, Ni, Pb, and Zn across the conterminous USA using point observations, environmental variables, and Histogram-based Gradient Boosting (HGB) modeling. Over 9180 surficial soil observations from the Soil Geochemistry Spatial Database (SGSD) (n = 1150), the Geochemical and Mineralogical Survey of Soils (GMSS) (n = 4857), and the Holmgren Dataset (HD) (n = 3400), and 28 covariates (100 m × 100 m grid) representing climate, topography, vegetation, soils, and anthropic activity were compiled. Model performance was evaluated on 20 % of the data not used in calibration using the coefficient of determination (R2), concordance correlation coefficient (ρc), and root mean square error (RMSE) indices. Uncertainty of predictions was calculated as the difference between the estimated 95 and 5 % quantiles provided by HGB. The model explained up to 50 % of the variance in the data with RMSE ranging between 0.16 (mg kg-1) for Cu and 23.4 (mg kg-1) for Zn, respectively. Likewise, ρc ranged between 0.55 (Cu) and 0.68 (Zn), respectively, and Zn had the highest R2 (0.50) among all predictions. We observed high Pb concentrations near urban areas. Peak concentrations of all studied metals were found in the Lower Mississippi River Valley. Cu, Ni, and Zn concentrations were higher on the West Coast; Cd concentrations were higher in the central USA. Clay, pH, potential evapotranspiration, temperature, and precipitation were among the model's top five important covariates for spatial predictions of heavy metals. The combined use of point observations and environmental covariates coupled with machine learning provided a reliable prediction of heavy metals distribution in the soils of the conterminous USA. The updated maps could support environmental assessments, monitoring, and decision-making with this methodology applicable to other soil databases, worldwide.

11.
J Cheminform ; 16(1): 65, 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38816859

RESUMO

This study describes the development and evaluation of six new models for predicting physical-chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure-Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for "novel chemicals" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7-1.8 for log VP and log SW. Scientific contributionNew partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.

12.
Diagnostics (Basel) ; 12(6)2022 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-35741303

RESUMO

CNN-based image processing has been actively applied to histopathological analysis to detect and classify cancerous tumors automatically. However, CNN-based classifiers generally predict a label with overconfidence, which becomes a serious problem in the medical domain. The objective of this study is to propose a new training method, called MixPatch, designed to improve a CNN-based classifier by specifically addressing the prediction uncertainty problem and examine its effectiveness in improving diagnosis performance in the context of histopathological image analysis. MixPatch generates and uses a new sub-training dataset, which consists of mixed-patches and their predefined ground-truth labels, for every single mini-batch. Mixed-patches are generated using a small size of clean patches confirmed by pathologists while their ground-truth labels are defined using a proportion-based soft labeling method. Our results obtained using a large histopathological image dataset shows that the proposed method performs better and alleviates overconfidence more effectively than any other method examined in the study. More specifically, our model showed 97.06% accuracy, an increase of 1.6% to 12.18%, while achieving 0.76% of expected calibration error, a decrease of 0.6% to 6.3%, over the other models. By specifically considering the mixed-region variation characteristics of histopathology images, MixPatch augments the extant mixed image methods for medical image analysis in which prediction uncertainty is a crucial issue. The proposed method provides a new way to systematically alleviate the overconfidence problem of CNN-based classifiers and improve their prediction accuracy, contributing toward more calibrated and reliable histopathology image analysis.

13.
PeerJ ; 10: e13872, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36032939

RESUMO

Background: Biodiversity varies in space and time, and often in response to environmental heterogeneity. Indicators in the form of local biodiversity measures-such as species richness or abundance-are common tools to capture this variation. The rise of readily available remote sensing data has enabled the characterization of environmental heterogeneity in a globally robust and replicable manner. Based on the assumption that differences in biodiversity measures are generally related to differences in environmental heterogeneity, these data have enabled projections and extrapolations of biodiversity in space and time. However so far little work has been done on quantitatively evaluating if and how accurately local biodiversity measures can be predicted. Methods: Here I combine estimates of biodiversity measures from terrestrial local biodiversity surveys with remotely-sensed data on environmental heterogeneity globally. I then determine through a cross-validation framework how accurately local biodiversity measures can be predicted within ("predictability") and across similar ("transferability") biodiversity surveys. Results: I found that prediction errors can be substantial, with error magnitudes varying between different biodiversity measures, taxonomic groups, sampling techniques and types of environmental heterogeneity characterizations. And although errors associated with model predictability were in many cases relatively low, these results question-particular for transferability-our capability to accurately predict and project local biodiversity measures based on environmental heterogeneity. I make the case that future predictions should be evaluated based on their accuracy and inherent uncertainty, and ecological theories be tested against whether we are able to make accurate predictions from local biodiversity data.


Assuntos
Agricultura , Biodiversidade
14.
J Hazard Mater ; 421: 126746, 2022 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-34388923

RESUMO

Deep convolutional neural network (DCNN) has proved to be a promising tool for identifying organic chemicals of environmental concern. However, the uncertainty associated with DCNN predictions remains to be quantified. The training process contains many random configurations, including dataset segmentation, input sequences, and initial weight, etc. Moreover, the DCNN working mechanism is non-linear and opaque. To increase confidence to use this novel approach, persistent, bioaccumulative, and toxic substances (PBTs) were utilized as representative chemicals of environmental concern to estimate the prediction uncertainty under five distinguished datasets and ten different molecular descriptor (MD) arrangements with 111,852 chemicals and 2424 available MDs. An internal correlation coefficient test indicated that the prediction confidence reached 0.98 when a mean of 50 DCNNs' predictions was used instead of a sing DCNN prediction. A threshold for PBT categorization was determined by considering costs between false-negative and false-positive predictions. As revealed by the guided backpropagation-class activation mapping (GBP-CAM) saliency images, only 12% of all selected MDs were activated by DCNN and influenced decision-making process. However, the activated MDs not only varied among chemical classes but also shifted with different DCNNs. Principal component analysis indicated that 2424 MDs could transform into 370 orthogonal variables. Both results suggest that redundancy exists among selected MDs. Yet, DCNN was found to adapt to redundant data by focusing on the most important information for better prediction performance.


Assuntos
Redes Neurais de Computação , Compostos Orgânicos
15.
Front Mol Biosci ; 9: 800856, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35281278

RESUMO

Dynamic behavior of biological systems is commonly represented by non-linear models such as ordinary differential equations. A frequently encountered task in such systems is the estimation of model parameters based on measurement of biochemical compounds. Non-linear models require special techniques to estimate the uncertainty of the obtained model parameters and predictions, e.g. by exploiting the concept of the profile likelihood. Model parameters with significant uncertainty associated with their estimates hinder the interpretation of model results. Informing these model parameters by optimal experimental design minimizes the additional amount of data and therefore resources required in experiments. However, existing techniques of experimental design either require prior parameter distributions in Bayesian approaches or do not adequately deal with the non-linearity of the system in frequentist approaches. For identification of optimal experimental designs, we propose a two-dimensional profile likelihood approach, providing a design criterion which meaningfully represents the expected parameter uncertainty after measuring data for a specified experimental condition. The described approach is implemented into the open source toolbox Data2Dynamics in Matlab. The applicability of the method is demonstrated on an established systems biology model. For this demonstration, available data has been censored to simulate a setting in which parameters are not yet well determined. After determining the optimal experimental condition from the censored ones, a realistic evaluation was possible by re-introducing the censored data point corresponding to the optimal experimental condition. This provided a validation that our method is feasible in real-world applications. The approach applies to, but is not limited to, models in systems biology.

16.
Methods Mol Biol ; 2390: 1-59, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34731463

RESUMO

Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.


Assuntos
Inteligência Artificial , Desenho de Fármacos
17.
Int J Numer Method Biomed Eng ; 36(10): e3388, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32691507

RESUMO

Patient outcome in trans-aortic valve implantation (TAVI) therapy partly relies on a patient's haemodynamic properties that cannot be determined from current diagnostic methods alone. In this study, we predict changes in haemodynamic parameters (as a part of patient outcome) after valve replacement treatment in aortic stenosis patients. A framework to incorporate uncertainty in patient-specific model predictions for decision support is presented. A 0D lumped parameter model including the left ventricle, a stenotic valve and systemic circulatory system has been developed, based on models published earlier. The unscented Kalman filter (UKF) is used to optimize model input parameters to fit measured data pre-intervention. After optimization, the valve treatment is simulated by significantly reducing valve resistance. Uncertain model parameters are then propagated using a polynomial chaos expansion approach. To test the proposed framework, three in silico test cases are developed with clinically feasible measurements. Quality and availability of simulated measured patient data are decreased in each case. The UKF approach is compared to a Monte Carlo Markov Chain (MCMC) approach, a well-known approach in modelling predictions with uncertainty. Both methods show increased confidence intervals as measurement quality decreases. By considering three in silico test-cases we were able to show that the proposed framework is able to incorporate optimization uncertainty in model predictions and is faster and the MCMC approach, although it is more sensitive to noise in flow measurements. To conclude, this work shows that the proposed framework is ready to be applied to real patient data.


Assuntos
Estenose da Valva Aórtica , Simulação por Computador , Cadeias de Markov , Incerteza , Valva Aórtica/cirurgia , Estenose da Valva Aórtica/cirurgia , Humanos , Resultado do Tratamento
18.
J Chromatogr A ; 1587: 101-110, 2019 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-30579636

RESUMO

Mechanistic modeling of chromatography has been around in academia for decades and has gained increased support in pharmaceutical companies in recent years. Despite the large number of published successful applications, process development in the pharmaceutical industry today still does not fully benefit from a systematic mechanistic model-based approach. The hesitation on the part of industry to systematically apply mechanistic models can often be attributed to the absence of a general approach for determining if a model is qualified to support decision making in process development. In this work a Bayesian framework for the calibration and quality assessment of mechanistic chromatography models is introduced. Bayesian Markov Chain Monte Carlo is used to assess parameter uncertainty by generating samples from the parameter posterior distribution. Once the parameter posterior distribution has been estimated, it can be used to propagate the parameter uncertainty to model predictions, allowing a prediction-based uncertainty assessment of the model. The benefit of this uncertainty assessment is demonstrated using the example of a mechanistic model describing the separation of an antibody from its impurities on a strong cation exchanger. The mechanistic model was calibrated at moderate column load density and used to make extrapolations at high load conditions. Using the Bayesian framework, it could be shown that despite significant parameter uncertainty, the model can extrapolate beyond observed process conditions with high accuracy and is qualified to support process development.


Assuntos
Cromatografia/métodos , Modelos Teóricos , Incerteza , Teorema de Bayes , Calibragem , Humanos , Cadeias de Markov , Método de Monte Carlo
19.
Acta Trop ; 185: 391-399, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29932934

RESUMO

Zika virus, which has been linked to severe congenital abnormalities, is exacerbating global public health problems with its rapid transnational expansion fueled by increased global travel and trade. Suitability mapping of the transmission risk of Zika virus is essential for drafting public health plans and disease control strategies, which are especially important in areas where medical resources are relatively scarce. Predicting the risk of Zika virus outbreak has been studied in recent years, but the published literature rarely includes multiple model comparisons or predictive uncertainty analysis. Here, three relatively popular machine learning models including backward propagation neural network (BPNN), gradient boosting machine (GBM) and random forest (RF) were adopted to map the probability of Zika epidemic outbreak at the global level, pairing high-dimensional multidisciplinary covariate layers with comprehensive location data on recorded Zika virus infection in humans. The results show that the predicted high-risk areas for Zika transmission are concentrated in four regions: Southeastern North America, Eastern South America, Central Africa and Eastern Asia. To evaluate the performance of machine learning models, the 50 modeling processes were conducted based on a training dataset. The BPNN model obtained the highest predictive accuracy with a 10-fold cross-validation area under the curve (AUC) of 0.966 [95% confidence interval (CI) 0.965-0.967], followed by the GBM model (10-fold cross-validation AUC = 0.964[0.963-0.965]) and the RF model (10-fold cross-validation AUC = 0.963[0.962-0.964]). Based on training samples, compared with the BPNN-based model, we find that significant differences (p = 0.0258* and p = 0.0001***, respectively) are observed for prediction accuracies achieved by the GBM and RF models. Importantly, the prediction uncertainty introduced by the selection of absence data was quantified and could provide more accurate fundamental and scientific information for further study on disease transmission prediction and risk assessment.


Assuntos
Aprendizado de Máquina , Infecção por Zika virus/transmissão , Surtos de Doenças/prevenção & controle , Humanos , Redes Neurais de Computação , Risco , Infecção por Zika virus/epidemiologia , Infecção por Zika virus/etiologia
20.
Int J Food Microbiol ; 282: 1-8, 2018 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-29885972

RESUMO

Building mathematical models in predictive microbiology is a data driven science. As such, the experimental data (and its uncertainty) has an influence on the final predictions and even on the calculation of the model prediction uncertainty. Therefore, the current research studies the influence of both the parameter estimation and uncertainty propagation method on the calculation of the model prediction uncertainty. The study is intended as well as a tutorial to uncertainty propagation techniques for researchers in (predictive) microbiology. To this end, an in silico case study was applied in which the effect of temperature on the microbial growth rate was modelled and used to make simulations for a temperature profile that is characterised by variability. The comparison of the parameter estimation methods demonstrated that the one-step method yields more accurate and precise calculations of the model prediction uncertainty than the two-step method. Four uncertainty propagation methods were assessed. The current work assesses the applicability of these techniques by considering the effect of experimental uncertainty and model input uncertainty. The linear approximation was demonstrated not always to provide reliable results. The Monte Carlo method was computationally very intensive, compared to its competitors. Polynomial chaos expansion was computationally efficient and accurate but is relatively complex to implement. Finally, the sigma point method was preferred as it is (i) computationally efficient, (ii) robust with respect to experimental uncertainty and (iii) easily implemented.


Assuntos
Bactérias/crescimento & desenvolvimento , Técnicas Microbiológicas/normas , Algoritmos , Bactérias/química , Simulação por Computador , Modelos Teóricos , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA