Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 240
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38517697

RESUMO

Non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.


Assuntos
Polimorfismo de Nucleotídeo Único , Fatores de Transcrição , Sítios de Ligação/genética , Ligação Proteica/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , DNA/genética
2.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38018909

RESUMO

Model quality evaluation is a crucial part of protein structural biology. How to distinguish high-quality models from low-quality models, and to assess which high-quality models have relatively incorrect regions for improvement, are remain a challenge. More importantly, the quality assessment of multimer models is a hot topic for structure prediction. In this study, we propose GraphCPLMQA, a novel approach for evaluating residue-level model quality that combines graph coupled networks and embeddings from protein language models. The GraphCPLMQA consists of a graph encoding module and a transform-based convolutional decoding module. In encoding module, the underlying relational representations of sequence and high-dimensional geometry structure are extracted by protein language models with Evolutionary Scale Modeling. In decoding module, the mapping connection between structure and quality is inferred by the representations and low-dimensional features. Specifically, the triangular location and residue level contact order features are designed to enhance the association between the local structure and the overall topology. Experimental results demonstrate that GraphCPLMQA using single-sequence embedding achieves the best performance compared with the CASP15 residue-level interface evaluation methods among 9108 models in the local residue interface test set of CASP15 multimers. In CAMEO blind test (20 May 2022 to 13 August 2022), GraphCPLMQA ranked first compared with other servers (https://www.cameo3d.org/quality-estimation). GraphCPLMQA also outperforms state-of-the-art methods on 19, 035 models in CASP13 and CASP14 monomer test set.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Proteínas/química , Idioma
3.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37381618

RESUMO

Although sequencing-based high-throughput chromatin interaction data are widely used to uncover genome-wide three-dimensional chromatin architecture, their sparseness and high signal-noise-ratio greatly restrict the precision of the obtained structural elements. To improve data quality, we here present iEnhance (chromatin interaction data resolution enhancement), a multi-scale spatial projection and encoding network, to predict high-resolution chromatin interaction matrices from low-resolution and noisy input data. Specifically, iEnhance projects the input data into matrix spaces to extract multi-scale global and local feature sets, then hierarchically fused these features by attention mechanism. After that, dense channel encoding and residual channel decoding are used to effectively infer robust chromatin interaction maps. iEnhance outperforms state-of-the-art Hi-C resolution enhancement tools in both visual and quantitative evaluation. Comprehensive analysis shows that unlike other tools, iEnhance can recover both short-range structural elements and long-range interaction patterns precisely. More importantly, iEnhance can be transferred to data enhancement of other tissues or cell lines of unknown resolution. Furthermore, iEnhance performs robustly in enhancement of diverse chromatin interaction data including those from single-cell Hi-C and Micro-C experiments.


Assuntos
Cromatina , Cromossomos , Cromatina/genética , Genoma , Linhagem Celular
4.
Proc Natl Acad Sci U S A ; 119(15): e2113561119, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35394862

RESUMO

Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.


Assuntos
COVID-19 , COVID-19/mortalidade , Confiabilidade dos Dados , Previsões , Humanos , Pandemias , Probabilidade , Saúde Pública/tendências , Estados Unidos/epidemiologia
5.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35848999

RESUMO

Drug-induced liver injury (DILI) is one of the most significant concerns in medical practice but yet it still cannot be fully recapitulated with existing in vivo, in vitro and in silico approaches. To address this challenge, Chen et al. [ 1] developed a deep learning-based DILI prediction model based on chemical structure information alone. The reported model yielded an outstanding prediction performance (i.e. 0.958, 0.976, 0.935, 0.947, 0.926 and 0.913 for AUC, accuracy, recall, precision, F1-score and specificity, respectively, on a test set), far outperforming all publicly available and similar in silico DILI models. This extraordinary model performance is counter-intuitive to what we know about the underlying biology of DILI and the principles and hypothesis behind this type of in silico approach. In this Letter to the Editor, we raise awareness of several issues concerning data curation, model validation and comparison practices, and data and model reproducibility.


Assuntos
Inteligência Artificial , Doença Hepática Induzida por Substâncias e Drogas , Simulação por Computador , Humanos , Modelos Biológicos , Reprodutibilidade dos Testes
6.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38465982

RESUMO

In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.


Assuntos
Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Humanos , Curva ROC , Projetos de Pesquisa , Viés
7.
Diabetes Obes Metab ; 26(2): 663-672, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38073424

RESUMO

AIM: To develop a visual prediction model for gestational diabetes (GD) in pregnant women and to establish an effective and practical tool for clinical application. METHODS: To establish a prediction model, the modelling set included 1756 women enrolled in the Zunyi birth cohort, the internal validation set included 1234 enrolled women, and pregnant women in the Wuhan cohort were included in the external validation set. We established a demographic-lifestyle factor model (DLFM) and a demographic-lifestyle-environmental pollution factor model (DLEFM) based on whether the women were exposed to environmental pollutants. The least absolute shrinkage and selection lasso-logistic regression analyses were used to identify the independent predictors of GD and construct a nomogram for predicting its occurrence. RESULTS: The DLEFM regression analysis showed that a family history of diabetes (odd ratio [OR] 2.28; 95% confidence interval [CI] 1.05-4.71), a history of GD in pregnant women (OR 4.22; 95% CI 1.89-9.41), being overweight or obese before pregnancy (OR 1.71; 95% CI 1.27-2.29), a history of hypertension (OR 2.61; 95% CI 1.41-4.72), sedentary time (h/day) (OR 1.16; 95% CI 1.08-1.24), monobenzyl phthalate (OR 1.95; 95% CI 1.45-2.67) and Q4 mono-ethyl phthalate concentration (OR 1.85; 95% CI 1.26-2.73) were independent predictors. The area under the receiver operating curves for the internal validation of the DLEFM and the DLFM constructed using these seven factors was 0.827 and 0.783, respectively. The calibration curve of the DLEFM was close to the diagonal line. The DLEFM was thus the more optimal model, and the one which we chose. CONCLUSIONS: A nomogram based on preconception factors was constructed to predict the occurrence of GD in the second and third trimesters. It provided an effective tool for the early prediction and timely management of GD.


Assuntos
Diabetes Gestacional , Ácidos Ftálicos , Gravidez , Feminino , Humanos , Diabetes Gestacional/epidemiologia , Estilo de Vida , Calibragem
8.
Eur J Clin Pharmacol ; 80(6): 813-826, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38483544

RESUMO

BACKGROUND AND OBJECTIVES: Despite being clinically utilized for the treatment of infections, the limited therapeutic range of polymyxin B (PMB), along with considerable interpatient variability in its pharmacokinetics and frequent occurrence of acute kidney injury, has significantly hindered its widespread utilization. Recent research on the population pharmacokinetics of PMB has provided valuable insights. This study aims to review relevant literature to establish a theoretical foundation for individualized clinical management. METHODS: Follow PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, Pop-PK studies of PMB were searched in PubMed and EMBASE database systems from the inception of the database until March 2023. RESULT: To date, a total of 22 population-based studies have been conducted, encompassing 756 subjects across six different countries. The recruited population in these studies consisted of critically infected individuals with multidrug-resistant bacteria, patients with varying renal functions, those with cystic fibrosis, kidney or lung transplant recipients, patients undergoing extracorporeal membrane oxygenation (ECMO) or continuous renal replacement therapy (CRRT), as well as individuals with obesity or pediatric populations. Among these studies, seven employed a one-compartmental model, with the range of typical clearance (CL) and volume (Vc) being 1.18-2.5L /h and 12.09-47.2 L, respectively. Fifteen studies employed a two-compartmental model, with the ranges of the clearance (CL) and volume of the central compartment (Vc), the volume of the peripheral compartment (Vp), and the intercompartment clearance (Q) were 1.27-8.65 L/h, 5.47-38.6 L, 4.52-174.69 L, and 1.34-24.3 L/h, respectively. Primary covariates identified in these studies included creatinine clearance and body weight, while other covariates considered were CRRT, albumin, age, and SOFA scores. Internal evaluation was conducted in 19 studies, with only one study being externally validated using an independent external dataset. CONCLUSION: We conclude that small sample sizes, lack of multicentre collaboration, and patient homogeneity are the primary reasons for the discrepancies in the results of the current studies. In addition, most of the studies limited in the internal evaluation, which confined the implementation of model-informed precision dosing strategies.


Assuntos
Antibacterianos , Polimixina B , Humanos , Polimixina B/farmacocinética , Polimixina B/administração & dosagem , Antibacterianos/farmacocinética , Antibacterianos/administração & dosagem , Modelos Biológicos , Oxigenação por Membrana Extracorpórea , Estado Terminal
9.
J Biomed Inform ; 157: 104692, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39009174

RESUMO

BACKGROUND: An inherent difference exists between male and female bodies, the historical under-representation of females in clinical trials widened this gap in existing healthcare data. The fairness of clinical decision-support tools is at risk when developed based on biased data. This paper aims to quantitatively assess the gender bias in risk prediction models. We aim to generalize our findings by performing this investigation on multiple use cases at different hospitals. METHODS: First, we conduct a thorough analysis of the source data to find gender-based disparities. Secondly, we assess the model performance on different gender groups at different hospitals and on different use cases. Performance evaluation is quantified using the area under the receiver-operating characteristic curve (AUROC). Lastly, we investigate the clinical implications of these biases by analyzing the underdiagnosis and overdiagnosis rate, and the decision curve analysis (DCA). We also investigate the influence of model calibration on mitigating gender-related disparities in decision-making processes. RESULTS: Our data analysis reveals notable variations in incidence rates, AUROC, and over-diagnosis rates across different genders, hospitals and clinical use cases. However, it is also observed the underdiagnosis rate is consistently higher in the female population. In general, the female population exhibits lower incidence rates and the models perform worse when applied to this group. Furthermore, the decision curve analysis demonstrates there is no statistically significant difference between the model's clinical utility across gender groups within the interested range of thresholds. CONCLUSION: The presence of gender bias within risk prediction models varies across different clinical use cases and healthcare institutions. Although inherent difference is observed between male and female populations at the data source level, this variance does not affect the parity of clinical utility. In conclusion, the evaluations conducted in this study highlight the significance of continuous monitoring of gender-based disparities in various perspectives for clinical risk prediction models.


Assuntos
Curva ROC , Sexismo , Humanos , Feminino , Masculino , Sexismo/estatística & dados numéricos , Medição de Risco/métodos , Hospitais , Área Sob a Curva , Sistemas de Apoio a Decisões Clínicas
10.
Acta Pharmacol Sin ; 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39349764

RESUMO

Therapeutic antibodies are at the forefront of biotherapeutics, valued for their high target specificity and binding affinity. Despite their potential, optimizing antibodies for superior efficacy presents significant challenges in both monetary and time costs. Recent strides in computational and artificial intelligence (AI), especially generative diffusion models, have begun to address these challenges, offering novel approaches for antibody design. This review delves into specific diffusion-based generative methodologies tailored for antibody design tasks, de novo antibody design, and optimization of complementarity-determining region (CDR) loops, along with their evaluation metrics. We aim to provide an exhaustive overview of this burgeoning field, making it an essential resource for leveraging diffusion-based generative models in antibody design endeavors.

11.
J Dairy Sci ; 107(9): 6771-6784, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38754833

RESUMO

Automated measurements of the ratio of concentrations of methane and carbon dioxide, [CH4]:[CO2], in breath from individual animals (the so-called "sniffer technique") and estimated CO2 production can be used to estimate CH4 production, provided that CO2 production can be reliably calculated. This would allow CH4 production from individual cows to be estimated in large cohorts of cows, whereby ranking of cows according to their CH4 production might become possible and their values could be used for breeding of low CH4-emitting animals. Estimates of CO2 production are typically based on predictions of heat production, which can be calculated from body weight (BW), energy-corrected milk yield, and days of pregnancy. The objectives of the present study were to develop predictions of CO2 production directly from milk production, dietary, and animal variables, and furthermore to develop different models to be used for different scenarios, depending on available data. An international dataset with 2,244 records from individual lactating cows including CO2 production and associated traits, as dry matter intake (DMI), diet composition, BW, milk production and composition, days in milk, and days pregnant, was compiled to constitute the training dataset. Research location and experiment nested within research location were included as random intercepts. The method of CO2 production measurement (respiration chamber [RC] or GreenFeed [GF]) was confounded with research location, and therefore excluded from the model. In total, 3 models were developed based on the current training dataset: model 1 ("best model"), where all significant traits were included; model 2 ("on-farm model"), where DMI was excluded; and model 3 ("reduced on-farm model"), where both DMI and BW were excluded. Evaluation on test dat sets with either RC data (n = 103), GF data without additives (n = 478), or GF data only including observations where nitrate, 3-nitrooxypropanol (3-NOP), or a combination of nitrate and 3-NOP were fed to the cows (GF+: n = 295), showed good precision of the 3 models, illustrated by low slope bias both in absolute values (-0.22 to 0.097) and in percentage (0.049 to 4.89) of mean square error (MSE). However, the mean bias (MB) indicated systematic overprediction and underprediction of CO2 production when the models were evaluated on the GF and the RC test datasets, respectively. To address this bias, the 3 models were evaluated on a modified test dataset, where the CO2 production (g/d) was adjusted by subtracting (where measurements were obtained by RC) or adding absolute MB (where measurements were obtained by GF) from evaluation of the specific model on RC, GF, and GF+ test datasets. With this modification, the absolute values of MB and MB as percentage of MSE became negligible. In conclusion, the 3 models were precise in predicting CO2 production from lactating dairy cows.


Assuntos
Dióxido de Carbono , Dieta , Lactação , Metano , Leite , Animais , Bovinos , Feminino , Dióxido de Carbono/metabolismo , Leite/metabolismo , Leite/química , Dieta/veterinária , Metano/biossíntese , Metano/metabolismo , Ração Animal , Peso Corporal
12.
Ecotoxicol Environ Saf ; 275: 116240, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38520811

RESUMO

Modelling approaches to estimate the bioaccumulation of organic chemicals by earthworms are important for improving the realism in risk assessment of chemicals. However, the applicability of existing models is uncertain, partly due to the lack of independent datasets to test them. This study therefore conducted a comprehensive literature review on existing empirical and kinetic models that estimate the bioaccumulation of organic chemicals in earthworms and gathered two independent datasets from published literature to evaluate the predictive performance of these models. The Belfroid et al. (1995a) model is the best-performing empirical model, with 91.2% of earthworm body residue simulations within an order of magnitude of observation. However, this model is limited to the more hydrophobic pesticides and to the earthworm species Eisenia fetida or Eisenia andrei. The kinetic model proposed by Jager et al. (2003b) which out-performs that of Armitage and Gobas (2007), predicted uptake of PCB 153 in the earthworm E. andrei to within a factor of 10. However, the applicability of Jager et al.'s model to other organic compounds and other earthworm species is unknown due to the limited evaluation dataset. The model needs to be parameterised for different chemical, soil, and species types prior to use, which restricts its applicability to risk assessment on a broad scale. Both the empirical and kinetic models leave room for improvement in their ability to reliably predict bioaccumulation in earthworms. Whether they are fit for purpose in environmental risk assessment needs careful consideration on a case by case basis.


Assuntos
Oligoquetos , Praguicidas , Poluentes do Solo , Animais , Poluentes do Solo/análise , Bioacumulação , Compostos Orgânicos , Solo/química
13.
Proteins ; 91(12): 1800-1810, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37622458

RESUMO

Ribonucleic acid (RNA) molecules serve as master regulators of cells by encoding their biological function in the ribonucleotide sequence, particularly their ability to interact with other molecules. To understand how RNA molecules perform their biological tasks and to design new sequences with specific functions, it is of great benefit to be able to computationally predict how RNA folds and interacts in the cellular environment. Our workflow for computational modeling of the 3D structures of RNA and its interactions with other molecules uses a set of methods developed in our laboratory, including MeSSPredRNA for predicting canonical and non-canonical base pairs, PARNASSUS for detecting remote homology based on comparisons of sequences and secondary structures, ModeRNA for comparative modeling, the SimRNA family of programs for modeling RNA 3D structure and its complexes with other molecules, and QRNAS for model refinement. In this study, we present the results of testing this workflow in predicting RNA 3D structures in the CASP15 experiment. The overall high score of the computational models predicted by our group demonstrates the robustness of our workflow and its individual components in terms of predicting RNA 3D structures of acceptable quality that are close to the target structures. However, the variance in prediction quality is still quite high, and the results are still too far from the level of protein 3D structure predictions. This exercise led us to consider several improvements, especially to better predict and enforce stacking interactions and non-canonical base pairs.


Assuntos
RNA , RNA/química , Conformação de Ácido Nucleico , Modelos Moleculares , Pareamento de Bases , Simulação por Computador
14.
Environ Sci Technol ; 57(25): 9224-9233, 2023 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-37294067

RESUMO

The use of passive air samplers (PAS) for semi-volatile organic compounds (SVOCs) continues to expand. To advance quantitative understanding of uptake kinetics, we calibrated the XAD-PAS, using a styrene-divinylbenzene sorbent, through a year-long side-by-side deployment with an active sampler. Twelve XAD-PASs, deployed in June 2020, were retrieved at 4-week intervals, while gas phase SVOCs were quantified in 48 consecutive week-long active samples taken from June 2020 to May 2021. Consistent with XAD's high uptake capacity, even relatively volatile SVOCs, such as hexachlorobutadiene, displayed linear uptake throughout the entire deployment. Sampling rates (SRs) range between 0.1 and 0.6 m3 day-1 for 26 SVOCs, including brominated flame retardants, organophosphate esters, and halogenated methoxylated benzenes. SRs are compared with experimental SRs reported previously. The ability of the existing mechanistic uptake model PAS-SIM to reproduce the observed uptake and SRs was evaluated. Agreement between simulated and measured uptake curves was reasonable but varied with compound volatility and the assumed stagnant air layer boundary thickness. Even though PAS-SIM succeeds in predicting the SR range for the studied SVOCs, it fails to capture the volatility dependence of the SR by underestimating the length of the linear uptake period and by failing to consider the kinetics of sorption.


Assuntos
Poluentes Atmosféricos , Compostos Orgânicos Voláteis , Compostos Orgânicos Voláteis/análise , Calibragem , Poluentes Atmosféricos/análise , Monitoramento Ambiental , Cinética
15.
Environ Res ; 237(Pt 1): 116857, 2023 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-37579963

RESUMO

Against the backdrop of global warming, rapid urbanization has caused the aggregation of urban building spaces and the heat island effect is becoming increasingly serious, hindering sustainable urban development. In order to investigate the potential and methods of green roofs in different types of neighborhoods to mitigate the urban heat island effect, this study used multivariate data for surface temperature inversion and local climate zone (LCZ), and the potential of green roofs to reduce the heat island effect was evaluated by combining LCZ zoning and ENVI-met prediction model. Finally, a multi-scenario analysis with economic factors was conducted to derive the optimal implementation path for green roofs. The results show that in LCZs 1-9, the green roof can reduce the daytime average air temperature by a maximum of 0.41 °C for 0.5 m of the LCZ8 roof and 0.37 °C for 1.2 m of the LCZ6 pedestrian. Based on the surface cooling efficiency of LCZ green roofs get the best green roof construction order: LCZ3, LCZ6, LCZ8 > LCZ2, LCZ5, LCZ7 > LCZ1, LCZ4, LCZ9. The construction of green roofs for the heat island areas within the fifth ring road of Beijing can reduce the area of high-temperature and sub-high-temperature zones by 52.55% and 29.17%, respectively, compared with the area without green roof construction. The study clarifies the technical methodology system of cooling efficiency of green roofs in different types of neighborhoods and the reduction of the urban-scale heat island effect, which provides a reference for the planning of green roofs for urban buildings.

16.
BMC Public Health ; 23(1): 782, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37118796

RESUMO

BACKGROUND: The COVID-19 pandemic has highlighted the role of infectious disease forecasting in informing public policy. However, significant barriers remain for effectively linking infectious disease forecasts to public health decision making, including a lack of model validation. Forecasting model performance and accuracy should be evaluated retrospectively to understand under which conditions models were reliable and could be improved in the future. METHODS: Using archived forecasts from the California Department of Public Health's California COVID Assessment Tool ( https://calcat.covid19.ca.gov/cacovidmodels/ ), we compared how well different forecasting models predicted COVID-19 hospitalization census across California counties and regions during periods of Alpha, Delta, and Omicron variant predominance. RESULTS: Based on mean absolute error estimates, forecasting models had variable performance across counties and through time. When accounting for model availability across counties and dates, some individual models performed consistently better than the ensemble model, but model rankings still differed across counties. Local transmission trends, variant prevalence, and county population size were informative predictors for determining which model performed best for a given county based on a random forest classification analysis. Overall, the ensemble model performed worse in less populous counties, in part because of fewer model contributors in these locations. CONCLUSIONS: Ensemble model predictions could be improved by incorporating geographic heterogeneity in model coverage and performance. Consistency in model reporting and improved model validation can strengthen the role of infectious disease forecasting in real-time public health decision making.


Assuntos
COVID-19 , Doenças Transmissíveis , Humanos , Pandemias , Estudos Retrospectivos , COVID-19/epidemiologia , SARS-CoV-2 , Doenças Transmissíveis/epidemiologia , California/epidemiologia , Política Pública , Tomada de Decisões , Hospitalização , Previsões
17.
J Med Internet Res ; 25: e48763, 2023 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-37651179

RESUMO

BACKGROUND: The reporting of machine learning (ML) prognostic and diagnostic modeling studies is often inadequate, making it difficult to understand and replicate such studies. To address this issue, multiple consensus and expert reporting guidelines for ML studies have been published. However, these guidelines cover different parts of the analytics lifecycle, and individually, none of them provide a complete set of reporting requirements. OBJECTIVE: We aimed to consolidate the ML reporting guidelines and checklists in the literature to provide reporting items for prognostic and diagnostic ML in in-silico and shadow mode studies. METHODS: We conducted a literature search that identified 192 unique peer-reviewed English articles that provide guidance and checklists for reporting ML studies. The articles were screened by their title and abstract against a set of 9 inclusion and exclusion criteria. Articles that were filtered through had their quality evaluated by 2 raters using a 9-point checklist constructed from guideline development good practices. The average κ was 0.71 across all quality criteria. The resulting 17 high-quality source papers were defined as having a quality score equal to or higher than the median. The reporting items in these 17 articles were consolidated and screened against a set of 6 inclusion and exclusion criteria. The resulting reporting items were sent to an external group of 11 ML experts for review and updated accordingly. The updated checklist was used to assess the reporting in 6 recent modeling papers in JMIR AI. Feedback from the external review and initial validation efforts was used to improve the reporting items. RESULTS: In total, 37 reporting items were identified and grouped into 5 categories based on the stage of the ML project: defining the study details, defining and collecting the data, modeling methodology, model evaluation, and explainability. None of the 17 source articles covered all the reporting items. The study details and data description reporting items were the most common in the source literature, with explainability and methodology guidance (ie, data preparation and model training) having the least coverage. For instance, a median of 75% of the data description reporting items appeared in each of the 17 high-quality source guidelines, but only a median of 33% of the data explainability reporting items appeared. The highest-quality source articles tended to have more items on reporting study details. Other categories of reporting items were not related to the source article quality. We converted the reporting items into a checklist to support more complete reporting. CONCLUSIONS: Our findings supported the need for a set of consolidated reporting items, given that existing high-quality guidelines and checklists do not individually provide complete coverage. The consolidated set of reporting items is expected to improve the quality and reproducibility of ML modeling studies.


Assuntos
Lista de Checagem , Aprendizado de Máquina , Humanos , Prognóstico , Reprodutibilidade dos Testes , Consenso
18.
Ecotoxicol Environ Saf ; 255: 114806, 2023 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-36948010

RESUMO

Cancer, the second largest human disease, has become a major public health problem. The prediction of chemicals' carcinogenicity before their synthesis is crucial. In this paper, seven machine learning algorithms (i.e., Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), Complement Naive Bayes (CNB), K-Nearest Neighbor (KNN), XGBoost, and Multilayer Perceptron (MLP)) were used to construct the carcinogenicity triple classification prediction (TCP) model (i.e., 1A, 1B, Category 2). A total of 1444 descriptors of 118 hazardous organic chemicals were calculated by Discovery Studio 2020, Sybyl X-2.0 and PaDEL-Descriptor software. The constructed carcinogenicity TCP model was evaluated through five model evaluation indicators (i.e., Accuracy, Precision, Recall, F1 Score and AUC). The model evaluation results show that Accuracy, Precision, Recall, F1 Score and AUC evaluation indicators meet requirements (greater than 0.6). The accuracy of RF, LR, XGBoost, and MLP models for predicting carcinogenicity of Category 2 is 91.67%, 79.17%, 100%, and 100%, respectively. In addition, the constructed machine learning model in this study has potential for error correction. Taking XGBoost model as an example, the predicted carcinogenicity level of 1,2,3-Trichloropropane (96-18-4) is Category 2, but the actual carcinogenicity level is 1B. But the difference between Category 2 and 1B is only 0.004, indicating that the XGBoost is one optimum model of the seven constructed machine learning models. Besides, results showed that functional groups like chlorine and benzene ring might influence the prediction of carcinogenic classification. Therefore, considering functional group characteristics of chemicals before constructing the carcinogenicity prediction model of organic chemicals is recommended. The predicted carcinogenicity of the organic chemicals using the optimum machine leaning model (i.e., XGBoost) was also evaluated and verified by the toxicokinetics. The RF and XGBoost TCP models constructed in this paper can be used for carcinogenicity detection before synthesizing new organic substances. It also provides technical support for the subsequent management of organic chemicals.


Assuntos
Carcinógenos , Substâncias Perigosas , Aprendizado de Máquina , Compostos Orgânicos , Teorema de Bayes , Carcinogênese , Carcinógenos/toxicidade , Carcinógenos/química , Substâncias Perigosas/química , Substâncias Perigosas/toxicidade , Compostos Orgânicos/toxicidade , Compostos Orgânicos/química , Máquina de Vetores de Suporte , Organização Mundial da Saúde , Algoritmos , Estados Unidos , União Europeia , China , Bases de Dados Factuais
19.
Prev Sci ; 24(3): 467-479, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-34519939

RESUMO

Statistical analysis of categorical data often relies on multiway contingency tables; yet, as the number of categories and/or variables increases, the number of table cells with few (or zero) observations also increases. Unfortunately, sparse contingency tables invalidate the use of standard goodness-of-fit statistics. Limited-information fit statistics and bootstrapping procedures offer valuable solutions to this problem, but they present an additional concern in their strict reliance on the (potentially misleading) observed data. To address both of these issues, we demonstrate the Bayesian model checking technique, which yields insightful, useful, and comprehensive evaluations of specific properties of a given model. We illustrate this technique using item response data from a patient-reported psychopathology screening questionnaire, and we provide annotated R code to promote dissemination of this informative method in other prevention science modeling scenarios.


Assuntos
Modelos Estatísticos , Modelos Teóricos , Humanos , Teorema de Bayes , Projetos de Pesquisa
20.
Prev Sci ; 24(3): 455-466, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-33970410

RESUMO

The Tucker-Lewis index (TLI; Tucker & Lewis, 1973), also known as the non-normed fit index (NNFI; Bentler & Bonett, 1980), is one of the numerous incremental fit indices widely used in linear mean and covariance structure modeling, particularly in exploratory factor analysis, tools popular in prevention research. It augments information provided by other indices such as the root-mean-square error of approximation (RMSEA). In this paper, we develop and examine an analogous index for categorical item level data modeled with item response theory (IRT). The proposed Tucker-Lewis index for IRT (TLIRT) is based on Maydeu-Olivares and Joe's (2005) [Formula: see text] family of limited-information overall model fit statistics. The limited-information fit statistics have significantly better Chi-square approximation and power than traditional full-information Pearson or likelihood ratio statistics under realistic situations. Building on the incremental fit assessment principle, the TLIRT compares the fit of model under consideration along a spectrum of worst to best possible model fit scenarios. We examine the performance of the new index using simulated and empirical data. Results from a simulation study suggest that the new index behaves as theoretically expected, and it can offer additional insights about model fit not available from other sources. In addition, a more stringent cutoff value is perhaps needed than Hu and Bentler's (1999) traditional cutoff criterion with continuous variables. In the empirical data analysis, we use a data set from a measurement development project in support of cigarette smoking cessation research to illustrate the usefulness of the TLIRT. We noticed that had we only utilized the RMSEA index, we could have arrived at qualitatively different conclusions about model fit, depending on the choice of test statistics, an issue to which the TLIRT is relatively more immune.


Assuntos
Análise Fatorial , Humanos , Psicometria , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa