Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
2.
Big Data ; 11(3): 199-214, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-34612727

RESUMO

Although confirmatory modeling has dominated much of applied research in medical, business, and behavioral sciences, modeling large data sets with the goal of accurate prediction has become more widely accepted. The current practice for fitting predictive models is guided by heuristic-based modeling frameworks that lead researchers to make a series of often isolated decisions regarding data preparation and cleaning that may result in substandard predictive performance. In this article, we use an experimental design to evaluate the impact of six factors related to data preparation and model selection (techniques for numerical imputation, categorical imputation, encoding, subsampling for unbalanced data, feature selection, and machine learning algorithm) and their interactions on the predictive accuracy of models applied to a large, publicly available heart transplantation database. Our factorial experiment includes 10,800 models evaluated on 5 independent test partitions of the data. Results confirm that some decisions made early in the modeling process interact with later decisions to affect predictive performance; therefore, the current practice of making these decisions independently can negatively affect predictive outcomes. A key result of this case study is to highlight the need for improved rigor in applied predictive research. By using the scientific method to inform predictive modeling, we can work toward a framework for applied predictive modeling and a standard for reproducibility in predictive research.


Assuntos
Algoritmos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Bases de Dados Factuais
3.
JMIR Public Health Surveill ; 8(7): e32164, 2022 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-35476722

RESUMO

BACKGROUND: Socially vulnerable communities are at increased risk for adverse health outcomes during a pandemic. Although this association has been established for H1N1, Middle East respiratory syndrome (MERS), and COVID-19 outbreaks, understanding the factors influencing the outbreak pattern for different communities remains limited. OBJECTIVE: Our 3 objectives are to determine how many distinct clusters of time series there are for COVID-19 deaths in 3108 contiguous counties in the United States, how the clusters are geographically distributed, and what factors influence the probability of cluster membership. METHODS: We proposed a 2-stage data analytic framework that can account for different levels of temporal aggregation for the pandemic outcomes and community-level predictors. Specifically, we used time-series clustering to identify clusters with similar outcome patterns for the 3108 contiguous US counties. Multinomial logistic regression was used to explain the relationship between community-level predictors and cluster assignment. We analyzed county-level confirmed COVID-19 deaths from Sunday, March 1, 2020, to Saturday, February 27, 2021. RESULTS: Four distinct patterns of deaths were observed across the contiguous US counties. The multinomial regression model correctly classified 1904 (61.25%) of the counties' outbreak patterns/clusters. CONCLUSIONS: Our results provide evidence that county-level patterns of COVID-19 deaths are different and can be explained in part by social and political predictors.


Assuntos
COVID-19 , Vírus da Influenza A Subtipo H1N1 , Análise por Conglomerados , Humanos , SARS-CoV-2 , Fatores de Tempo , Estados Unidos/epidemiologia
4.
PLoS One ; 16(11): e0242896, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34731173

RESUMO

OBJECTIVE: The COVID-19 pandemic in the U.S. has exhibited a distinct multiwave pattern beginning in March 2020. Paradoxically, most counties do not exhibit this same multiwave pattern. We aim to answer three research questions: (1) How many distinct clusters of counties exhibit similar COVID-19 patterns in the time-series of daily confirmed cases? (2) What is the geographic distribution of the counties within each cluster? and (3) Are county-level demographic, socioeconomic and political variables associated with the COVID-19 case patterns? MATERIALS AND METHODS: We analyzed data from counties in the U.S. from March 1, 2020 to January 2, 2021. Time series clustering identified clusters in the daily confirmed cases of COVID-19. An explanatory model was used to identify demographic, socioeconomic and political variables associated with the outbreak patterns. RESULTS: Three patterns were identified from the cluster solution including counties in which cases are still increasing, those that peaked in the late fall, and those with low case counts to date. Several county-level demographic, socioeconomic, and political variables showed significant associations with the identified clusters. DISCUSSION: The pattern of the outbreak is related both to the geographic location within the U.S. and several variables including population density and government response. CONCLUSION: The reported pattern of cases in the U.S. is observed through aggregation of the daily confirmed COVID-19 cases, suggesting that local trends may be more informative. The pattern of the outbreak varies by county, and is associated with important demographic, socioeconomic, political and geographic factors.


Assuntos
COVID-19/epidemiologia , Análise por Conglomerados , Humanos , Modelos Biológicos , Estudos Retrospectivos , Estudos de Tempo e Movimento , Estados Unidos/epidemiologia
5.
Appl Ergon ; 90: 103262, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32927403

RESUMO

Advancements in sensing and network technologies have increased the amount of data being collected to monitor the worker conditions. In this study, we consider the use of time series methods to forecast physical fatigue using subjective ratings of perceived exertion (RPE) and gait data from wearable sensors captured during a simulated in-lab manual material handling task (Lab Study 1) and a fatiguing squatting with intermittent walking cycle (Lab Study 2). To determine whether time series models can accurately forecast individual response and for how many time periods ahead, five models were compared: naïve method, autoregression (AR), autoregressive integrated moving average (ARIMA), vector autoregression (VAR), and the vector error correction model (VECM). For forecasts of three or more time periods ahead, the VECM model that incorporates historical RPE and wearable sensor data outperformed the other models with median mean absolute error (MAE) <1.24 and median MAE <1.22 across all participants for Lab Study 1 and Lab Study 2, respectively. These results suggest that wearable sensor data can support forecasting a worker's condition and the forecasts obtained are as good as current state-of-the-art models using multiple sensors for current time prediction.


Assuntos
Esforço Físico , Dispositivos Eletrônicos Vestíveis , Fadiga/diagnóstico , Previsões , Humanos , Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA