Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Sci Rep ; 13(1): 22930, 2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38129635

RESUMO

Time series data collected using wireless sensors, such as temperature and humidity, can provide insight into a building's heating, ventilation, and air conditioning (HVAC) system. Anomalies of these sensor measurements can be used to identify locations of a building that are poorly designed or maintained. Resolving the anomalies present in these locations can improve the thermal comfort of occupants, as well as improve air quality and energy efficiency levels in that space. In this study, we developed a scoring method to identify sensors that shows collective anomalies due to environmental issues. This leads to identifying problematic locations within commercial and institutional buildings. The Dynamic Time Warping (DTW) based anomaly detection method was applied to identify collective anomalies. Then, a score for each sensor was obtained by taking the weighted sum of the number of anomalies, vertical distance to an anomaly point, and dynamic time-warping distance. The weights were optimized using a well-defined simulation study and applying the grid search algorithm. Finally, using a synthetic data set and the results of a case study we could evaluate the performance of our developed scoring method. In conclusion, this newly developed scoring method successfully detects collective anomalies even with data collected over one week, compared to the machine learning models which need more data to train themselves.

2.
PeerJ Comput Sci ; 8: e1081, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36262135

RESUMO

High dimensional classification problems have gained increasing attention in machine learning, and feature selection has become essential in executing machine learning algorithms. In general, most feature selection methods compare the scores of several feature subsets and select the one that gives the maximum score. There may be other selections of a lower number of features with a lower score, yet the difference is negligible. This article proposes and applies an extended version of such feature selection methods, which selects a smaller feature subset with similar performance to the original subset under a pre-defined threshold. It further validates the suggested extended version of the Principal Component Loading Feature Selection (PCLFS-ext) results by simulating data for several practical scenarios with different numbers of features and different imbalance rates on several classification methods. Our simulated results show that the proposed method outperforms the original PCLFS and existing Recursive Feature Elimination (RFE) by giving reasonable feature reduction on various data sets, which is important in some applications.

3.
JMIR Form Res ; 6(9): e37984, 2022 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-36069846

RESUMO

BACKGROUND: The COVID-19 pandemic is a substantial public health crisis that negatively affects human health and well-being. As a result of being infected with the coronavirus, patients can experience long-term health effects called long COVID syndrome. Multiple symptoms characterize this syndrome, and it is crucial to identify these symptoms as they may negatively impact patients' day-to-day lives. Breathlessness, fatigue, and brain fog are the 3 most common continuing and debilitating symptoms that patients with long COVID have reported, often months after the onset of COVID-19. OBJECTIVE: This study aimed to understand the patterns and behavior of long COVID symptoms reported by patients on the Twitter social media platform, which is vital to improving our understanding of long COVID. METHODS: Long COVID-related Twitter data were collected from May 1, 2020, to December 31, 2021. We used association rule mining techniques to identify frequent symptoms and establish relationships between symptoms among patients with long COVID in Twitter social media discussions. The highest confidence level-based detection was used to determine the most significant rules with 10% minimum confidence and 0.01% minimum support with a positive lift. RESULTS: Among the 30,327 tweets included in our study, the most frequent symptoms were brain fog (n=7812, 25.8%), fatigue (n=5284, 17.4%), breathing/lung issues (n=4750, 15.7%), heart issues (n=2900, 9.6%), flu symptoms (n=2824, 9.3%), depression (n=2256, 7.4%) and general pains (n=1786, 5.9%). Loss of smell and taste, cold, cough, chest pain, fever, headache, and arm pain emerged in 1.6% (n=474) to 5.3% (n=1616) of patients with long COVID. Furthermore, the highest confidence level-based detection successfully demonstrates the potential of association analysis and the Apriori algorithm to establish patterns to explore 57 meaningful relationship rules among long COVID symptoms. The strongest relationship revealed that patients with lung/breathing problems and loss of taste are likely to have a loss of smell with 77% confidence. CONCLUSIONS: There are very active social media discussions that could support the growing understanding of COVID-19 and its long-term impact. These discussions enable a potential field of research to analyze the behavior of long COVID syndrome. Exploratory data analysis using natural language processing methods revealed the symptoms and medical conditions related to long COVID discussions on the Twitter social media platform. Using Apriori algorithm-based association rules, we determined interesting and meaningful relationships between symptoms.

4.
Energy Inform ; 5(1): 1, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35252758

RESUMO

An efficient building should be able to control its internal temperature in a manner that considers both the building's energy efficiency and the comfort level of its occupants. Thermostats help to control the temperature within a building by providing real-time data on the temperature inside that space to determine whether it is within the acceptable range of that building's control system, and proper thermostat placement helps to better control a building's temperature. More thermostats can provide better control of a building, as well as a better understanding of the building's temperature distribution. In order to determine the minimum number of thermostats required to accurately measure and control the internal temperature distribution of a building, it is necessary to find the locations that show similar environmental conditions. In this paper, we analyzed high resolution temperature measurements from a commercial building using wireless sensors to assess the performance and health of the building's HVAC zoning and controls system. Then we conducted two cluster analyses to evaluate the efficiency of the existing zoning structure and to find the optimal number of clusters. K-means and time series clustering were used to identify the temperature clusters per building floor. Based on statistical assessments, we observed that time series clustering showed better results than k-means clustering.

5.
Int J Inf Technol ; 14(2): 607-618, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35106437

RESUMO

Identification of sub-networks within a network is essential to understand the functionality of a network. This process is called as 'Community detection'. There are various existing community detection algorithms, and the performance of these algorithms can be varied based on the network structure. In this paper, we introduce a novel random graph generator using a mixture of Gaussian distributions. The community sizes of the generated network depend on the given Gaussian distributions. We then develop simulation studies to understand the impact of density and sparsity of the network on community detection. We use Infomap, Label propagation, Spinglass, and Louvain algorithms to detect communities. The similarity between true communities and detected communities is evaluated using Adjusted Rand Index, Adjusted Mutual Information, and Normalized Mutual Information similarity scores. We also develop a method to generate heatmaps to compare those similarity score values. The results indicate that the Louvain algorithm has the highest capacity to detect perfect communities while Label Propagation has the lowest capacity.

6.
J Appl Stat ; 48(10): 1775-1797, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35706711

RESUMO

We introduce an approach to model the batting outcomes of baseball batters based on the weighted likelihood approach and make use of our methodology to estimate commonly used baseball batting metrics. The weighted likelihood allows the sharing of relevant information among players. Specifically, this allows the inference on each batter to make use of the batting data from all other players in the league and, in the process, allows for improved inference. MAMSE (Minimum Averaged Mean Squared Error) weights are used as the likelihood weights. For comparison, we implemented a semi-parametric Bayesian approach based on the Dirichlet process, which enables the borrowing of information across batters while providing a natural clustering mechanism. We demonstrate and compare these approaches using 2018 Major League Baseball (MLB) batters data.

7.
Comput Math Methods Med ; 2018: 8134132, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30159005

RESUMO

We consider a Bayesian approach for assessing hypotheses of equivalence in two-arm trials with binary Data. We discuss the development of likelihood, the prior, and the posterior distributions of parameters of interest. We then examine the suitability of a normal approximation to the posterior distribution obtained via a Taylor series expansion. The Bayesian inference is carried out using Markov Chain Monte Carlo (MCMC) methods. We illustrate the methods using actual data arising from two-arm clinical trials on preventing mortality after myocardial infarction.


Assuntos
Teorema de Bayes , Metanálise como Assunto , Método de Monte Carlo , Cadeias de Markov , Projetos de Pesquisa
8.
BMJ Open ; 7(6): e016173, 2017 06 23.
Artigo em Inglês | MEDLINE | ID: mdl-28645978

RESUMO

OBJECTIVE: This research proposes a model-based method to facilitate the selection of disease case definitions from validation studies for administrative health data. The method is demonstrated for a rheumatoid arthritis (RA) validation study. STUDY DESIGN AND SETTING: Data were from 148 definitions to ascertain cases of RA in hospital, physician and prescription medication administrative data. We considered: (A) separate univariate models for sensitivity and specificity, (B) univariate model for Youden's summary index and (C) bivariate (ie, joint) mixed-effects model for sensitivity and specificity. Model covariates included the number of diagnoses in physician, hospital and emergency department records, physician diagnosis observation time, duration of time between physician diagnoses and number of RA-related prescription medication records. RESULTS: The most common case definition attributes were: 1+ hospital diagnosis (65%), 2+ physician diagnoses (43%), 1+ specialist physician diagnosis (51%) and 2+ years of physician diagnosis observation time (27%). Statistically significant improvements in sensitivity and/or specificity for separate univariate models were associated with (all p values <0.01): 2+ and 3+ physician diagnoses, unlimited physician diagnosis observation time, 1+ specialist physician diagnosis and 1+ RA-related prescription medication records (65+ years only). The bivariate model produced similar results. Youden's index was associated with these same case definition criteria, except for the length of the physician diagnosis observation time. CONCLUSION: A model-based method provides valuable empirical evidence to aid in selecting a definition(s) for ascertaining diagnosed disease cases from administrative health data. The choice between univariate and bivariate models depends on the goals of the validation study and number of case definitions.


Assuntos
Artrite Reumatoide/diagnóstico , Modelos Estatísticos , Adulto , Idoso , Canadá , Bases de Dados Factuais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Sensibilidade e Especificidade , Índice de Gravidade de Doença , Adulto Jovem
9.
Can J Respir Ther ; 53(3): 37-44, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-30996632

RESUMO

OBJECTIVE: COPD is a high-cost disease and results in frequent contacts with the healthcare system. The study objective was to compare the accuracy of classification models with different covariates for classifying COPD patients into cost groups. METHODS: Linked health administrative databases from Saskatchewan, Canada, were used to identify a cohort of newly diagnosed COPD patients (April 1, 2007 to March 31, 2011) and their episodes of healthcare encounters for disease exacerbations. Total costs of the first and follow-up episodes were computed and patients were categorized as persistently high cost, occasionally high cost, and persistently low cost based on cumulative cost distribution ranking using the 75th percentile cutoff for high-cost status. Classification accuracy was compared for seven multinomial logistic regression models containing socio-demographic characteristics (i.e., base model), and socio-demographic and prior healthcare use characteristics (i.e., comparator models). RESULTS: Of the 1182 patients identified, 8.5% were classified as persistently high cost, 26.1% as occasionally high cost, and the remainder as persistently low cost. The persistently high-cost and occasionally high-cost patients incurred 10 times ($12 449 vs $1263) and seven times ($9334 vs $1263) more costs in their first exacerbation episode than persistently low-cost patients, respectively. Classification accuracy was 0.67 for the base model, whereas the comparator model containing socio-demographic and number of prior hospital admissions had the highest accuracy (0.72). CONCLUSIONS: Costs associated with COPD exacerbation episodes are substantial. Adding prior hospitalization to socio-demographic characteristics produced the highest improvements in classification accuracy. Accurate classification models are important for identifying potential healthcare cost management strategies.

10.
Medicine (Baltimore) ; 95(9): e2888, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26945376

RESUMO

Healthcare pathways are important to measure because they are expected to affect outcomes. However, they are challenging to define because patients exhibit heterogeneity in their use of healthcare services. The objective of this study was to identify and describe healthcare pathways during episodes of chronic obstructive pulmonary disease (COPD) exacerbations. Linked administrative databases from Saskatchewan, Canada were used to identify a cohort of newly diagnosed COPD patients and their episodes of healthcare use for disease exacerbations. Latent class analysis (LCA) was used to classify the cohort into homogeneous pathways using indicators of respiratory-related hospitalizations, emergency department (ED) visits, general and specialist physician visits, and outpatient prescription drug dispensations. Multinomial logistic regression models tested patients' demographic and disease characteristics associated with pathway group membership. The most frequent healthcare contact sequences in each pathway were described. Tests of mean costs across groups were conducted using a model-based approach with χ² statistics. LCA identified 3 distinct pathways for patients with hospital- (n = 963) and ED-initiated (n = 364) episodes. For the former, pathway group 1 members followed complex pathways in which multiple healthcare services were repeatedly used and incurred substantially higher costs than patients in the other pathway groups. For patients with an ED-initiated episode, pathway group 1 members also had higher costs than other groups. Pathway groups differed with respect to patient demographic and disease characteristics. A minority of patients were discharged from ED or hospital, but did not have any follow-up care during the remainder of their episode.Patients who followed complex pathways could benefit from case management interventions to streamline their journeys through the healthcare system. The minority of patients whose pathways were not consistent with recommended follow-up care should be further investigated to fully align COPD treatment in the province with recommended care practices.


Assuntos
Procedimentos Clínicos/economia , Doença Pulmonar Obstrutiva Crônica/terapia , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Doença Pulmonar Obstrutiva Crônica/economia , Estudos Retrospectivos
11.
Stat Methods Med Res ; 25(1): 352-65, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22802045

RESUMO

This article develops a Bayesian approach for meta-analysis using the Dirichlet process. The key aspect of the Dirichlet process in meta-analysis is the ability to assess evidence of statistical heterogeneity or variation in the underlying effects across study while relaxing the distributional assumptions. We assume that the study effects are generated from a Dirichlet process. Under a Dirichlet process model, the study effects parameters have support on a discrete space and enable borrowing of information across studies while facilitating clustering among studies. We illustrate the proposed method by applying it to a dataset on the Program for International Student Assessment on 30 countries. Results from the data analysis, simulation studies, and the log pseudo-marginal likelihood model selection procedure indicate that the Dirichlet process model performs better than conventional alternative methods.


Assuntos
Teorema de Bayes , Metanálise como Assunto , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/psicologia , Bioestatística , Análise por Conglomerados , Simulação por Computador , Escolaridade , Humanos , Funções Verossimilhança , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Tacrina/uso terapêutico
12.
Pharm Stat ; 14(6): 471-8, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26276902

RESUMO

Various methodologies proposed for some inference problems associated with two-arm trails are known to suffer from difficulties, as documented in Senn (2001). We propose an alternative Bayesian approach to these problems that deals with these difficulties through providing an explicit measure of statistical evidence and the strength of this evidence. Bayesian methods are often criticized for their intrinsic subjectivity. We show how these concerns can be dealt with through assessing the bias induced by a prior model checking and checking for prior-data conflict.


Assuntos
Teorema de Bayes , Ensaios Clínicos Controlados como Assunto/métodos , Modelos Estatísticos , Viés , Interpretação Estatística de Dados , Humanos
13.
Stat Methods Med Res ; 22(3): 261-77, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21300626

RESUMO

The existing generalized p-value approach, from statistical literature, is applied to assess noninferiority of an experimental treatment in a three-arm clinical trial including a placebo. Two generalized test functions (GTFs) are constructed and Monte Carlo simulations are used to compute the p-value. The GTFs perform well in terms of maintaining the Type-I error probabilities, and the power of the tests are shown to increase to 1 as both the sample size and the parameter denoting the fraction of the effect of the reference drug with respect to placebo increase. The generalized confidence intervals are shown to retain the coverage probabilities. A published dataset is re-analysed using the proposed test and the results are in agreement with earlier findings.


Assuntos
Modelos Estatísticos , Método de Monte Carlo , Probabilidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA