Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
PLoS One ; 19(2): e0298049, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38346030

RESUMEN

We investigate the dynamic characteristics of Covid-19 daily infection rates in Taiwan during its initial surge period, focusing on 79 districts within the seven largest cities. By employing computational techniques, we extract 18 features from each district-specific curve, transforming unstructured data into structured data. Our analysis reveals distinct patterns of asymmetric growth and decline among the curves. Utilizing theoretical information measurements such as conditional entropy and mutual information, we identify major factors of order-1 and order-2 that influence the peak value and curvature at the peak of the curves, crucial features characterizing the infection rates. Additionally, we examine the impact of geographic and socioeconomic factors on the curves by encoding each of the 79 districts with two binary characteristics: North-vs-South and Urban-vs-Suburban. Furthermore, leveraging this data-driven understanding at the district level, we explore the fine-scale behavioral effects on disease spread by examining the similarity among 96 age-group-specific curves within urban districts of Taipei and suburban districts of New Taipei City, which collectively represent a substantial portion of the nation's population. Our findings highlight the implicit influence of human behaviors related to living, traveling, and working on the dynamics of Covid-19 transmission in Taiwan.


Asunto(s)
COVID-19 , Humanos , Taiwán/epidemiología , COVID-19/epidemiología , Factores Socioeconómicos , Ciudades/epidemiología , Empleo
2.
Sci Rep ; 12(1): 17130, 2022 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-36224306

RESUMEN

Air pollution exposure has been linked to various diseases, including dementia. However, a novel method for investigating the associations between air pollution exposure and disease is lacking. The objective of this study was to investigate whether long-term exposure to ambient particulate air pollution increases dementia risk using both the traditional Cox model approach and a novel machine learning (ML) with random forest (RF) method. We used health data from a national population-based cohort in Taiwan from 2000 to 2017. We collected the following ambient air pollution data from the Taiwan Environmental Protection Administration (EPA): fine particulate matter (PM2.5) and gaseous pollutants, including sulfur dioxide (SO2), carbon monoxide (CO), ozone (O3), nitrogen oxide (NOx), nitric oxide (NO), and nitrogen dioxide (NO2). Spatiotemporal-estimated air quality data calculated based on a geostatistical approach, namely, the Bayesian maximum entropy method, were collected. Each subject's residential county and township were reviewed monthly and linked to air quality data based on the corresponding township and month of the year for each subject. The Cox model approach and the ML with RF method were used. Increasing the concentration of PM2.5 by one interquartile range (IQR) increased the risk of dementia by approximately 5% (HR = 1.05 with 95% CI = 1.04-1.05). The comparison of the performance of the extended Cox model approach with the RF method showed that the prediction accuracy was approximately 0.7 by the RF method, but the AUC was lower than that of the Cox model approach. This national cohort study over an 18-year period provides supporting evidence that long-term particulate air pollution exposure is associated with increased dementia risk in Taiwan. The ML with RF method appears to be an acceptable approach for exploring associations between air pollutant exposure and disease.


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , Demencia , Ozono , Contaminantes Atmosféricos/efectos adversos , Contaminantes Atmosféricos/análisis , Contaminación del Aire/efectos adversos , Contaminación del Aire/análisis , Teorema de Bayes , Monóxido de Carbono , Estudios de Cohortes , Demencia/epidemiología , Demencia/etiología , Exposición a Riesgos Ambientales/efectos adversos , Exposición a Riesgos Ambientales/análisis , Humanos , Aprendizaje Automático , Óxido Nítrico , Dióxido de Nitrógeno , Óxidos de Nitrógeno/análisis , Ozono/efectos adversos , Ozono/análisis , Material Particulado/efectos adversos , Material Particulado/análisis , Dióxido de Azufre
3.
PLoS One ; 17(4): e0266838, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35395047

RESUMEN

Tennis is a popular sport, and professional tennis matches are probably the most watched games globally. Many studies consider statistical or machine learning models to predict the results of professional tennis matches. In this study, we propose a statistical approach for predicting the match outcomes of Grand Slam tournaments, in addition to applying exploratory data analysis (EDA) to explore variables related to match results. The proposed approach introduces new variables via the Glicko rating model, a Bayesian method commonly used in professional chess. We use EDA tools to determine important variables and apply classification models (e.g., logistic regression, support vector machine, neural network and light gradient boosting machine) to evaluate the classification results through cross-validation. The empirical study is based on men's and women's single matches of Grand Slam tournaments (2000-2019). Our analysis results show that professional tennis ranking is the most important variable and that the accuracy of the proposed Glicko model is slightly higher than that of other models.


Asunto(s)
Tenis , Teorema de Bayes , Femenino , Predicción , Humanos , Modelos Logísticos , Aprendizaje Automático , Masculino
4.
Entropy (Basel) ; 24(2)2022 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-35205465

RESUMEN

For a large ensemble of complex systems, a Many-System Problem (MSP) studies how heterogeneity constrains and hides structural mechanisms, and how to uncover and reveal hidden major factors from homogeneous parts. All member systems in an MSP share common governing principles of dynamics, but differ in idiosyncratic characteristics. A typical dynamic is found underlying response features with respect to covariate features of quantitative or qualitative data types. Neither all-system-as-one-whole nor individual system-specific functional structures are assumed in such response-vs-covariate (Re-Co) dynamics. We developed a computational protocol for identifying various collections of major factors of various orders underlying Re-Co dynamics. We first demonstrate the immanent effects of heterogeneity among member systems, which constrain compositions of major factors and even hide essential ones. Secondly, we show that fuller collections of major factors are discovered by breaking heterogeneity into many homogeneous parts. This process further realizes Anderson's "More is Different" phenomenon. We employ the categorical nature of all features and develop a Categorical Exploratory Data Analysis (CEDA)-based major factor selection protocol. Information theoretical measurements-conditional mutual information and entropy-are heavily used in two selection criteria: C1-confirmable and C2-irreplaceable. All conditional entropies are evaluated through contingency tables with algorithmically computed reliability against the finite sample phenomenon. We study one artificially designed MSP and then two real collectives of Major League Baseball (MLB) pitching dynamics with 62 slider pitchers and 199 fastball pitchers, respectively. Finally, our MSP data analyzing techniques are applied to resolve a scientific issue related to the Rosenberg Self-Esteem Scale.

5.
Entropy (Basel) ; 24(10)2022 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-37420402

RESUMEN

We reformulate and reframe a series of increasingly complex parametric statistical topics into a framework of response-vs.-covariate (Re-Co) dynamics that is described without any explicit functional structures. Then we resolve these topics' data analysis tasks by discovering major factors underlying such Re-Co dynamics by only making use of data's categorical nature. The major factor selection protocol at the heart of Categorical Exploratory Data Analysis (CEDA) paradigm is illustrated and carried out by employing Shannon's conditional entropy (CE) and mutual information (I[Re;Co]) as the two key Information Theoretical measurements. Through the process of evaluating these two entropy-based measurements and resolving statistical tasks, we acquire several computational guidelines for carrying out the major factor selection protocol in a do-and-learn fashion. Specifically, practical guidelines are established for evaluating CE and I[Re;Co] in accordance with the criterion called [C1:confirmable]. Following the [C1:confirmable] criterion, we make no attempts on acquiring consistent estimations of these theoretical information measurements. All evaluations are carried out on a contingency table platform, upon which the practical guidelines also provide ways of lessening the effects of the curse of dimensionality. We explicitly carry out six examples of Re-Co dynamics, within each of which, several widely extended scenarios are also explored and discussed.

6.
Entropy (Basel) ; 23(12)2021 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-34945990

RESUMEN

Without assuming any functional or distributional structure, we select collections of major factors embedded within response-versus-covariate (Re-Co) dynamics via selection criteria [C1: confirmable] and [C2: irrepaceable], which are based on information theoretic measurements. The two criteria are constructed based on the computing paradigm called Categorical Exploratory Data Analysis (CEDA) and linked to Wiener-Granger causality. All the information theoretical measurements, including conditional mutual information and entropy, are evaluated through the contingency table platform, which primarily rests on the categorical nature within all involved features of any data types: quantitative or qualitative. Our selection task identifies one chief collection, together with several secondary collections of major factors of various orders underlying the targeted Re-Co dynamics. Each selected collection is checked with algorithmically computed reliability against the finite sample phenomenon, and so is each member's major factor individually. The developments of our selection protocol are illustrated in detail through two experimental examples: a simple one and a complex one. We then apply this protocol on two data sets pertaining to two somewhat related but distinct pitching dynamics of two pitch types: slider and fastball. In particular, we refer to a specific Major League Baseball (MLB) pitcher and we consider data of multiple seasons.

7.
Entropy (Basel) ; 23(7)2021 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-34206624

RESUMEN

All features of any data type are universally equipped with categorical nature revealed through histograms. A contingency table framed by two histograms affords directional and mutual associations based on rescaled conditional Shannon entropies for any feature-pair. The heatmap of the mutual association matrix of all features becomes a roadmap showing which features are highly associative with which features. We develop our data analysis paradigm called categorical exploratory data analysis (CEDA) with this heatmap as a foundation. CEDA is demonstrated to provide new resolutions for two topics: multiclass classification (MCC) with one single categorical response variable and response manifold analytics (RMA) with multiple response variables. We compute visible and explainable information contents with multiscale and heterogeneous deterministic and stochastic structures in both topics. MCC involves all feature-group specific mixing geometries of labeled high-dimensional point-clouds. Upon each identified feature-group, we devise an indirect distance measure, a robust label embedding tree (LET), and a series of tree-based binary competitions to discover and present asymmetric mixing geometries. Then, a chain of complementary feature-groups offers a collection of mixing geometric pattern-categories with multiple perspective views. RMA studies a system's regulating principles via multiple dimensional manifolds jointly constituted by targeted multiple response features and selected major covariate features. This manifold is marked with categorical localities reflecting major effects. Diverse minor effects are checked and identified across all localities for heterogeneity. Both MCC and RMA information contents are computed for data's information content with predictive inferences as by-products. We illustrate CEDA developments via Iris data and demonstrate its applications on data taken from the PITCHf/x database.

8.
Entropy (Basel) ; 23(5)2021 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-34064857

RESUMEN

We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features' categorical nature via histogram and it is guided by all features' associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of k(≥3) features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix's information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.

9.
J Neurosci Methods ; 295: 111-120, 2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-29247676

RESUMEN

BACKGROUND: Phase clustering within a single neurophysiological signal plays a significant role in a wide array of cognitive functions. Inter-trial phase coherence (ITC) is commonly used to assess to what extent phases are clustered in a similar direction over samples. However, this measure is especially dependent on sample size. Although ITC was transformed into ITCz, namely, Rayleigh's Z, to "correct" for the sample-size effect in previous research, the validity of this strategy has not been formally tested. New method This study introduced cosine similarity (CS) as an alternative solution, as this measure is an unbiased and consistent estimator for finite sample size and is considered less sensitive to the sample-size effect. RESULTS: In a series of studies using either artificial or real datasets, CS was robust against sample size variation even with small sample sizes. Moreover, several different aspects of examinations confirmed that CS could successfully detect phase-clustering differences between datasets with different sample sizes. Comparison with existing methods Existing measures suffer from sample-size effects. ITCz produced a mixed pattern of bias in assessing phase clustering according to sample size, whereas ITC overestimated the degree of phase clustering with small sample sizes. CONCLUSIONS: The current study not only reveals the incompetence of the previous "correction" measure, ITCz, but also provides converging evidence showing that CS may serve as an optimal measure to quantify phase clustering.


Asunto(s)
Encéfalo/fisiología , Análisis por Conglomerados , Tamaño de la Muestra , Procesamiento de Señales Asistido por Computador , Adulto , Simulación por Computador , Reconocimiento Facial/fisiología , Femenino , Humanos , Magnetoencefalografía/métodos , Masculino , Neurofisiología/métodos , Periodicidad , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...