Búsqueda | BVS Nicaragua

1.

JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression.

Mbatchou, Joelle; McPeek, Mary Sara.

Am J Hum Genet ; 2024 Jul 10.

Artículo en Inglés | MEDLINE | ID: mdl-39025064

RESUMEN

Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.

2.

Reliability of energy landscape analysis of resting-state functional MRI data.

Khanra, Pitambar; Nakuci, Johan; Muldoon, Sarah; Watanabe, Takamitsu; Masuda, Naoki.

Eur J Neurosci ; 2024 Jun 04.

Artículo en Inglés | MEDLINE | ID: mdl-38837814

RESUMEN

Energy landscape analysis is a data-driven method to analyse multidimensional time series, including functional magnetic resonance imaging (fMRI) data. It has been shown to be a useful characterization of fMRI data in health and disease. It fits an Ising model to the data and captures the dynamics of the data as movement of a noisy ball constrained on the energy landscape derived from the estimated Ising model. In the present study, we examine test-retest reliability of the energy landscape analysis. To this end, we construct a permutation test that assesses whether or not indices characterizing the energy landscape are more consistent across different sets of scanning sessions from the same participant (i.e. within-participant reliability) than across different sets of sessions from different participants (i.e. between-participant reliability). We show that the energy landscape analysis has significantly higher within-participant than between-participant test-retest reliability with respect to four commonly used indices. We also show that a variational Bayesian method, which enables us to estimate energy landscapes tailored to each participant, displays comparable test-retest reliability to that using the conventional likelihood maximization method. The proposed methodology paves the way to perform individual-level energy landscape analysis for given data sets with a statistically controlled reliability.

3.

Investigating the effects of chiropractic care on resting-state EEG of MCI patients.

Ziloochi, Fahimeh; Niazi, Imran Khan; Amjad, Imran; Cade, Alice; Duehr, Jenna; Ghani, Usman; Holt, Kelly; Haavik, Heidi; Shalchyan, Vahid.

Front Aging Neurosci ; 16: 1406664, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38919600

RESUMEN

Introduction: Mild cognitive impairment (MCI) is a stage between health and dementia, with various symptoms including memory, language, and visuospatial impairment. Chiropractic, a manual therapy that seeks to improve the function of the body and spine, has been shown to affect sensorimotor processing, multimodal sensory processing, and mental processing tasks. Methods: In this paper, the effect of chiropractic intervention on Electroencephalogram (EEG) signals in patients with mild cognitive impairment was investigated. EEG signals from two groups of patients with mild cognitive impairment (n = 13 people in each group) were recorded pre- and post-control and chiropractic intervention. A comparison of relative power was done with the support vector machine (SVM) method and non-parametric cluster-based permutation test showing the two groups could be separately identified with high accuracy. Results: The highest accuracy was obtained in beta2 (25-35 Hz) and theta (4-8 Hz) bands. A comparison of different brain areas with the SVM method showed that the intervention had a greater effect on frontal areas. Also, interhemispheric coherence in all regions increased significantly after the intervention. The results of the Wilcoxon test showed that intrahemispheric coherence changes in frontal-occipital, frontal-temporal and right temporal-occipital regions were significantly different in two groups. Discussion: Comparison of the results obtained from chiropractic intervention and previous studies shows that chiropractic intervention can have a positive effect on MCI disease and using this method may slow down the progression of mild cognitive impairment to Alzheimer's disease.

4.

Statistical Methods for Comparing Predictive Values in Medical Diagnosis.

Park, Chanrim; Park, Seo Young; Kim, Hwa Jung; Shin, Hee Jung.

Korean J Radiol ; 25(7): 656-661, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38942459

RESUMEN

Evaluating the performance of a binary diagnostic test, including artificial intelligence classification algorithms, involves measuring sensitivity, specificity, positive predictive value, and negative predictive value. Particularly when comparing the performance of two diagnostic tests applied on the same set of patients, these metrics are crucial for identifying the more accurate test. However, comparing predictive values presents statistical challenges because their denominators depend on the test outcomes, unlike the comparison of sensitivities and specificities. This paper reviews existing methods for comparing predictive values and proposes using the permutation test. The permutation test is an intuitive, non-parametric method suitable for datasets with small sample sizes. We demonstrate each method using a dataset from MRI and combined modality of mammography and ultrasound in diagnosing breast cancer.

Asunto(s)

Neoplasias de la Mama , Imagen por Resonancia Magnética , Valor Predictivo de las Pruebas , Humanos , Neoplasias de la Mama/diagnóstico por imagen , Femenino , Imagen por Resonancia Magnética/métodos , Mamografía/métodos , Sensibilidad y Especificidad , Algoritmos , Ultrasonografía Mamaria/métodos

5.

Extending the CWM approach to intraspecific trait variation: how to deal with overly optimistic standard tests?

Zelený, David; Helsen, Kenny; Lee, Yi-Nuo.

Oecologia ; 205(2): 257-269, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38806949

RESUMEN

Community weighted means (CWMs) are widely used to study the relationship between community-level functional traits and environment. For certain null hypotheses, CWM-environment relationships assessed by linear regression or ANOVA and tested by standard parametric tests are prone to inflated Type I error rates. Previous research has found that this problem can be solved by permutation tests (i.e., the max test). A recent extension of the CWM approach allows the inclusion of intraspecific trait variation (ITV) by the separate calculation of fixed, site-specific, and intraspecific CWMs. The question is whether the same Type I error rate inflation exists for the relationship between environment and site-specific or intraspecific CWM. Using simulated and real-world community datasets, we show that site-specific CWM-environment relationships have also inflated Type I error rate, and this rate is negatively related to the relative ITV magnitude. In contrast, for intraspecific CWM-environment relationships, standard parametric tests have the correct Type I error rate, although somewhat reduced statistical power. We introduce an ITV-extended version of the max test, which can solve the inflation problem for site-specific CWM-environment relationships and, without considering ITV, becomes equivalent to the "original" max test used for the CWM approach. We show that this new ITV-extended max test works well across the full possible magnitude of ITV on both simulated and real-world data. Most real datasets probably do not have intraspecific trait variation large enough to alleviate the problem of inflated Type I error rate, and published studies possibly report overly optimistic significance results.

Asunto(s)

Ecosistema

6.

In-depth analysis of volatolomic and odorous profiles of novel craft beer by permutation test features selection and multivariate correlation analysis.

Tufariello, Maria; Palombi, Lorenzo; Baiano, Antonietta; Grieco, Francesco.

Food Chem ; 453: 139702, 2024 Sep 30.

Artículo en Inglés | MEDLINE | ID: mdl-38772309

RESUMEN

This research explored the impact of binary cereal blends [barley with durum wheat (DW) and soft wheat (CW)], four autochthonous yeast strains (9502, 9518, 14061 and 17290) and two refermentation sugar concentrations (6-9 g/L), on volatolomics (VOCs) and odour profiles of craft beers using unsupervised statistics. For the first time, we applied permutation test to select volatiles with higher significance in explaining variance among samples. The unsupervised approach on the 19 selected VOCs revealed cereal-yeast interaction to be the main source of variability and DW-9502-6/9, DW-17290-6, CW-17290-6 and CW-9518-6 being the best technological strategies. In particular, in samples DW-9502-6/9, concentrations of some of the selected volatiles were observed to be approximately three to more than seven times higher than the average. PLS-correlation between VOCs and odour profiles proved to be very useful in assessing the weight of each of the selected VOCs on the perception of odour notes.

Asunto(s)

Cerveza , Odorantes , Compuestos Orgánicos Volátiles , Cerveza/análisis , Odorantes/análisis , Compuestos Orgánicos Volátiles/química , Compuestos Orgánicos Volátiles/análisis , Análisis Multivariante , Triticum/química , Triticum/genética , Hordeum/química , Hordeum/genética , Hordeum/microbiología , Humanos , Fermentación

7.

Statistical considerations in model-based dose finding for binary responses under model uncertainty.

Yan, Zhiwu; Yang, Min.

Stat Med ; 43(12): 2472-2485, 2024 May 30.

Artículo en Inglés | MEDLINE | ID: mdl-38605556

RESUMEN

The statistical methodology for model-based dose finding under model uncertainty has attracted increasing attention in recent years. While the underlying principles are simple and easy to understand, developing and implementing an efficient approach for binary responses can be a formidable task in practice. Motivated by the statistical challenges encountered in a phase II dose finding study, we explore several key design and analysis issues related to the hybrid testing-modeling approaches for binary responses. The issues include candidate model selection and specifications, optimal design and efficient sample size allocations, and, notably, the methods for dose-response testing and estimation. Specifically, we consider a class of generalized linear models suited for the candidate set and establish D-optimal designs for these models. Additionally, we propose using permutation-based tests for dose-response testing to avoid asymptotic normality assumptions typically required for contrast-based tests. We perform trial simulations to enhance our understanding of these issues.

Asunto(s)

Simulación por Computador , Relación Dosis-Respuesta a Droga , Modelos Estadísticos , Humanos , Incertidumbre , Modelos Lineales , Ensayos Clínicos Fase II como Asunto/métodos , Ensayos Clínicos Fase II como Asunto/estadística & datos numéricos , Tamaño de la Muestra , Proyectos de Investigación , Interpretación Estadística de Datos

8.

Inferential procedures based on the weighted Pearson correlation coefficient test statistic.

Yu, Han; Hutson, Alan D.

J Appl Stat ; 51(3): 481-496, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38370269

RESUMEN

In this note, we evaluated the type I error control of the commonly used t-test found in most statistical software packages for testing the hypothesis on H0:ρ=0 vs. H1:ρ>0 based on the sample weighted Pearson correlation coefficient. We found the type I error rate is severely inflated in general cases, even under bivariate normality. To address this issue, we derived the large sample variance of the weighted Pearson correlation. Based on this result, we proposed an asymptotic test and a set of studentized permutation tests. A comprehensive set of simulation studies with a range of sample sizes and a variety of underlying distributions were conducted. The studentized permutation test based on Fisher's Z statistic was shown to robustly control the type I error even in the small sample and non-normality settings. The method was demonstrated with an example data of country-level preterm birth rates.

9.

The siren song of so-called evidence: Why the evidence for social ecology models is not as strong as we think.

Zhong, Jingwen; Brashears, Matthew E.

Soc Sci Res ; 118: 102978, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38336421

RESUMEN

Ecological competition models from biology have been adopted for the study of a wide variety of social entities, including workplace organizations and voluntary associations.Despite their popularity, a number of fundamental challenges to these models have not been sufficiently recognized or addressed. As a result, it's possible that some apparently supportive evidence for ecological competition is in fact the outcome of chance or other processes. We propose a permutation test to compare observed evidence for ecological competition against an appropriate counterfactual population. To demonstrate our approach and validate our concern about the quality of evidence for ecological competition models, we apply the permutation test to one specific case. The results indicate that K-correlation values that have been taken as evidence for a well-established model, the Ecology of Affiliation, are quite common even in the absence of ecological competition. We conclude that the existing evidence for social ecology models may not be as reliable as commonly believed due to the disconnect between theory and empirical testing.

Asunto(s)

Ecología , Modelos Teóricos , Humanos , Medio Social

10.

The effects of trauma on feedback processing: an MEG study.

Sawalma, Abdulrahman S; Kiefer, Christian M; Boers, Frank; Shah, N Jon; Khudeish, Nibal; Neuner, Irene; Herzallah, Mohammad M; Dammers, Jürgen.

Front Neurosci ; 17: 1172549, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38027493

RESUMEN

The cognitive impact of psychological trauma can manifest as a range of post-traumatic stress symptoms that are often attributed to impairments in learning from positive and negative outcomes, aka reinforcement learning. Research on the impact of trauma on reinforcement learning has mainly been inconclusive. This study aimed to circumscribe the impact of psychological trauma on reinforcement learning in the context of neural response in time and frequency domains. Two groups of participants were tested - those who had experienced psychological trauma and a control group who had not - while they performed a probabilistic classification task that dissociates learning from positive and negative feedback during a magnetoencephalography (MEG) examination. While the exposure to trauma did not exhibit any effects on learning accuracy or response time for positive or negative feedback, MEG cortical activity was modulated in response to positive feedback. In particular, the medial and lateral orbitofrontal cortices (mOFC and lOFC) exhibited increased activity, while the insular and supramarginal cortices showed decreased activity during positive feedback presentation. Furthermore, when receiving negative feedback, the trauma group displayed higher activity in the medial portion of the superior frontal cortex. The timing of these activity changes occurred between 160 and 600 ms post feedback presentation. Analysis of the time-frequency domain revealed heightened activity in theta and alpha frequency bands (4-10 Hz) in the lOFC in the trauma group. Moreover, dividing the two groups according to their learning performance, the activity for the non-learner subgroup was found to be lower in lOFC and higher in the supramarginal cortex. These differences were found in the trauma group only. The results highlight the localization and neural dynamics of feedback processing that could be affected by exposure to psychological trauma. This approach and associated findings provide a novel framework for understanding the cognitive correlates of psychological trauma in relation to neural dynamics in the space, time, and frequency domains. Subsequent work will focus on the stratification of cognitive and neural correlates as a function of various symptoms of psychological trauma. Clinically, the study findings and approach open the possibility for neuromodulation interventions that synchronize cognitive and psychological constructs for individualized treatment.

11.

Testing exchangeability of multivariate distributions.

Kalina, Jan; Janácek, Patrik.

J Appl Stat ; 50(15): 3142-3156, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37969545

RESUMEN

Although there have been a number of available tests of bivariate exchangeability, i.e. bivariate symmetry for bivariate distributions, the literature is void of tests whether a multivariate distribution with more than two dimensions is exchangeable or not. In this paper, multivariate permutation tests of exchangeability of multivariate distributions are proposed, which are based on the non-parametric combination methodology, i.e. on combining non-parametric bivariate exchangeability tests. Numerical experiments on real as well as simulated multivariate data with more than two dimensions are presented here. The multivariate permutation test turns out to be typically more powerful than a bivariate exchangeability test performed only over a single pair of variables, and also more suitable compared to tests exploiting the approaches of Benjamini-Yekutieli or Bonferroni.

12.

Topological inference on brain networks across subtypes of post-stroke aphasia.

Wang, Yuan; Yin, Jian; Desai, Rutvik H.

ArXiv ; 2023 Nov 02.

Artículo en Inglés | MEDLINE | ID: mdl-37961747

RESUMEN

Persistent homology (PH) characterizes the shape of brain networks through the persistence features. Group comparison of persistence features from brain networks can be challenging as they are inherently heterogeneous. A recent scale-space representation of persistence diagram (PD) through heat diffusion reparameterizes using the finite number of Fourier coefficients with respect to the Laplace-Beltrami (LB) eigenfunction expansion of the domain, which provides a powerful vectorized algebraic representation for group comparisons of PDs. In this study, we advance a transposition-based permutation test for comparing multiple groups of PDs through the heat-diffusion estimates of the PDs. We evaluate the empirical performance of the spectral transposition test in capturing within- and between-group similarity and dissimilarity with respect to statistical variation of topological noise and hole location. We also illustrate how the method extends naturally into a clustering scheme by subtyping individuals with post-stroke aphasia through the PDs of their resting-state functional brain networks.

13.

Limitation of permutation-based differential correlation analysis.

Song, Hoseung; Wu, Michael C.

Genet Epidemiol ; 47(8): 637-641, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37947279

RESUMEN

The comparison of biological systems, through the analysis of molecular changes under different conditions, has played a crucial role in the progress of modern biological science. Specifically, differential correlation analysis (DCA) has been employed to determine whether relationships between genomic features differ across conditions or outcomes. Because ascertaining the null distribution of test statistics to capture variations in correlation is challenging, several DCA methods utilize permutation which can loosen parametric (e.g., normality) assumptions. However, permutation is often problematic for DCA due to violating the assumption that samples are exchangeable under the null. Here, we examine the limitations of permutation-based DCA and investigate instances where the permutation-based DCA exhibits poor performance. Experimental results show that the permutation-based DCA often fails to control the type I error under the null hypothesis of equal correlation structures.

Asunto(s)

Genómica , Humanos , Estadística como Asunto

14.

Model-agnostic unsupervised detection of bots in a Likert-type questionnaire.

Ilagan, Michael John; Falk, Carl F.

Behav Res Methods ; 2023 Nov 20.

Artículo en Inglés | MEDLINE | ID: mdl-37985637

RESUMEN

To detect bots in online survey data, there is a wealth of literature on statistical detection using only responses to Likert-type items. There are two traditions in the literature. One tradition requires labeled data, forgoing strong model assumptions. The other tradition requires a measurement model, forgoing collection of labeled data. In the present article, we consider the problem where neither requirement is available, for an inventory that has the same number of Likert-type categories for all items. We propose a bot detection algorithm that is both model-agnostic and unsupervised. Our proposed algorithm involves a permutation test with leave-one-out calculations of outlier statistics. For each respondent, it outputs a p value for the null hypothesis that the respondent is a bot. Such an algorithm offers nominal sensitivity calibration that is robust to the bot response distribution. In a simulation study, we found our proposed algorithm to improve upon naive alternatives in terms of 95% sensitivity calibration and, in many scenarios, in terms of classification accuracy.

15.

RgnTX: Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity.

Wang, Yue; Wei, Zhen; Su, Jionglong; Coenen, Frans; Meng, Jia.

Comput Struct Biotechnol J ; 21: 4110-4117, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37671241

RESUMEN

Colocalization analysis of genomic region sets has been widely adopted to unveil potential functional interactions between corresponding biological attributes, which often serves as the basis for further investigation. A number of methods have been developed for colocalization analysis of genomic elements. However, none of them explicitly considered the transcriptome heterogeneity and isoform ambiguity, making them less appropriate for analyzing transcriptome elements. Here, we developed RgnTX, an R/Bioconductor tool for the colocalization analysis of transcriptome elements with permutation tests. Different from existing approaches, RgnTX directly takes advantage of transcriptome annotation, and offers high flexibility in the null model to simulate realistic transcriptome-wide background, such as the complex alternative splicing patterns. Importantly, it supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. Proposed package offers a wide selection of pre-defined functions, easy to be utilized by users for visualizing permutation results, calculating shifted z-scores and conducting multiple hypothesis testing under Benjamini-Hochberg correction. Moreover, with synthetic and real datasets, we show that RgnTX novel testing modes return distinct and more significant results compared to existing genome-based methods. We believe RgnTX should make a useful tool to characterize the randomness of the transcriptome, and for conducting statistical association analysis for genomic region sets within the heterogeneous transcriptome. The package now has been accepted by Bioconductor and is freely available at: https://bioconductor.org/packages/RgnTX.

16.

Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method.

Shi, Yang; Shi, Weiping; Wang, Mengqiao; Lee, Ji-Hyun; Kang, Huining; Jiang, Hui.

Stat Appl Genet Mol Biol ; 22(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-37622330

RESUMEN

Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge in the application of permutation tests in genomic studies is that an enormous number of permutations are often needed to obtain reliable estimates of very small p-values, leading to intensive computational effort. To address this issue, we develop algorithms for the accurate and efficient estimation of small p-values in permutation tests for paired and independent two-group genomic data, and our approaches leverage a novel framework for parameterizing the permutation sample spaces of those two types of data respectively using the Bernoulli and conditional Bernoulli distributions, combined with the cross-entropy method. The performance of our proposed algorithms is demonstrated through the application to two simulated datasets and two real-world gene expression datasets generated by microarray and RNA-Seq technologies and comparisons to existing methods such as crude permutations and SAMC, and the results show that our approaches can achieve orders of magnitude of computational efficiency gains in estimating small p-values. Our approaches offer promising solutions for the improvement of computational efficiencies of existing permutation test procedures and the development of new testing methods using permutations in genomic data analysis.

Asunto(s)

Genómica , Proyectos de Investigación , Entropía , Algoritmos , Análisis de Datos

17.

Reliability of energy landscape analysis of resting-state functional MRI data.

Khanra, Pitambar; Nakuci, Johan; Muldoon, Sarah; Watanabe, Takamitsu; Masuda, Naoki.

ArXiv ; 2023 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-37396616

RESUMEN

Energy landscape analysis is a data-driven method to analyze multidimensional time series, including functional magnetic resonance imaging (fMRI) data. It has been shown to be a useful characterization of fMRI data in health and disease. It fits an Ising model to the data and captures the dynamics of the data as movement of a noisy ball constrained on the energy landscape derived from the estimated Ising model. In the present study, we examine test-retest reliability of the energy landscape analysis. To this end, we construct a permutation test that assesses whether or not indices characterizing the energy landscape are more consistent across different sets of scanning sessions from the same participant (i.e., within-participant reliability) than across different sets of sessions from different participants (i.e., between-participant reliability). We show that the energy landscape analysis has significantly higher within-participant than between-participant test-retest reliability with respect to four commonly used indices. We also show that a variational Bayesian method, which enables us to estimate energy landscapes tailored to each participant, displays comparable test-retest reliability to that using the conventional likelihood maximization method. The proposed methodology paves the way to perform individual-level energy landscape analysis for given data sets with a statistically controlled reliability.

18.

A New Analysis of Real-Time Fatality Rate in the Initial Stage of COVID-19.

Zhou, Chuanbo; Fang, Jiaohong; Mao, Mingzhi.

Entropy (Basel) ; 25(7)2023 Jul 06.

Artículo en Inglés | MEDLINE | ID: mdl-37509975

RESUMEN

Mortality is one of the most important epidemiological measures and a key indicator of the effectiveness of potential treatments or interventions. In this paper, a permutation test method of variance analysis is proposed to test the null hypothesis that the real-time fatality rates of multiple groups were equal during the epidemic period. In light of large-scale simulation studies, the proposed test method can accurately identify the differences between different groups and display satisfactory performance. We apply the proposed method to the real dataset of the COVID-19 epidemic in mainland China (excluding Hubei), Hubei Province (excluding Wuhan), and Wuhan from 31 January 2020 to 30 March 2020. By comparing the differences in the disease severity for differential cities, we show that the severity of the early disease of COVID-19 may be related to the effectiveness of interventions and the improvement in medical resources.

19.

The importance of choosing a proper validation strategy in predictive models. A tutorial with real examples.

Lopez, Eneko; Etxebarria-Elezgarai, Jaione; Amigo, Jose Manuel; Seifert, Andreas.

Anal Chim Acta ; 1275: 341532, 2023 Sep 22.

Artículo en Inglés | MEDLINE | ID: mdl-37524478

RESUMEN

Machine learning is the art of combining a set of measurement data and predictive variables to forecast future events. Every day, new model approaches (with high levels of sophistication) can be found in the literature. However, less importance is given to the crucial stage of validation. Validation is the assessment that the model reliably links the measurements and the predictive variables. Nevertheless, there are many ways in which a model can be validated and cross-validated reliably, but still, it may be a model that wrongly reflects the real nature of the data and cannot be used to predict external samples. This manuscript shows in a didactical manner how important the data structure is when a model is constructed and how easy it is to obtain models that look promising with wrong-designed cross-validation and external validation strategies. A comprehensive overview of the main validation strategies is shown, exemplified by three different scenarios, all of them focused on classification.

20.

Identifying potential significant factors impacting zero-inflated proportion data.

Ribaud, Mélina; Gabriel, Edith; Hughes, Joseph; Soubeyrand, Samuel.

Stat Med ; 42(19): 3467-3486, 2023 08 30.

Artículo en Inglés | MEDLINE | ID: mdl-37290435

RESUMEN

Classical supervised methods like linear regression and decision trees are not completely adapted for identifying impacting factors on a response variable corresponding to zero-inflated proportion data (ZIPD) that are dependent, continuous and bounded. In this article we propose a within-block permutation-based methodology to identify factors (discrete or continuous) that are significantly correlated with ZIPD, we propose a performance indicator quantifying the percentage of correlation explained by the subset of significant factors, and we show how to predict the ranks of the response variables conditionally on the observation of these factors. The methodology is illustrated on simulated data and on two real data sets dealing with epidemiology. In the first data set, ZIPD correspond to probabilities of transmission of Influenza between horses. In the second data set, ZIPD correspond to probabilities that geographic entities (eg, states and countries) have the same COVID-19 mortality dynamics.

Asunto(s)

COVID-19 , Modelos Estadísticos , Animales , Caballos , COVID-19/epidemiología , Modelos Lineales , Probabilidad

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA