Pesquisa | Portal Regional da BVS

Homogeneity pursuit and variable selection in regression models for multivariate abundance data.

Hui, Francis K C; Maestrini, Luca; Welsh, Alan H.

Biometrics ; 80(1)2024 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-38364807

RESUMO

When building regression models for multivariate abundance data in ecology, it is important to allow for the fact that the species are correlated with each other. Moreover, there is often evidence species exhibit some degree of homogeneity in their responses to each environmental predictor, and that most species are informed by only a subset of predictors. We propose a generalized estimating equation (GEE) approach for simultaneous homogeneity pursuit (ie, grouping species with similar coefficient values while allowing differing groups for different covariates) and variable selection in regression models for multivariate abundance data. Using GEEs allows us to straightforwardly account for between-response correlations through a (reduced-rank) working correlation matrix. We augment the GEE with both adaptive fused lasso- and adaptive lasso-type penalties, which aim to cluster the species-specific coefficients within each covariate and encourage differing levels of sparsity across the covariates, respectively. Numerical studies demonstrate the strong finite sample performance of the proposed method relative to several existing approaches for modeling multivariate abundance data. Applying the proposed method to presence-absence records collected along the Great Barrier Reef in Australia reveals both a substantial degree of homogeneity and sparsity in species-environmental relationships. We show this leads to a more parsimonious model for understanding the environmental drivers of seabed biodiversity, and results in stronger out-of-sample predictive performance relative to methods that do not accommodate such features.

C-reactive protein and serum creatinine, but not haemoglobin A1c, are independent predictors of coronary heart disease risk in non-diabetic Chinese.

Salim, Agus; Tai, E Shyong; Tan, Vincent Y; Welsh, Alan H; Liew, Reginald; Naidoo, Nasheen; Wu, Yi; Yuan, Jian-Min; Koh, Woon P; van Dam, Rob M.

Eur J Prev Cardiol ; 23(12): 1339-49, 2016 08.

Artigo em Inglês | MEDLINE | ID: mdl-26780920

RESUMO

BACKGROUND: In western populations, high-sensitivity C-reactive protein (hsCRP), and to a lesser degree serum creatinine and haemoglobin A1c, predict risk of coronary heart disease (CHD). However, data on Asian populations that are increasingly affected by CHD are sparse and it is not clear whether these biomarkers can be used to improve CHD risk classification. DESIGN AND METHODS: We conducted a nested case-control study within the Singapore Chinese Health Study cohort, with incident 'hard' CHD (myocardial infarction or CHD death) as an outcome. We used data from 965 men (298 cases, 667 controls) and 528 women (143 cases, 385 controls) to examine the utility of hsCRP, serum creatinine and haemoglobin A1c in improving the prediction of CHD risk over and above traditional risk factors for CHD included in the ATP III model. For each sex, the performance of models with only traditional risk factors used in the ATP III model was compared with models with the biomarkers added using weighted Cox proportional hazards analysis. The impact of adding these biomarkers was assessed using the net reclassification improvement index. RESULTS: For men, loge hsCRP (hazard ratio 1.25, 95% confidence interval: 1.05; 1.49) and loge serum creatinine (hazard ratio 4.82, 95% confidence interval: 2.10; 11.04) showed statistically significantly associations with CHD risk when added to the ATP III model. We did not observe a significant association between loge haemoglobin A1c and CHD risk (hazard ratio 1.83, 95% confidence interval: 0.21; 16.06). Adding hsCRP and serum creatinine to the ATP III model improved risk classification in men with a net gain of 6.3% of cases (p-value = 0.001) being reclassified to a higher risk category, while it did not significantly reduce the accuracy of classification for non-cases. For women, squared hsCRP was borderline significantly (hazard ratio 1.01, 95% confidence interval: 1.00; 1.03) and squared serum creatinine was significantly (hazard ratio 1.81, 95% confidence interval: 1.49; 2.21) associated with CHD risk. However, the association between squared haemoglobin A1c and CHD risk was not significant (hazard ratio 1.05, 95% confidence interval: 0.99; 1.12). The addition of hsCRP and serum creatinine to the ATP III model resulted in 3.7% of future cases being reclassified to a higher risk category (p-value = 0.025), while it did not significantly reduce the accuracy of classification for non-cases. CONCLUSION: Adding hsCRP and serum creatinine, but not haemoglobin A1c, to traditional risk factors improved CHD risk prediction among non-diabetic Singaporean Chinese. The improved risk estimates will allow better identification of individuals at high risk of CHD than existing risk calculators such as the ATP III model.

Assuntos

Proteína C-Reativa/metabolismo , Doença das Coronárias/sangue , Creatinina/sangue , Hemoglobinas Glicadas/metabolismo , Medição de Risco , Idoso , Biomarcadores/sangue , Estudos de Casos e Controles , Doença das Coronárias/epidemiologia , Diabetes Mellitus , Feminino , Seguimentos , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Fatores de Risco , Singapura/epidemiologia

Adjusting for one issue while ignoring others can make things worse.

Welsh, Alan H; Lindenmayer, David B; Donnelly, Christine F.

PLoS One ; 10(3): e0120817, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25786210

Assuntos

Algoritmos , Ecologia/estatística & dados numéricos , Modelos Biológicos , Modelos Estatísticos

Response.

Welsh, Alan H; Knight, Emma J.

Med Sci Sports Exerc ; 47(4): 886, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25783667

Assuntos

Teorema de Bayes , Interpretação Estatística de Dados , Medicina Esportiva/estatística & dados numéricos , Humanos

"Magnitude-based inference": a statistical review.

Welsh, Alan H; Knight, Emma J.

Med Sci Sports Exerc ; 47(4): 874-84, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25051387

RESUMO

PURPOSE: We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. METHODS: We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. RESULTS AND CONCLUSIONS: We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

Assuntos

Teorema de Bayes , Interpretação Estatística de Dados , Medicina Esportiva/estatística & dados numéricos , Humanos

Fitting and interpreting occupancy models.

Welsh, Alan H; Lindenmayer, David B; Donnelly, Christine F.

PLoS One ; 8(1): e52015, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23326323

RESUMO

We show that occupancy models are more difficult to fit than is generally appreciated because the estimating equations often have multiple solutions, including boundary estimates which produce fitted probabilities of zero or one. The estimates are unstable when the data are sparse, making them difficult to interpret, and, even in ideal situations, highly variable. As a consequence, making accurate inference is difficult. When abundance varies over sites (which is the general rule in ecology because we expect spatial variance in abundance) and detection depends on abundance, the standard analysis suffers bias (attenuation in detection, biased estimates of occupancy and potentially finding misleading relationships between occupancy and other covariates), asymmetric sampling distributions, and slow convergence of the sampling distributions to normality. The key result of this paper is that the biases are of similar magnitude to those obtained when we ignore non-detection entirely. The fact that abundance is subject to detection error and hence is not directly observable, means that we cannot tell when bias is present (or, equivalently, how large it is) and we cannot adjust for it. This implies that we cannot tell which fit is better: the fit from the occupancy model or the fit ignoring the possibility of detection error. Therefore trying to adjust occupancy models for non-detection can be as misleading as ignoring non-detection completely. Ignoring non-detection can actually be better than trying to adjust for it.

Assuntos

Algoritmos , Ecologia/estatística & dados numéricos , Modelos Biológicos , Modelos Estatísticos , Simulação por Computador , Ecologia/métodos , Densidade Demográfica , Dinâmica Populacional , Reprodutibilidade dos Testes

Designing 2-phase prevalence studies in the absence of a "gold standard" test.

Salim, Agus; Welsh, Alan H.

Am J Epidemiol ; 170(3): 369-78, 2009 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-19505999

RESUMO

A population survey for estimating prevalence is challenging when a disease or condition is difficult to diagnose. If clinical diagnosis is expensive, a 2-phase study, in which less expensive but less accurate tests are administered to all study subjects in the first phase (screening phase) and a more accurate but expensive or time-consuming test is administered to only a subset of the subjects in the second phase, is an attractive approach. Published research has discussed ways of maximizing precision of the prevalence estimate from a 2-phase study with a "gold standard" second-phase test. For many psychiatric disorders, even the best diagnostic tests are not of gold standard quality. In this paper, the authors propose a quasi-optimal design for 2-phase prevalence studies without a gold standard test; random-effects latent class analysis facilitates the estimation of prevalence and appropriately addresses the issue of dependent errors among the diagnostic tests. The authors show that the quasi-optimal design is efficient compared with the balanced and random designs when there is strong inter-test dependence caused by additional factors, apart from disease status, and highlight the importance of collecting data on those subjects testing negative in the first phase.

Assuntos

Transtornos Cognitivos/epidemiologia , Projetos de Pesquisa Epidemiológica , Algoritmos , Austrália/epidemiologia , Transtornos Cognitivos/diagnóstico , Estudos de Coortes , Estudos Transversais , Feminino , Humanos , Masculino , Computação Matemática , Pessoa de Meia-Idade , Modelos Estatísticos , Seleção de Pacientes , Valor Preditivo dos Testes , Prevalência , Estudos de Amostragem , Sensibilidade e Especificidade

Statistical modeling of a ligand knowledge base.

Mansson, Ralph A; Welsh, Alan H; Fey, Natalie; Orpen, A Guy.

J Chem Inf Model ; 46(6): 2591-600, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17125199

RESUMO

A range of different statistical models has been fitted to experimental data for the Tolman electronic parameter (TEP) based on a large set of calculated descriptors in a prototype ligand knowledge base (LKB) of phosphorus(III) donor ligands. The models have been fitted by ordinary least squares using subsets of descriptors, principal component regression, and partial least squares which use variables derived from the complete set of descriptors, least angle regression, and the least absolute shrinkage and selection operator. None of these methods is robust against outliers, so we also applied a robust estimation procedure to the linear regression model. Criteria for model evaluation and comparison have been discussed, highlighting the importance of resampling methods for assessing the robustness of models and the scope for making predictions in chemically intuitive models. For the ligands covered by this LKB, ordinary least squares models of descriptor subsets provide a good representation of the data, while partial least squares, principal component regression, and least angle regression models are less suitable for our dual aims of prediction and interpretation. A linear regression model with robustly fitted parameters achieves the best model performance over all classes of models fitted to TEP data, and the weightings assigned to ligands during the robust estimation procedure are chemically intuitive. The increased model complexity when compared to the ordinary least squares linear model is justified by the reduced influence of individual ligands on the model parameters and predictions of new ligands. Robust linear regression models therefore represent the best compromise for achieving statistical robustness in simple, chemically meaningful models.

Assuntos

Fósforo/química , Algoritmos , Elétrons , Bases de Conhecimento , Ligantes , Modelos Lineares , Modelos Químicos , Modelos Estatísticos , Modelos Teóricos , Dinâmica não Linear , Análise de Regressão , Software

RESUMO

RESUMO

Assuntos

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA