RESUMEN
Emotion dysregulation is a central process implicated in the genesis and maintenance of obsessive-compulsive disorder (OCD). However, past research on OCD has examined emotion regulation with a trait-level approach, thereby neglecting important situational and temporal dynamics. The present study is the first one to examine moment-to-moment emotion regulation in individuals with OCD. A 6-day ecological momentary assessment was used to assess affect, emotion regulation strategies, perceived effectiveness of emotion regulation strategies, and acceptance of emotional experiences in nâ¯=â¯72 individuals with OCD and nâ¯=â¯54 psychologically healthy controls. As expected, individuals with OCD reported more negative and less positive affect. Group differences in positive (but not negative) affect did remain significant when controlling for baseline depression. Furthermore, the OCD group reported to use a higher momentary number of avoidance-oriented regulation strategies and less perceived effectiveness of emotion regulation, even when controlling for current symptoms and negative affect or baseline depression scores. Further, irrespective of group, more momentary negative affect amplified use of avoidance-oriented strategies and diminished perceived effectiveness and emotional acceptance. Contrary to expectations, these effects were not more pronounced in the OCD group. Possible explanations for unexpected findings and implications for future research, particularly regarding more holistic emotion regulation treatments, are discussed.
Asunto(s)
Evaluación Ecológica Momentánea , Regulación Emocional , Trastorno Obsesivo Compulsivo , Humanos , Trastorno Obsesivo Compulsivo/psicología , Femenino , Masculino , Adulto , Persona de Mediana Edad , Depresión/psicología , Afecto , Emociones , Adulto JovenRESUMEN
We propose a novel method for predicting time-to-event data in the presence of cure fractions based on flexible survival models integrated into a deep neural network (DNN) framework. Our approach allows for nonlinear relationships and high-dimensional interactions between covariates and survival and is suitable for large-scale applications. To ensure the identifiability of the overall predictor formed of an additive decomposition of interpretable linear and nonlinear effects and potential higher-dimensional interactions captured through a DNN, we employ an orthogonalization layer. We demonstrate the usefulness and computational efficiency of our method via simulations and apply it to a large portfolio of U.S. mortgage loans. Here, we find not only a better predictive performance of our framework but also a more realistic picture of covariate effects.
RESUMEN
OBJECTIVE: Recent evidence shows that during slow-wave sleep (SWS), the brain is cleared from potentially toxic metabolites, such as the amyloid-beta protein. Poor sleep or elevated cortisol levels can worsen amyloid-beta clearance, potentially leading to the formation of amyloid plaques, a neuropathological hallmark of Alzheimer disease. Here, we explored how nocturnal neural and endocrine activity affects amyloid-beta fluctuations in the peripheral blood. METHODS: We acquired simultaneous polysomnography and all-night blood sampling in 60 healthy volunteers aged 20-68 years. Nocturnal plasma concentrations of amyloid-beta-40, amyloid-beta-42, cortisol, and growth hormone were assessed every 20 minutes. Amyloid-beta fluctuations were modeled with sleep stages, (non)oscillatory power, and hormones as predictors while controlling for age and participant-specific random effects. RESULTS: Amyloid-beta-40 and amyloid-beta-42 levels correlated positively with growth hormone concentrations, SWS proportion, and slow-wave (0.3-4Hz) oscillatory and high-band (30-48Hz) nonoscillatory power, but negatively with cortisol concentrations and rapid eye movement sleep (REM) proportion measured 40-100 minutes previously (all t values > |3|, p values < 0.003). Older participants showed higher amyloid-beta-40 levels. INTERPRETATION: Slow-wave oscillations are associated with higher plasma amyloid-beta levels, whereas REM sleep is related to decreased amyloid-beta plasma levels, possibly representing changes in central amyloid-beta production or clearance. Strong associations between cortisol, growth hormone, and amyloid-beta presumably reflect the sleep-regulating role of the corresponding releasing hormones. A positive association between age and amyloid-beta-40 may indicate that peripheral clearance becomes less efficient with age. ANN NEUROL 2024;96:46-60.
Asunto(s)
Péptidos beta-Amiloides , Polisomnografía , Sueño REM , Sueño de Onda Lenta , Humanos , Persona de Mediana Edad , Péptidos beta-Amiloides/sangre , Péptidos beta-Amiloides/metabolismo , Adulto , Masculino , Anciano , Femenino , Sueño de Onda Lenta/fisiología , Adulto Joven , Sueño REM/fisiología , Hidrocortisona/sangre , Fragmentos de Péptidos/sangreRESUMEN
Motivation: Cell fate decisions, such as apoptosis or proliferation, are communicated via signaling pathways. The pathways are heavily intertwined and often consist of sequential interaction of proteins (kinases). Information integration takes place on the protein level via n-to-1 interactions. A state-of-the-art procedure to quantify information flow (edges) between signaling proteins (nodes) is network inference. However, edge weight calculation typically refers to 1-to-1 interactions only and relies on mean protein phosphorylation levels instead of single cell distributions. Information theoretic measures such as the mutual information (MI) have the potential to overcome these shortcomings but are still rarely used. Results: This work proposes a Bayesian nearest neighbor-based MI estimator (BannMI) to quantify n-to-1 kinase dependency in signaling pathways. BannMI outperforms the state-of-the-art MI estimator on protein-like data in terms of mean squared error and Pearson correlation. Using BannMI, we analyze apoptotic signaling in phosphoproteomic cancerous and noncancerous breast cell line data. Our work provides evidence for cooperative signaling of several kinases in programmed cell death and identifies a potential key role of the mitogen-activated protein kinase p38. Availability and implementation: Source code and applications are available at: https://github.com/zuiop11/nn_info and can be downloaded via Pip as Python package: nn-info.
RESUMEN
Recent years have seen the development of many novel scoring tools for disease prognosis and prediction. To become accepted for use in clinical applications, these tools have to be validated on external data. In practice, validation is often hampered by logistical issues, resulting in multiple small-sized validation studies. It is therefore necessary to synthesize the results of these studies using techniques for meta-analysis. Here we consider strategies for meta analyzing the concordance probability for time-to-event data ("C-index"), which has become a popular tool to evaluate the discriminatory power of prediction models with a right-censored outcome. We show that standard meta-analysis of the C-index may lead to biased results, as the magnitude of the concordance probability depends on the length of the time interval used for evaluation (defined e.g., by the follow-up time, which might differ considerably between studies). To address this issue, we propose a set of methods for random-effects meta-regression that incorporate time directly as covariate in the model equation. In addition to analyzing nonlinear time trends via fractional polynomial, spline, and exponential decay models, we provide recommendations on suitable transformations of the C-index before meta-regression. Our results suggest that the C-index is best meta-analyzed using fractional polynomial meta-regression with logit-transformed C-index values. Classical random-effects meta-analysis (not considering time as covariate) is demonstrated to be a suitable alternative when follow-up times are small. Our findings have implications for the reporting of C-index values in future studies, which should include information on the length of the time interval underlying the calculations.
Asunto(s)
Algoritmos , Pronóstico , ProbabilidadRESUMEN
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modeling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.
Asunto(s)
Algoritmos , Modelos Estadísticos , Niño , Humanos , Simulación por Computador , Australia , NigeriaRESUMEN
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup, each parameter of the copula model, that is, the marginal distribution parameters and the copula dependence parameters, can be related to covariates via structured additive predictors. We propose a framework to fit distributional copula regression via model-based boosting, which is a modern estimation technique that incorporates useful features like an intrinsic variable selection mechanism, parameter shrinkage and the capability to fit regression models in high-dimensional data setting, that is, situations with more covariates than observations. Thus, model-based boosting does not only complement existing Bayesian and maximum-likelihood based estimation frameworks for this model class but rather enables unique intrinsic mechanisms that can be helpful in many applied problems. The performance of our boosting algorithm for copula regression models with continuous margins is evaluated in simulation studies that cover low- and high-dimensional data settings and situations with and without dependence between the responses. Moreover, distributional copula boosting is used to jointly analyze and predict the length and the weight of newborns conditional on sonographic measurements of the fetus before delivery together with other clinical variables.
Asunto(s)
Algoritmos , Modelos Estadísticos , Recién Nacido , Humanos , Funciones de Verosimilitud , Teorema de Bayes , Simulación por ComputadorRESUMEN
BACKGROUND: Due to contradictory results in current research, whether age at menopause is increasing or decreasing in Western countries remains an open question, yet worth studying as later ages at menopause are likely to be related to an increased risk of breast cancer. Using data from breast cancer screening programs to study the temporal trend of age at menopause is difficult since especially younger women in the same generational cohort have often not yet reached menopause. Deleting these younger women in a breast cancer risk analyses may bias the results. The aim of this study is therefore to recover missing menopause ages as a covariate by comparing methods for handling missing data. Additionally, the study makes a contribution to understanding the evolution of age at menopause for several generations born in Portugal between 1920 and 1970. METHODS: Data from a breast cancer screening program in Portugal including 278,282 women aged 45-69 and collected between 1990 and 2010 are used to compare two approaches of imputing age at menopause: (i) a multiple imputation methodology based on a truncated distribution but ignoring the mechanism of missingness; (ii) a copula-based multiple imputation method that simultaneously handles the age at menopause and the missing mechanism. The linear predictors considered in both cases have a semiparametric additive structure accommodating linear and non-linear effects defined via splines or Markov random fields smoothers in the case of spatial variables. RESULTS: Both imputation methods unveiled an increasing trend of age at menopause when viewed as a function of the birth year for the youngest generation. This trend is hidden if we model only women with an observed age at menopause. CONCLUSION: When studying age at menopause, missing ages must be recovered with an adequate procedure for incomplete data. Imputing these missing ages avoids excluding the younger generation cohort of the screening program in breast cancer risk analyses and hence reduces the bias stemming from this exclusion. In addition, imputing the not yet observed ages of menopause for mostly younger women is also crucial when studying the time trend of age at menopause otherwise the analysis will be biased.
Asunto(s)
Neoplasias de la Mama , Menopausia , Sesgo , Neoplasias de la Mama/epidemiología , Estudios de Cohortes , Femenino , Humanos , Medición de RiesgoRESUMEN
There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use external information for variable selection to improve model interpretability and variable selection accuracy, thereby prediction quality. However, it is unclear to which extent, if at all, RF and ML methods may benefit from external information. In this paper, we examine the usefulness of external information from prior variable selection studies that used traditional statistical modeling approaches such as the Lasso, or suboptimal methods such as univariate selection. We conducted a plasmode simulation study based on subsampling a data set from a pharmacoepidemiologic study with nearly 200,000 individuals, two binary outcomes and 1152 candidate predictor (mainly sparse binary) variables. When the scope of candidate predictors was reduced based on external knowledge RF models achieved better calibration, that is, better agreement of predictions and observed outcome rates. However, prediction quality measured by cross-entropy, AUROC or the Brier score did not improve. We recommend appraising the methodological quality of studies that serve as an external information source for future prediction model development.
RESUMEN
Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical publications. This problem of knowledge transfer from statistical research to application was identified by some medical journals, which have published series of statistical tutorials and (shorter) papers mainly addressing medical researchers. The aim of this review was to assess the current level of knowledge with regard to regression modeling contained in such statistical papers. We searched for target series by a request to international statistical experts. We identified 23 series including 57 topic-relevant articles. Within each article, two independent raters analyzed the content by investigating 44 predefined aspects on regression modeling. We assessed to what extent the aspects were explained and if examples, software advices, and recommendations for or against specific methods were given. Most series (21/23) included at least one article on multivariable regression. Logistic regression was the most frequently described regression type (19/23), followed by linear regression (18/23), Cox regression and survival models (12/23) and Poisson regression (3/23). Most general aspects on regression modeling, e.g. model assumptions, reporting and interpretation of regression results, were covered. We did not find many misconceptions or misleading recommendations, but we identified relevant gaps, in particular with respect to addressing nonlinear effects of continuous predictors, model specification and variable selection. Specific recommendations on software were rarely given. Statistical guidance should be developed for nonlinear effects, model specification and variable selection to better support medical researchers who perform or interpret regression analyses.
Asunto(s)
Escritura Médica , Modelos Estadísticos , Análisis de Regresión , Humanos , Publicaciones Periódicas como AsuntoRESUMEN
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data (pAsunto(s)
Algoritmos
, Calidad de Vida
, Estudios de Cohortes
, Humanos
, Estudios Longitudinales
, Aprendizaje Automático
RESUMEN
BACKGROUND: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies. METHODS: We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant. RESULTS: Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection. CONCLUSIONS: The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.
Asunto(s)
Modelos Estadísticos , Causalidad , Simulación por Computador , HumanosRESUMEN
We propose a new highly flexible and tractable Bayesian approach to undertake variable selection in non-Gaussian regression models. It uses a copula decomposition for the joint distribution of observations on the dependent variable. This allows the marginal distribution of the dependent variable to be calibrated accurately using a nonparametric or other estimator. The family of copulas employed are "implicit copulas" that are constructed from existing hierarchical Bayesian models widely used for variable selection, and we establish some of their properties. Even though the copulas are high dimensional, they can be estimated efficiently and quickly using Markov chain Monte Carlo. A simulation study shows that when the responses are non-Gaussian, the approach selects variables more accurately than contemporary benchmarks. A real data example in the Web Appendix illustrates that accounting for even mild deviations from normality can lead to a substantial increase in accuracy. To illustrate the full potential of our approach, we extend it to spatial variable selection for fMRI. Using real data, we show our method allows for voxel-specific marginal calibration of the magnetic resonance signal at over 6000 voxels, leading to an increase in the quality of the activation maps.
Asunto(s)
Imagen por Resonancia Magnética , Teorema de Bayes , Simulación por Computador , Cadenas de Markov , Método de MontecarloRESUMEN
In the last decades, statistical methodology has developed rapidly, in particular in the field of regression modeling. Multivariable regression models are applied in almost all medical research projects. Therefore, the potential impact of statistical misconceptions within this field can be enormous Indeed, the current theoretical statistical knowledge is not always adequately transferred to the current practice in medical statistics. Some medical journals have identified this problem and published isolated statistical articles and even whole series thereof. In this systematic review, we aim to assess the current level of education on regression modeling that is provided to medical researchers via series of statistical articles published in medical journals. The present manuscript is a protocol for a systematic review that aims to assess which aspects of regression modeling are covered by statistical series published in medical journals that intend to train and guide applied medical researchers with limited statistical knowledge. Statistical paper series cannot easily be summarized and identified by common keywords in an electronic search engine like Scopus. We therefore identified series by a systematic request to statistical experts who are part or related to the STRATOS Initiative (STRengthening Analytical Thinking for Observational Studies). Within each identified article, two raters will independently check the content of the articles with respect to a predefined list of key aspects related to regression modeling. The content analysis of the topic-relevant articles will be performed using a predefined report form to assess the content as objectively as possible. Any disputes will be resolved by a third reviewer. Summary analyses will identify potential methodological gaps and misconceptions that may have an important impact on the quality of analyses in medical research. This review will thus provide a basis for future guidance papers and tutorials in the field of regression modeling which will enable medical researchers 1) to interpret publications in a correct way, 2) to perform basic statistical analyses in a correct way and 3) to identify situations when the help of a statistical expert is required.
Asunto(s)
Investigación Biomédica/estadística & datos numéricos , Modelos Estadísticos , Análisis de Regresión , Sesgo , Investigación Biomédica/educación , Bioestadística/métodos , Recolección de Datos , Manejo de Datos , Ciencia de los Datos/educación , Ciencia de los Datos/estadística & datos numéricos , Humanos , Estudios Observacionales como Asunto , Publicaciones Periódicas como AsuntoRESUMEN
Agricultural expansion drives biodiversity loss globally, but impact assessments are biased towards recent time periods. This can lead to a gross underestimation of species declines in response to habitat loss, especially when species declines are gradual and occur over long time periods. Using Cold War spy satellite images (Corona), we show that a grassland keystone species, the bobak marmot (Marmota bobak), continues to respond to agricultural expansion that happened more than 50 years ago. Although burrow densities of the bobak marmot today are highest in croplands, densities declined most strongly in areas that were persistently used as croplands since the 1960s. This response to historical agricultural conversion spans roughly eight marmot generations and suggests the longest recorded response of a mammal species to agricultural expansion. We also found evidence for remarkable philopatry: nearly half of all burrows retained their exact location since the 1960s, and this was most pronounced in grasslands. Our results stress the need for farsighted decisions, because contemporary land management will affect biodiversity decades into the future. Finally, our work pioneers the use of Corona historical Cold War spy satellite imagery for ecology. This vastly underused global remote sensing resource provides a unique opportunity to expand the time horizon of broad-scale ecological studies.
Asunto(s)
Agricultura , Biodiversidad , Conservación de los Recursos Naturales , Imágenes Satelitales , Productos Agrícolas , EcosistemaRESUMEN
In this paper, we propose the class of generalized additive models for location, scale and shape in a test for the association of genetic markers with non-normally distributed phenotypes comprising a spike at zero. The resulting statistical test is a generalization of the quantitative transmission disequilibrium test with mating type indicator, which was originally designed for normally distributed quantitative traits and parent-offspring data. As a motivational example, we consider coronary artery calcification (CAC), which can accurately be identified by electron beam tomography. In the investigated regions, individuals will have a continuous measure of the extent of calcium found or they will be calcium-free. Hence, the resulting distribution is a mixed discrete-continuous distribution with spike at zero. We carry out parent-offspring simulations motivated by such CAC measurement values in a screening population to study statistical properties of the proposed test for genetic association. Furthermore, we apply the approach to data of the Genetic Analysis Workshop 16 that are based on real genotype and family data of the Framingham Heart Study, and test the association of selected genetic markers with simulated coronary artery calcification.
RESUMEN
Bivariate copula regression allows for the flexible combination of two arbitrary, continuous marginal distributions with regression effects being placed on potentially all parameters of the resulting bivariate joint response distribution. Motivated by the risk factors for adverse birth outcomes, many of which are dichotomous, we consider mixed binary-continuous responses that extend the bivariate continuous framework to the situation where one response variable is discrete (more precisely, binary) whereas the other response remains continuous. Utilizing the latent continuous representation of binary regression models, we implement a penalized likelihood-based approach for the resulting class of copula regression models and employ it in the context of modeling gestational age and the presence/absence of low birth weight. The analysis demonstrates the advantage of the flexible specification of regression impacts including nonlinear effects of continuous covariates and spatial effects. Our results imply that racial and spatial inequalities in the risk factors for infant mortality are even greater than previously suggested.
Asunto(s)
Recien Nacido Prematuro , Modelos Estadísticos , Resultado del Embarazo/epidemiología , Análisis de Regresión , Femenino , Edad Gestacional , Humanos , Lactante , Mortalidad Infantil , Recién Nacido de Bajo Peso , Recién Nacido , Funciones de Verosimilitud , EmbarazoRESUMEN
We specify a Bayesian, geoadditive Stochastic Frontier Analysis (SFA) model to assess hospital performance along the dimensions of resources and quality of stroke care in German hospitals. With 1,100 annual observations and data from 2006 to 2013 and risk-adjusted patient volume as output, we introduce a production function that captures quality, resource inputs, hospital inefficiency determinants and spatial patterns of inefficiencies. With high relevance for hospital management and health system regulators, we identify performance improvement mechanisms by considering marginal effects for the average hospital. Specialization and certification can substantially reduce mortality. Regional and hospital-level concentration can improve quality and resource efficiency. Finally, our results demonstrate a trade-off between quality improvement and resource reduction and substantial regional variation in efficiency.
Asunto(s)
Eficiencia Organizacional , Hospitales , Garantía de la Calidad de Atención de Salud , Accidente Cerebrovascular/terapia , Teorema de Bayes , Geografía Médica , Alemania , Humanos , Readmisión del Paciente , Garantía de la Calidad de Atención de Salud/métodos , Especialización , Procesos Estocásticos , Accidente Cerebrovascular/mortalidadRESUMEN
BACKGROUND: Exposure to green space seems to be beneficial for self-reported mental health. In this study we used an objective health indicator, namely antidepressant prescription rates. Current studies rely exclusively upon mean regression models assuming linear associations. It is, however, plausible that the presence of green space is non-linearly related with different quantiles of the outcome antidepressant prescription rates. These restrictions may contribute to inconsistent findings. OBJECTIVE: Our aim was: a) to assess antidepressant prescription rates in relation to green space, and b) to analyze how the relationship varies non-linearly across different quantiles of antidepressant prescription rates. METHODS: We used cross-sectional data for the year 2014 at a municipality level in the Netherlands. Ecological Bayesian geoadditive quantile regressions were fitted for the 15%, 50%, and 85% quantiles to estimate green space-prescription rate correlations, controlling for physical activity levels, socio-demographics, urbanicity, etc. RESULTS: The results suggested that green space was overall inversely and non-linearly associated with antidepressant prescription rates. More important, the associations differed across the quantiles, although the variation was modest. Significant non-linearities were apparent: The associations were slightly positive in the lower quantile and strongly negative in the upper one. CONCLUSION: Our findings imply that an increased availability of green space within a municipality may contribute to a reduction in the number of antidepressant prescriptions dispensed. Green space is thus a central health and community asset, whilst a minimum level of 28% needs to be established for health gains. The highest effectiveness occurred at a municipality surface percentage higher than 79%. This inverse dose-dependent relation has important implications for setting future community-level health and planning policies.