RESUMEN
BACKGROUND: In binary logistic regression data are 'separable' if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth's logistic regression (FL), which was originally proposed to reduce the bias in coefficient estimates. The question of convergence becomes more involved when analyzing clustered data as frequently encountered in clinical research, e.g. data collected in several study centers or when individuals contribute multiple observations, using marginal logistic regression models fitted by generalized estimating equations (GEE). From our experience we suspect that separable data are a sufficient, but not a necessary condition for non-convergence of GEE. Thus, we expect that generalizations of approaches that can handle separable uncorrelated data may reduce but not fully remove the non-convergence issues of GEE. METHODS: We investigate one recently proposed and two new extensions of FL to GEE. With 'penalized GEE' the GEE are treated as score equations, i.e. as derivatives of a log-likelihood set to zero, which are then modified as in FL. We introduce two approaches motivated by the equivalence of FL and maximum likelihood estimation with iteratively augmented data. Specifically, we consider fully iterated and single-step versions of this 'augmented GEE' approach. We compare the three approaches with respect to convergence behavior, practical applicability and performance using simulated data and a real data example. RESULTS: Our simulations indicate that all three extensions of FL to GEE substantially improve convergence compared to ordinary GEE, while showing a similar or even better performance in terms of accuracy of coefficient estimates and predictions. Penalized GEE often slightly outperforms the augmented GEE approaches, but this comes at the cost of a higher burden of implementation. CONCLUSIONS: When fitting marginal logistic regression models using GEE on sparse data we recommend to apply penalized GEE if one has access to a suitable software implementation and single-step augmented GEE otherwise.
Asunto(s)
Modelos Estadísticos , Sesgo , Simulación por Computador , Humanos , Funciones de Verosimilitud , Modelos LogísticosRESUMEN
We present a topological method for the detection and quantification of bone microstructure from non-linear microscopy images. Specifically, we analyse second harmonic generation (SHG) and two photon excited autofluorescence (TPaF) images of bone tissue which capture the distribution of matrix (fibrillar collagen) structure and autofluorescent molecules, respectively. Using persistent homology statistics with a signed Euclidean distance transform filtration on binary patches of images, we are able to quantify the number, size, distribution, and crowding of holes within and across samples imaged at the microscale. We apply our methodology to a previously characterized murine model of skeletal pathology whereby vascular endothelial growth factor expression was deleted in osteocalcin-expressing cells (OcnVEGFKO) presenting increased cortical porosity, compared to wild type (WT) littermate controls. We show significant differences in topological statistics between the OcnVEGFKO and WT groups and, when classifying the males, or females respectively, into OcnVEGFKO or WT groups, we obtain high prediction accuracies of 98.7% (74.2%) and 77.8% (65.8%) respectively for SHG (TPaF) images. The persistence statistics that we use are fully interpretable, can highlight regions of abnormality within an image and identify features at different spatial scales.
Asunto(s)
Microscopía , Factor A de Crecimiento Endotelial Vascular , Masculino , Femenino , Ratones , Animales , Colágenos Fibrilares , Huesos/diagnóstico por imagen , FotonesRESUMEN
OBJECTIVE: The aim of this study was to investigate the impact of a program of repeated assessments, feedback, and training on the quality of coded clinical data in general practice. DESIGN: A prospective uncontrolled intervention study was conducted in a general practice research network. MEASUREMENTS: Percentage of recorded consultations with a coded problem title and percentage of patients receiving a specific drug (e.g., tamoxifen) who had the relevant morbidity code (e.g., breast cancer) were calculated. Annual period prevalence of 12 selected morbidities was compared with parallel data derived from the fourth National Study of Morbidity Statistics from General Practice (MSGP4). RESULTS: The first two measures showed variation between practices at baseline, but on repeat assessments all practices improved or maintained their levels of coding. The period prevalence figures also were variable, but over time rates increased to levels comparable with, or above, MSGP4 rates. Practices were able to provide time and resources for feedback and training sessions. CONCLUSION: A program of repeated assessments, feedback, and training appears to improve data quality in a range of practices. The program is likely to be generalizable to other practices but needs a trained support team to implement it that has implications for cost and resources.
Asunto(s)
Capacitación de Usuario de Computador , Medicina Familiar y Comunitaria/normas , Sistemas de Registros Médicos Computarizados/normas , Quimioterapia , Control de Formularios y Registros , Humanos , Pautas de la Práctica en Medicina/normas , Estudios Prospectivos , Proyectos de Investigación , Reino UnidoRESUMEN
BACKGROUND: Searching medical records of study non-responders to investigate selection bias is no longer acceptable. We explore an alternative by comparing consultation rates in survey responders who consented to medical record review, with anonymized consultation rates for the total practice populations. METHODS: Anonymized aggregated consultation rates for the year following a population-based survey were calculated for headache and a number of other conditions (chosen to reflect a mixture of chronic and episodic conditions). These rates were compared across two groups of adults: (i) responders to the survey who consented to medical record review and (ii) a 'population group' created from records of the general practices participating in the survey to represent all patients aged 18 years and over at the mid-point in the study year. The consultation rates for the conditions were compared across the two groups using direct standardization. RESULTS: Adjusted consultation rates were similar but generally higher in the responders. CONCLUSIONS: This alternative method applied here offers one potential approach to determine whether study respondents are representative of the population from which they were sampled with respect to general practice consultations.