RESUMEN
HostSeq was launched in April 2020 as a national initiative to integrate whole genome sequencing data from 10,000 Canadians infected with SARS-CoV-2 with clinical information related to their disease experience. The mandate of HostSeq is to support the Canadian and international research communities in their efforts to understand the risk factors for disease and associated health outcomes and support the development of interventions such as vaccines and therapeutics. HostSeq is a collaboration among 13 independent epidemiological studies of SARS-CoV-2 across five provinces in Canada. Aggregated data collected by HostSeq are made available to the public through two data portals: a phenotype portal showing summaries of major variables and their distributions, and a variant search portal enabling queries in a genomic region. Individual-level data is available to the global research community for health research through a Data Access Agreement and Data Access Compliance Office approval. Here we provide an overview of the collective project design along with summary level information for HostSeq. We highlight several statistical considerations for researchers using the HostSeq platform regarding data aggregation, sampling mechanism, covariate adjustment, and X chromosome analysis. In addition to serving as a rich data source, the diversity of study designs, sample sizes, and research objectives among the participating studies provides unique opportunities for the research community.
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiología , Canadá/epidemiología , Genómica , Secuenciación Completa del GenomaRESUMEN
Two- or multi-phase study designs are often used in settings involving failure times. In most studies, whether or not certain covariates are measured on an individual depends on their failure time and status. For example, when failures are rare, case-cohort or case-control designs are used to increase the number of failures relative to a random sample of the same size. Another scenario is where certain covariates are expensive to measure, so they are obtained only for selected individuals in a cohort. This paper considers such situations and focuses on cases where we wish to test hypotheses of no association between failure time and expensive covariates. Efficient score tests based on maximum likelihood are developed and shown to have a simple form for a wide class of models and sampling designs. Some numerical comparisons of study designs are presented.
Asunto(s)
Funciones de Verosimilitud , Modelos Lineales , Sesgo , Estudios de Cohortes , Simulación por Computador , Estudios de Asociación Genética , Humanos , Modelos de Riesgos ProporcionalesRESUMEN
In follow-up studies on chronic disease cohorts, individuals are often observed at irregular visit times that may be related to their previous disease history and other factors. This can produce bias in standard methods of estimation. Working in the context of multistate models, we consider a method of nonparametric estimation for state occupancy probabilities that adjusts for dependent follow-up through the use of inverse-intensity-of-visit weighted estimating functions and smoothing. The methodology is applied to the estimation of viral rebound probabilities in the Canadian Observational Cohort on HIV-positive persons. Copyright © 2016 John Wiley & Sons, Ltd.
Asunto(s)
Estudios de Seguimiento , Infecciones por VIH/virología , Modelos Estadísticos , Probabilidad , Carga Viral/estadística & datos numéricos , Canadá/epidemiología , Infecciones por VIH/epidemiología , Humanos , Cadenas de Markov , Estudios Observacionales como Asunto , Factores de TiempoRESUMEN
Multistate models provide important methods of analysis for many life history processes, and this is an area where John Klein made numerous contributions. When individuals in a study group are observed continuously so that all transitions between states, and their times, are known, estimation and model checking is fairly straightforward. However, individuals in many studies are observed intermittently, and only the states occupied at the observation times are known. We review methods of estimation and assessment for Markov models in this situation. Numerical studies that show the effects of inter-observation times are provided, and new methods for assessing fit are given. An illustration involving viral load dynamics for HIV-positive persons is presented.
Asunto(s)
Cadenas de Markov , Análisis de Regresión , Adulto , Canadá/epidemiología , Estudios de Cohortes , Infecciones por VIH/sangre , Infecciones por VIH/epidemiología , Infecciones por VIH/virología , Humanos , Funciones de Verosimilitud , Carga Viral , Adulto JovenRESUMEN
We consider survival or duration times associated with spells (sojourns in some state) or events experienced by individuals in a population over a specified time period. Duration distributions can be estimated from data recorded during followup of panel members in longitudinal surveys, but adjustments for the sample design, population structure and losses to followup are typically required. We provided weighted Kaplan-Meier estimates that allow for these features and, in particular, adjust for dependent loss to followup through the use of inverse probability of censoring weights.
Asunto(s)
Estimación de Kaplan-Meier , Estudios de Cohortes , Empleo/estadística & datos numéricos , Estudios de Seguimiento , Humanos , Estudios Longitudinales , Factores de TiempoRESUMEN
We consider lifetime data involving pairs of study individuals with more than one possible cause of failure for each individual. Non-parametric estimation of cause-specific distribution functions is considered under independent censoring. Properties of the estimators are discussed and an illustration of their application is given.
Asunto(s)
Biometría/métodos , Comorbilidad , Interpretación Estadística de Datos , Modelos Estadísticos , Medición de Riesgo/métodos , Análisis de Supervivencia , Causalidad , Simulación por Computador , Epidemiología , Distribuciones Estadísticas , Tasa de SupervivenciaRESUMEN
Events that may occur repeatedly for individual subjects are of interest in many medical studies. We review methods of analysis for repeated events, emphasizing that the approach taken in a given study should allow clinical questions to be addressed as directly as possible. Methods based on full models for event processes as well as on simpler 'marginal' assumptions are considered. The treatment of dependent terminating events related to the recurrent events is also discussed. We apply various methods of analysis to studies involving pulmonary exacerbations in persons with cystic fibrosis, and the occurrence of bone metastases and skeletal events in cancer patients, respectively. Most of the methodology considered can be implemented with existing software.
Asunto(s)
Biometría , Acontecimientos que Cambian la Vida , Asma/tratamiento farmacológico , Asma/fisiopatología , Neoplasias Óseas/secundario , Fibrosis Quística/complicaciones , Fibrosis Quística/tratamiento farmacológico , Humanos , Hidrocefalia/cirugía , Enfermedades Pulmonares/etiología , Enfermedades Pulmonares/prevención & control , Modelos Biológicos , Distribución de Poisson , Recurrencia , Insuficiencia del TratamientoRESUMEN
We consider recurrent event data when the duration or gap times between successive event occurrences are of intrinsic interest. Subject heterogeneity not attributed to observed covariates is usually handled by random effects which result in an exchangeable correlation structure for the gap times of a subject. Recently, efforts have been put into relaxing this restriction to allow non-exchangeable correlation. Here we consider dynamic models where random effects can vary stochastically over the gap times. We extend the traditional Gaussian variance components models and evaluate a previously proposed proportional hazards model through a simulation study and some examples. Besides, semiparametric estimation of the proportional hazards models is considered. Both models are easily used. The Gaussian models are easily interpreted in terms of the variance structure. On the other hand, the proportional hazards models would be more appropriate in the context of survival analysis, particularly in the interpretation of the regression parameters. They can be sensitive to the choice of model for random effects but not to the choice of the baseline hazard function.
Asunto(s)
Modelos de Riesgos Proporcionales , Recurrencia , Procesos Estocásticos , Animales , Femenino , Motilidad Gastrointestinal , Hong Kong , Humanos , Intestino Delgado/fisiología , Neoplasias Mamarias Animales/patología , Ratas , Investigación/estadística & datos numéricosRESUMEN
Many studies in medicine involve conditions whereby subjects make transitions among a set of defined states over time. In such situations the durations of sojourns in specific states is frequently of interest. This article considers the modelling and analysis of sojourn times, beginning with semi-Markov models in which the durations of different sojourns are independent, and then considering extended models incorporating chronological time effects and random effects. Methodologic challenges for inference are discussed, and examples involving a relapse-remitting process and recurrent events are considered.
Asunto(s)
Enfermedad Crónica/epidemiología , Modelos Biológicos , Bronquitis/epidemiología , Diabetes Mellitus Tipo 1/epidemiología , Infecciones por VIH/epidemiología , Humanos , Intestino Delgado/fisiología , Funciones de Verosimilitud , Estudios Longitudinales , Cadenas de Markov , Distribución Normal , Modelos de Riesgos Proporcionales , Recurrencia , Factores de TiempoRESUMEN
Many models have been proposed that relate failure times and stochastic time-varying covariates. In some of these models, failure occurs when a particular observable marker crosses a threshold level. We are interested in the more difficult, and often more realistic, situation where failure is not related deterministically to an observable marker. In this case, joint models for marker evolution and failure tend to lead to complicated calculations for characteristics such as the marginal distribution of failure time or the joint distribution of failure time and marker value at failure. This paper presents a model based on a bivariate Wiener process in which one component represents the marker and the second, which is latent (unobservable), determines the failure time. In particular, failure occurs when the latent component crosses a threshold level. The model yields reasonably simple expressions for the characteristics mentioned above and is easy to fit to commonly occurring data that involve the marker value at the censoring time for surviving cases and the marker value and failure time for failing cases. Parametric and predictive inference are discussed, as well as model checking. An extension of the model permits the construction of a composite marker from several candidate markers that may be available. The methodology is demonstrated by a simulated example and a case application.
Asunto(s)
Tablas de Vida , Modelos Estadísticos , Biometría , Humanos , Procesos EstocásticosRESUMEN
Chronic medical conditions are often manifested by the incidence of recurrent adverse clinical events. In clinical trials designed to investigate therapeutic interventions for such conditions it is natural to make treatment comparisons on the basis of event occurrence. However, when there is a more serious, possibly related, event that terminates the occurrence of the recurrent events, the problem of dependent censoring arises. Here, we consider robust modelling strategies for expressing covariate effects on the recurrent event process that address the possible dependence between the recurrent and terminal events. The various methods differ in the way the dependence is addressed, and hence in the interpretation of covariate effects. The methods are applied to a data set from a kidney transplant study and simulated data chosen for illustrative purposes.
Asunto(s)
Modelos Estadísticos , Enfermedad Crónica , Ensayos Clínicos como Asunto/métodos , Rechazo de Injerto/epidemiología , Humanos , Trasplante de Riñón/estadística & datos numéricos , Análisis Multivariante , Recurrencia , Estadísticas no Paramétricas , Factores de TiempoRESUMEN
A method of interim monitoring is described for longitudinal comparative studies in which the outcome of interest is a recurrent event and treatment comparisons are based on expected numbers of events. The nonparametric methods described by Cook, Lawless, and Nadeau (1996, Biometrics 52, 116-130) are generalized to provide a robust estimate of the covariance matrix for a sequence of test statistics calculated over time. The error spending function methodology of Lan and DeMets (1983, Biometrika 70, 659-663) is adopted to control the experimental type I error rate. A simulation study indicates satisfactory frequency properties of this procedure for the moderate to large scale trials for which it is intended. Extensions of this approach to handle stratified designs and studies with multitype recurrent events are indicated. Data from a kidney transplant study (Cole et al., 1994, Transplantation 57, 60-67) are used for illustrative purposes.
Asunto(s)
Biometría/métodos , Suero Antilinfocítico/farmacología , Simulación por Computador , Rechazo de Injerto/prevención & control , Humanos , Trasplante de Riñón/inmunología , Estudios Longitudinales , Muromonab-CD3/farmacología , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Recurrencia , Análisis de RegresiónRESUMEN
Robust nonparametric tests are considered for use in longitudinal studies in which the response of interest is a recurrent event. The tests are robust in the sense that they do not rely on distributional assumptions regarding the processes generating the events. The methods we describe are presented in the context of a clinical trial with attention initially directed at the two-sample problem in which a single experimental treatment is compared to a control. We investigate a family of generalized pseudo-score statistics (Lawless and Nadeau, 1995, Technometrics 37, 158-168) in which weight functions may be chosen to generate tests sensitive to various types of departure from the null hypothesis that the mean functions for the treatment and control groups are identical. All tests we consider are evaluated by simulation with respect to the type I error rate and power under a variety of practical scenarios. An application involving data from a kidney transplant study illustrates these procedures. For trials with multiple treatment arms, we generalize these approaches and indicate test statistics appropriate for unstructured alternatives and tests based on linear contrasts of the treatment-specific mean functions. Extensions of this methodology for stratified designs are also indicated.
Asunto(s)
Biometría , Ensayos Clínicos como Asunto/estadística & datos numéricos , Anticuerpos/farmacología , Anticuerpos Monoclonales/farmacología , Rechazo de Injerto/prevención & control , Humanos , Trasplante de Riñón/efectos adversos , Trasplante de Riñón/inmunología , Estudios Longitudinales , Modelos Estadísticos , Distribución de Poisson , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricosRESUMEN
Multi-state Markov models can be useful in analysing disease history data. We apply the general estimation methods of Kalbfleisch and Lawless to panel data in which individuals are viewed over only a portion of their life history and complete information about transition times between states is unavailable. Methods to assess goodness-of-fit are proposed. To illustrate the methods, we consider models of HIV disease relating important immunological marker measurements to the onset of AIDS.
Asunto(s)
Síndrome de Inmunodeficiencia Adquirida/inmunología , Linfocitos T CD4-Positivos/inmunología , Infecciones por VIH/inmunología , Inmunoglobulina A/análisis , Recuento de Leucocitos , Cadenas de Markov , Anamnesis/estadística & datos numéricos , Adulto , Biomarcadores , Bisexualidad , Estudios de Seguimiento , Seropositividad para VIH/inmunología , Homosexualidad , Humanos , Modelos EstadísticosRESUMEN
Toxicologists frequently conduct toxicity experiments in which different treatment conditions are applied to groups of animals and the resulting mortality in each group is measured at a number of discrete time points over the course of the experiment. Both survival analysis and generalized linear models have been proposed for analyzing this type of data. Whatever the approach taken, the model should allow for the presence of extra-multinomial variation arising from the use of groups of animals rather than individuals as the experimental units. We consider a number of models for overdispersion that can be incorporated into the generalized linear model framework for multinomial data. These models are extensions of ones proposed for binomial data by Williams (1982, Applied Statistics 31, 144-148) and Moore (1986, Biometrika 73, 583-588; 1987, Applied Statistics 36, 8-14). In addition, we examine robust asymptotic covariance matrix estimators for regression parameters, similar to those given in Liang and Zeger (1986, Biometrika 73, 13-22) and Zeger and Liang (1986, Biometrics 42, 121-130), and compare them to the model-based asymptotic estimators. Recommendations for analysis are given.
Asunto(s)
Modelos Estadísticos , Toxicología/estadística & datos numéricos , Análisis de Varianza , Animales , Biometría , Modelos Lineales , Análisis de Regresión , Análisis de SupervivenciaRESUMEN
The number of cases of transfusion-associated acquired immune deficiency syndrome (TA-AIDS) that will be seen over the next few years is difficult to estimate, because of the uncertainty about the number of persons infected with the human immunodeficiency virus (HIV) via blood transfusion and about the duration of the incubation period from HIV infection via transfusion to diagnosis of AIDS. Presented here are a mathematical model and nonparametric and parametric statistical analyses of recent data on TA-AIDS that indicate clearly the existing estimability problems. The methods provide short-term projections of new TA-AIDS cases to be reported; the results suggest about 1100 new cases to be reported in the United States between July 1988 and June 1989 and about 1500 more between July 1989 and June 1990. Estimates of the number of eventual TA-AIDS cases to be seen are considerably more uncertain and require additional assumptions about the incubation distribution. Under the assumption that the probability of an infected person developing AIDS within 8 years of infection is 0.40 (an estimation derived from cohort studies in homosexual men and hemophiliacs), parametric and nonparametric analyses give, respectively, point estimates of 14,300 and 15,000 for the number of eventual cases of AIDS (in the age group 13-69) attributable to infection by blood transfusion prior to July 1985. The parametric analysis gives a corresponding 95 percent confidence interval.
Asunto(s)
Síndrome de Inmunodeficiencia Adquirida/epidemiología , Transfusión Sanguínea , Síndrome de Inmunodeficiencia Adquirida/transmisión , Adolescente , Adulto , Anciano , Niño , Preescolar , Humanos , Lactante , Persona de Mediana Edad , Probabilidad , Factores de Tiempo , Estados UnidosRESUMEN
Data related to life histories of individuals can be obtained in many different ways, and the usefulness of multi-state models for statistical analysis is generally highly dependent on the type and nature of the data. In this paper, we focus on this, and present an approach to estimation for certain 'difficult' situations associated with retrospective or incomplete prospective observation. The paper begins with the identification of some problem areas in the analysis of data on life history processes. We discuss maximum likelihood estimation in some simple contexts and introduce a pseudo-likelihood which enables the simple analysis of some sampling procedures. This approach is illustrated on standard retrospective and case-cohort designs.
Asunto(s)
Métodos Epidemiológicos , Estadística como Asunto , Humanos , Modelos Biológicos , Morbilidad , Mortalidad , Estudios Prospectivos , Estudios RetrospectivosRESUMEN
Regression and clustering methods have both been used to explore the effects of explanatory variables on survival times for patients with cancer or other chronic diseases. This paper discusses effective and computationally feasible approaches for this task in situations where there are fairly large and complex data sets; the techniques stressed are all-subsets regression and a kind of recursive partition clustering. We compare the two approaches in a rather general way, in part by examining some survival data for patients with ovarian carcinoma, and conclude that both have strong points to recommend them.
Asunto(s)
Mortalidad , Análisis de Regresión , Agrupamiento Espacio-Temporal , Femenino , Humanos , Persona de Mediana Edad , Modelos Biológicos , Neoplasias Ováricas/mortalidadRESUMEN
This paper describes a system written to carry out regression analyses under certain generalized linear models that are widely used in biomedical research. These include continuous response models such as the Weibull, log-logistic, log-normal and Cox proportional hazards models used in survival analysis, and also discrete Poisson, binomial and multinomial response regression models. The system fits models, generates residuals and other diagnostic output, and has an all-subsets regression feature. This paper describes the models implemented and gives statistical background; Part II describes the ISMOD system and presents examples of its application.