Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 118(4)2021 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-33414276

RESUMEN

The weaponization of digital communications and social media to conduct disinformation campaigns at immense scale, speed, and reach presents new challenges to identify and counter hostile influence operations (IOs). This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors. The framework integrates natural language processing, machine learning, graph analytics, and a network causal inference approach to quantify the impact of individual actors in spreading IO narratives. We demonstrate its capability on real-world hostile IO campaigns with Twitter datasets collected during the 2017 French presidential elections and known IO accounts disclosed by Twitter over a broad range of IO campaigns (May 2007 to February 2020), over 50,000 accounts, 17 countries, and different account types including both trolls and bots. Our system detects IO accounts with 96% precision, 79% recall, and 96% area-under-the precision-recall (P-R) curve; maps out salient network communities; and discovers high-impact accounts that escape the lens of traditional impact statistics based on activity counts and network centrality. Results are corroborated with independent sources of known IO accounts from US Congressional reports, investigative journalism, and IO datasets provided by Twitter.


Asunto(s)
Medios de Comunicación/tendencias , Difusión de la Información/métodos , Política , Medios de Comunicación Sociales/tendencias , Comunicación , Humanos , Análisis de Redes Sociales , Red Social
2.
Proc Natl Acad Sci U S A ; 117(22): 12004-12010, 2020 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-32414914

RESUMEN

A catalytic prior distribution is designed to stabilize a high-dimensional "working model" by shrinking it toward a "simplified model." The shrinkage is achieved by supplementing the observed data with a small amount of "synthetic data" generated from a predictive distribution under the simpler model. We apply this framework to generalized linear models, where we propose various strategies for the specification of a tuning parameter governing the degree of shrinkage and study resultant theoretical properties. In simulations, the resulting posterior estimation using such a catalytic prior outperforms maximum likelihood estimation from the working model and is generally comparable with or superior to existing competitive methods in terms of frequentist prediction accuracy of point estimation and coverage accuracy of interval estimation. The catalytic priors have simple interpretations and are easy to formulate.


Asunto(s)
Simulación por Computador/estadística & datos numéricos , Modelos Lineales , Teorema de Bayes , Simulación por Computador/tendencias , Análisis de Datos , Recolección de Datos , Tamaño de la Muestra , Estadística como Asunto
3.
Proc Natl Acad Sci U S A ; 117(32): 19045-19053, 2020 08 11.
Artículo en Inglés | MEDLINE | ID: mdl-32723822

RESUMEN

Data analyses typically rely upon assumptions about the missingness mechanisms that lead to observed versus missing data, assumptions that are typically unassessable. We explore an approach where the joint distribution of observed data and missing data are specified in a nonstandard way. In this formulation, which traces back to a representation of the joint distribution of the data and missingness mechanism, apparently first proposed by J. W. Tukey, the modeling assumptions about the distributions are either assessable or are designed to allow relatively easy incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both observed and missing. We develop Tukey's representation for exponential-family models, propose a computationally tractable approach to inference in this class of models, and offer some general theoretical comments. We then illustrate the utility of this approach with an example in systems biology.

4.
Stat Med ; 40(25): 5565-5586, 2021 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-34374106

RESUMEN

We describe a new method to combine propensity-score matching with regression adjustment in treatment-control studies when outcomes are binary by multiply imputing potential outcomes under control for the matched treated subjects. This enables the estimation of clinically meaningful measures of effect such as the risk difference. We used Monte Carlo simulation to explore the effect of the number of imputed potential outcomes under control for the matched treated subjects on inferences about the risk difference. We found that imputing potential outcomes under control (either single imputation or multiple imputation) resulted in a substantial reduction in bias compared with what was achieved using conventional nearest neighbor matching alone. Increasing the number of imputed potential outcomes under control resulted in more efficient estimation, with more efficient estimation of the estimated risk difference when increasing the number of the imputed potential outcomes. The greatest relative increase in efficiency was achieved by imputing five potential outcomes; once 20 outcomes under control were imputed for each matched treated subject, further improvements in efficiency were negligible. We also examined the effect of the number of these imputed potential outcomes on: (i) estimated standard errors; (ii) mean squared error; (iii) coverage of estimated confidence intervals. We illustrate the application of the method by estimating the effect on the risk of death within 1 year of prescribing beta-blockers to patients discharged from hospital with a diagnosis of heart failure.


Asunto(s)
Proyectos de Investigación , Sesgo , Simulación por Computador , Humanos , Método de Montecarlo , Puntaje de Propensión
5.
Proc Natl Acad Sci U S A ; 115(37): 9157-9162, 2018 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-30150408

RESUMEN

Although complete randomization ensures covariate balance on average, the chance of observing significant differences between treatment and control covariate distributions increases with many covariates. Rerandomization discards randomizations that do not satisfy a predetermined covariate balance criterion, generally resulting in better covariate balance and more precise estimates of causal effects. Previous theory has derived finite sample theory for rerandomization under the assumptions of equal treatment group sizes, Gaussian covariate and outcome distributions, or additive causal effects, but not for the general sampling distribution of the difference-in-means estimator for the average causal effect. We develop asymptotic theory for rerandomization without these assumptions, which reveals a non-Gaussian asymptotic distribution for this estimator, specifically a linear combination of a Gaussian random variable and truncated Gaussian random variables. This distribution follows because rerandomization affects only the projection of potential outcomes onto the covariate space but does not affect the corresponding orthogonal residuals. We demonstrate that, compared with complete randomization, rerandomization reduces the asymptotic quantile ranges of the difference-in-means estimator. Moreover, our work constructs accurate large-sample confidence intervals for the average causal effect.


Asunto(s)
Modelos Teóricos , Distribución Aleatoria
6.
J Biopharm Stat ; 26(6): 1020-1024, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27611988

RESUMEN

The wise use of statistical ideas in practice essentially requires some Bayesian thinking, in contrast to the classical rigid frequentist dogma. This dogma too often has seemed to influence the applications of statistics, even at agencies like the FDA. Greg Campbell was one of the most important advocates there for more nuanced modes of thought, especially Bayesian statistics. Because two brilliant statisticians, Ronald Fisher and Jerzy Neyman, are often credited with instilling the traditional frequentist approach in current practice, I argue that both men were actually seeking very Bayesian answers, and neither would have endorsed the rigid application of their ideas.


Asunto(s)
Teorema de Bayes , Estadística como Asunto , United States Food and Drug Administration , Humanos , Estados Unidos
7.
Stat Med ; 34(24): 3214-22, 2015 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-25959735

RESUMEN

By 'partially post-hoc' subgroup analyses, we mean analyses that compare existing data from a randomized experiment-from which a subgroup specification is derived-to new, subgroup-only experimental data. We describe a motivating example in which partially post hoc subgroup analyses instigated statistical debate about a medical device's efficacy. We clarify the source of such analyses' invalidity and then propose a randomization-based approach for generating valid posterior predictive p-values for such partially post hoc subgroups. Lastly, we investigate the approach's operating characteristics in a simple illustrative setting through a series of simulations, showing that it can have desirable properties under both null and alternative hypotheses.


Asunto(s)
Selección de Paciente , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Proyectos de Investigación , Biometría , Simulación por Computador , Equipos y Suministros , Geles/uso terapéutico , Humanos , Osteoartritis de la Rodilla/tratamiento farmacológico , Estados Unidos , United States Food and Drug Administration
8.
Stat Med ; 34(26): 3381-98, 2015 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-26013308

RESUMEN

Estimation of causal effects in non-randomized studies comprises two distinct phases: design, without outcome data, and analysis of the outcome data according to a specified protocol. Recently, Gutman and Rubin (2013) proposed a new analysis-phase method for estimating treatment effects when the outcome is binary and there is only one covariate, which viewed causal effect estimation explicitly as a missing data problem. Here, we extend this method to situations with continuous outcomes and multiple covariates and compare it with other commonly used methods (such as matching, subclassification, weighting, and covariance adjustment). We show, using an extensive simulation, that of all methods considered, and in many of the experimental conditions examined, our new 'multiple-imputation using two subclassification splines' method appears to be the most efficient and has coverage levels that are closest to nominal. In addition, it can estimate finite population average causal effects as well as non-linear causal estimands. This type of analysis also allows the identification of subgroups of units for which the effect appears to be especially beneficial or harmful.


Asunto(s)
Modelos Estadísticos , Proyectos de Investigación , Terapéutica , Estudios Observacionales como Asunto , Ensayos Clínicos Controlados Aleatorios como Asunto , Resultado del Tratamiento
9.
Stat Med ; 34(23): 3081-103, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26045214

RESUMEN

Health and medical data are increasingly being generated, collected, and stored in electronic form in healthcare facilities and administrative agencies. Such data hold a wealth of information vital to effective health policy development and evaluation, as well as to enhanced clinical care through evidence-based practice and safety and quality monitoring. These initiatives are aimed at improving individuals' health and well-being. Nevertheless, analyses of health data archives must be conducted in such a way that individuals' privacy is not compromised. One important aspect of protecting individuals' privacy is protecting the confidentiality of their data. It is the purpose of this paper to provide a review of a number of approaches to reducing disclosure risk when making data available for research, and to present a taxonomy for such approaches. Some of these methods are widely used, whereas others are still in development. It is important to have a range of methods available because there is also a range of data-use scenarios, and it is important to be able to choose between methods suited to differing scenarios. In practice, it is necessary to find a balance between allowing the use of health and medical data for research and protecting confidentiality. This balance is often presented as a trade-off between disclosure risk and data utility, because methods that reduce disclosure risk, in general, also reduce data utility.


Asunto(s)
Investigación Biomédica/legislación & jurisprudencia , Confidencialidad/legislación & jurisprudencia , Interpretación Estadística de Datos , Medicina Basada en la Evidencia/legislación & jurisprudencia , Política de Salud/legislación & jurisprudencia , Australia , Investigación Biomédica/métodos , Investigación Biomédica/estadística & datos numéricos , Seguridad Computacional/legislación & jurisprudencia , Seguridad Computacional/normas , Seguridad Computacional/estadística & datos numéricos , Confidencialidad/normas , Unión Europea , Medicina Basada en la Evidencia/métodos , Medicina Basada en la Evidencia/estadística & datos numéricos , Health Insurance Portability and Accountability Act , Humanos , Estados Unidos
10.
Stat Med ; 33(24): 4170-85, 2014 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-24845086

RESUMEN

Although recent guidelines for dealing with missing data emphasize the need for sensitivity analyses, and such analyses have a long history in statistics, universal recommendations for conducting and displaying these analyses are scarce. We propose graphical displays that help formalize and visualize the results of sensitivity analyses, building upon the idea of 'tipping-point' analysis for randomized experiments with a binary outcome and a dichotomous treatment. The resulting 'enhanced tipping-point displays' are convenient summaries of conclusions obtained from making different modeling assumptions about missingness mechanisms. The primary goal of the displays is to make formal sensitivity analysesmore comprehensible to practitioners, thereby helping them assess the robustness of the experiment's conclusions to plausible missingness mechanisms. We also present a recent example of these enhanced displays in amedical device clinical trial that helped lead to FDA approval.


Asunto(s)
Interpretación Estadística de Datos , Modelos Estadísticos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Simulación por Computador , Fracturas por Compresión/cirugía , Humanos , Cifoplastia/efectos adversos , Cifoplastia/normas , Dolor/prevención & control , Fracturas de la Columna Vertebral/cirugía , Estados Unidos
11.
Stat Med ; 33(13): 2238-50, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24443287

RESUMEN

A number of mixture modeling approaches assume both normality and independent observations. However, these two assumptions are at odds with the reality of many data sets, which are often characterized by an abundance of zero-valued or highly skewed observations as well as observations from biologically related (i.e., non-independent) subjects. We present here a finite mixture model with a zero-inflated Poisson regression component that may be applied to both types of data. This flexible approach allows the use of covariates to model both the Poisson mean and rate of zero inflation and can incorporate random effects to accommodate non-independent observations. We demonstrate the utility of this approach by applying these models to a candidate endophenotype for schizophrenia, but the same methods are applicable to other types of data characterized by zero inflation and non-independence.


Asunto(s)
Conjuntos de Datos como Asunto/estadística & datos numéricos , Modelos Estadísticos , Distribución de Poisson , Adulto , Endofenotipos , Humanos , Persona de Mediana Edad , Oportunidad Relativa , Esquizofrenia/genética
12.
Stat Methods Med Res ; 33(5): 825-837, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38499338

RESUMEN

Existing methods that use propensity scores for heterogeneous treatment effect estimation on non-experimental data do not readily extend to the case of more than two treatment options. In this work, we develop a new propensity score-based method for heterogeneous treatment effect estimation when there are three or more treatment options, and prove that it generates unbiased estimates. We demonstrate our method on a real patient registry of patients in Singapore with diabetic dyslipidemia. On this dataset, our method generates heterogeneous treatment recommendations for patients among three options: Statins, fibrates, and non-pharmacological treatment to control patients' lipid ratios (total cholesterol divided by high-density lipoprotein level). In our numerical study, our proposed method generated more stable estimates compared to a benchmark method based on a multi-dimensional propensity score.


Asunto(s)
Dislipidemias , Inhibidores de Hidroximetilglutaril-CoA Reductasas , Puntaje de Propensión , Humanos , Dislipidemias/tratamiento farmacológico , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Singapur , Causalidad , Modelos Estadísticos , Ácidos Fíbricos/uso terapéutico , Hipolipemiantes/uso terapéutico
13.
Stat Methods Med Res ; 29(3): 728-751, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-30569832

RESUMEN

Matching on an estimated propensity score is frequently used to estimate the effects of treatments from observational data. Since the 1970s, different authors have proposed methods to combine matching at the design stage with regression adjustment at the analysis stage when estimating treatment effects for continuous outcomes. Previous work has consistently shown that the combination has generally superior statistical properties than either method by itself. In biomedical and epidemiological research, survival or time-to-event outcomes are common. We propose a method to combine regression adjustment and propensity score matching to estimate survival curves and hazard ratios based on estimating an imputed potential outcome under control for each successfully matched treated subject, which is accomplished using either an accelerated failure time parametric survival model or a Cox proportional hazard model that is fit to the matched control subjects. That is, a fitted model is then applied to the matched treated subjects to allow simulation of the missing potential outcome under control for each treated subject. Conventional survival analyses (e.g., estimation of survival curves and hazard ratios) can then be conducted using the observed outcome under treatment and the imputed outcome under control. We evaluated the repeated-sampling bias of the proposed methods using simulations. When using nearest neighbor matching, the proposed method resulted in decreased bias compared to crude analyses in the matched sample. We illustrate the method in an example prescribing beta-blockers at hospital discharge to patients hospitalized with heart failure.


Asunto(s)
Puntaje de Propensión , Sesgo , Humanos , Método de Montecarlo , Modelos de Riesgos Proporcionales , Análisis de Supervivencia
14.
Stat Methods Med Res ; 28(7): 1958-1978, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-29187059

RESUMEN

Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data are observed, (3) a statistical analysis stage comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.


Asunto(s)
Causalidad , Modelos Estadísticos , Estudios Observacionales como Asunto/estadística & datos numéricos , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Proyectos de Investigación , Adulto , Femenino , Humanos , Masculino , Padres , Cese del Hábito de Fumar/estadística & datos numéricos
15.
Biometrika ; 105(3): 745-752, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-30174335

RESUMEN

The seminal work of Morgan & Rubin (2012) considers rerandomization for all the units at one time.In practice, however, experimenters may have to rerandomize units sequentially. For example, a clinician studying a rare disease may be unable to wait to perform an experiment until all the experimental units are recruited. Our work offers a mathematical framework for sequential rerandomization designs, where the experimental units are enrolled in groups. We formulate an adaptive rerandomization procedure for balancing treatment/control assignments over some continuous or binary covariates, using Mahalanobis distance as the imbalance measure. We prove in our key result that given the same number of rerandomizations, in expected value, under certain mild assumptions, sequential rerandomization achieves better covariate balance than rerandomization at one time.

16.
Psychol Methods ; 23(2): 337-350, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28406674

RESUMEN

Blinded randomized controlled trials (RCT) require participants to be uncertain if they are receiving a treatment or placebo. Although uncertainty is ideal for isolating the treatment effect from all other potential effects, it is poorly suited for estimating the treatment effect under actual conditions of intended use-when individuals are certain that they are receiving a treatment. We propose an experimental design, randomization to randomization probabilities (R2R), which significantly improves estimates of treatment effects under actual conditions of use by manipulating participant expectations about receiving treatment. In the R2R design, participants are first randomized to a value, π, denoting their probability of receiving treatment (vs. placebo). Subjects are then told their value of π and randomized to either treatment or placebo with probabilities π and 1-π, respectively. Analysis of the treatment effect includes statistical controls for π (necessary for causal inference) and typically a π-by-treatment interaction. Random assignment of subjects to π and disclosure of its value to subjects manipulates subject expectations about receiving the treatment without deception. This method offers a better treatment effect estimate under actual conditions of use than does a conventional RCT. Design properties, guidelines for power analyses, and limitations of the approach are discussed. We illustrate the design by implementing an RCT of caffeine effects on mood and vigilance and show that some of the actual effects of caffeine differ by the expectation that one is receiving the active drug. (PsycINFO Database Record


Asunto(s)
Investigación Biomédica/métodos , Evaluación de Resultado en la Atención de Salud/métodos , Distribución Aleatoria , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Proyectos de Investigación , Adulto , Afecto/efectos de los fármacos , Nivel de Alerta/efectos de los fármacos , Cafeína/farmacología , Estimulantes del Sistema Nervioso Central/farmacología , Humanos
18.
Stat Med ; 31(24): 2778-9, 2012 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-23037139
19.
J Abnorm Psychol ; 116(1): 16-29, 2007 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-17324013

RESUMEN

Prior research has focused on the latent structure of endophenotypic markers of schizophrenia liability, or schizotypy. The work supports the existence of 2 relatively distinct latent classes and derives largely from the taxometric analysis of psychometric values. The present study used finite mixture modeling as a technique for discerning latent structure and the laboratory-measured endophenotypes of sustained attention deficits and eye-tracking dysfunction as endophenotype indexes. In a large adult community sample (N=311), finite mixture analysis of the sustained attention index d' and 2 eye-tracking indexes (gain and catch-up saccade rate) revealed evidence for 2 latent components. A putative schizotypy class accounted for 27% of the sample. A supplementary maximum covariance taxometric analysis yielded highly consistent results. Subjects in the schizotypy component displayed higher rates of schizotypal personality features and an increased rate of treated schizophrenia in their 1st-degree biological relatives compared with subjects in the other component. Implications of these results are examined in light of major theories of schizophrenia liability, and methodological advantages of finite mixture modeling for psychopathology research, with particular emphasis on genomic issues, are discussed.


Asunto(s)
Fenotipo , Esquizofrenia/genética , Atención , Trastornos del Conocimiento/epidemiología , Trastornos del Conocimiento/genética , Movimientos Oculares/fisiología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Esquizofrenia/epidemiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA