RESUMO
PURPOSE: To assess the validity of privacy-preserving synthetic data by comparing results from synthetic versus original EHR data analysis. METHODS: A published retrospective cohort study on real-world effectiveness of COVID-19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID-19 infection, symptomatic COVID-19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results. RESULTS: The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%-99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID-19 Infection. In the comparison of hazard ratios for COVID 19-related hospitalization and odds ratio for symptomatic COVID-19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates. CONCLUSIONS: Overall, comparison of synthetic versus original real-world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted.
Assuntos
Vacinas contra COVID-19 , COVID-19 , Registros Eletrônicos de Saúde , Humanos , COVID-19/prevenção & controle , COVID-19/epidemiologia , Vacinas contra COVID-19/administração & dosagem , Israel/epidemiologia , Estudos Retrospectivos , Masculino , Feminino , Eficácia de Vacinas , Pessoa de Meia-Idade , Hospitalização/estatística & dados numéricos , Reprodutibilidade dos Testes , Adulto , Idoso , Privacidade , Estudos de CoortesRESUMO
Replication studies are increasingly conducted to assess the credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs, such as equivalence. In order to fill this gap, we propose two approaches, the two-trials rule and the sceptical two one-sided tests (TOST) procedure, adapted from methods used in superiority settings. Both methods have the same overall Type-I error rate, but the sceptical TOST procedure allows replication success even for nonsignificant original or replication studies. This leads to a larger project power and other differences in relevant operating characteristics. Both methods can be used for sample size calculation of the replication study, based on the results from the original one. The two methods are applied to data from the Reproducibility Project: Cancer Biology.
Assuntos
Biometria , Biometria/métodos , Humanos , Reprodutibilidade dos Testes , Tamanho da Amostra , Estudos de Equivalência como AsuntoRESUMO
SomaScan is an aptamer-based proteomics assay designed for the simultaneous measurement of thousands of human proteins with a broad range of endogenous concentrations. The 7K SomaScan assay has recently been expanded into the new 11K version. Following up on our previous assessment of the 7K assay, here, we expand our work on technical replicates from donors enrolled in the Baltimore Longitudinal Study of Aging. By generating SomaScan data from a second batch of technical replicates in the 7K version as well as additional intra- and interplate replicate measurements in the new 11K version using the same donor samples, this work provides useful precision benchmarks for the SomaScan user community. Beyond updating our previous technical assessment of the 7K assay with increased statistics, here, we estimate interbatch variability, assess inter- and intraplate variability in the new 11K assay, compare the observed variability between the 7K and 11K assays (leveraging the use of overlapping pairs of technical replicates), and explore the potential effects of sample storage time (ranging from 2 to 30 years) in the assays' precision.
RESUMO
We discuss a relatively new meta-scientific research design: many-analyst studies that attempt to assess the replicability and credibility of research based on large-scale observational data. In these studies, a large number of analysts try to answer the same research question using the same data. The key idea is the greater the variation in results, the greater the uncertainty in answering the research question and, accordingly, the lower the credibility of any individual research finding. Compared to individual replications, the large crowd of analysts allows for a more systematic investigation of uncertainty and its sources. However, many-analyst studies are also resource-intensive, and there are some doubts about their potential to provide credible assessments. We identify three issues that any many-analyst study must address: 1) identifying the source of variation in the results; 2) providing an incentive structure similar to that of standard research; and 3) conducting a proper meta-analysis of the results. We argue that some recent many-analyst studies have failed to address these issues satisfactorily and have therefore provided an overly pessimistic assessment of the credibility of science. We also provide some concrete guidance on how future many-analyst studies could provide a more constructive assessment.
RESUMO
Although zebrafish (Danio rerio) neuroscience research is rapidly expanding, the fundamental question of how these fish should be maintained in research laboratories remains largely unstudied. This may explain the diverse practices and broad range of environmental parameters used in zebrafish facilities. Here, we provide examples of these parameters and practices, including housing density, tank size, and water chemistry. We discuss the principles of stochastic resonance versus homeostasis and provide hypothetical examples to explain why keeping zebrafish outside of their tolerated range of environmental parameters may increase phenotypical variance and reduce replicability. We call for systematic studies to establish the optimal maintenance conditions for zebrafish. Furthermore, we discuss why knowing more about the natural behavior and ecology of this species could be a guiding principle for these studies.
RESUMO
For many problems in clinical practice, multiple treatment alternatives are available. Given data from a randomized controlled trial or an observational study, an important challenge is to estimate an optimal decision rule that specifies for each client the most effective treatment alternative, given his or her pattern of pretreatment characteristics. In the present paper we will look for such a rule within the insightful family of classification trees. Unfortunately, however, there is dearth of readily accessible software tools for optimal decision tree estimation in the case of more than two treatment alternatives. Moreover, this primary tree estimation problem is also cursed with two secondary problems: a structural missingness in typical studies on treatment evaluation (because every individual is assigned to a single treatment alternative only), and a major issue of replicability. In this paper we propose solutions for both the primary and the secondary problems at stake. We evaluate the proposed solution in a simulation study, and illustrate with an application on the search for an optimal tree-based treatment regime in a randomized controlled trial on K = 3 different types of aftercare for younger women with early-stage breast cancer. We conclude by arguing that the proposed solutions may have relevance for several other classification problems inside and outside the domain of optimal treatment assignment.
Assuntos
Árvores de Decisões , Humanos , Feminino , Neoplasias da Mama/terapia , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Simulação por Computador , AlgoritmosRESUMO
Complex systems pose significant challenges to traditional scientific and statistical methods due to their inherent unpredictability and resistance to simplification. Accurately detecting complex behavior and the uncertainty which comes with it is therefore essential. Using the context of previous studies, we introduce a new information-theoretic measure, termed "incoherence". By using an adapted Jensen-Shannon Divergence across an ensemble of outcomes, we quantify the aleatoric uncertainty of the system. First we compared this measure to established statistical tests using both continuous and discrete data. Before demonstrating how incoherence can be applied to identify key characteristics of complex systems, including sensitivity to initial conditions, criticality, and response to perturbations.
RESUMO
Performance in tests of various cognitive abilities has often been compared, both within and between species. In intraspecific comparisons, habitat effects on cognition has been a popular topic, frequently with an underlying assumption that urban animals should perform better than their rural conspecifics. In this study, we tested problem-solving ability in great tits Parus major, in a string-pulling and a plug-opening test. Our aim was to compare performance between urban and rural great tits, and to compare their performance with previously published problem solving studies. Our great tits perfomed better in string-pulling than their conspecifics in previous studies (solving success: 54%), and better than their close relative, the mountain chickadee Poecile gambeli, in the plug-opening test (solving success: 70%). Solving latency became shorter over four repeated sessions, indicating learning abilities, and showed among-individual correlation between the two tests. However, the solving ability did not differ between habitat types in either test. Somewhat unexpectedly, we found marked differences between study years even though we tried to keep conditions identical. These were probably due to small changes to the experimental protocol between years, for example the unavoidable changes of observers and changes in the size and material of test devices. This has an important implication: if small changes in an otherwise identical set-up can have strong effects, meaningful comparisons of cognitive performance between different labs must be extremely hard. In a wider perspective this highlights the replicability problem often present in animal behaviour studies.
Assuntos
Resolução de Problemas , Animais , Masculino , Feminino , Ecossistema , Passeriformes/fisiologiaRESUMO
BACKGROUND AND OBJECTIVE: The minimum sample size for multistakeholder Delphi surveys remains understudied. Drawing from three large international multistakeholder Delphi surveys, this study aimed to: 1) investigate the effect of increasing sample size on replicability of results; 2) assess whether the level of replicability of results differed with participant characteristics: for example, gender, age, and profession. METHODS: We used data from Delphi surveys to develop guidance for improved reporting of health-care intervention trials: SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) and CONSORT (Consolidated Standards of Reporting Trials) extension for surrogate end points (n = 175, 22 items rated); CONSORT-SPI [CONSORT extension for Social and Psychological Interventions] (n = 333, 77 items rated); and core outcome set for burn care (n = 553, 88 items rated). Resampling with replacement was used to draw random subsamples from the participant data set in each of the three surveys. For each subsample, the median value of all rated survey items was calculated and compared to the medians from the full participant data set. The median number (and interquartile range) of medians replicated was used to calculate the percentage replicability (and variability). High replicability was defined as ≥80% and moderate as 60% and <80% RESULTS: The average median replicability (variability) as a percentage of total number of items rated from the three datasets was 81% (10%) at a sample size of 60. In one of the datasets (CONSORT-SPI), a ≥80% replicability was reached at a sample size of 80. On average, increasing the sample size from 80 to 160 increased the replicability of results by a further 3% and reduced variability by 1%. For subgroup analysis based on participant characteristics (eg, gender, age, professional role), using resampled samples of 20 to 100 showed that a sample size of 20 to 30 resulted to moderate replicability levels of 64% to 77%. CONCLUSION: We found that a minimum sample size of 60-80 participants in multistakeholder Delphi surveys provides a high level of replicability (≥80%) in the results. For Delphi studies limited to individual stakeholder groups (such as researchers, clinicians, patients), a sample size of 20 to 30 per group may be sufficient.
Assuntos
Técnica Delphi , Humanos , Tamanho da Amostra , Reprodutibilidade dos Testes , Feminino , Masculino , Inquéritos e Questionários/normas , Pessoa de Meia-Idade , Adulto , Projetos de Pesquisa/normasRESUMO
Although reproducibility is central to the scientific method, its understanding within the research community remains insufficient. We aimed to explore the perceptions of research reproducibility among stakeholders within academia, learn about possible barriers and facilitators to reproducibility-related practices, and gather their suggestions for the Croatian Reproducibility Network website. We conducted four focus groups with researchers, teachers, editors, research managers, and policymakers from Croatia (n = 23). The participants observed a lack of consensus on the core definitions of reproducibility, both generally and between disciplines. They noted that incentivization and recognition of reproducibility-related practices from publishers and institutions, alongside comprehensive education adapted to the researchers' career stage, could help with implementing reproducibility. Education was considered essential to these efforts, as it could help create a research culture based on good reproducibility-related practices and behavior rather than one driven by mandates or career advancement. This was particularly found to be relevant for growing reproducibility efforts globally. Regarding the Croatian Reproducibility Network website, the participants suggested we adapt the content to users from different disciplines or career stages and offer guidance and tools for reproducibility through which we should present core reproducibility concepts. Our findings could inform other initiatives focused on improving research reproducibility.
RESUMO
BACKGROUND: Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. RESULTS: Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . CONCLUSIONS: Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.
Assuntos
RNA-Seq , Análise de Célula Única , Software , Análise de Célula Única/métodos , RNA-Seq/métodos , Análise por Conglomerados , Algoritmos , Análise de Sequência de RNA/métodos , Humanos , Transcriptoma/genética , Reprodutibilidade dos Testes , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula ÚnicaRESUMO
Grouping/read-across is widely used for predicting the toxicity of data-poor target substance(s) using data-rich source substance(s). While the chemical industry and the regulators recognise its benefits, registration dossiers are often rejected due to weak analogue/category justifications based largely on the structural similarity of source and target substances. Here we demonstrate how multi-omics measurements can improve confidence in grouping via a statistical assessment of the similarity of molecular effects. Six azo dyes provided a pool of potential source substances to predict long-term toxicity to aquatic invertebrates (Daphnia magna) for the dye Disperse Yellow 3 (DY3) as the target substance. First, we assessed the structural similarities of the dyes, generating a grouping hypothesis with DY3 and two Sudan dyes within one group. Daphnia magna were exposed acutely to equi-effective doses of all seven dyes (each at 3 doses and 3 time points), transcriptomics and metabolomics data were generated from 760 samples. Multi-omics bioactivity profile-based grouping uniquely revealed that Sudan 1 (S1) is the most suitable analogue for read-across to DY3. Mapping ToxPrint structural fingerprints of the dyes onto the bioactivity profile-based grouping indicated an aromatic alcohol moiety could be responsible for this bioactivity similarity. The long-term reproductive toxicity to aquatic invertebrates of DY3 was predicted from S1 (21-day NOEC, 40 µg/L). This prediction was confirmed experimentally by measuring the toxicity of DY3 in D. magna. While limitations of this 'omics approach are identified, the study illustrates an effective statistical approach for building chemical groups.
Assuntos
Compostos Azo , Corantes , Daphnia , Poluentes Químicos da Água , Daphnia/efeitos dos fármacos , Animais , Compostos Azo/toxicidade , Compostos Azo/química , Corantes/toxicidade , Poluentes Químicos da Água/toxicidade , Metabolômica , Testes de Toxicidade/métodos , Transcriptoma/efeitos dos fármacos , Daphnia magna , MultiômicaRESUMO
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
Assuntos
Asma , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Asma/genética , Cadeias de Markov , Colite Ulcerativa/genética , Reprodutibilidade dos Testes , Fenótipo , GenótipoRESUMO
PURPOSE: Our objective is to describe how the U.S. Food and Drug Administration (FDA)'s Sentinel System implements best practices to ensure trust in drug safety studies using real-world data from disparate sources. METHODS: We present a stepwise schematic for Sentinel's data harmonization, data quality check, query design and implementation, and reporting practices, and describe approaches to enhancing the transparency, reproducibility, and replicability of studies at each step. CONCLUSIONS: Each Sentinel data partner converts its source data into the Sentinel Common Data Model. The transformed data undergoes rigorous quality checks before it can be used for Sentinel queries. The Sentinel Common Data Model framework, data transformation codes for several data sources, and data quality assurance packages are publicly available. Designed to run against the Sentinel Common Data Model, Sentinel's querying system comprises a suite of pre-tested, parametrizable computer programs that allow users to perform sophisticated descriptive and inferential analysis without having to exchange individual-level data across sites. Detailed documentation of capabilities of the programs as well as the codes and information required to execute them are publicly available on the Sentinel website. Sentinel also provides public trainings and online resources to facilitate use of its data model and querying system. Its study specifications conform to established reporting frameworks aimed at facilitating reproducibility and replicability of real-world data studies. Reports from Sentinel queries and associated design and analytic specifications are available for download on the Sentinel website. Sentinel is an example of how real-world data can be used to generate regulatory-grade evidence at scale using a transparent, reproducible, and replicable process.
Assuntos
Farmacoepidemiologia , United States Food and Drug Administration , Farmacoepidemiologia/métodos , Reprodutibilidade dos Testes , United States Food and Drug Administration/normas , Humanos , Estados Unidos , Confiabilidade dos Dados , Sistemas de Notificação de Reações Adversas a Medicamentos/estatística & dados numéricos , Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Bases de Dados Factuais/normas , Projetos de Pesquisa/normasRESUMO
In the last decade, scientists investigating human social cognition have started bringing traditional laboratory paradigms more "into the wild" to examine how socio-cognitive mechanisms of the human brain work in real-life settings. As this implies transferring 2D observational paradigms to 3D interactive environments, there is a risk of compromising experimental control. In this context, we propose a methodological approach which uses humanoid robots as proxies of social interaction partners and embeds them in experimental protocols that adapt classical paradigms of cognitive psychology to interactive scenarios. This allows for a relatively high degree of "naturalness" of interaction and excellent experimental control at the same time. Here, we present two case studies where our methods and tools were applied and replicated across two different laboratories, namely the Italian Institute of Technology in Genova (Italy) and the Agency for Science, Technology and Research in Singapore. In the first case study, we present a replication of an interactive version of a gaze-cueing paradigm reported in Kompatsiari et al. (J Exp Psychol Gen 151(1):121-136, 2022). The second case study presents a replication of a "shared experience" paradigm reported in Marchesi et al. (Technol Mind Behav 3(3):11, 2022). As both studies replicate results across labs and different cultures, we argue that our methods allow for reliable and replicable setups, even though the protocols are complex and involve social interaction. We conclude that our approach can be of benefit to the research field of social cognition and grant higher replicability, for example, in cross-cultural comparisons of social cognition mechanisms.
Assuntos
Cognição Social , Interação Social , Humanos , Robótica/métodos , Masculino , Itália , Sinais (Psicologia) , Feminino , Adulto , Cognição/fisiologia , Relações InterpessoaisRESUMO
The functional connectome changes with aging. We systematically evaluated aging related alterations in the functional connectome using a whole-brain connectome network analysis in 39,675 participants in UK Biobank project. We used adaptive dense network discovery tools to identify networks directly associated with aging from resting-state fMRI data. We replicated our findings in 499 participants from the Lifespan Human Connectome Project in Aging study. The results consistently revealed two motor-related subnetworks (both permutation test p-values <0.001) that showed a decline in resting-state functional connectivity (rsFC) with increasing age. The first network primarily comprises sensorimotor and dorsal/ventral attention regions from precentral gyrus, postcentral gyrus, superior temporal gyrus, and insular gyrus, while the second network is exclusively composed of basal ganglia regions, namely the caudate, putamen, and globus pallidus. Path analysis indicates that white matter fractional anisotropy mediates 19.6% (p<0.001, 95% CI [7.6% 36.0%]) and 11.5% (p<0.001, 95% CI [6.3% 17.0%]) of the age-related decrease in both networks, respectively. The total volume of white matter hyperintensity mediates 32.1% (p<0.001, 95% CI [16.8% 53.0%]) of the aging-related effect on rsFC in the first subnetwork.
RESUMO
The two-trials rule for drug approval requires "at least two adequate and well-controlled studies, each convincing on its own, to establish effectiveness." This is usually implemented by requiring two significant pivotal trials and is the standard regulatory requirement to provide evidence for a new drug's efficacy. However, there is need to develop suitable alternatives to this rule for a number of reasons, including the possible availability of data from more than two trials. I consider the case of up to three studies and stress the importance to control the partial Type-I error rate, where only some studies have a true null effect, while maintaining the overall Type-I error rate of the two-trials rule, where all studies have a null effect. Some less-known P $$ P $$ -value combination methods are useful to achieve this: Pearson's method, Edgington's method and the recently proposed harmonic mean χ 2 $$ {\chi}^2 $$ -test. I study their properties and discuss how they can be extended to a sequential assessment of success while still ensuring overall Type-I error control. I compare the different methods in terms of partial Type-I error rate, project power and the expected number of studies required. Edgington's method is eventually recommended as it is easy to implement and communicate, has only moderate partial Type-I error rate inflation but substantially increased project power.
Assuntos
Aprovação de Drogas , Humanos , Ensaios Clínicos como Assunto/economia , Modelos Estatísticos , Projetos de PesquisaRESUMO
When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.
Assuntos
Psicometria , Humanos , Psicometria/métodos , Modelos Estatísticos , Interpretação Estatística de Dados , COVID-19/epidemiologia , Modelos Lineares , Simulação por ComputadorRESUMO
Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.