RESUMEN
We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inferences on the average treatment effects of combined data across sites. Our methods first compute summary statistics locally using propensity scores and then aggregate these statistics across sites to obtain point and variance estimators of average treatment effects. We show that these estimators are consistent and asymptotically normal. To achieve these asymptotic properties, we find that the aggregation schemes need to account for the heterogeneity in treatment assignments and in outcomes across sites. We demonstrate the validity of our federated methods through a comparative study of two large medical claims databases.
Asunto(s)
Puntaje de Propensión , Humanos , Causalidad , Bases de Datos Factuales , Interpretación Estadística de DatosRESUMEN
BACKGROUND: Joint and individual variation explained (JIVE), distinct and common simultaneous component analysis (DISCO) and O2-PLS, a two-block (X-Y) latent variable regression method with an integral OSC filter can all be used for the integrated analysis of multiple data sets and decompose them in three terms: a low(er)-rank approximation capturing common variation across data sets, low(er)-rank approximations for structured variation distinctive for each data set, and residual noise. In this paper these three methods are compared with respect to their mathematical properties and their respective ways of defining common and distinctive variation. RESULTS: The methods are all applied on simulated data and mRNA and miRNA data-sets from GlioBlastoma Multiform (GBM) brain tumors to examine their overlap and differences. When the common variation is abundant, all methods are able to find the correct solution. With real data however, complexities in the data are treated differently by the three methods. CONCLUSIONS: All three methods have their own approach to estimate common and distinctive variation with their specific strength and weaknesses. Due to their orthogonality properties and their used algorithms their view on the data is slightly different. By assuming orthogonality between common and distinctive, true natural or biological phenomena that may not be orthogonal at all might be misinterpreted.
Asunto(s)
Algoritmos , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Neoplasias Encefálicas/patología , Glioblastoma/genética , Glioblastoma/metabolismo , Glioblastoma/patología , Humanos , MicroARNs/metabolismo , Análisis de Componente Principal , ARN Mensajero/metabolismoRESUMEN
The structure determination of an integral membrane protein using synchrotron X-ray diffraction data collected at room temperature directly in vapour-diffusion crystallization plates (in situ) is demonstrated. Exposing the crystals in situ eliminates manual sample handling and, since it is performed at room temperature, removes the complication of cryoprotection and potential structural anomalies induced by sample cryocooling. Essential to the method is the ability to limit radiation damage by recording a small amount of data per sample from many samples and subsequently assembling the resulting data sets using specialized software. The validity of this procedure is established by the structure determination of Haemophilus influenza TehA at 2.3â Å resolution. The method presented offers an effective protocol for the fast and efficient determination of membrane-protein structures at room temperature using third-generation synchrotron beamlines.
Asunto(s)
Cristalografía por Rayos X/métodos , Proteínas de la Membrana/química , Proteínas Bacterianas/química , Haemophilus influenzae/química , Modelos Moleculares , Conformación Proteica , TemperaturaRESUMEN
Background: In recent years, many researchers have focused on the use of legacy data, such as pooled analyses that collect and reanalyze data from multiple studies. However, the methodology for the integration of preexisting databases whose data were collected for different purposes has not been established. Previously, we developed a tool to efficiently generate Study Data Tabulation Model (SDTM) data from hypothetical clinical trial data using the Clinical Data Interchange Standards Consortium (CDISC) SDTM. Objective: This study aimed to design a practical model for integrating preexisting databases using the CDISC SDTM. Methods: Data integration was performed in three phases: (1) the confirmation of the variables, (2) SDTM mapping, and (3) the generation of the SDTM data. In phase 1, the definitions of the variables in detail were confirmed, and the data sets were converted to a vertical structure. In phase 2, the items derived from the SDTM format were set as mapping items. Three types of metadata (domain name, variable name, and test code), based on the CDISC SDTM, were embedded in the Research Electronic Data Capture (REDCap) field annotation. In phase 3, the data dictionary, including the SDTM metadata, was outputted in the Operational Data Model (ODM) format. Finally, the mapped SDTM data were generated using REDCap2SDTM version 2. Results: SDTM data were generated as a comma-separated values file for each of the 7 domains defined in the metadata. A total of 17 items were commonly mapped to 3 databases. Because the SDTM data were set in each database correctly, we were able to integrate 3 independently preexisting databases into 1 database in the CDISC SDTM format. Conclusions: Our project suggests that the CDISC SDTM is useful for integrating multiple preexisting databases.
RESUMEN
This study used multiple data sets and modelling choices in an environmental life cycle assessment (LCA) to compare typical disposable beverage cups made from polystyrene (PS), polylactic acid (PLA; bioplastic) and paper lined with bioplastic (biopaper). Incineration and recycling were considered as waste processing options, and for the PLA and biopaper cup also composting and anaerobic digestion. Multiple data sets and modelling choices were systematically used to calculate average results and the spread in results for each disposable cup in eleven impact categories. The LCA results of all combinations of data sets and modelling choices consistently identify three processes that dominate the environmental impact: (1) production of the cup's basic material (PS, PLA, biopaper), (2) cup manufacturing, and (3) waste processing. The large spread in results for impact categories strongly overlaps among the cups, however, and therefore does not allow a preference for one type of cup material. Comparison of the individual waste treatment options suggests some cautious preferences. The average waste treatment results indicate that recycling is the preferred option for PLA cups, followed by anaerobic digestion and incineration. Recycling is slightly preferred over incineration for the biopaper cups. There is no preferred waste treatment option for the PS cups. Taking into account the spread in waste treatment results for all cups, however, none of these preferences for waste processing options can be justified. The only exception is composting, which is least preferred for both PLA and biopaper cups. Our study illustrates that using multiple data sets and modelling choices can lead to considerable spread in LCA results. This makes comparing products more complex, but the outcomes more robust.