RESUMEN
OBJECTIVES: We report the results of glucose measurements performed during one year by the same measurement procedures (MPs) in 58 Norwegian hospital laboratories using control materials provided by external quality assessment (EQA) schemes from two different providers. The providers used materials with presumed vs. verified commutability and transfers of values using reference material vs. using a highest-order reference MP. METHODS: Data from six Labquality and three Noklus glucose EQA surveys were aggregated for each MP (Abbott Alinity, Abbott Architect, Roche Cobas, and Siemens Advia) in each scheme. For each EQA result, percent difference from target value (% bias) was calculated. Median percent bias for each MP per scheme was then calculated. RESULTS: The median % biases observed for each MP in the Labquality scheme were significantly larger than those in the Noklus scheme, which uses verified commutable control materials and highest-order reference MP target values. The difference ranged from 1.2 (Roche Cobas, 2.9 vs. 1.7â¯%) to 4.4 percentage points (Siemens Advia, 3.2â¯% vs. -1.2â¯%). The order of bias size for the various MPs was different in the two schemes. In contrast to the Labquality scheme, the median % biases observed in the Noklus scheme for Abbott Alinity (-0.1â¯%), Abbott Architect (-0.5â¯%), and Siemens Advia (-1.2â¯%) were not significantly different from target value (p>0.756). CONCLUSIONS: This study underlines the importance of using verified commutable EQA materials and target values traceable to reference MPs in EQA schemes designed for assessment of metrological traceability of laboratory results.
Asunto(s)
Laboratorios de Hospital , Laboratorios , Humanos , Control de Calidad , Glucosa , Sesgo , Valores de Referencia , Estándares de ReferenciaRESUMEN
Single-cell transcriptomics has revolutionized our understanding of basic biology and disease. Since transcript levels often do not correlate with protein expression, it is crucial to complement transcriptomics approaches with proteome analyses at single-cell resolution. Despite continuous technological improvements in sensitivity, mass-spectrometry-based single-cell proteomics ultimately faces the challenge of reproducibly comparing the protein expression profiles of thousands of individual cells. Here, we combine two hitherto opposing analytical strategies, DIA and Tandem-Mass-Tag (TMT)-multiplexing, to generate highly reproducible, quantitative proteome signatures from ultralow input samples. We developed a novel, identification-independent proteomics data-analysis pipeline that allows to quantitatively compare DIA-TMT proteome signatures across hundreds of samples independent of their biological origin to identify cell types and single protein knockouts. These proteome signatures overcome the need to impute quantitative data due to accumulating detrimental amounts of missing data in standard multibatch TMT experiments. We validate our approach using integrative data analysis of different human cell lines and standard database searches for knockouts of defined proteins. Our data establish a novel and reproducible approach to markedly expand the numbers of proteins one detects from ultralow input samples.
Asunto(s)
Proteoma , Espectrometría de Masas en Tándem , Línea Celular , Humanos , Procesamiento Proteico-Postraduccional , Proteoma/metabolismo , ProteómicaRESUMEN
Wireless sensor networks (WSNs) are structured for monitoring an area with distributed sensors and built-in batteries. However, most of their battery energy is consumed during the data transmission process. In recent years, several methodologies, like routing optimization, topology control, and sleep scheduling algorithms, have been introduced to improve the energy efficiency of WSNs. This study introduces a novel method based on a deep learning approach that utilizes variational autoencoders (VAEs) to improve the energy efficiency of WSNs by compressing transmission data. The VAE approach is customized in this work for compressing WSN data by retaining its important features. This is achieved by analyzing the statistical structure of the sensor data rather than providing a fixed-size latent representation. The performance of the proposed model is verified using a MATLAB simulation platform, integrating a pre-trained variational autoencoder model with openly available wireless sensor data. The performance of the proposed model is found to be satisfactory in comparison to traditional methods, like the compressed sensing technique, lightweight temporal compression, and the autoencoder, in terms of having an average compression rate of 1.5572. The WSN simulation also indicates that the VAE-incorporated architecture attains a maximum network lifetime of 1491 s and suggests that VAE could be used for compression-based transmission using WSNs, as its reconstruction rate is 0.9902, which is better than results from all the other techniques.
RESUMEN
Due to the uniqueness of the underwater environment, traditional data aggregation schemes face many challenges. Most existing data aggregation solutions do not fully consider node trustworthiness, which may result in the inclusion of falsified data sent by malicious nodes during the aggregation process, thereby affecting the accuracy of the aggregated results. Additionally, because of the dynamically changing nature of the underwater environment, current solutions often lack sufficient flexibility to handle situations such as node movement and network topology changes, significantly impacting the stability and reliability of data transmission. To address the aforementioned issues, this paper proposes a secure data aggregation algorithm based on a trust mechanism. By dynamically adjusting the number and size of node slices based on node trust values and transmission distances, the proposed algorithm effectively reduces network communication overhead and improves the accuracy of data aggregation. Due to the variability in the number of node slices, even if attackers intercept some slices, it is difficult for them to reconstruct the complete data, thereby ensuring data security.
RESUMEN
Existing secure data aggregation protocols are weaker to eliminate data redundancy and protect wireless sensor networks (WSNs). Only some existing approaches have solved this singular issue when aggregating data. However, there is a need for a multi-featured protocol to handle the multiple problems of data aggregation, such as energy efficiency, authentication, authorization, and maintaining the security of the network. Looking at the significant demand for multi-featured data aggregation protocol, we propose secure data aggregation using authentication and authorization (SDAAA) protocol to detect malicious attacks, particularly cyberattacks such as sybil and sinkhole, to extend network performance. These attacks are more complex to address through existing cryptographic protocols. The proposed SDAAA protocol comprises a node authorization algorithm that permits legitimate nodes to communicate within the network. This SDAAA protocol's methods help improve the quality of service (QoS) parameters. Furthermore, we introduce a mathematical model to improve accuracy, energy efficiency, data freshness, authorization, and authentication. Finally, our protocol is tested in an intelligent healthcare WSN patient-monitoring application scenario and verified using an OMNET++ simulator. Based upon the results, we confirm that our proposed SDAAA protocol attains a throughput of 444 kbs, representing a 98% of data/network channel capacity rate; an energy consumption of 2.6 joules, representing 99% network energy efficiency; an effected network of 2.45, representing 99.5% achieved overall performance of the network; and time complexity of 0.08 s, representing 98.5% efficiency of the proposed SDAAA approach. By contrast, contending protocols such as SD, EEHA, HAS, IIF, and RHC have throughput ranges between 415-443, representing 85-90% of the data rate/channel capacity of the network; energy consumption in the range of 3.0-3.6 joules, representing 88-95% energy efficiency of the network; effected network range of 2.98, representing 72-89% improved overall performance of the network; and time complexity in the range of 0.20 s, representing 72-89% efficiency of the proposed SDAAA approach. Therefore, our proposed SDAAA protocol outperforms other known approaches, such as SD, EEHA, HAS, IIF, and RHC, designed for secure data aggregation in a similar environment.
RESUMEN
Multinomial processing tree (MPT) models are prominent and frequently used tools to model and measure cognitive processes underlying responses in many experimental paradigms. Although MPT models typically refer to cognitive processes within single individuals, they have often been applied to group data aggregated across individuals. We investigate the conditions under which MPT analyses of aggregate data make sense. After introducing the notions of structural and empirical aggregation invariance of MPT models, we show that any MPT model that holds at the level of single individuals must also hold at the aggregate level when it is both structurally and empirically aggregation invariant. Moreover, group-level parameters of aggregation-invariant MPT models are equivalent to the expected values (i.e., means) of the corresponding individual parameters. To investigate the robustness of MPT results for aggregate data when one or both invariance conditions are violated, we additionally performed a series of simulation studies, systematically manipulating (1) the sample sizes in different trees of the model, (2) model parameterization, (3) means and variances of crucial model parameters, and (4) their correlations with other parameters of the respective MPT model. Overall, our results show that MPT parameter estimates based on aggregate data are trustworthy under rather general conditions, provided that a few preconditions are met.
Asunto(s)
Modelos Estadísticos , Humanos , Cognición/fisiología , Simulación por Computador , Modelos Psicológicos , Interpretación Estadística de DatosRESUMEN
Although xylem embolism is a key process during drought-induced tree mortality, its relationship to wood anatomy remains debated. While the functional link between bordered pits and embolism resistance is known, there is no direct, mechanistic explanation for the traditional assumption that wider vessels are more vulnerable than narrow ones. We used data from 20 temperate broad-leaved tree species to study the inter- and intraspecific relationship of water potential at 50% loss of conductivity (P50 ) with hydraulically weighted vessel diameter (Dh ) and tested its link to pit membrane thickness (TPM ) and specific conductivity (Ks ) on species level. Embolism-resistant species had thick pit membranes and narrow vessels. While Dh was weakly associated with TPM , the P50 -Dh relationship remained highly significant after accounting for TPM . The interspecific pattern between P50 and Dh was mirrored by a link between P50 and Ks , but there was no evidence for an intraspecific relationship. Our results provide robust evidence for an interspecific P50 -Dh relationship across our species. As a potential cause for the inconsistencies in published P50 -Dh relationships, our analysis suggests differences in the range of trait values covered, and the level of data aggregation (species, tree or sample level) studied.
Asunto(s)
Embolia , Xilema , Xilema/anatomía & histología , Madera/anatomía & histología , Sequías , Agua , ÁrbolesRESUMEN
Continuous release of image databases with fully or partially identical inner categories dramatically deteriorates the production of autonomous Computer-Aided Diagnostics (CAD) systems for true comprehensive medical diagnostics. The first challenge is the frequent massive bulk release of medical image databases, which often suffer from two common drawbacks: image duplication and corruption. The many subsequent releases of the same data with the same classes or categories come with no clear evidence of success in the concatenation of those identical classes among image databases. This issue stands as a stumbling block in the path of hypothesis-based experiments for the production of a single learning model that can successfully classify all of them correctly. Removing redundant data, enhancing performance, and optimizing energy resources are among the most challenging aspects. In this article, we propose a global data aggregation scale model that incorporates six image databases selected from specific global resources. The proposed valid learner is based on training all the unique patterns within any given data release, thereby creating a unique dataset hypothetically. The Hash MD5 algorithm (MD5) generates a unique hash value for each image, making it suitable for duplication removal. The T-Distributed Stochastic Neighbor Embedding (t-SNE), with a tunable perplexity parameter, can represent data dimensions. Both the Hash MD5 and t-SNE algorithms are applied recursively, producing a balanced and uniform database containing equal samples per category: normal, pneumonia, and Coronavirus Disease of 2019 (COVID-19). We evaluated the performance of all proposed data and the new automated version using the Inception V3 pre-trained model with various evaluation metrics. The performance outcome of the proposed scale model showed more respectable results than traditional data aggregation, achieving a high accuracy of 98.48%, along with high precision, recall, and F1-score. The results have been proved through a statistical t-test, yielding t-values and p-values. It's important to emphasize that all t-values are undeniably significant, and the p-values provide irrefutable evidence against the null hypothesis. Furthermore, it's noteworthy that the Final dataset outperformed all other datasets across all metric values when diagnosing various lung infections with the same factors.
Asunto(s)
COVID-19 , Neumonía , Humanos , COVID-19/diagnóstico por imagen , Rayos X , Neumonía/diagnóstico por imagen , Algoritmos , Pulmón/diagnóstico por imagenRESUMEN
By definition, the aggregating methodology ensures that transmitted data remain visible in clear text in the aggregated units or nodes. Data transmission without encryption is vulnerable to security issues such as data confidentiality, integrity, authentication and attacks by adversaries. On the other hand, encryption at each hop requires extra computation for decrypting, aggregating, and then re-encrypting the data, which results in increased complexity, not only in terms of computation but also due to the required sharing of keys. Sharing the same key across various nodes makes the security more vulnerable. An alternative solution to secure the aggregation process is to provide an end-to-end security protocol, wherein intermediary nodes combine the data without decoding the acquired data. As a consequence, the intermediary aggregating nodes do not have to maintain confidential key values, enabling end-to-end security across sensor devices and base stations. This research presents End-to-End Homomorphic Encryption (EEHE)-based safe and secure data gathering in IoT-based Wireless Sensor Networks (WSNs), whereby it protects end-to-end security and enables the use of aggregator functions such as COUNT, SUM and AVERAGE upon encrypted messages. Such an approach could also employ message authentication codes (MAC) to validate data integrity throughout data aggregation and transmission activities, allowing fraudulent content to also be identified as soon as feasible. Additionally, if data are communicated across a WSN, then there is a higher likelihood of a wormhole attack within the data aggregation process. The proposed solution also ensures the early detection of wormhole attacks during data aggregation.
Asunto(s)
Seguridad Computacional , Agregación de Datos , Redes de Comunicación de Computadores , Algoritmos , ConfidencialidadRESUMEN
The Internet of Things (IoT) is an advanced technology that comprises numerous devices with carrying sensors to collect, send, and receive data. Due to its vast popularity and efficiency, it is employed in collecting crucial data for the health sector. As the sensors generate huge amounts of data, it is better for the data to be aggregated before being transmitting the data further. These sensors generate redundant data frequently and transmit the same values again and again unless there is no variation in the data. The base scheme has no mechanism to comprehend duplicate data. This problem has a negative effect on the performance of heterogeneous networks.It increases energy consumption; and requires high control overhead, and additional transmission slots are required to send data. To address the above-mentioned challenges posed by duplicate data in the IoT-based health sector, this paper presents a fuzzy data aggregation system (FDAS) that aggregates data proficiently and reduces the same range of normal data sizes to increase network performance and decrease energy consumption. The appropriate parent node is selected by implementing fuzzy logic, considering important input parameters that are crucial from the parent node selection perspective and share Boolean digit 0 for the redundant values to store in a repository for future use. This increases the network lifespan by reducing the energy consumption of sensors in heterogeneous environments. Therefore, when the complexity of the environment surges, the efficiency of FDAS remains stable. The performance of the proposed scheme has been validated using the network simulator and compared with base schemes. According to the findings, the proposed technique (FDAS) dominates in terms of reducing energy consumption in both phases, achieves better aggregation, reduces control overhead, and requires the fewest transmission slots.
RESUMEN
INTRODUCTION: Recent developments in the postoperative evaluation of deep brain stimulation surgery on the group level warrant the detection of achieved electrode positions based on postoperative imaging. Computed tomography (CT) is a frequently used imaging modality, but because of its idiosyncrasies (high spatial accuracy at low soft tissue resolution), it has not been sufficient for the parallel determination of electrode position and details of the surrounding brain anatomy (nuclei). The common solution is rigid fusion of CT images and magnetic resonance (MR) images, which have much better soft tissue contrast and allow accurate normalization into template spaces. Here, we explored a deep-learning approach to directly relate positions (usually the lead position) in postoperative CT images to the native anatomy of the midbrain and group space. MATERIALS AND METHODS: Deep learning is used to create derived tissue contrasts (white matter, gray matter, cerebrospinal fluid, brainstem nuclei) based on the CT image; that is, a convolution neural network (CNN) takes solely the raw CT image as input and outputs several tissue probability maps. The ground truth is based on coregistrations with MR contrasts. The tissue probability maps are then used to either rigidly coregister or normalize the CT image in a deformable way to group space. The CNN was trained in 220 patients and tested in a set of 80 patients. RESULTS: Rigorous validation of such an approach is difficult because of the lack of ground truth. We examined the agreements between the classical and proposed approaches and considered the spread of implantation locations across a group of identically implanted subjects, which serves as an indicator of the accuracy of the lead localization procedure. The proposed procedure agrees well with current magnetic resonance imaging-based techniques, and the spread is comparable or even lower. CONCLUSIONS: Postoperative CT imaging alone is sufficient for accurate localization of the midbrain nuclei and normalization to the group space. In the context of group analysis, it seems sufficient to have a single postoperative CT image of good quality for inclusion. The proposed approach will allow researchers and clinicians to include cases that were not previously suitable for analysis.
Asunto(s)
Estimulación Encefálica Profunda , Aprendizaje Profundo , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Encéfalo/diagnóstico por imagen , Encéfalo/cirugía , Tomografía Computarizada por Rayos X/métodos , Imagen por Resonancia Magnética/métodosRESUMEN
Threatened species monitoring can produce enormous quantities of acoustic and visual recordings which must be searched for animal detections. Data coding is extremely time-consuming for humans and even though machine algorithms are emerging as useful tools to tackle this task, they too require large amounts of known detections for training. Citizen scientists are often recruited via crowd-sourcing to assist. However, the results of their coding can be difficult to interpret because citizen scientists lack comprehensive training and typically each codes only a small fraction of the full dataset. Competence may vary between citizen scientists, but without knowing the ground truth of the dataset, it is difficult to identify which citizen scientists are most competent. We used a quantitative cognitive model, cultural consensus theory, to analyze both empirical and simulated data from a crowdsourced analysis of audio recordings of Australian frogs. Several hundred citizen scientists were asked whether the calls of nine frog species were present on 1260 brief audio recordings, though most only coded a fraction of these recordings. Through modeling, characteristics of both the citizen scientist cohort and the recordings were estimated. We then compared the model's output to expert coding of the recordings and found agreement between the cohort's consensus and the expert evaluation. This finding adds to the evidence that crowdsourced analyses can be utilized to understand large-scale datasets, even when the ground truth of the dataset is unknown. The model-based analysis provides a promising tool to screen large datasets prior to investing expert time and resources.
RESUMEN
A number of statistical approaches have been proposed for incorporating supplemental information in randomized clinical trials. Existing methods often compare the marginal treatment effects to evaluate the degree of consistency between sources. Dissimilar marginal treatment effects would either lead to increased bias or down-weighting of the supplemental data. This represents a limitation in the presence of treatment effect heterogeneity, in which case the marginal treatment effect may differ between the sources solely due to differences between the study populations. We introduce the concept of covariate-adjusted exchangeability, in which differences in the marginal treatment effect can be explained by differences in the distributions of the effect modifiers. The potential outcomes framework is used to conceptualize covariate-adjusted and marginal exchangeability. We utilize a linear model and the existing multisource exchangeability models framework to facilitate borrowing when marginal treatment effects are dissimilar but covariate-adjusted exchangeability holds. We investigate the operating characteristics of our method using simulations. We also illustrate our method using data from two clinical trials of very low nicotine content cigarettes. Our method has the ability to incorporate supplemental information in a wider variety of situations than when only marginal exchangeability is considered.
Asunto(s)
Modelos Estadísticos , Productos de Tabaco , Sesgo , Humanos , Proyectos de InvestigaciónRESUMEN
Definitive clinical trials are resource intensive, often requiring a large number of participants over several years. One approach to improve the efficiency of clinical trials is to incorporate historical information into the primary trial analysis. This approach has tremendous potential in the areas of pediatric or rare disease trials, where achieving reasonable power is difficult. In this article, we introduce a novel Bayesian group-sequential trial design based on Multisource Exchangeability Models, which allows for dynamic borrowing of historical information at the interim analyses. Our approach achieves synergy between group sequential and adaptive borrowing methodology to attain improved power and reduced sample size. We explore the frequentist operating characteristics of our design through simulation and compare our method to a traditional group-sequential design. Our method achieves earlier stopping of the primary study while increasing power under the alternative hypothesis but has a potential for type I error inflation under some null scenarios. We discuss the issues of decision boundary determination, power and sample size calculations, and the issue of information accrual. We present our method for a continuous and binary outcome, as well as in a linear regression setting.
Asunto(s)
Proyectos de Investigación , Teorema de Bayes , Niño , Simulación por Computador , Humanos , Tamaño de la MuestraRESUMEN
In this article, we will discuss the genesis, evolution, and progress of the INternational Soft Tissue SaRcoma ConsorTium (INSTRuCT), which aims to foster international research and collaboration focused on pediatric soft tissue sarcoma. We will begin by highlighting the current state of clinical research for pediatric soft tissue sarcomas, including rhabdomyosarcoma and non-rhabdomyosarcoma soft tissue sarcoma. We will then explore challenges and research priorities, describe the development of INSTRuCT, and discuss how the consortium aims to address key research priorities.
Asunto(s)
Rabdomiosarcoma , Sarcoma , Neoplasias de los Tejidos Blandos , Niño , Humanos , Sarcoma/terapia , Neoplasias de los Tejidos Blandos/terapiaRESUMEN
Abnormal electricity data, caused by electricity theft or meter failure, leads to the inaccuracy of aggregation results. These inaccurate results not only harm the interests of users but also affect the decision-making of the power system. However, the existing data aggregation schemes do not consider the impact of abnormal data. How to filter out abnormal data is a challenge. To solve this problem, in this study, we propose a lightweight and privacy-friendly data aggregation scheme against abnormal data, in which the valid data can correctly be aggregated but abnormal data will be filtered out during the aggregation process. This is more suitable for resource-limited smart meters, due to the adoption of lightweight matrix encryption. The automatic filtering of abnormal data without additional processes and the detection of abnormal data sources are where our protocol outperforms other schemes. Finally, a detailed security analysis shows that the proposed scheme can protect the privacy of users' data. In addition, the results of extensive simulations demonstrate that the additional computation cost to filter the abnormal data is within the acceptable range, which shows that our proposed scheme is still very effective.
Asunto(s)
Seguridad Computacional , Privacidad , Algoritmos , Confidencialidad , Agregación de DatosRESUMEN
A wireless sensor network (WSN) consists of a very large number of sensors which are deployed in the specific area of interest. A sensor is an electronic device equipped with a small processor and has a small-capacity memory. The WSN has the functions of low cost, easy deployment, and random reconfiguration. In this paper, an energy-efficient load balancing tree-based data aggregation scheme (LB-TBDAS) for grid-based WSNs is proposed. In this scheme, the sensing area is partitioned into many cells of a grid and then the sensor node with the maximum residual energy is elected to be the cell head in each cell. Then, the tree-like path is established by using the minimum spanning tree algorithm. In the tree construction, it must meet the three constraints, which are the minimum energy consumption spanning tree, the network depth, and the maximum number of child nodes. In the data transmission process, the cell head is responsible for collecting the sensing data in each cell, and the collected data are transmitted along the tree-like path to the base station (BS). Simulation results show that the total energy consumption of LB-TBDAS is significantly less than that of GB-PEDAP and PEDAP. Compared to GB-PEDAP and PEDAP, the proposed LB-TBDAS extends the network lifetime by more than 100%. The proposed LB-TBDAS can avoid excessive energy consumption of sensor nodes during multi-hop data transmission and can also avoid the hotspot problem of WSNs.
Asunto(s)
Conservación de los Recursos Energéticos , Agregación de Datos , Niño , Humanos , Sistemas de Computación , Recolección de Datos , ElectrónicaRESUMEN
With the development of the Internet of Things, smart grids have become indispensable in our daily life and can provide people with reliable electricity generation, transmission, distribution and control. Therefore, how to design a privacy-preserving data aggregation protocol has been a research hot-spot in smart grid technology. However, these proposed protocols often contain some complex cryptographic operations, which are not suitable for resource-constrained smart meter devices. In this paper, we combine data aggregation and the outsourcing of computations to design two privacy-preserving outsourcing algorithms for the modular exponentiation operations involved in the multi-dimensional data aggregation, which can allow these smart meter devices to delegate complex computation tasks to nearby servers for computing. By utilizing our proposed outsourcing algorithms, the computational overhead of resource-constrained smart meter devices can be greatly reduced in the process of data encryption and aggregation. In addition, the proposed algorithms can protect the input's privacy of smart meter devices and ensure that the smart meter devices can verify the correctness of results from the server with a very small computational cost. From three aspects, including security, verifiability and efficiency, we give a detailed analysis about our proposed algorithms. Finally, through carrying out some experiments, we prove that our algorithms can improve the efficiency of performing the data encryption and aggregation on the smart meter device side.
Asunto(s)
Servicios Externos , Privacidad , Algoritmos , Seguridad Computacional , Sistemas de Computación , HumanosRESUMEN
Estimating individualized treatment rules-particularly in the context of right-censored outcomes-is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom's Clinical Practice Research Datalink.
Asunto(s)
Confidencialidad , Depresión , Medicina de Precisión , Depresión/terapia , Humanos , Gravedad del Paciente , Resultado del TratamientoRESUMEN
BACKGROUND: We previously found additive effects of long- and short-term exposures to fine particulate matter (PM2.5), ozone (O3), and nitrogen dioxide (NO2) on all-cause mortality rate using a generalized propensity score (GPS) adjustment approach. The study addressed an important question of how many early deaths were caused by each exposure. However, the study was computationally expensive, did not capture possible interactions and high-order nonlinearities, and omitted potential confounders. METHODS: We proposed two new methods and reconducted the analysis using the same cohort of Medicare beneficiaries in Massachusetts during 2000-2012, which consisted of 1.5 million individuals with 3.8 billion person-days of follow-up. The first method, weighted least squares (WLS), leveraged large volume of data by aggregating person-days, which gave equivalent results to the linear probability model (LPM) method in the previous analysis but significantly reduced computational burden. The second method, m-out-of-n random forests (moonRF), implemented scaling random forests that captured all possible interactions and nonlinearities in the GPS model. To minimize confounding bias, we additionally controlled relative humidity and health care utilizations that were not included previously. Further, we performed low-level analysis by restricting to person-days with exposure levels below increasingly stringent thresholds. RESULTS: We found consistent results between LPM/WLS and moonRF: all exposures were positively associated with mortality rate, even at low levels. For long-term PM2.5 and O3, the effect estimates became larger at lower levels. Long-term exposure to PM2.5 posed the highest risk: 1 µg/m3 increase in long-term PM2.5 was associated with 1053 (95% confidence interval [CI]: 984, 1122; based on LPM/WLS methods) or 1058 (95% CI: 988, 1127; based on moonRF method) early deaths each year among the Medicare population in Massachusetts. CONCLUSIONS: This study provides more rigorous causal evidence between PM2.5, O3, and NO2 exposures and mortality, even at low levels. The largest effect estimate for long-term PM2.5 suggests that reducing PM2.5 could gain the most substantial benefits. The consistency between LPM/WLS and moonRF suggests that there were not many interactions and high-order nonlinearities. In the big data context, the proposed methods will be useful for future scientific work in estimating causality on an additive scale.