Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Biostatistics ; 24(3): 635-652, 2023 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-34893807

RESUMEN

Nonignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the $\texttt{sva}$ R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html).


Asunto(s)
Expresión Génica , Humanos , Filogenia , Proyectos de Investigación
2.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36576001

RESUMEN

MOTIVATION: In the training of predictive models using high-dimensional genomic data, multiple studies' worth of data are often combined to increase sample size and improve generalizability. A drawback of this approach is that there may be different sets of features measured in each study due to variations in expression measurement platform or technology. It is often common practice to work only with the intersection of features measured in common across all studies, which results in the blind discarding of potentially useful feature information that is measured in individual or subsets of studies. RESULTS: We characterize the loss in predictive performance incurred by using only the intersection of feature information available across all studies when training predictors using gene expression data from microarray and sequencing datasets. We study the properties of linear and polynomial regression for imputing discarded features and demonstrate improvements in the external performance of prediction functions through simulation and in gene expression data collected on breast cancer patients. To improve this process, we propose a pairwise strategy that applies any imputation algorithm to two studies at a time and averages imputed features across pairs. We demonstrate that the pairwise strategy is preferable to first merging all datasets together and imputing any resulting missing features. Finally, we provide insights on which subsets of intersected and study-specific features should be used so that missing-feature imputation best promotes cross-study replicability. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/YujieWuu/Pairwise_imputation. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Asunto(s)
Algoritmos , Genómica , Humanos , Tamaño de la Muestra , Genoma , Simulación por Computador
3.
BMC Infect Dis ; 24(1): 610, 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38902649

RESUMEN

BACKGROUND: Blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease. However, an unresolved issue is whether gene set enrichment analysis of the signature transcripts alone is sufficient for prediction and differentiation or whether it is necessary to use the original model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data and missing details about the original trained model. To facilitate the utilization of these signatures in TB research, comparisons between gene set scoring methods cross-data validation of original model implementations are needed. METHODS: We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both rrebuilt original models and gene set scoring methods. Existing gene set scoring methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, were used as alternative approaches to obtain the profile scores. The area under the ROC curve (AUC) value was computed to measure performance. Correlation analysis and Wilcoxon paired tests were used to compare the performance of enrichment methods with the original models. RESULTS: For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original models. In some cases, PLAGE outperformed the original models when considering signatures' weighted mean AUC values and the AUC results within individual studies. CONCLUSION: Gene set enrichment scoring of existing gene sets can distinguish patients with active TB disease from other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.


Asunto(s)
Perfilación de la Expresión Génica , Tuberculosis , Humanos , Tuberculosis/diagnóstico , Tuberculosis/genética , Tuberculosis/microbiología , Perfilación de la Expresión Génica/métodos , Mycobacterium tuberculosis/genética , Transcriptoma , Curva ROC , Reproducibilidad de los Resultados
4.
J Community Health ; 49(1): 91-99, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37507525

RESUMEN

Occupational exposure to SARS-CoV-2 varies by profession, but "essential workers" are often considered in aggregate in COVID-19 models. This aggregation complicates efforts to understand risks to specific types of workers or industries and target interventions, specifically towards non-healthcare workers. We used census tract-resolution American Community Survey data to develop novel essential worker categories among the occupations designated as COVID-19 Essential Services in Massachusetts. Census tract-resolution COVID-19 cases and deaths were provided by the Massachusetts Department of Public Health. We evaluated the association between essential worker categories and cases and deaths over two phases of the pandemic from March 2020 to February 2021 using adjusted mixed-effects negative binomial regression, controlling for other sociodemographic risk factors. We observed elevated COVID-19 case incidence in census tracts in the highest tertile of workers in construction/transportation/buildings maintenance (Phase 1: IRR 1.32 [95% CI 1.22, 1.42]; Phase 2: IRR: 1.19 [1.13, 1.25]), production (Phase 1: IRR: 1.23 [1.15, 1.33]; Phase 2: 1.18 [1.12, 1.24]), and public-facing sales and services occupations (Phase 1: IRR: 1.14 [1.07, 1.21]; Phase 2: IRR: 1.10 [1.06, 1.15]). We found reduced case incidence associated with greater percentage of essential workers able to work from home (Phase 1: IRR: 0.85 [0.78, 0.94]; Phase 2: IRR: 0.83 [0.77, 0.88]). Similar trends exist in the associations between essential worker categories and deaths, though attenuated. Estimating industry-specific risk for essential workers is important in targeting interventions for COVID-19 and other diseases and our categories provide a reproducible and straightforward way to support such efforts.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , SARS-CoV-2 , Ocupaciones , Industrias , Massachusetts/epidemiología
5.
Bioinformatics ; 37(11): 1521-1527, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-33245114

RESUMEN

MOTIVATION: Genomic data are often produced in batches due to practical restrictions, which may lead to unwanted variation in data caused by discrepancies across batches. Such 'batch effects' often have negative impact on downstream biological analysis and need careful consideration. In practice, batch effects are usually addressed by specifically designed software, which merge the data from different batches, then estimate batch effects and remove them from the data. Here, we focus on classification and prediction problems, and propose a different strategy based on ensemble learning. We first develop prediction models within each batch, then integrate them through ensemble weighting methods. RESULTS: We provide a systematic comparison between these two strategies using studies targeting diverse populations infected with tuberculosis. In one study, we simulated increasing levels of heterogeneity across random subsets of the study, which we treat as simulated batches. We then use the two methods to develop a genomic classifier for the binary indicator of disease status. We evaluate the accuracy of prediction in another independent study targeting a different population cohort. We observed that in independent validation, while merging followed by batch adjustment provides better discrimination at low level of heterogeneity, our ensemble learning strategy achieves more robust performance, especially at high severity of batch effects. These observations provide practical guidelines for handling batch effects in the development and evaluation of genomic classifiers. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in the article and in its online supplementary material. Processed data is available in the Github repository with implementation code, at https://github.com/zhangyuqing/bea_ensemble. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Genómica , Humanos , Aprendizaje Automático
6.
BMC Infect Dis ; 21(1): 686, 2021 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-34271870

RESUMEN

BACKGROUND: Associations between community-level risk factors and COVID-19 incidence have been used to identify vulnerable subpopulations and target interventions, but the variability of these associations over time remains largely unknown. We evaluated variability in the associations between community-level predictors and COVID-19 case incidence in 351 cities and towns in Massachusetts from March to October 2020. METHODS: Using publicly available sociodemographic, occupational, environmental, and mobility datasets, we developed mixed-effect, adjusted Poisson regression models to depict associations between these variables and town-level COVID-19 case incidence data across five distinct time periods from March to October 2020. We examined town-level demographic variables, including population proportions by race, ethnicity, and age, as well as factors related to occupation, housing density, economic vulnerability, air pollution (PM2.5), and institutional facilities. We calculated incidence rate ratios (IRR) associated with these predictors and compared these values across the multiple time periods to assess variability in the observed associations over time. RESULTS: Associations between key predictor variables and town-level incidence varied across the five time periods. We observed reductions over time in the association with percentage of Black residents (IRR = 1.12 [95%CI: 1.12-1.13]) in early spring, IRR = 1.01 [95%CI: 1.00-1.01] in early fall) and COVID-19 incidence. The association with number of long-term care facility beds per capita also decreased over time (IRR = 1.28 [95%CI: 1.26-1.31] in spring, IRR = 1.07 [95%CI: 1.05-1.09] in fall). Controlling for other factors, towns with higher percentages of essential workers experienced elevated incidences of COVID-19 throughout the pandemic (e.g., IRR = 1.30 [95%CI: 1.27-1.33] in spring, IRR = 1.20 [95%CI: 1.17-1.22] in fall). Towns with higher proportions of Latinx residents also had sustained elevated incidence over time (IRR = 1.19 [95%CI: 1.18-1.21] in spring, IRR = 1.14 [95%CI: 1.13-1.15] in fall). CONCLUSIONS: Town-level COVID-19 risk factors varied with time in this study. In Massachusetts, racial (but not ethnic) disparities in COVID-19 incidence may have decreased across the first 8 months of the pandemic, perhaps indicating greater success in risk mitigation in selected communities. Our approach can be used to evaluate effectiveness of public health interventions and target specific mitigation efforts on the community level.


Asunto(s)
COVID-19/epidemiología , Ocupaciones/estadística & datos numéricos , Medio Social , Transportes/estadística & datos numéricos , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/etnología , Etnicidad/estadística & datos numéricos , Femenino , Disparidades en el Estado de Salud , Humanos , Incidencia , Renta/estadística & datos numéricos , Masculino , Massachusetts/epidemiología , Persona de Mediana Edad , Movimiento/fisiología , Pandemias , Características de la Residencia/estadística & datos numéricos , Factores de Riesgo , SARS-CoV-2/fisiología , Factores Socioeconómicos , Factores de Tiempo , Poblaciones Vulnerables/etnología , Poblaciones Vulnerables/estadística & datos numéricos , Adulto Joven
7.
Surg Endosc ; 35(1): 182-191, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-31953733

RESUMEN

BACKGROUND: Postoperative gastrointestinal leak and venous thromboembolism (VTE) are devastating complications of bariatric surgery. The performance of currently available predictive models for these complications remains wanting, while machine learning has shown promise to improve on traditional modeling approaches. The purpose of this study was to compare the ability of two machine learning strategies, artificial neural networks (ANNs), and gradient boosting machines (XGBs) to conventional models using logistic regression (LR) in predicting leak and VTE after bariatric surgery. METHODS: ANN, XGB, and LR prediction models for leak and VTE among adults undergoing initial elective weight loss surgery were trained and validated using preoperative data from 2015 to 2017 from Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program database. Data were randomly split into training, validation, and testing populations. Model performance was measured by the area under the receiver operating characteristic curve (AUC) on the testing data for each model. RESULTS: The study cohort contained 436,807 patients. The incidences of leak and VTE were 0.70% and 0.46%. ANN (AUC 0.75, 95% CI 0.73-0.78) was the best-performing model for predicting leak, followed by XGB (AUC 0.70, 95% CI 0.68-0.72) and then LR (AUC 0.63, 95% CI 0.61-0.65, p < 0.001 for all comparisons). In detecting VTE, ANN, and XGB, LR achieved similar AUCs of 0.65 (95% CI 0.63-0.68), 0.67 (95% CI 0.64-0.70), and 0.64 (95% CI 0.61-0.66), respectively; the performance difference between XGB and LR was statistically significant (p = 0.001). CONCLUSIONS: ANN and XGB outperformed traditional LR in predicting leak. These results suggest that ML has the potential to improve risk stratification for bariatric surgery, especially as techniques to extract more granular data from medical records improve. Further studies investigating the merits of machine learning to improve patient selection and risk management in bariatric surgery are warranted.


Asunto(s)
Fuga Anastomótica/etiología , Cirugía Bariátrica/efectos adversos , Aprendizaje Automático , Complicaciones Posoperatorias/etiología , Tromboembolia Venosa/etiología , Adulto , Estudios de Cohortes , Bases de Datos Factuales , Diagnóstico por Computador , Humanos , Modelos Logísticos , Redes Neurales de la Computación
8.
Proc Natl Acad Sci U S A ; 115(11): 2578-2583, 2018 03 13.
Artículo en Inglés | MEDLINE | ID: mdl-29531060

RESUMEN

This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.


Asunto(s)
Investigación Biomédica/métodos , Investigación Biomédica/normas , Aprendizaje Automático , Simulación por Computador , Bases de Datos Factuales , Femenino , Humanos , Neoplasias Ováricas/diagnóstico , Neoplasias Ováricas/mortalidad , Pronóstico , Reproducibilidad de los Resultados
9.
Bioinformatics ; 31(14): 2318-23, 2015 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-25788628

RESUMEN

MOTIVATION: Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction. RESULTS: We demonstrate that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments. As an alternative, we examine the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms. AVAILABILITY AND IMPLEMENTATION: The code, data and instructions necessary to reproduce our entire analysis is available at https://github.com/prpatil/testsetbias.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Persona de Mediana Edad , Clasificación del Tumor , Estadificación de Neoplasias , Pronóstico , Receptor ErbB-2/metabolismo , Receptores de Estrógenos/metabolismo , Reproducibilidad de los Resultados , Tasa de Supervivencia
10.
Eur J Nutr ; 54(6): 863-80, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26154777

RESUMEN

BACKGROUND: Type 2 diabetes (T2D), one of the major common human health problems, is growing at an alarming rate around the globe. Alpha-glucosidase and dipeptidyl peptidase IV (DPP-IV) enzymes play a significant role in development of T2D. Hence, reduction or inhibition of their activity can be one of the important strategies in management of T2D. Studies in the field of bioactive peptides have shown that dietary proteins could be natural source of alpha-glucosidase and DPP-IV inhibitory peptides. PURPOSE: The purpose of this review is to provide an overview of food protein-derived peptides as potential inhibitors of alpha-glucosidase and DPP-IV with major focus on milk proteins. METHODS: Efforts have been made to review the available information in literature on the relationship between food protein-derived peptides and T2D. This review summarizes the current data on alpha-glucosidase and dipeptidyl peptidase IV inhibitory bioactive peptides derived from proteins and examines the potential value of these peptides in the treatment and prevention of T2D. In addition, the proposed modes of inhibition of peptide inhibitors are also discussed. RESULTS: Studies revealed that milk and other food proteins-derived bioactive peptides play a vital role in controlling T2D through several mechanisms, such as the satiety response, regulation of incretin hormones, insulinemia levels, and reducing the activity of carbohydrate degrading digestive enzymes. CONCLUSIONS: The bioactive peptides could be used in prevention and management of T2D through functional foods or nutraceutical supplements. Further clinical trials are necessary to validate the findings of in vitro studies and to confirm the efficiency of these peptides for applications.


Asunto(s)
Diabetes Mellitus Tipo 2/terapia , Proteínas en la Dieta/química , Inhibidores de la Dipeptidil-Peptidasa IV/uso terapéutico , Inhibidores de Glicósido Hidrolasas/uso terapéutico , Péptidos/uso terapéutico , Secuencia de Aminoácidos , Animales , Glucemia/análisis , Metabolismo de los Hidratos de Carbono , Productos Lácteos Cultivados , Diabetes Mellitus Tipo 2/enzimología , Diabetes Mellitus Tipo 2/prevención & control , Proteínas en la Dieta/administración & dosificación , Digestión , Inhibidores de la Dipeptidil-Peptidasa IV/administración & dosificación , Inhibidores de Glicósido Hidrolasas/administración & dosificación , Humanos , Incretinas , Proteínas de la Leche/administración & dosificación , Proteínas de la Leche/química , Datos de Secuencia Molecular , Péptidos/administración & dosificación , Péptidos/química , Periodo Posprandial , Saciedad/efectos de los fármacos , alfa-Glucosidasas/metabolismo
11.
Circulation ; 127(4): 517-26, 2013 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-23261867

RESUMEN

BACKGROUND: Pharmacogenetics in warfarin clinical trials have failed to show a significant benefit in comparison with standard clinical therapy. This study demonstrates a computational framework to systematically evaluate preclinical trial design of target population, pharmacogenetic algorithms, and dosing protocols to optimize primary outcomes. METHODS AND RESULTS: We programmatically created an end-to-end framework that systematically evaluates warfarin clinical trial designs. The framework includes options to create a patient population, multiple dosing strategies including genetic-based and nongenetic clinical-based, multiple-dose adjustment protocols, pharmacokinetic/pharmacodynamics modeling and international normalization ratio prediction, and various types of outcome measures. We validated the framework by conducting 1000 simulations of the applying pharmacogenetic algorithms to individualize dosing of warfarin (CoumaGen) clinical trial primary end points. The simulation predicted a mean time in therapeutic range of 70.6% and 72.2% (P=0.47) in the standard and pharmacogenetic arms, respectively. Then, we evaluated another dosing protocol under the same original conditions and found a significant difference in the time in therapeutic range between the pharmacogenetic and standard arm (78.8% versus 73.8%; P=0.0065), respectively. CONCLUSIONS: We demonstrate that this simulation framework is useful in the preclinical assessment phase to study and evaluate design options and provide evidence to optimize the clinical trial for patient efficacy and reduced risk.


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Farmacogenética/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Teoría de Sistemas , Trombosis/tratamiento farmacológico , Warfarina/uso terapéutico , Animales , Anticoagulantes/uso terapéutico , Simulación por Computador , Humanos , Modelos Teóricos , Trombosis/genética
12.
J Cancer Policy ; 39: 100460, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38061493

RESUMEN

In India the cancer burden for 2021 was 26.7 million disability-adjusted life years (DALYs), and this is expected to increase to 29.8 million in 2025 (Kulothungan et al., 2022). According to the World Health Organisation (WHO), cancer is a leading cause of death worldwide, accounting for one in six deaths. As per WHO, palliative care is a strategy that assists both adults and children along with their families in dealing with life-threatening illnesses. Currently, only 14% of those in need of pain and palliative (P&P) care receive it globally (WHO, 2020). Financial toxicity (FT) is the term used to describe the negative effects that an excessive financial burden resulting from cancer have on patients, their families, and society (Desai and Gyawali, 2020). Addressing this gap will require significant adjustments to both demand- and supply-side policies to ensure accessible and equitable cancer care in India (Caduff et al., 2019). Measuring FT along with health-related quality of life (HRQoL) represents a clinically relevant and patient-centred approach (de Souza et al., 2017). AIM AND OBJECTIVE: To estimate FT and its association with quality of life (QoL). MATERIALS AND METHODS: This was an observational descriptive study conducted among cancer patients recommended for P&P care. Scores were estimated from September 2022 to February 2023 using official tools: the Functional Assessment for Chronic illness Treatment Compressive Score for Financial Toxicity (FACIT-COST) and the European Organisation for Research and Treatment of Cancer (EORTC) Quality of life Questionnaires for Cancer (QLQ30). RESULTS: From 150 patients (70 males and 80 females, mean age 54.96 ± 13.5 years), 92.6% suffered from FT. Eleven patients (7.3%) were under FT grade 0, 41 (27.3%) were FT grade 1, 98 (65.3%) were FT grade 2, and no patients were under FT grade 3. At criterial alpha 0.05 (95%CI), FT and the global score for HRQoL showed an association. Among inpatient department (IPD) expenses, medication bills contributed the greatest expense at 33%, and among outpatient department (OPD) expenses treatment expenses contributed 50% of the total. Breast cancer (30 cases, 20%) and oral cancer (26 cases, 17.3%) were the most frequent cancers. CONCLUSION: FT measured using the COST tool showed an association with HRQoL. POLICY SUMMARY: This paper refers to the insurance policies available for cancer patients irrespective of P&P care treatment.


Asunto(s)
Neoplasias de la Mama , Calidad de Vida , Masculino , Adulto , Niño , Femenino , Humanos , Persona de Mediana Edad , Anciano , Estrés Financiero , Cuidados Paliativos/métodos , Manejo del Dolor , Dolor
13.
Health Justice ; 12(1): 11, 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38472497

RESUMEN

BACKGROUND: Currently, there are more than two million people in prisons or jails, with nearly two-thirds meeting the criteria for a substance use disorder. Following these patterns, overdose is the leading cause of death following release from prison and the third leading cause of death during periods of incarceration in jails. Traditional quantitative methods analyzing the factors associated with overdose following incarceration may fail to capture structural and environmental factors present in specific communities. People with lived experiences in the criminal legal system and with substance use disorder hold unique perspectives and must be involved in the research process. OBJECTIVE: To identify perceived factors that impact overdose following release from incarceration among people with direct criminal legal involvement and experience with substance use. METHODS: Within a community-engaged approach to research, we used concept mapping to center the perspectives of people with personal experience with the carceral system. The following prompt guided our study: "What do you think are some of the main things that make people who have been in jail or prison more and less likely to overdose?" Individuals participated in three rounds of focus groups, which included brainstorming, sorting and rating, and community interpretation. We used the Concept Systems Inc. platform groupwisdom for our analyses and constructed cluster maps. RESULTS: Eight individuals (ages 33 to 53) from four states participated. The brainstorming process resulted in 83 unique factors that impact overdose. The concept mapping process resulted in five clusters: (1) Community-Based Prevention, (2) Drug Use and Incarceration, (3) Resources for Treatment for Substance Use, (4) Carceral Factors, and (5) Stigma and Structural Barriers. CONCLUSIONS: Our study provides critical insight into community-identified factors associated with overdose following incarceration. These factors should be accounted for during resource planning and decision-making.

14.
Ann Epidemiol ; 94: 81-90, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38710239

RESUMEN

PURPOSE: Identifying predictors of opioid overdose following release from prison is critical for opioid overdose prevention. METHODS: We leveraged an individually linked, state-wide database from 2015-2020 to predict the risk of opioid overdose within 90 days of release from Massachusetts state prisons. We developed two decision tree modeling schemes: a model fit on all individuals with a single weight for those that experienced an opioid overdose and models stratified by race/ethnicity. We compared the performance of each model using several performance measures and identified factors that were most predictive of opioid overdose within racial/ethnic groups and across models. RESULTS: We found that out of 44,246 prison releases in Massachusetts between 2015-2020, 2237 (5.1%) resulted in opioid overdose in the 90 days following release. The performance of the two predictive models varied. The single weight model had high sensitivity (79%) and low specificity (56%) for predicting opioid overdose and was more sensitive for White non-Hispanic individuals (sensitivity = 84%) than for racial/ethnic minority individuals. CONCLUSIONS: Stratified models had better balanced performance metrics for both White non-Hispanic and racial/ethnic minority groups and identified different predictors of overdose between racial/ethnic groups. Across racial/ethnic groups and models, involuntary commitment (involuntary treatment for alcohol/substance use disorder) was an important predictor of opioid overdose.


Asunto(s)
Árboles de Decisión , Sobredosis de Opiáceos , Humanos , Masculino , Sobredosis de Opiáceos/epidemiología , Adulto , Femenino , Massachusetts/epidemiología , Trastornos Relacionados con Opioides/epidemiología , Trastornos Relacionados con Opioides/etnología , Prisioneros/estadística & datos numéricos , Prisiones/estadística & datos numéricos , Persona de Mediana Edad , Analgésicos Opioides/envenenamiento , Analgésicos Opioides/efectos adversos , Etnicidad/estadística & datos numéricos , Adulto Joven
15.
bioRxiv ; 2023 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-36711818

RESUMEN

Rationale: Many blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease, predict risk of progression from infection to disease, and monitor TB treatment outcomes. However, an unresolved issue is whether gene set enrichment analysis (GSEA) of the signature transcripts alone is sufficient for prediction and differentiation, or whether it is necessary to use the original statistical model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data, missing details about the original trained model, and inadequate publicly-available software tools or source code implementing models. To facilitate these signatures' replicability and appropriate utilization in TB research, comprehensive comparisons between gene set scoring methods with cross-data validation of original model implementations are needed. Objectives: We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both re-rebuilt original models and gene set scoring methods to evaluate whether gene set scoring is a reasonable proxy to the performance of the original trained model. We have provided an open-access software implementation of the original models for all 19 signatures for future use. Methods: We considered existing gene set scoring and machine learning methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, as alternative approaches to profile gene signature performance. The sample-size-weighted mean area under the curve (AUC) value was computed to measure each signature's performance across datasets. Correlation analysis and Wilcoxon paired tests were used to analyze the performance of enrichment methods with the original models. Measurement and Main Results: For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original diagnostic models. PLAGE outperformed all other gene scoring methods. In some cases, PLAGE outperformed the original models when considering signatures' weighted mean AUC values and the AUC results within individual studies. Conclusion: Gene set enrichment scoring of existing blood-based biomarker gene sets can distinguish patients with active TB disease from latent TB infection and other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.

16.
Clin Kidney J ; 16(1): 90-99, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36726432

RESUMEN

Background: Protein biomarkers may provide insight into kidney disease pathology but their use for the identification of phenotypically distinct kidney diseases has not been evaluated. Methods: We used unsupervised hierarchical clustering on 225 plasma biomarkers in 541 individuals enrolled into the Boston Kidney Biopsy Cohort, a prospective cohort study of individuals undergoing kidney biopsy with adjudicated histopathology. Using principal component analysis, we studied biomarker levels by cluster and examined differences in clinicopathologic diagnoses and histopathologic lesions across clusters. Cox proportional hazards models tested associations of clusters with kidney failure and death. Results: We identified three biomarker-derived clusters. The mean estimated glomerular filtration rate was 72.9 ± 28.7, 72.9 ± 33.4 and 39.9 ± 30.4 mL/min/1.73 m2 in Clusters 1, 2 and 3, respectively. The top-contributing biomarker in Cluster 1 was AXIN, a negative regulator of the Wnt signaling pathway. The top-contributing biomarker in Clusters 2 and 3 was Placental Growth Factor, a member of the vascular endothelial growth factor family. Compared with Cluster 1, individuals in Cluster 3 were more likely to have tubulointerstitial disease (P < .001) and diabetic kidney disease (P < .001) and had more severe mesangial expansion [odds ratio (OR) 2.44, 95% confidence interval (CI) 1.29, 4.64] and inflammation in the fibrosed interstitium (OR 2.49 95% CI 1.02, 6.10). After multivariable adjustment, Cluster 3 was associated with higher risks of kidney failure (hazard ratio 3.29, 95% CI 1.37, 7.90) compared with Cluster 1. Conclusion: Plasma biomarkers may identify clusters of individuals with kidney disease that associate with different clinicopathologic diagnoses, histopathologic lesions and adverse outcomes, and may uncover biomarker candidates and relevant pathways for further study.

17.
Ann Epidemiol ; 80: 62-68.e3, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36822278

RESUMEN

PURPOSE: When studying health risks across a large geographic region such as a state or province, researchers often assume that finer-resolution data on health outcomes and risk factors will improve inferences by avoiding ecological bias and other issues associated with geographic aggregation. However, coarser-resolution data (e.g., at the town or county-level) are more commonly publicly available and packaged for easier access, allowing for rapid analyses. The advantages and limitations of using finer-resolution data, which may improve precision at the cost of time spent gaining access and processing data, have not been considered in detail to date. METHODS: We systematically examine the implications of conducting town-level mixed-effect regression analyses versus census-tract-level analyses to study sociodemographic predictors of COVID-19 in Massachusetts. In a series of negative binomial regressions, we vary the spatial resolution of the outcome, the resolution of variable selection, and the resolution of the random effect to allow for more direct comparison across models. RESULTS: We find stability in some estimates across scenarios, changes in magnitude, direction, and significance in others, and tighter confidence intervals on the census-tract level. Conclusions regarding sociodemographic predictors are robust when regions of high concentration remain consistent across town and census-tract resolutions. CONCLUSIONS: Inferences about high-risk populations may be misleading if derived from town- or county-resolution data, especially for covariates that capture small subgroups (e.g., small racial minority populations) or are geographically concentrated or skewed (e.g., % college students). Our analysis can help inform more rapid and efficient use of public health data by identifying when finer-resolution data are truly most informative, or when coarser-resolution data may be misleading.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , Massachusetts/epidemiología , Factores de Riesgo , Estudiantes , Análisis de Regresión
18.
J Racial Ethn Health Disparities ; 10(4): 2071-2080, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-36056195

RESUMEN

Infectious disease surveillance frequently lacks complete information on race and ethnicity, making it difficult to identify health inequities. Greater awareness of this issue has occurred due to the COVID-19 pandemic, during which inequities in cases, hospitalizations, and deaths were reported but with evidence of substantial missing demographic details. Although the problem of missing race and ethnicity data in COVID-19 cases has been well documented, neither its spatiotemporal variation nor its particular drivers have been characterized. Using individual-level data on confirmed COVID-19 cases in Massachusetts from March 2020 to February 2021, we show how missing race and ethnicity data: (1) varied over time, appearing to increase sharply during two different periods of rapid case growth; (2) differed substantially between towns, indicating a nonrandom distribution; and (3) was associated significantly with several individual- and town-level characteristics in a mixed-effects regression model, suggesting a combination of personal and infrastructural drivers of missing data that persisted despite state and federal data-collection mandates. We discuss how a variety of factors may contribute to persistent missing data but could potentially be mitigated in future contexts.


Asunto(s)
COVID-19 , Etnicidad , Humanos , Pandemias , Grupos Raciales , Massachusetts/epidemiología
19.
PLoS Comput Biol ; 7(8): e1002147, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21901085

RESUMEN

In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Internet , Programas Informáticos , Biología Computacional , Seguridad Computacional , Almacenamiento y Recuperación de la Información/economía
20.
Ann Appl Stat ; 16(4): 2145-2165, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36274786

RESUMEN

We propose the "study strap ensemble", which combines advantages of two common approaches to fitting prediction models when multiple training datasets ("studies") are available: pooling studies and fitting one model versus averaging predictions from multiple models each fit to individual studies. The study strap ensemble fits models to bootstrapped datasets, or "pseudo-studies." These are generated by resampling from multiple studies with a hierarchical resampling scheme that generalizes the randomized cluster bootstrap. The study strap is controlled by a tuning parameter that determines the proportion of observations to draw from each study. When the parameter is set to its lowest value, each pseudo-study is resampled from only a single study. When it is high, the study strap ignores the multi-study structure and generates pseudo-studies by merging the datasets and drawing observations like a standard bootstrap. We empirically show the optimal tuning value often lies in between, and prove that special cases of the study strap draw the merged dataset and the set of original studies as pseudo-studies. We extend the study strap approach with an ensemble weighting scheme that utilizes information in the distribution of the covariates of the test dataset. Our work is motivated by neuroscience experiments using real-time neurochemical sensing during awake behavior in humans. Current techniques to perform this kind of research require measurements from an electrode placed in the brain during awake neurosurgery and rely on prediction models to estimate neurotransmitter concentrations from the electrical measurements recorded by the electrode. These models are trained by combining multiple datasets that are collected in vitro under heterogeneous conditions in order to promote accuracy of the models when applied to data collected in the brain. A prevailing challenge is deciding how to combine studies or ensemble models trained on different studies to enhance model generalizability. Our methods produce marked improvements in simulations and in this application. All methods are available in the studyStrap CRAN package.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA