Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Cancer ; 130(7): 1101-1111, 2024 04 01.
Article in English | MEDLINE | ID: mdl-38100619

ABSTRACT

BACKGROUND: Many parents of children with advanced cancer report curative goals and continue intensive therapies that can compound symptoms and suffering. Factors that influence parents to choose palliation as the primary treatment goal are not well understood. The objective of this study was to examine experiences impacting parents' report of palliative goals adjusted for time. The authors hypothesized that awareness of poor prognosis, recall of oncologists' prognostic disclosure, intensive treatments, and burdensome symptoms and suffering would influence palliative goal-setting. METHODS: The authors collected prospective, longitudinal surveys from parents of children with relapsed/refractory neuroblastoma at nine pediatric cancer centers across the United States, beginning at relapse and continuing every 3 months for 18 months or until death. Hypothesized covariates were examined for possible associations with parental report of palliative goals. Generalized linear mixed models were used to evaluate factors associated with parents' report of palliative goals at different time points. RESULTS: A total of 96 parents completed surveys. Parents were more likely to report a primary goal of palliation when they recalled communication about prognosis by their child's oncologist (odds ratio [OR], 52.48; p = .010). Treatment intensity and previous ineffective therapeutic regimens were not associated with parents' report of palliative goals adjusted for time. A parent who reported new suffering for their child was less likely to report palliative goals (OR, 0.13; p = .008). CONCLUSIONS: Parents of children with poor prognosis cancer may not report palliative goals spontaneously in the setting of treatment-related suffering. Prognostic communication, however, does influence palliative goal-setting. Evidence-based interventions are needed to encourage timely, person-centered prognostic disclosure in the setting of advanced pediatric cancer. PLAIN LANGUAGE SUMMARY: Many parents of children with poor-prognosis cancer continue to pursue curative treatments that may worsen symptoms and suffering. Little is known about which factors influence parents to choose palliative care as their child's main treatment goal. To explore this question, we asked parents of children with advanced neuroblastoma across the United States to complete multiple surveys over time. We found that the intensity of treatment, number of treatments, and suffering from treatment did not influence parents to choose palliative goals. However, when parents remembered their child's oncologist talking about prognosis, they were more likely to choose palliative goals of care.


Subject(s)
Neuroblastoma , Palliative Care , Child , Humans , Goals , Prospective Studies , Neoplasm Recurrence, Local/therapy , Neuroblastoma/therapy , Parents , Surveys and Questionnaires , Longitudinal Studies
2.
Stat Med ; 42(17): 3032-3049, 2023 07 30.
Article in English | MEDLINE | ID: mdl-37158137

ABSTRACT

Longitudinal outcomes are prevalent in clinical studies, where the presence of missing data may make the statistical learning of individualized treatment rules (ITRs) a much more challenging task. We analyzed a longitudinal calcium supplementation trial in the ELEMENT Project and established a novel ITR to reduce the risk of adverse outcomes of lead exposure on child growth and development. Lead exposure, particularly in the form of in utero exposure, can seriously impair children's health, especially their cognitive and neurobehavioral development, which necessitates clinical interventions such as calcium supplementation intake during pregnancy. Using the longitudinal outcomes from a randomized clinical trial of calcium supplementation, we developed a new ITR for daily calcium intake during pregnancy to mitigate persistent lead exposure in children at age 3 years. To overcome the technical challenges posed by missing data, we illustrate a new learning approach, termed longitudinal self-learning (LS-learning), that utilizes longitudinal measurements of child's blood lead concentration in the derivation of ITR. Our LS-learning method relies on a temporally weighted self-learning paradigm to synergize serially correlated training data sources. The resulting ITR is the first of this kind in precision nutrition that will contribute to the reduction of expected blood lead concentration in children aged 0-3 years should this ITR be implemented to the entire study population of pregnant women.


Subject(s)
Calcium , Lead , Child , Humans , Pregnancy , Female , Child, Preschool , Learning , Dietary Supplements , Nutrients
3.
J Biopharm Stat ; 32(1): 90-106, 2022 01 02.
Article in English | MEDLINE | ID: mdl-34632951

ABSTRACT

In current clinical trial development, historical information is receiving more attention as it provides utility beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAP priors have been proposed for prospectively borrowing historical data on a single endpoint. To simultaneously synthesize control information from multiple endpoints in confirmatory clinical trials, we propose to approximate posterior probabilities from a Bayesian hierarchical model and estimate critical values by deep learning to construct pre-specified strategies for hypothesis testing. This feature is important to ensure study integrity by establishing prospective decision functions before the trial conduct. Simulations are performed to show that our method properly controls family-wise error rate and preserves power as compared with a typical practice of choosing constant critical values given a subset of null space. Satisfactory performance under prior-data conflict is also demonstrated. We further illustrate our method using a case study in Immunology.


Subject(s)
Research Design , Bayes Theorem , Computer Simulation , Humans , Probability , Sample Size
4.
PLoS Comput Biol ; 16(4): e1007768, 2020 04.
Article in English | MEDLINE | ID: mdl-32302299

ABSTRACT

Mediation analysis with high-dimensional DNA methylation markers is important in identifying epigenetic pathways between environmental exposures and health outcomes. There have been some methodology developments of mediation analysis with high-dimensional mediators. However, high-dimensional mediation analysis methods for time-to-event outcome data are still yet to be developed. To address these challenges, we propose a new high-dimensional mediation analysis procedure for survival models by incorporating sure independent screening and minimax concave penalty techniques for variable selection, with the Sobel and the joint method for significance test of indirect effect. The simulation studies show good performance in identifying correct biomarkers, false discovery rate control, and minimum estimation bias of the proposed procedure. We also apply this approach to study the causal pathway from smoking to overall survival among lung cancer patients potentially mediated by 365,307 DNA methylations in the TCGA lung cancer cohort. Mediation analysis using a Cox proportional hazards model estimates that patients who have serious smoking history increase the risk of lung cancer through methylation markers including cg21926276, cg27042065, and cg26387355 with significant hazard ratios of 1.2497(95%CI: 1.1121, 1.4045), 1.0920(95%CI: 1.0170, 1.1726), and 1.1489(95%CI: 1.0518, 1.2550), respectively. The three methylation sites locate in the three genes which have been showed to be associated with lung cancer event or overall survival. However, the three CpG sites (cg21926276, cg27042065 and cg26387355) have not been reported, which are newly identified as the potential novel epigenetic markers linking smoking and survival of lung cancer patients. Collectively, the proposed high-dimensional mediation analysis procedure has good performance in mediator selection and indirect effect estimation.


Subject(s)
Computational Biology/methods , Models, Statistical , Survival Analysis , Adult , Aged , Aged, 80 and over , DNA Methylation/genetics , Epigenomics , Humans , Lung Neoplasms/genetics , Lung Neoplasms/mortality , Middle Aged , Smoking/genetics , Smoking/mortality
5.
Biometrics ; 77(4): 1254-1264, 2021 12.
Article in English | MEDLINE | ID: mdl-32918486

ABSTRACT

One central task in precision medicine is to establish individualized treatment rules (ITRs) for patients with heterogeneous responses to different therapies. Motivated from a randomized clinical trial for Type 2 diabetic patients on a comparison of two drugs, that is, pioglitazone and gliclazide, we consider a problem: utilizing promising candidate biomarkers to improve an existing ITR. This calls for a biomarker evaluation procedure that enables to gauge added values of individual biomarkers. We propose an assessment analytic, termed as net benefit index (NBI), that quantifies a contrast between the resulting gain and loss of treatment benefits when a biomarker enters ITR to reallocate patients in treatments. We optimize reallocation schemes via outcome weighted learning (OWL), from which the optimal treatment group labels are generated by weighted support vector machine (SVM). To account for sampling uncertainty in assessing a biomarker, we propose an NBI-based test for a significant improvement over the existing ITR, where the empirical null distribution is constructed via the method of stratified permutation by treatment arms. Applying NBI to the motivating diabetes trial, we found that baseline fasting insulin is an important biomarker that leads to an improvement over an existing ITR based only on patient's baseline fasting plasma glucose (FPG), age, and body mass index (BMI) to reduce FPG over a period of 52 weeks.


Subject(s)
Diabetes Mellitus, Type 2 , Precision Medicine , Biomarkers , Diabetes Mellitus, Type 2/drug therapy , Humans , Hypoglycemic Agents/therapeutic use , Learning , Machine Learning , Precision Medicine/methods , Research Design
6.
Int Stat Rev ; 88(2): 462-513, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32834402

ABSTRACT

Multi-compartment models have been playing a central role in modelling infectious disease dynamics since the early 20th century. They are a class of mathematical models widely used for describing the mechanism of an evolving epidemic. Integrated with certain sampling schemes, such mechanistic models can be applied to analyse public health surveillance data, such as assessing the effectiveness of preventive measures (e.g. social distancing and quarantine) and forecasting disease spread patterns. This review begins with a nationwide macromechanistic model and related statistical analyses, including model specification, estimation, inference and prediction. Then, it presents a community-level micromodel that enables high-resolution analyses of regional surveillance data to provide current and future risk information useful for local government and residents to make decisions on reopenings of local business and personal travels. r software and scripts are provided whenever appropriate to illustrate the numerical detail of algorithms and calculations. The coronavirus disease 2019 pandemic surveillance data from the state of Michigan are used for the illustration throughout this paper.

7.
J Virol ; 89(17): 8855-70, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26085163

ABSTRACT

UNLABELLED: When expressed alone at high levels, the human adenovirus E4orf4 protein exhibits tumor cell-specific p53-independent toxicity. A major E4orf4 target is the B55 class of PP2A regulatory subunits, and we have shown recently that binding of E4orf4 inhibits PP2A(B55) phosphatase activity in a dose-dependent fashion by preventing access of substrates (M. Z. Mui et al., PLoS Pathog 9:e1003742, 2013, http://dx.doi.org/10.1371/journal.ppat.1003742). While interaction with B55 subunits is essential for toxicity, E4orf4 mutants exist that, despite binding B55 at high levels, are defective in cell killing, suggesting that other essential targets exist. In an attempt to identify additional targets, we undertook a proteomics approach to characterize E4orf4-interacting proteins. Our findings indicated that, in addition to PP2A(B55) subunits, ASPP-PP1 complex subunits were found among the major E4orf4-binding species. Both the PP2A and ASPP-PP1 phosphatases are known to positively regulate effectors of the Hippo signaling pathway, which controls the expression of cell growth/survival genes by dephosphorylating the YAP transcriptional coactivator. We find here that expression of E4orf4 results in hyperphosphorylation of YAP, suggesting that Hippo signaling is affected by E4orf4 interactions with PP2A(B55) and/or ASPP-PP1 phosphatases. Furthermore, knockdown of YAP1 expression was seen to enhance E4orf4 killing, again consistent with a link between E4orf4 toxicity and inhibition of the Hippo pathway. This effect may in fact contribute to the cancer cell specificity of E4orf4 toxicity, as many human cancer cells rely heavily on the Hippo pathway for their enhanced proliferation. IMPORTANCE: The human adenovirus E4orf4 protein has been known for some time to induce tumor cell-specific death when expressed at high levels; thus, knowledge of its mode of action could be of importance for development of new cancer therapies. Although the B55 form of the phosphatase PP2A has long been known as an essential E4orf4 target, genetic analyses indicated that others must exist. To identify additional E4orf4 targets, we performed, for the first time, a large-scale affinity purification/mass spectrometry analysis of E4orf4 binding partners. Several additional candidates were detected, including key regulators of the Hippo signaling pathway, which enhances cell viability in many cancers, and results of preliminary studies suggested a link between inhibition of Hippo signaling and E4orf4 toxicity.


Subject(s)
Adaptor Proteins, Signal Transducing/antagonists & inhibitors , Adaptor Proteins, Signal Transducing/genetics , Apoptosis Regulatory Proteins/antagonists & inhibitors , Phosphoproteins/genetics , Protein Phosphatase 2/antagonists & inhibitors , Viral Proteins/genetics , Adaptor Proteins, Signal Transducing/metabolism , Apoptosis Regulatory Proteins/metabolism , Cell Death/genetics , Cell Line, Tumor , Cell Proliferation/genetics , Cell Survival/genetics , HEK293 Cells , Hippo Signaling Pathway , Humans , Phosphoproteins/metabolism , Protein Binding/genetics , Protein Binding/physiology , Protein Phosphatase 2/metabolism , Protein Serine-Threonine Kinases/metabolism , RNA Interference , RNA, Small Interfering , Signal Transduction , Transcription Factors , Viral Proteins/metabolism , YAP-Signaling Proteins
8.
J Natl Cancer Inst ; 116(5): 642-646, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38273668

ABSTRACT

Data commons have proven to be an indispensable avenue for advancing pediatric cancer research by serving as unified information technology platforms that, when coupled with data standards, facilitate data sharing. The Pediatric Cancer Data Commons, the flagship project of Data for the Common Good (D4CG), collaborates with disease-based consortia to facilitate development of clinical data standards, harmonization and pooling of clinical data from disparate sources, establishment of governance structure, and sharing of clinical data. In the interest of international collaboration, researchers developed the Hodgkin Lymphoma Data Collaboration and forged a relationship with the Pediatric Cancer Data Commons to establish a data commons for pediatric Hodgkin lymphoma. Herein, we describe the progress made in the formation of Hodgkin Lymphoma Data Collaboration and foundational goals to advance pediatric Hodgkin lymphoma research.


Subject(s)
Hodgkin Disease , Hodgkin Disease/therapy , Humans , Child , Information Dissemination , Biomedical Research/organization & administration , Databases, Factual
9.
Blood Adv ; 8(3): 686-698, 2024 02 13.
Article in English | MEDLINE | ID: mdl-37991991

ABSTRACT

ABSTRACT: Serial prognostic evaluation after allogeneic hematopoietic cell transplantation (allo-HCT) might help identify patients at high risk of lethal organ dysfunction. Current prediction algorithms based on models that do not incorporate changes to patients' clinical condition after allo-HCT have limited predictive ability. We developed and validated a robust risk-prediction algorithm to predict short- and long-term survival after allo-HCT in pediatric patients that includes baseline biological variables and changes in the patients' clinical status after allo-HCT. The model was developed using clinical data from children and young adults treated at a single academic quaternary-care referral center. The model was created using a randomly split training data set (70% of the cohort), internally validated (remaining 30% of the cohort) and then externally validated on patient data from another tertiary-care referral center. Repeated clinical measurements performed from 30 days before allo-HCT to 30 days afterwards were extracted from the electronic medical record and incorporated into the model to predict survival at 100 days, 1 year, and 2 years after allo-HCT. Naïve-Bayes machine learning models incorporating longitudinal data were significantly better than models constructed from baseline variables alone at predicting whether patients would be alive or deceased at the given time points. This proof-of-concept study demonstrates that unlike traditional prognostic tools that use fixed variables for risk assessment, incorporating dynamic variability using clinical and laboratory data improves the prediction of mortality in patients undergoing allo-HCT.


Subject(s)
Hematopoietic Stem Cell Transplantation , Young Adult , Humans , Child , Transplantation, Homologous/adverse effects , Bayes Theorem , Retrospective Studies , Prognosis , Hematopoietic Stem Cell Transplantation/adverse effects
10.
J Cell Biol ; 223(9)2024 Sep 02.
Article in English | MEDLINE | ID: mdl-38949658

ABSTRACT

Contact sites between lipid droplets and other organelles are essential for cellular lipid and energy homeostasis upon metabolic demands. Detection of these contact sites at the nanometer scale over time in living cells is challenging. We developed a tool kit for detecting contact sites based on fluorogen-activated bimolecular complementation at CONtact sites, FABCON, using a reversible, low-affinity split fluorescent protein, splitFAST. FABCON labels contact sites with minimal perturbation to organelle interaction. Via FABCON, we quantitatively demonstrated that endoplasmic reticulum (ER)- and mitochondria (mito)-lipid droplet contact sites are dynamic foci in distinct metabolic conditions, such as during lipid droplet biogenesis and consumption. An automated analysis pipeline further classified individual contact sites into distinct subgroups based on size, likely reflecting differential regulation and function. Moreover, FABCON is generalizable to visualize a repertoire of organelle contact sites including ER-mito. Altogether, FABCON reveals insights into the dynamic regulation of lipid droplet-organelle contact sites and generates new hypotheses for further mechanistical interrogation during metabolic regulation.


Subject(s)
Endoplasmic Reticulum , Lipid Droplets , Mitochondria , Lipid Droplets/metabolism , Humans , Endoplasmic Reticulum/metabolism , Mitochondria/metabolism , Mitochondria/genetics , Fluorescent Dyes/chemistry , Fluorescent Dyes/metabolism , Lipid Metabolism , HeLa Cells , HEK293 Cells , Luminescent Proteins/metabolism , Luminescent Proteins/genetics
11.
Cancer Chemother Pharmacol ; 92(1): 1-6, 2023 07.
Article in English | MEDLINE | ID: mdl-37199744

ABSTRACT

PURPOSE: The Stanford V chemotherapy regimen has been used to treat Hodgkin lymphoma (HL) patients since 2002 with excellent cure rates; however, mechlorethamine is no longer available. Bendamustine, a drug structurally similar to alkylating agents and nitrogen mustard, is being substituted for mechlorethamine in combination therapy in a frontline trial for low- and intermediate-risk pediatric HL patients, forming a new backbone of BEABOVP (bendamustine, etoposide, doxorubicin, bleomycin, vincristine, vinblastine, and prednisone). This study evaluated the pharmacokinetics and tolerability of a 180 mg/m2 dose of bendamustine every 28 days to determine factors that may explain this variability. METHODS: Bendamustine plasma concentrations were measured in 118 samples from 20 pediatric patients with low- and intermediate-risk HL who received a single-day dose of 180 mg/m2 of bendamustine. A pharmacokinetic model was fit to the data using nonlinear mixed-effects modeling. RESULTS: Bendamustine concentration vs time demonstrated a trend toward decreasing clearance with increasing age (p = 0.074) and age explained 23% of the inter-individual variability in clearance. The median (range) AUC was 12,415 (8,539, 18,642) µg hr/L and the median (range) maximum concentration was 11,708 (8034, 15,741) µg/L. Bendamustine was well tolerated with no grade 3 toxicities resulting in treatment delays of more than 7 days. CONCLUSIONS: A single-day dose of 180 mg/m2 of bendamustine every 28 days was safe and well tolerated in pediatric patients. While age accounted for 23% of inter-individual variability observed in bendamustine clearance, the differences did not affect the safety and tolerability of bendamustine in our patient population.


Subject(s)
Hodgkin Disease , Humans , Child , Hodgkin Disease/drug therapy , Bendamustine Hydrochloride , Mechlorethamine/adverse effects , Neoplasm Recurrence, Local/drug therapy , Doxorubicin , Antineoplastic Combined Chemotherapy Protocols
12.
Pediatr Obes ; 17(6): e12887, 2022 06.
Article in English | MEDLINE | ID: mdl-35023314

ABSTRACT

BACKGROUND: Alterations in body composition (BC) during adolescence relates to future metabolic risk, yet underlying mechanisms remain unclear. OBJECTIVES: To assess the association between the metabolome with changes in adiposity (body mass index [BMI], waist circumference [WC], triceps skinfold [TS], fat percentage [BF%]) and muscle mass (MM). METHODS: In Mexican adolescents (n = 352), untargeted serum metabolomics was profiled at baseline. and data were reduced by pairing hierarchical clustering with confirmatory factor analysis, yielding 30 clusters with 51 singleton metabolites. At the baseline and follow-up visits (1.6-3.5 years apart), anthropometry was collected to identify associations between baseline metabolite clusters and change in BC (∆) using seemingly unrelated and linear regression. RESULTS: Between visits, MM increased in boys and adiposity increased in girls. Sex differences were observed between metabolite clusters and changes in BC. In boys, aromatic amino acids (AAA), branched chain amino acids (BCAA) and fatty acid oxidation metabolites were associated with increases in ∆BMI, and ∆BF%. Phospholipids were associated with decreases in ∆TS and ∆MM. Negative associations were observed for ∆MM in boys with a cluster including AAA and BCAA, whereas positive associations were found for a cluster containing tryptophan metabolites. Few associations were observed between metabolites and BC change in girls, with one cluster comprising methionine, proline and lipids associated with decreases in ∆BMI, ∆WC and ∆MM. CONCLUSION: Sex-specific associations between the metabolome and change in BC were observed, highlighting metabolic pathways underlying adolescent physical growth.


Subject(s)
Adiposity , Obesity , Adolescent , Amino Acids, Branched-Chain , Body Mass Index , Female , Humans , Male , Metabolomics , Muscles , Waist Circumference
13.
PLoS One ; 15(8): e0228520, 2020.
Article in English | MEDLINE | ID: mdl-32857775

ABSTRACT

Health advances are contingent on continuous development of new methods and approaches to foster data-driven discovery in the biomedical and clinical sciences. Open-science and team-based scientific discovery offer hope for tackling some of the difficult challenges associated with managing, modeling, and interpreting of large, complex, and multisource data. Translating raw observations into useful information and actionable knowledge depends on effective domain-independent reproducibility, area-specific replicability, data curation, analysis protocols, organization, management and sharing of health-related digital objects. This study expands the functionality and utility of an ensemble semi-supervised machine learning technique called Compressive Big Data Analytics (CBDA). Applied to high-dimensional data, CBDA (1) identifies salient features and key biomarkers enabling reliable and reproducible forecasting of binary, multinomial and continuous outcomes (i.e., feature mining); and (2) suggests the most accurate algorithms/models for predictive analytics of the observed data (i.e., model mining). The method relies on iterative subsampling, combines function optimization and statistical inference, and generates ensemble predictions for observed univariate outcomes. The novelty of this study is highlighted by a new and expanded set of CBDA features including (1) efficiently handling extremely large datasets (>100,000 cases and >1,000 features); (2) generalizing the internal and external validation steps; (3) expanding the set of base-learners for joint ensemble prediction; (4) introducing an automated selection of CBDA specifications; and (5) providing mechanisms to assess CBDA convergence, evaluate the prediction accuracy, and measure result consistency. To ground the mathematical model and the corresponding computational algorithm, CBDA 2.0 validation utilizes synthetic datasets as well as a population-wide census-like study. Specifically, an empirical validation of the CBDA technique is based on a translational health research using a large-scale clinical study (UK Biobank), which includes imaging, cognitive, and clinical assessment data. The UK Biobank archive presents several difficult challenges related to the aggregation, harmonization, modeling, and interrogation of the information. These problems are related to the complex longitudinal structure, variable heterogeneity, feature multicollinearity, incongruency, and missingness, as well as violations of classical parametric assumptions. Our results show the scalability, efficiency, and usability of CBDA to interrogate complex data into structural information leading to derived knowledge and translational action. Applying CBDA 2.0 to the UK Biobank case-study allows predicting various outcomes of interest, e.g., mood disorders and irritability, and suggests new and exciting avenues of evidence-based research in the context of identifying, tracking, and treating mental health and aging-related diseases. Following open-science principles, we share the entire end-to-end protocol, source-code, and results. This facilitates independent validation, result reproducibility, and team-based collaborative discovery.


Subject(s)
Data Mining/methods , Data Science/methods , Algorithms , Big Data , Data Compression , Humans , Machine Learning , Meta-Analysis as Topic , Models, Theoretical , Physical Phenomena , Prognosis , Reproducibility of Results , Software
14.
Harv Data Sci Rev ; 2020(Suppl 1)2020.
Article in English | MEDLINE | ID: mdl-32607504

ABSTRACT

With only 536 cases and 11 fatalities, India took the historic decision of a 21-day national lockdown on March 25. The lockdown was first extended to May 3 soon after the analysis of this paper was completed, and then to May 18 while this paper was being revised. In this paper, we use a Bayesian extension of the Susceptible-Infected-Removed (eSIR) model designed for intervention forecasting to study the short- and long-term impact of an initial 21-day lockdown on the total number of COVID-19 infections in India compared to other less severe non-pharmaceutical interventions. We compare effects of hypothetical durations of lockdown on reducing the number of active and new infections. We find that the lockdown, if implemented correctly, can reduce the total number of cases in the short term, and buy India invaluable time to prepare its healthcare and disease-monitoring system. Our analysis shows we need to have some measures of suppression in place after the lockdown for increased benefit (as measured by reduction in the number of cases). A longer lockdown between 42-56 days is preferable to substantially "flatten the curve" when compared to 21-28 days of lockdown. Our models focus solely on projecting the number of COVID-19 infections and, thus, inform policymakers about one aspect of this multi-faceted decision-making problem. We conclude with a discussion on the pivotal role of increased testing, reliable and transparent data, proper uncertainty quantification, accurate interpretation of forecasting models, reproducible data science methods and tools that can enable data-driven policymaking during a pandemic. Our software products are available at covind19.org.

15.
Sci Rep ; 9(1): 6012, 2019 04 12.
Article in English | MEDLINE | ID: mdl-30979917

ABSTRACT

The UK Biobank is a rich national health resource that provides enormous opportunities for international researchers to examine, model, and analyze census-like multisource healthcare data. The archive presents several challenges related to aggregation and harmonization of complex data elements, feature heterogeneity and salience, and health analytics. Using 7,614 imaging, clinical, and phenotypic features of 9,914 subjects we performed deep computed phenotyping using unsupervised clustering and derived two distinct sub-cohorts. Using parametric and nonparametric tests, we determined the top 20 most salient features contributing to the cluster separation. Our approach generated decision rules to predict the presence and progression of depression or other mental illnesses by jointly representing and modeling the significant clinical and demographic variables along with the derived salient neuroimaging features. We reported consistency and reliability measures of the derived computed phenotypes and the top salient imaging biomarkers that contributed to the unsupervised clustering. This clinical decision support system identified and utilized holistically the most critical biomarkers for predicting mental health, e.g., depression. External validation of this technique on different populations may lead to reducing healthcare expenses and improving the processes of diagnosis, forecasting, and tracking of normal and pathological aging.


Subject(s)
Biological Specimen Banks , Data Science , Humans
16.
PLoS One ; 13(8): e0202674, 2018.
Article in English | MEDLINE | ID: mdl-30161148

ABSTRACT

The theoretical foundations of Big Data Science are not fully developed, yet. This study proposes a new scalable framework for Big Data representation, high-throughput analytics (variable selection and noise reduction), and model-free inference. Specifically, we explore the core principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) iteratively generates random (sub)samples from a big and complex dataset. This subsampling with replacement is conducted on the feature and case levels and results in samples that are not necessarily consistent or congruent across iterations. The approach relies on an ensemble predictor where established model-based or model-free inference techniques are iteratively applied to preprocessed and harmonized samples. Repeating the subsampling and prediction steps many times, yields derived likelihoods, probabilities, or parameter estimates, which can be used to assess the algorithm reliability and accuracy of findings via bootstrapping methods, or to extract important features via controlled variable selection. CBDA provides a scalable algorithm for addressing some of the challenges associated with handling complex, incongruent, incomplete and multi-source data and analytics challenges. Albeit not fully developed yet, a CBDA mathematical framework will enable the study of the ergodic properties and the asymptotics of the specific statistical inference approaches via CBDA. We implemented the high-throughput CBDA method using pure R as well as via the graphical pipeline environment. To validate the technique, we used several simulated datasets as well as a real neuroimaging-genetics of Alzheimer's disease case-study. The CBDA approach may be customized to provide generic representation of complex multimodal datasets and to provide stable scientific inference for large, incomplete, and multisource datasets.


Subject(s)
Algorithms , Alzheimer Disease/diagnosis , Alzheimer Disease/genetics , Alzheimer Disease/pathology , Data Mining , Databases, Factual , Humans , Neuroimaging
SELECTION OF CITATIONS
SEARCH DETAIL