Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 11 de 11
1.
Nat Commun ; 14(1): 3482, 2023 Jun 15.
Article En | MEDLINE | ID: mdl-37321988

Subseasonal forecasting-predicting temperature and precipitation 2 to 6 weeks ahead-is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remain poor, partly due to stubborn errors in representing atmospheric dynamics and physics inside dynamical models. Here, to counter these errors, we introduce an adaptive bias correction (ABC) method that combines state-of-the-art dynamical forecasts with observations using machine learning. We show that, when applied to the leading subseasonal model from the European Centre for Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting skill by 60-90% (over baseline skills of 0.18-0.25) and precipitation forecasting skill by 40-69% (over baseline skills of 0.11-0.15) in the contiguous U.S. We couple these performance improvements with a practical workflow to explain ABC skill gains and identify higher-skill windows of opportunity based on specific climate conditions.


Climate , Weather , Droughts , Floods , Temperature , Forecasting
2.
Proc Natl Acad Sci U S A ; 118(51)2021 12 21.
Article En | MEDLINE | ID: mdl-34903654

The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.


COVID-19/epidemiology , Databases, Factual , Health Status Indicators , Ambulatory Care/trends , Epidemiologic Methods , Humans , Internet/statistics & numerical data , Physical Distancing , Surveys and Questionnaires , Travel , United States/epidemiology
3.
Article En | MEDLINE | ID: mdl-32923903

PURPOSE: Our goal was to identify the opportunities and challenges in analyzing data from the American Association of Cancer Research Project Genomics Evidence Neoplasia Information Exchange (GENIE), a multi-institutional database derived from clinically driven genomic testing, at both the inter- and the intra-institutional level. Inter-institutionally, we identified genotypic differences between primary and metastatic tumors across the 3 most represented cancers in GENIE. Intra-institutionally, we analyzed the clinical characteristics of the Vanderbilt-Ingram Cancer Center (VICC) subset of GENIE to inform the interpretation of GENIE as a whole. METHODS: We performed overall cohort matching on the basis of age, ethnicity, and sex of 13,208 patients stratified by cancer type (breast, colon, or lung) and sample site (primary or metastatic). We then determined whether detected variants, at the gene level, were associated with primary or metastatic tumors. We extracted clinical data for the VICC subset from VICC's clinical data warehouse. Treatment exposures were mapped to a 13-class schema derived from the HemOnc ontology. RESULTS: Across 756 genes, there were significant differences in all cancer types. In breast cancer, ESR1 variants were over-represented in metastatic samples (odds ratio, 5.91; q < 10-6). TP53 mutations were over-represented in metastatic samples across all cancers. VICC had a significantly different cancer type distribution than that of GENIE but patients were well matched with respect to age, sex, and sample type. Treatment data from VICC was used for a bipartite network analysis, demonstrating clusters with a mix of histologies and others being more histology specific. CONCLUSION: This article demonstrates the feasibility of deriving meaningful insights from GENIE at the inter- and intra-institutional level and illuminates the opportunities and challenges of the data GENIE contains. The results should help guide future development of GENIE, with the goal of fully realizing its potential for accelerating precision medicine.

4.
JAMA Netw Open ; 3(3): e200265, 2020 03 02.
Article En | MEDLINE | ID: mdl-32119094

Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. Design, Setting, and Participants: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. Main Outcomes and Measurements: Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. Results: Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. Conclusions and Relevance: While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.


Breast Neoplasms/diagnostic imaging , Deep Learning , Image Interpretation, Computer-Assisted/methods , Mammography/methods , Radiologists , Adult , Aged , Algorithms , Artificial Intelligence , Early Detection of Cancer , Female , Humans , Middle Aged , Radiology , Sensitivity and Specificity , Sweden , United States
5.
J Am Acad Child Adolesc Psychiatry ; 58(8): 787-798, 2019 08.
Article En | MEDLINE | ID: mdl-30768381

OBJECTIVE: Sex differences in the brain are traditionally treated as binary. We present new evidence that a continuous measure of sex differentiation of the brain can explain sex differences in psychopathology. The degree of sex-differentiated brain features (ie, features that are more common in one sex) may predispose individuals toward sex-biased psychopathology and may also be influenced by the genome. We hypothesized that individuals with a female-biased differentiation score would have greater female-biased psychopathology (internalizing symptoms, such as anxiety and depression), whereas individuals with a male-biased differentiation score would have greater male-biased psychopathology (externalizing symptoms, such as disruptive behaviors). METHOD: Using the Philadelphia Neurodevelopmental Cohort database acquired from database of Genotypes and Phenotypes, we calculated the sex differentiation measure, a continuous data-driven calculation of each individual's degree of sex-differentiating features extracted from multimodal brain imaging data (magnetic resonance imaging [MRI] /diffusion MRI) from the imaged participants (n = 866, 407 female and 459 male). RESULTS: In male individuals, higher differentiation scores were correlated with higher levels of externalizing symptoms (r = 0.119, p = .016). The differentiation measure reached genome-wide association study significance (p < 5∗10-8) in male individuals with single nucleotide polymorphisms Chromsome5:rs111161632:RASGEF1C and Chromosome19:rs75918199:GEMIN7, and in female individuals with Chromosome2:rs78372132:PARD3B and Chromosome15:rs73442006:HCN4. CONCLUSION: The sex differentiation measure provides an initial topography of quantifying male and female brain features. This demonstration that the sex of the human brain can be conceptualized on a continuum has implications for both the presentation of psychopathology and the relation of the brain with genetic variants that may be associated with brain differentiation.


Brain/physiopathology , Chromosomes, Human/genetics , Sex Characteristics , Sex Differentiation/genetics , Adolescent , Brain/diagnostic imaging , Child , Cohort Studies , Databases, Factual , Diffusion Magnetic Resonance Imaging , Female , Genome-Wide Association Study , Genotype , Humans , Male , Phenotype , Philadelphia , Psychopathology , Young Adult
6.
AMIA Jt Summits Transl Sci Proc ; 2017: 217-226, 2018.
Article En | MEDLINE | ID: mdl-29888076

Escalating healthcare costs and inconsistent quality is exacerbated by clinical practice variability. Diagnostic testing is the highest volume medical activity, but human intuition is typically unreliable for quantitative inferences on diagnostic performance characteristics. Electronic medical records from a tertiary academic hospital (2008-2014) allow us to systematically predict laboratory pre-test probabilities of being normal under different conditions. We find that low yield laboratory tests are common (e.g., ~90% of blood cultures are normal). Clinical decision support could triage cases based on available data, such as consecutive use (e.g., lactate, potassium, and troponin are >90% normal given two previously normal results) or more complex patterns assimilated through common machine learning methods (nearly 100% precision for the top 1% of several example labs).

7.
PLoS One ; 13(12): e0208422, 2018.
Article En | MEDLINE | ID: mdl-30596661

Checkpoint inhibitor immunotherapies have had major success in treating patients with late-stage cancers, yet the minority of patients benefit. Mutation load and PD-L1 staining are leading biomarkers associated with response, but each is an imperfect predictor. A key challenge to predicting response is modeling the interaction between the tumor and immune system. We begin to address this challenge with a multifactorial model for response to anti-PD-L1 therapy. We train a model to predict immune response in patients after treatment based on 36 clinical, tumor, and circulating features collected prior to treatment. We analyze data from 21 bladder cancer patients using the elastic net high-dimensional regression procedure and, as training set error is a biased and overly optimistic measure of prediction error, we use leave-one-out cross-validation to obtain unbiased estimates of accuracy on held-out patients. In held-out patients, the model explains 79% of the variance in T cell clonal expansion. This predicted immune response is multifactorial, as the variance explained is at most 23% if clinical, tumor, or circulating features are excluded. Moreover, if patients are triaged according to predicted expansion, only 38% of non-durable clinical benefit (DCB) patients need be treated to ensure that 100% of DCB patients are treated. In contrast, using mutation load or PD-L1 staining alone, one must treat at least 77% of non-DCB patients to ensure that all DCB patients receive treatment. Thus, integrative models of immune response may improve our ability to anticipate clinical benefit of immunotherapy.


B7-H1 Antigen/antagonists & inhibitors , Cell Proliferation , Immunotherapy/methods , Lymphocytes, Tumor-Infiltrating/physiology , Models, Statistical , Protein Kinase Inhibitors/therapeutic use , T-Lymphocytes/physiology , Adult , Antibodies, Monoclonal/therapeutic use , Antibodies, Monoclonal, Humanized , B7-H1 Antigen/immunology , Biomarkers, Pharmacological/analysis , Biomarkers, Tumor/analysis , Carcinoma, Transitional Cell/drug therapy , Carcinoma, Transitional Cell/immunology , Carcinoma, Transitional Cell/pathology , Cell Proliferation/drug effects , Cell Proliferation/genetics , Clonal Evolution/drug effects , Clonal Evolution/genetics , Female , Humans , Lymphocytes, Tumor-Infiltrating/drug effects , Male , Mutation , Risk Assessment , T-Lymphocytes/drug effects , Treatment Outcome , Urinary Bladder Neoplasms/drug therapy , Urinary Bladder Neoplasms/immunology , Urinary Bladder Neoplasms/pathology
8.
BMJ Open ; 7(1): e011580, 2017 01 11.
Article En | MEDLINE | ID: mdl-28077408

OBJECTIVES: To compare the ability of standard versus enhanced models to predict future high-cost patients, especially those who move from a lower to the upper decile of per capita healthcare expenditures within 1 year-that is, 'cost bloomers'. DESIGN: We developed alternative models to predict being in the upper decile of healthcare expenditures in year 2 of a sample, based on data from year 1. Our 6 alternative models ranged from a standard cost-prediction model with 4 variables (ie, traditional model features), to our largest enhanced model with 1053 non-traditional model features. To quantify any increases in predictive power that enhanced models achieved over standard tools, we compared the prospective predictive performance of each model. PARTICIPANTS AND SETTING: We used the population of Western Denmark between 2004 and 2011 (2 146 801 individuals) to predict future high-cost patients and characterise high-cost patient subgroups. Using the most recent 2-year period (2010-2011) for model evaluation, our whole-population model used a cohort of 1 557 950 individuals with a full year of active residency in year 1 (2010). Our cost-bloom model excluded the 155 795 individuals who were already high cost at the population level in year 1, resulting in 1 402 155 individuals for prediction of cost bloomers in year 2 (2011). PRIMARY OUTCOME MEASURES: Using unseen data from a future year, we evaluated each model's prospective predictive performance by calculating the ratio of predicted high-cost patient expenditures to the actual high-cost patient expenditures in Year 2-that is, cost capture. RESULTS: Our best enhanced model achieved a 21% and 30% improvement in cost capture over a standard diagnosis-based model for predicting population-level high-cost patients and cost bloomers, respectively. CONCLUSIONS: In combination with modern statistical learning methods for analysing large data sets, models enhanced with a large and diverse set of features led to better performance-especially for predicting future cost bloomers.


Health Care Costs , Health Expenditures , Insurance, Health/statistics & numerical data , Denmark/epidemiology , Female , Health Care Costs/statistics & numerical data , Health Care Surveys , Health Expenditures/statistics & numerical data , Humans , Longitudinal Studies , Male , Models, Econometric , Risk Adjustment , Utilization Review
9.
J Am Med Inform Assoc ; 24(3): 472-480, 2017 May 01.
Article En | MEDLINE | ID: mdl-27655861

OBJECTIVE: Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets. MATERIALS AND METHODS: The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders. RESULTS: Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% ( P < 10 -20 ) by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., "critical care," "pneumonia," "neurologic evaluation"). DISCUSSION: Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability. CONCLUSION: Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support.


Algorithms , Decision Support Systems, Clinical , Electronic Health Records , Medical Order Entry Systems , Models, Statistical , Patient Care , Data Mining , Diagnostic Tests, Routine , Hospitalization , Humans , ROC Curve
10.
IEEE Trans Pattern Anal Mach Intell ; 37(2): 290-306, 2015 Feb.
Article En | MEDLINE | ID: mdl-26353242

We develop a Bayesian nonparametric approach to a general family of latent class problems in which individuals can belong simultaneously to multiple classes and where each class can be exhibited multiple times by an individual. We introduce a combinatorial stochastic process known as the negative binomial process ( NBP ) as an infinite-dimensional prior appropriate for such problems. We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process ( BNBP) and hierarchical models based on the BNBP (the HBNBP). We study the asymptotic properties of the BNBP and develop a three-parameter extension of the BNBP that exhibits power-law behavior. We derive MCMC algorithms for posterior inference under the HBNBP , and we present experiments using these algorithms in the domains of image segmentation, object recognition, and document analysis.


Cluster Analysis , Informatics/methods , Algorithms , Bayes Theorem , Computer Simulation , Image Processing, Computer-Assisted , Models, Theoretical , Statistics, Nonparametric
11.
Nat Biotechnol ; 33(1): 51-7, 2015 Jan.
Article En | MEDLINE | ID: mdl-25362243

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with substantial heterogeneity in its clinical presentation. This makes diagnosis and effective treatment difficult, so better tools for estimating disease progression are needed. Here, we report results from the DREAM-Phil Bowen ALS Prediction Prize4Life challenge. In this crowdsourcing competition, competitors developed algorithms for the prediction of disease progression of 1,822 ALS patients from standardized, anonymized phase 2/3 clinical trials. The two best algorithms outperformed a method designed by the challenge organizers as well as predictions by ALS clinicians. We estimate that using both winning algorithms in future trial designs could reduce the required number of patients by at least 20%. The DREAM-Phil Bowen ALS Prediction Prize4Life challenge also identified several potential nonstandard predictors of disease progression including uric acid, creatinine and surprisingly, blood pressure, shedding light on ALS pathobiology. This analysis reveals the potential of a crowdsourcing competition that uses clinical trial data for accelerating ALS research and development.


Amyotrophic Lateral Sclerosis/pathology , Clinical Trials as Topic , Crowdsourcing , Algorithms , Disease Progression , Humans
...