RESUMEN
The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.
Asunto(s)
COVID-19/epidemiología , Bases de Datos Factuales , Indicadores de Salud , Atención Ambulatoria/tendencias , Métodos Epidemiológicos , Humanos , Internet/estadística & datos numéricos , Distanciamiento Físico , Encuestas y Cuestionarios , Viaje , Estados Unidos/epidemiologíaRESUMEN
BACKGROUND: CRS-HIPEC provides oncologic benefit in well-selected patients with peritoneal carcinomatosis; however, it is a morbid procedure. Decision tools for preoperative patient selection are limited. We developed a risk score to predict severity of 90 day complications for cytoreductive surgery with hyperthermic intraperitoneal chemotherapy (CRS-HIPEC). PATIENTS AND METHODS: Adults who underwent CRS-HIPEC at the University of Pittsburgh Medical Center (March 2001-April 2020) were analyzed as part of this study. Primary endpoint was severe complications within 90 days following CRS-HIPEC, defined using Comprehensive Complication Index (CCI) scores as a dichotomous (determined using restricted cubic splines) and continuous variable. Data were divided into training and test sets. Several machine learning and traditional algorithms were considered. RESULTS: For the 1959 CRS-HIPEC procedures included, CCI ranged from 0 to 100 (median 32.0). Adjusted restricted cubic splines model defined severe complications as CCI > 61. A minimum of 20 variables achieved optimal performance of any of the models. Linear regression achieved the highest area under the receiving operator characteristic curve (AUC, 0.74) and outperformed the NSQIP Surgical Risk calculator (AUC 0.80 vs. 0.66). Factors most positively associated with severe complications included peritoneal carcinomatosis index score, symptomatic status, and undergoing pancreatectomy, while American Society of Anesthesiologists 2 class, appendiceal diagnosis, and preoperative albumin were most negatively associated with severe complications. CONCLUSIONS: This study refines our ability to predict severe complications within 90 days of discharge from a hospitalization in which CRS-HIPEC was performed. This advancement is timely and relevant given the growing interest in this procedure and may have implications for patient selection, patient and referring provider comfort, and survival.
Asunto(s)
Hipertermia Inducida , Neoplasias Peritoneales , Adulto , Humanos , Neoplasias Peritoneales/terapia , Terapia Combinada , Protocolos de Quimioterapia Combinada Antineoplásica/efectos adversos , Quimioterapia Adyuvante , Procedimientos Quirúrgicos de Citorreducción/efectos adversos , Juicio , Hipertermia Inducida/efectos adversos , Tasa de Supervivencia , Estudios RetrospectivosRESUMEN
With COVID-19 now pervasive, identification of high-risk individuals is crucial. Using data from a major healthcare provider in Southwestern Pennsylvania, we develop survival models predicting severe COVID-19 progression. In this endeavor, we face a tradeoff between more accurate models relying on many features and less accurate models relying on a few features aligned with clinician intuition. Complicating matters, many EHR features tend to be under-coded degrading the accuracy of smaller models. In this study we develop two sets of high-performance risk scores: (i) an unconstrained model built from all available features; and (ii) a pipeline that learns a small set of clinical concepts before training a risk predictor. Learned concepts boost performance over the corresponding features (C-index 0.858 vs. 0.844) and demonstrate improvements over (i) when evaluated out-of-sample (subsequent time periods). Our models outperform previous works (C-index 0.844-0.872 vs. 0.598-0.810).
Asunto(s)
COVID-19 , Humanos , Aprendizaje Automático , Factores de Riesgo , PennsylvaniaRESUMEN
Since the COVID-19 pandemic began, the United States's case fatality rate (CFR) has plummeted. Using national and Florida data, we unpack the drop in CFR between April and December 2020, accounting for such confounders as expanded testing, age distribution shift, and detection-to-death lags. Guided by the insight that treatment improvements in this period should correspond to decreases in hospitalization fatality rate (HFR), and using a block-bootstrapping procedure to quantify uncertainty, we find that although treatment improvements do not follow the same trajectory in Florida and nationally (with Florida undergoing a comparatively severe second peak), by December, significant improvements are observed both in Florida and nationally (at least 17% and 55% respectively). These estimates paint a more realistic picture of improvements than the drop in aggregate CFR (70.8%-91.1%). We publish a website where users can apply our analyses to selected demographics, regions, and dates of interest.
Asunto(s)
COVID-19 , Distribución por Edad , COVID-19/epidemiología , Florida/epidemiología , Hospitalización , Humanos , PandemiasRESUMEN
Numerous studies have established that estimated brain age constitutes a valuable biomarker that is predictive of cognitive decline and various neurological diseases. In this work, we curate a large-scale brain MRI data set of healthy individuals, on which we train a uniform deep learning model for brain age estimation. We demonstrate an age estimation accuracy on a hold-out test set (mean absolute error = 4.06 years, r = 0.970) and an independent life span evaluation data set (mean absolute error = 4.21 years, r = 0.960). We further demonstrate the utility of the estimated age in a life span aging analysis of cognitive functions. In summary, we achieve age estimation performance comparable to previous studies, but with a more heterogenous data set confirming the efficacy of this deep learning framework. We also evaluated training with varying age distributions. The analysis of regional contributions to our brain age predictions through multiple analyses, and confirmation of the association of divergence between the estimated and chronological brain age with neuropsychological measures, may be useful in the development and evaluation of similar imaging biomarkers.
Asunto(s)
Encéfalo/patología , Aprendizaje Profundo , Envejecimiento Saludable/patología , Imagen por Resonancia Magnética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Conjuntos de Datos como Asunto , Femenino , Humanos , Longevidad , Masculino , Persona de Mediana Edad , Adulto JovenRESUMEN
This paper provides new insight into maximizing F1 measures in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, the F1 measure is widely used to evaluate the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 measures are used in multilabel classification. For any classifier that produces a real-valued output, we derive the relationship between the best achievable F1 value and the decision-making threshold that achieves this optimum. As a special case, if the classifier outputs are well-calibrated conditional probabilities, then the optimal threshold is half the optimal F1 value. As another special case, if the classifier is completely uninformative, then the optimal behavior is to classify all examples as positive. When the actual prevalence of positive examples is low, this behavior can be undesirable. As a case study, we discuss the results, which can be surprising, of maximizing F1 when predicting 26,853 labels for Medline documents.
RESUMEN
In this work, an approach to jointly estimating the tone hole configuration (fingering) and reed model parameters of a saxophone is presented. The problem isn't one of merely estimating pitch as one applied fingering can be used to produce several different pitches by bugling or overblowing. Nor can a fingering be estimated solely by the spectral envelope of the produced sound (as it might for estimation of vocal tract shape in speech) since one fingering can produce markedly different spectral envelopes depending on the player's embouchure and control of the reed. The problem is therefore addressed by jointly estimating both the reed (source) parameters and the fingering (filter) of a saxophone model using convex optimization and 1) a bank of filter frequency responses derived from measurement of the saxophone configured with all possible fingerings and 2) sample recordings of notes produced using all possible fingerings, played with different overblowing, dynamics and timbre. The saxophone model couples one of several possible frequency response pairs (corresponding to the applied fingering), and a quasi-static reed model generating input pressure at the mouthpiece, with control parameters being blowing pressure and reed stiffness. Applied fingering and reed parameters are estimated for a given recording by formalizing a minimization problem, where the cost function is the error between the recording and the synthesized sound produced by the model having incremental parameter values for blowing pressure and reed stiffness. The minimization problem is nonlinear and not differentiable and is made solvable using convex optimization. The performance of the fingering identification is evaluated with better accuracy than previous reported value.