Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Int J Obes (Lond) ; 44(8): 1753-1765, 2020 08.
Article in English | MEDLINE | ID: mdl-32494036

ABSTRACT

BACKGROUND: Electronic health records (EHRs) are potentially important components in addressing pediatric obesity in clinical settings and at the population level. This work aims to identify temporal condition patterns surrounding obesity incidence in a large pediatric population that may inform clinical care and childhood obesity policy and prevention efforts. METHODS: EHR data from healthcare visits with an initial record of obesity incidence (index visit) from 2009 through 2016 at the Children's Hospital of Philadelphia, and visits immediately before (pre-index) and after (post-index), were compared with a matched control population of patients with a healthy weight to characterize the prevalence of common diagnoses and condition trajectories. The study population consisted of 49,694 patients with pediatric obesity and their corresponding matched controls. The SPADE algorithm was used to identify common temporal condition patterns in the case population. McNemar's test was used to assess the statistical significance of pattern prevalence differences between the case and control populations. RESULTS: SPADE identified 163 condition patterns that were present in at least 1% of cases; 80 were significantly more common among cases and 45 were significantly more common among controls (p < 0.05). Asthma and allergic rhinitis were strongly associated with childhood obesity incidence, particularly during the pre-index and index visits. Seven conditions were commonly diagnosed for cases exclusively during pre-index visits, including ear, nose, and throat disorders and gastroenteritis. CONCLUSIONS: The novel application of SPADE on a large retrospective dataset revealed temporally dependent condition associations with obesity incidence. Allergic rhinitis and asthma had a particularly high prevalence during pre-index visits. These conditions, along with those exclusively observed during pre-index visits, may represent signals of future obesity. While causation cannot be inferred from these associations, the temporal condition patterns identified here represent hypotheses that can be investigated to determine causal relationships in future obesity research.


Subject(s)
Big Data , Pediatric Obesity/epidemiology , Adolescent , Asthma/epidemiology , Case-Control Studies , Child , Child, Preschool , Electronic Health Records , Female , Humans , Incidence , Male , Philadelphia/epidemiology , Retrospective Studies , Rhinitis, Allergic/epidemiology
2.
J Pediatr ; 217: 59-65.e1, 2020 02.
Article in English | MEDLINE | ID: mdl-31604632

ABSTRACT

OBJECTIVE: To determine if time to antibiotic administration is associated with mortality and in-hospital outcomes in a neonatal intensive care unit (NICU) population. STUDY DESIGN: We conducted a prospective evaluation of infants with suspected sepsis between September 2014 and February 2018; sepsis was defined as clinical concern prompting blood culture collection and antibiotic administration. Time to antibiotic administration was calculated from time of sepsis identification, defined as the order time of either blood culture or an antibiotic, to time of first antibiotic administration. We used linear models with generalized estimating equations to determine the association between time to antibiotic administration and mortality, ventilator-free and inotrope-free days, and NICU length of stay in patients with culture-proven sepsis. RESULTS: Among 1946 sepsis evaluations, we identified 128 episodes of culture-proven sepsis in 113 infants. Among them, prolonged time to antibiotic administration was associated with significantly increased risk of mortality at 14 days (OR, 1.47; 95% CI, 1.15-1.87) and 30 days (OR, 1.47; 95% CI, 1.11-1.94) as well as fewer inotrope-free days (incidence rate ratio, 0.91; 95% CI, 0.84-0.98). No significant associations with ventilator-free days or NICU length of stay were demonstrated. CONCLUSIONS: Among infants with sepsis, delayed time to antibiotic administration was an independent risk factor for death and prolonged cardiovascular dysfunction. Further study is needed to define optimal timing of antimicrobial administration in high-risk NICU populations.


Subject(s)
Anti-Bacterial Agents/administration & dosage , Sepsis/drug therapy , Sepsis/mortality , Comorbidity , Electronic Health Records , Female , Humans , Infant , Infant, Newborn , Intensive Care Units, Neonatal , Intensive Care, Neonatal , Length of Stay , Linear Models , Male , Multivariate Analysis , Probability , Prospective Studies , Risk Factors , Sepsis/microbiology , Time-to-Treatment , Treatment Outcome
3.
Anesthesiology ; 133(3): 523-533, 2020 09.
Article in English | MEDLINE | ID: mdl-32433278

ABSTRACT

BACKGROUND: Children are required to fast before elective general anesthesia. This study hypothesized that prolonged fasting causes volume depletion that manifests as low blood pressure. This study aimed to assess the association between fluid fasting duration and postinduction low blood pressure. METHODS: A retrospective cohort study was performed of 15,543 anesthetized children without preinduction venous access who underwent elective surgery from 2016 to 2017 at Children's Hospital of Philadelphia. Low blood pressure was defined as systolic blood pressure lower than 2 standard deviations below the mean (approximately the 2.5th percentile) for sex- and age-specific reference values. Two epochs were assessed: epoch 1 was from induction to completion of anesthesia preparation, and epoch 2 was during surgical preparation. RESULTS: In epoch 1, the incidence of low systolic blood pressure was 5.2% (697 of 13,497), and no association was observed with the fluid fasting time groups: less than 4 h (4.6%, 141 of 3,081), 4 to 8 h (6.0%, 219 of 3,652), 8 to 12 h (4.9%, 124 of 2,526), and more than 12 h (5.0%, 213 of 4,238). In epoch 2, the incidence of low systolic blood pressure was 6.9% (889 of 12,917) and varied across the fasting groups: less than 4 h (5.6%, 162 of 2,918), 4 to 8 h (8.1%, 285 of 3,531), 8 to 12 h (5.9%, 143 of 2,423), and more than 12 h (7.4%, 299 of 4,045); after adjusting for confounders, fasting 4 to 8 h (adjusted odds ratio, 1.33; 95% CI, 1.07 to 1.64; P = 0.009) and greater than 12 h (adjusted odds ratio, 1.28; 95% CI, 1.04 to 1.57; P = 0.018) were associated with significantly higher odds of low systolic blood pressure compared with the group who fasted less than 4 h, whereas the increased odds of low systolic blood pressure associated with fasting 8 to 12 h (adjusted odds ratio, 1.11; 95% CI, 0.87 to 1.42; P = 0.391) was nonsignificant. CONCLUSIONS: Longer durations of clear fluid fasting in anesthetized children were associated with increased risk of postinduction low blood pressure during surgical preparation, although this association appeared nonlinear.


Subject(s)
Fasting/adverse effects , Hypotension/etiology , Hypotension/physiopathology , Preoperative Care/methods , Blood Pressure , Child , Child, Preschool , Cohort Studies , Female , Humans , Male , Prospective Studies , Retrospective Studies , Time Factors
4.
Ear Hear ; 41(2): 231-238, 2020.
Article in English | MEDLINE | ID: mdl-31408044

ABSTRACT

The use of "big data" for pediatric hearing research requires new approaches to both data collection and research methods. The widespread deployment of electronic health record systems creates new opportunities and corresponding challenges in the secondary use of large volumes of audiological and medical data. Opportunities include cost-effective hypothesis generation, rapid cohort expansion for rare conditions, and observational studies based on sample sizes in the thousands to tens of thousands. Challenges include finding and forming appropriately skilled teams, access to data, data quality assessment, and engagement with a research community new to big data. The authors share their experience and perspective on the work required to build and validate a pediatric hearing research database that integrates clinical data for over 185,000 patients from the electronic health record systems of three major academic medical centers.


Subject(s)
Audiology , Child , Cohort Studies , Databases, Factual , Hearing , Humans
5.
J Cardiothorac Vasc Anesth ; 34(2): 479-482, 2020 Feb.
Article in English | MEDLINE | ID: mdl-31327699

ABSTRACT

Congenital heart disease (CHD) is one of the most common birth anomalies, and the care of children with CHD has improved over the past 4 decades. However, children with CHD who undergo general anesthesia remain at increased risk for morbidity and mortality. The proliferation of electronic health record systems and sophisticated patient monitors affords the opportunity to capture and analyze large amounts of CHD patient data, and the application of novel, effective analytics methods to these data can enable clinicians to enhance their care of pediatric CHD patients. This narrative review covers recent efforts to leverage analytics in pediatric cardiac anesthesia and critical care to improve the care of children with CHD.


Subject(s)
Anesthesia, Cardiac Procedures , Heart Defects, Congenital , Anesthesia, General , Child , Critical Care , Heart Defects, Congenital/surgery , Humans
6.
J Biomed Inform ; 69: 86-92, 2017 05.
Article in English | MEDLINE | ID: mdl-28389234

ABSTRACT

Annotating unstructured texts in Electronic Health Records data is usually a necessary step for conducting machine learning research on such datasets. Manual annotation by domain experts provides data of the best quality, but has become increasingly impractical given the rapid increase in the volume of EHR data. In this article, we examine the effectiveness of crowdsourcing with unscreened online workers as an alternative for transforming unstructured texts in EHRs into annotated data that are directly usable in supervised learning models. We find the crowdsourced annotation data to be just as effective as expert data in training a sentence classification model to detect the mentioning of abnormal ear anatomy in radiology reports of audiology. Furthermore, we have discovered that enabling workers to self-report a confidence level associated with each annotation can help researchers pinpoint less-accurate annotations requiring expert scrutiny. Our findings suggest that even crowd workers without specific domain knowledge can contribute effectively to the task of annotating unstructured EHR datasets.


Subject(s)
Crowdsourcing , Data Curation , Electronic Health Records , Audiology , Humans , Radiology
7.
BMC Genomics ; 17 Suppl 4: 434, 2016 08 18.
Article in English | MEDLINE | ID: mdl-27535360

ABSTRACT

BACKGROUND: High throughput molecular sequencing and increased biospecimen variety have introduced significant informatics challenges for research biorepository infrastructures. We applied a modular system integration approach to develop an operational biorepository management system. This method enables aggregation of the clinical, specimen and genomic data collected for biorepository resources. METHODS: We introduce an electronic Honest Broker (eHB) and Biorepository Portal (BRP) open source project that, in tandem, allow for data integration while protecting patient privacy. This modular approach allows data and specimens to be associated with a biorepository subject at any time point asynchronously. This lowers the bar to develop new research projects based on scientific merit without institutional review for a proposal. RESULTS: By facilitating the automated de-identification of specimen and associated clinical and genomic data we create a future proofed specimen set that can withstand new workflows and be connected to new associated information over time. Thus facilitating collaborative advanced genomic and tissue research. CONCLUSIONS: As of Janurary of 2016 there are 23 unique protocols/patient cohorts being managed in the Biorepository Portal (BRP). There are over 4000 unique subject records in the electronic honest broker (eHB), over 30,000 specimens accessioned and 8 institutions participating in various biobanking activities using this tool kit. We specifically set out to build rich annotation of biospecimens with longitudinal clinical data; BRP/REDCap integration for multi-institutional repositories; EMR integration; further annotated specimens with genomic data specific to a domain; build application hooks for experiments at the specimen level integrated with analytic software; while protecting privacy per the Office of Civil Rights (OCR) and HIPAA.


Subject(s)
Biological Specimen Banks , Software , Specimen Handling/methods , Translational Research, Biomedical , Genome, Human , Genomics , High-Throughput Nucleotide Sequencing/methods , Humans , Privacy
9.
BMC Med Inform Decis Mak ; 16: 65, 2016 06 06.
Article in English | MEDLINE | ID: mdl-27267768

ABSTRACT

BACKGROUND: Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region. METHODS: Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80 %) and test (20 %) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set. RESULTS: Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90 %, 90 %, 93 %, and 82 % for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75 % of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions. CONCLUSIONS: Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available.


Subject(s)
Audiology/classification , Machine Learning , Natural Language Processing , Radiology/classification , Temporal Bone/diagnostic imaging , Databases as Topic , Humans
10.
BMC Bioinformatics ; 15: 248, 2014 Jul 21.
Article in English | MEDLINE | ID: mdl-25047600

ABSTRACT

BACKGROUND: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content. RESULTS: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3. CONCLUSIONS: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.


Subject(s)
Biological Ontologies , Computational Biology/methods , Data Mining/methods , Disease/genetics , Phenotype , Semantics , Algorithms , Databases, Genetic , Exome/genetics , Humans , Software
11.
JMIR Form Res ; 8: e48894, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38427407

ABSTRACT

BACKGROUND: The development of digital health tools that are clinically relevant requires a deep understanding of the unmet needs of stakeholders, such as clinicians and patients. One way to reveal unforeseen stakeholder needs is through qualitative research, including stakeholder interviews. However, conventional qualitative data analytical approaches are time-consuming and resource-intensive, rendering them untenable in many industry settings where digital tools are conceived of and developed. Thus, a more time-efficient process for identifying clinically relevant target needs for digital tool development is needed. OBJECTIVE: The objective of this study was to address the need for an accessible, simple, and time-efficient alternative to conventional thematic analysis of qualitative research data through text analysis of semistructured interview transcripts. In addition, we sought to identify important themes across expert psychiatrist advisor interview transcripts to efficiently reveal areas for the development of digital tools that target unmet clinical needs. METHODS: We conducted 10 (1-hour-long) semistructured interviews with US-based psychiatrists treating major depressive disorder. The interviews were conducted using an interview guide that comprised open-ended questions predesigned to (1) understand the clinicians' experience of the care management process and (2) understand the clinicians' perceptions of the patients' experience of the care management process. We then implemented a hybrid analytical approach that combines computer-assisted text analyses with deductive analyses as an alternative to conventional qualitative thematic analysis to identify word combination frequencies, content categories, and broad themes characterizing unmet needs in the care management process. RESULTS: Using this hybrid computer-assisted analytical approach, we were able to identify several key areas that are of interest to clinicians in the context of major depressive disorder and would be appropriate targets for digital tool development. CONCLUSIONS: A hybrid approach to qualitative research combining computer-assisted techniques with deductive techniques provides a time-efficient approach to identifying unmet needs, targets, and relevant themes to inform digital tool development. This can increase the likelihood that useful and practical tools are built and implemented to ultimately improve health outcomes for patients.

12.
Neural Netw ; 162: 581-588, 2023 May.
Article in English | MEDLINE | ID: mdl-37011460

ABSTRACT

In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.

13.
Front Digit Health ; 5: 1221754, 2023.
Article in English | MEDLINE | ID: mdl-37771820

ABSTRACT

Introduction: Digital health technologies (DHTs) driven by artificial intelligence applications, particularly those including predictive models derived with machine learning methods, have garnered substantial attention and financial investment in recent years. Yet, there is little evidence of widespread adoption and scant proof of gains in patient health outcomes. One factor of this paradox is the disconnect between DHT developers and digital health ecosystem stakeholders, which can result in developing technologies that are highly sophisticated but clinically irrelevant. Here, we aimed to uncover challenges faced by psychiatrists treating patients with major depressive disorder (MDD). Specifically, we focused on challenges psychiatrists raised about bipolar disorder (BD) misdiagnosis. Methods: We conducted semi-structured interviews with 10 United States-based psychiatrists. We applied text and thematic analysis to the resulting interview transcripts. Results: Three main themes emerged: (1) BD is often misdiagnosed, (2) information crucial to evaluating BD is often occluded from clinical observation, and (3) BD misdiagnosis has important treatment implications. Discussion: Using upstream stakeholder engagement methods, we were able to identify a narrow, unforeseen, and clinically relevant problem. We propose an organizing framework for development of digital tools based upon clinician-identified unmet need.

14.
PLOS Digit Health ; 1(8): e0000073, 2022 Aug.
Article in English | MEDLINE | ID: mdl-36812554

ABSTRACT

In this work, we present a study of electronic health record (EHR) data that aims to identify pediatric obesity clinical subtypes. Specifically, we examine whether certain temporal condition patterns associated with childhood obesity incidence tend to cluster together to characterize subtypes of clinically similar patients. In a previous study, the sequence mining algorithm, SPADE was implemented on EHR data from a large retrospective cohort (n = 49 594 patients) to identify common condition trajectories surrounding pediatric obesity incidence. In this study, we used Latent Class Analysis (LCA) to identify potential subtypes formed by these temporal condition patterns. The demographic characteristics of patients in each subtype are also examined. An LCA model with 8 classes was developed that identified clinically similar patient subtypes. Patients in Class 1 had a high prevalence of respiratory and sleep disorders, patients in Class 2 had high rates of inflammatory skin conditions, patients in Class 3 had a high prevalence of seizure disorders, and patients in Class 4 had a high prevalence of Asthma. Patients in Class 5 lacked a clear characteristic morbidity pattern, and patients in Classes 6, 7, and 8 had a high prevalence of gastrointestinal issues, neurodevelopmental disorders, and physical symptoms respectively. Subjects generally had high membership probability for a single class (>70%), suggesting shared clinical characterization within the individual groups. We identified patient subtypes with temporal condition patterns that are significantly more common among obese pediatric patients using a Latent Class Analysis approach. Our findings may be used to characterize the prevalence of common conditions among newly obese pediatric patients and to identify pediatric obesity subtypes. The identified subtypes align with prior knowledge on comorbidities associated with childhood obesity, including gastro-intestinal, dermatologic, developmental, and sleep disorders, as well as asthma.

15.
Autism Res ; 15(1): 117-130, 2022 01.
Article in English | MEDLINE | ID: mdl-34741438

ABSTRACT

Commercially available wearable biosensors have the potential to enhance psychophysiology research and digital health technologies for autism by enabling stress or arousal monitoring in naturalistic settings. However, such monitors may not be comfortable for children with autism due to sensory sensitivities. To determine the feasibility of wearable technology in children with autism age 8-12 years, we first selected six consumer-grade wireless cardiovascular monitors and tested them during rest and movement conditions in 23 typically developing adults. Subsequently, the best performing monitors (based on data quality robustness statistics), Polar and Mio Fuse, were evaluated in 32 children with autism and 23 typically developing children during a 2-h session, including rest and mild stress-inducing tasks. Cardiovascular data were recorded simultaneously across monitors using custom software. We administered the Comfort Rating Scales to children. Although the Polar monitor was less comfortable for children with autism than typically developing children, absolute scores demonstrated that, on average, all children found each monitor comfortable. For most children, data from the Mio Fuse (96%-100%) and Polar (83%-96%) passed quality thresholds of data robustness. Moreover, in the stress relative to rest condition, heart rate increased for the Polar, F(1,53) = 135.70, p < 0.001, ηp2  = 0.78, and Mio Fuse, F(1,53) = 71.98, p < 0.001, ηp2  = 0.61, respectively, and heart rate variability decreased for the Polar, F(1,53) = 13.41, p = 0.001, ηp2  = 0.26, and Mio Fuse, F(1,53) = 8.89, p = 0.005, ηp2  = 0.16, respectively. This feasibility study suggests that select consumer-grade wearable cardiovascular monitors can be used with children with autism and may be a promising means for tracking physiological stress or arousal responses in community settings. LAY SUMMARY: Commercially available heart rate trackers have the potential to advance stress research with individuals with autism. Due to sensory sensitivities common in autism, their comfort wearing such trackers is vital to gathering robust and valid data. After assessing six trackers with typically developing adults, we tested the best trackers (based on data quality) in typically developing children and children with autism and found that two of them met criteria for comfort, robustness, and validity.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Wearable Electronic Devices , Adult , Child , Fitness Trackers , Heart Rate , Humans
16.
JCO Clin Cancer Inform ; 6: e2200081, 2022 09.
Article in English | MEDLINE | ID: mdl-36198128

ABSTRACT

PURPOSE: Adverse events (AEs) on Children's Oncology Group (COG) trials are manually ascertained using Common Terminology Criteria for Adverse Events. Despite significant effort, we previously demonstrated that COG typhlitis reporting sensitivity was only 37% when compared with gold standard physician chart abstraction. This study tested an automated typhlitis identification algorithm using electronic health record data. METHODS: Electronic health record data from children with leukemia age 0-22 years treated at a single institution from 2006 to 2019 were included. Patients were divided into derivation and validation cohorts. Rigorous chart abstraction of validation cohort patients established a gold standard AE data set. We created an automated algorithm to identify typhlitis matching Common Terminology Criteria for Adverse Events v5 that included antibiotics, neutropenia, and non-negated mention of typhlitis in a note. We iteratively refined the algorithm using the derivation cohort and then applied the algorithm to the validation cohort; performance was compared with the gold standard. For patients on trial AAML1031, COG AE report performance was compared with the gold standard. RESULTS: The derivation cohort included 337 patients. The validation cohort included 270 patients (961 courses). Chart abstraction identified 16 courses with typhlitis. The algorithm identified 37 courses with typhlitis; 13 were true positives (sensitivity 81.3%, positive predictive value 35.1%). For patients on AAML1031, chart abstraction identified nine courses with typhlitis, and COG reporting correctly identified 4 (sensitivity 44.4%, positive predictive value 100.0%). CONCLUSION: The automated algorithm identified true cases of typhlitis with higher sensitivity than COG reporting. The algorithm identified false positives but reduced the number of courses needing manual review by 96% (961 to 37) by detecting potential typhlitis. This algorithm could provide a useful screening tool to reduce manual effort required for typhlitis AE reporting.


Subject(s)
Electronic Health Records , Typhlitis , Adolescent , Adult , Algorithms , Anti-Bacterial Agents , Child , Child, Preschool , Humans , Infant , Infant, Newborn , Predictive Value of Tests , Young Adult
17.
PLoS One ; 16(3): e0247784, 2021.
Article in English | MEDLINE | ID: mdl-33647071

ABSTRACT

Early childhood asthma diagnosis is common; however, many children diagnosed before age 5 experience symptom resolution and it remains difficult to identify individuals whose symptoms will persist. Our objective was to develop machine learning models to identify which individuals diagnosed with asthma before age 5 continue to experience asthma-related visits. We curated a retrospective dataset for 9,934 children derived from electronic health record (EHR) data. We trained five machine learning models to differentiate individuals without subsequent asthma-related visits (transient diagnosis) from those with asthma-related visits between ages 5 and 10 (persistent diagnosis) given clinical information up to age 5 years. Based on average NPV-Specificity area (ANSA), all models performed significantly better than random chance, with XGBoost obtaining the best performance (0.43 mean ANSA). Feature importance analysis indicated age of last asthma diagnosis under 5 years, total number of asthma related visits, self-identified black race, allergic rhinitis, and eczema as important features. Although our models appear to perform well, a lack of prior models utilizing a large number of features to predict individual persistence makes direct comparison infeasible. However, feature importance analysis indicates our models are consistent with prior research indicating diagnosis age and prior health service utilization as important predictors of persistent asthma. We therefore find that machine learning models can predict which individuals will experience persistent asthma with good performance and may be useful to guide clinician and parental decisions regarding asthma counselling in early childhood.


Subject(s)
Asthma/diagnosis , Machine Learning , Child, Preschool , Eczema/diagnosis , Electronic Health Records , Humans , Probability , Prognosis , Rhinitis, Allergic/diagnosis
18.
Int J Med Inform ; 150: 104454, 2021 06.
Article in English | MEDLINE | ID: mdl-33866231

ABSTRACT

OBJECTIVE: This study compares seven machine learning models developed to predict childhood obesity from age > 2 to ≤ 7 years using Electronic Healthcare Record (EHR) data up to age 2 years. MATERIALS AND METHODS: EHR data from of 860,510 patients with 11,194,579 healthcare encounters were obtained from the Children's Hospital of Philadelphia. After applying stringent quality control to remove implausible growth values and including only individuals with all recommended wellness visits by age 7 years, 27,203 (50.78 % male) patients remained for model development. Seven machine learning models were developed to predict obesity incidence as defined by the Centers for Disease Control and Prevention (age/sex adjusted BMI>95th percentile). Model performance was evaluated by multiple standard classifier metrics and the differences among seven models were compared using the Cochran's Q test and post-hoc pairwise testing. RESULTS: XGBoost yielded 0.81 (0.001) AUC, which outperformed all other models. It also achieved statistically significant better performance than all other models on standard classifier metrics (sensitivity fixed at 80 %): precision 30.90 % (0.22 %), F1-socre 44.60 % (0.26 %), accuracy 66.14 % (0.41 %), and specificity 63.27 % (0.41 %). DISCUSSION AND CONCLUSION: Early childhood obesity prediction models were developed from the largest cohort reported to date. Relative to prior research, our models generalize to include males and females in a single model and extend the time frame for obesity incidence prediction to 7 years of age. The presented machine learning model development workflow can be adapted to various EHR-based studies and may be valuable for developing other clinical prediction models.


Subject(s)
Electronic Health Records , Pediatric Obesity , Child , Child, Preschool , Cohort Studies , Female , Humans , Incidence , Machine Learning , Male , Pediatric Obesity/epidemiology
19.
Int J Lab Hematol ; 43(6): 1341-1356, 2021 Dec.
Article in English | MEDLINE | ID: mdl-33949115

ABSTRACT

INTRODUCTION: Early diagnosis and antibiotic administration are essential for reducing sepsis morbidity and mortality; however, diagnosis remains difficult due to complex pathogenesis and presentation. We created a machine learning model for bacterial sepsis identification in the neonatal intensive care unit (NICU) using hematological analyzer data. METHODS: Hematological analyzer data were gathered from NICU patients up to 48 hours prior to clinical evaluation for bacterial sepsis. Five models, Support Vector Machine, K-nearest-neighbors, Logistic Regression, Random Forest (RF), and Extreme Gradient boosting (XGBoost), were trained on 60 hematological and nine clinical variables for 2357 cases (1692 control, 665 septic). Clinical feature only models (nine variables) were additionally trained and compared with models including hematological variables. Feature importance was used to assess relative contributions of parameters to performance. RESULTS: The three best performing models were RF, Logistic Regression, and XGBoost. RF achieved an average accuracy of 0.74, AUC-ROC of 0.73, Sensitivity of 0.38, and Specificity of 0.88. Logistic Regression achieved an average accuracy of 0.70, AUC-ROC of 0.74, Sensitivity of 0.62, and Specificity of 0.73. XGBoost achieved an average accuracy of 0.72, AUC-ROC of 0.71, Sensitivity of 0.40, and Specificity of 0.85. All models with hematological variables had significantly stronger performance than models trained on only clinical features. Neutrophil parameters had the highest average feature importance. CONCLUSIONS: Machine learning models using hematological analyzer data can classify NICU patients as sepsis positive or negative with stronger performance compared to clinical feature only models. Hematological analyzer variables could augment current sepsis classification machine learning algorithms.


Subject(s)
Bacteremia/blood , Machine Learning , Neonatal Sepsis/blood , Algorithms , Bacteremia/diagnosis , Hematologic Tests , Humans , Infant, Newborn , Logistic Models , Neonatal Sepsis/diagnosis , Risk Assessment
20.
J Am Med Inform Assoc ; 27(4): 558-566, 2020 04 01.
Article in English | MEDLINE | ID: mdl-32049282

ABSTRACT

OBJECTIVE: This study introduces a temporal condition pattern mining methodology to address the sparse nature of coded condition concept utilization in electronic health record data. As a validation study, we applied this method to reveal condition patterns surrounding an initial diagnosis of pediatric asthma. MATERIALS AND METHODS: The SPADE (Sequential PAttern Discovery using Equivalence classes) algorithm was used to identify common temporal condition patterns surrounding the initial diagnosis of pediatric asthma in a study population of 71 824 patients from the Children's Hospital of Philadelphia. SPADE was applied to a dataset with diagnoses coded using International Classification of Diseases (ICD) concepts and separately to a dataset with the ICD codes mapped to their corresponding expanded diagnostic clusters (EDCs). Common temporal condition patterns surrounding the initial diagnosis of pediatric asthma ascertained by SPADE from both the ICD and EDC datasets were compared. RESULTS: SPADE identified 36 unique diagnoses in the mapped EDC dataset, whereas only 19 were recognized in the ICD dataset. Temporal trends in condition diagnoses ascertained from the EDC data were not discoverable in the ICD dataset. DISCUSSION: Mining frequent temporal condition patterns from large electronic health record datasets may reveal previously unknown associations between diagnoses that could inform future research into causation or other relationships. Mapping sparsely coded medical concepts into homogenous groups was essential to discovering potentially useful information from our dataset. CONCLUSIONS: We expect that the presented methodology is applicable to the study of diagnostic trajectories for other clinical conditions and can be extended to study temporal patterns of other coded medical concepts such as medications and procedures.


Subject(s)
Algorithms , Asthma/diagnosis , Data Mining/methods , Electronic Health Records , Pattern Recognition, Automated/methods , Child , Datasets as Topic , Humans , International Classification of Diseases , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL