Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34009266

RESUMEN

Cytolytic T-cells play an essential role in the adaptive immune system by seeking out, binding and killing cells that present foreign antigens on their surface. An improved understanding of T-cell immunity will greatly aid in the development of new cancer immunotherapies and vaccines for life-threatening pathogens. Central to the design of such targeted therapies are computational methods to predict non-native peptides to elicit a T-cell response, however, we currently lack accurate immunogenicity inference methods. Another challenge is the ability to accurately simulate immunogenic peptides for specific human leukocyte antigen alleles, for both synthetic biological applications, and to augment real training datasets. Here, we propose a beta-binomial distribution approach to derive peptide immunogenic potential from sequence alone. We conducted systematic benchmarking of five traditional machine learning (ElasticNet, K-nearest neighbors, support vector machine, Random Forest and AdaBoost) and three deep learning models (convolutional neural network (CNN), Residual Net and graph neural network) using three independent prior validated immunogenic peptide collections (dengue virus, cancer neoantigen and SARS-CoV-2). We chose the CNN as the best prediction model, based on its adaptivity for small and large datasets and performance relative to existing methods. In addition to outperforming two highly used immunogenicity prediction algorithms, DeepImmuno-CNN correctly predicts which residues are most important for T-cell antigen recognition and predicts novel impacts of SARS-CoV-2 variants. Our independent generative adversarial network (GAN) approach, DeepImmuno-GAN, was further able to accurately simulate immunogenic peptides with physicochemical properties and immunogenicity predictions similar to that of real antigens. We provide DeepImmuno-CNN as source code and an easy-to-use web interface.


Asunto(s)
COVID-19/inmunología , Péptidos/inmunología , SARS-CoV-2/inmunología , Algoritmos , COVID-19/virología , Aprendizaje Profundo , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Péptidos/genética , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Programas Informáticos , Linfocitos T/inmunología , Linfocitos T/virología
2.
Dev Med Child Neurol ; 65(1): 100-106, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35665923

RESUMEN

AIM: To predict ambulatory status and Gross Motor Function Classification System (GMFCS) levels in patients with cerebral palsy (CP) by applying natural language processing (NLP) to electronic health record (EHR) clinical notes. METHOD: Individuals aged 8 to 26 years with a diagnosis of CP in the EHR between January 2009 and November 2020 (~12 years of data) were included in a cross-sectional retrospective cohort of 2483 patients. The cohort was divided into train-test and validation groups. Positive predictive value, sensitivity, specificity, and area under the receiver operating curve (AUC) were calculated for prediction of ambulatory status and GMFCS levels. RESULTS: The median age was 15 years (interquartile range 10-20 years) for the total cohort, with 56% being male and 75% White. The validation group resulted in 70% sensitivity, 88% specificity, 81% positive predictive value, and 0.89 AUC for predicting ambulatory status. NLP applied to the EHR differentiated between GMFCS levels I-II and III (15% sensitivity, 96% specificity, 46% positive predictive value, and 0.71 AUC); and IV and V (81% sensitivity, 51% specificity, 70% positive predictive value, and 0.75 AUC). INTERPRETATION: NLP applied to the EHR demonstrated excellent differentiation between ambulatory and non-ambulatory status, and good differentiation between GMFCS levels I-II and III, and IV and V. Clinical use of NLP may help to individualize functional characterization and management. WHAT THIS PAPER ADDS: Natural language processing (NLP) applied to the electronic health record (EHR) can predict ambulatory status in children with cerebral palsy (CP). NLP provides good prediction of Gross Motor Function Classification System level in children with CP using the EHR. NLP methods described could be integrated in an EHR system to provide real-time information.


Asunto(s)
Parálisis Cerebral , Niño , Humanos , Masculino , Adolescente , Adulto Joven , Adulto , Femenino , Parálisis Cerebral/complicaciones , Parálisis Cerebral/diagnóstico , Procesamiento de Lenguaje Natural , Estudios Retrospectivos , Estudios Transversales , Registros Electrónicos de Salud
3.
Genet Med ; 24(11): 2329-2337, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36098741

RESUMEN

PURPOSE: The variable expressivity and multisystem features of Noonan syndrome (NS) make it difficult for patients to obtain a timely diagnosis. Genetic testing can confirm a diagnosis, but underdiagnosis is prevalent owing to a lack of recognition and referral for testing. Our study investigated the utility of using electronic health records (EHRs) to identify patients at high risk of NS. METHODS: Using diagnosis texts extracted from Cincinnati Children's Hospital's EHR database, we constructed deep learning models from 162 NS cases and 16,200 putative controls. Performance was evaluated on 2 independent test sets, one containing patients with NS who were previously diagnosed and the other containing patients with undiagnosed NS. RESULTS: Our novel method performed significantly better than the previous method, with the convolutional neural network model achieving the highest area under the precision-recall curve in both test sets (diagnosed: 0.43, undiagnosed: 0.16). CONCLUSION: The results suggested the validity of using text-based deep learning methods to analyze EHR and showed the value of this approach as a potential tool to identify patients with features of rare diseases. Given the paucity of medical geneticists, this has the potential to reduce disease underdiagnosis by prioritizing patients who will benefit most from a genetics referral.


Asunto(s)
Aprendizaje Profundo , Síndrome de Noonan , Humanos , Niño , Registros Electrónicos de Salud , Síndrome de Noonan/diagnóstico , Síndrome de Noonan/genética , Bases de Datos Factuales , Pruebas Genéticas
4.
Pediatr Transplant ; 26(3): e14204, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-34881481

RESUMEN

BACKGROUND: Pediatric heart transplant (PHT) patients have the highest waitlist mortality of solid organ transplants, yet more than 40% of viable hearts are unutilized. A tool for risk prediction could impact these outcomes. This study aimed to compare and validate the PHT risk score models (RSMs) in the literature. METHODS: The literature was reviewed to identify RSMs published. The United Network for Organ Sharing (UNOS) registry was used to validate the published models identified in a pediatric cohort (<18 years) transplanted between 2017 and 2019 and compared against the Scientific Registry of Transplant Recipients (SRTR) 2021 model. Primary outcome was post-transplant 1-year mortality. Odds ratios were obtained to evaluate the association between risk score groups and 1-year mortality. Area under the curve (AUC) was used to compare the RSM scores on their goodness-of-fit, using Delong's test. RESULTS: Six recipient and one donor RSMs published between 2008 and 2021 were included in the analysis. The validation cohort included 1,003 PHT. Low-risk groups had a significantly better survival than high-risk groups as predicted by Choudhry (OR = 4.59, 95% CI [2.36-8.93]) and Fraser III (3.17 [1.43-7.05]) models. Choudhry's and SRTR models achieved the best overall performance (AUC = 0.69 and 0.68, respectively). When adjusted for CHD and ventricular assist device support, all models reported better predictability [AUC > 0.6]. Choudhry (AUC = 0.69) and SRTR (AUC = 0.71) remained the best predicting RSMs even after adjustment. CONCLUSION: Although the RSMs by SRTR and Choudhry provided the best prediction for 1-year mortality, none demonstrated a strong (AUC ≥ 0.8) concordance statistic. All published studies lacked advanced analytical approaches and were derived from an inherently limited dataset.


Asunto(s)
Trasplante de Corazón , Niño , Humanos , Sistema de Registros , Factores de Riesgo , Donantes de Tejidos , Receptores de Trasplantes , Listas de Espera
5.
J Med Internet Res ; 23(9): e26231, 2021 09 10.
Artículo en Inglés | MEDLINE | ID: mdl-34505837

RESUMEN

BACKGROUND: Day-of-surgery cancellation (DoSC) represents a substantial wastage of hospital resources and can cause significant inconvenience to patients and families. Cancellation is reported to impact between 2% and 20% of the 50 million procedures performed annually in American hospitals. Up to 85% of cancellations may be amenable to the modification of patients' and families' behaviors. However, the factors underlying DoSC and the barriers experienced by families are not well understood. OBJECTIVE: This study aims to conduct a geospatial analysis of patient-specific variables from electronic health records (EHRs) of Cincinnati Children's Hospital Medical Center (CCHMC) and of Texas Children's Hospital (TCH), as well as linked socioeconomic factors measured at the census tract level, to understand potential underlying contributors to disparities in DoSC rates across neighborhoods. METHODS: The study population included pediatric patients who underwent scheduled surgeries at CCHMC and TCH. A 5-year data set was extracted from the CCHMC EHR, and addresses were geocoded. An equivalent set of data >5.7 years was extracted from the TCH EHR. Case-based data related to patients' health care use were aggregated at the census tract level. Community-level variables were extracted from the American Community Survey as surrogates for patients' socioeconomic and minority status as well as markers of the surrounding context. Leveraging the selected variables, we built spatial models to understand the variation in DoSC rates across census tracts. The findings were compared to those of the nonspatial regression and deep learning models. Model performance was evaluated from the root mean squared error (RMSE) using nested 10-fold cross-validation. Feature importance was evaluated by computing the increment of the RMSE when a single variable was shuffled within the data set. RESULTS: Data collection yielded sets of 463 census tracts at CCHMC (DoSC rates 1.2%-12.5%) and 1024 census tracts at TCH (DoSC rates 3%-12.2%). For CCHMC, an L2-normalized generalized linear regression model achieved the best performance in predicting all-cause DoSC rate (RMSE 1.299%, 95% CI 1.21%-1.387%); however, its improvement over others was marginal. For TCH, an L2-normalized generalized linear regression model also performed best (RMSE 1.305%, 95% CI 1.257%-1.352%). All-cause DoSC rate at CCHMC was predicted most strongly by previous no show. As for community-level data, the proportion of African American inhabitants per census tract was consistently an important predictor. In the Texas area, the proportion of overcrowded households was salient to DoSC rate. CONCLUSIONS: Our findings suggest that geospatial analysis offers potential for use in targeting interventions for census tracts at a higher risk of cancellation. Our study also demonstrates the importance of home location, socioeconomic disadvantage, and racial minority status on the DoSC of children's surgery. The success of future efforts to reduce cancellation may benefit from taking social, economic, and cultural issues into account.


Asunto(s)
Grupos Minoritarios , Características de la Residencia , Niño , Registros Electrónicos de Salud , Hospitales Pediátricos , Humanos , Factores Socioeconómicos
6.
J Med Internet Res ; 21(5): e13047, 2019 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-31120022

RESUMEN

BACKGROUND: The continued digitization and maturation of health care information technology has made access to real-time data easier and feasible for more health care organizations. With this increased availability, the promise of using data to algorithmically detect health care-related events in real-time has become more of a reality. However, as more researchers and clinicians utilize real-time data delivery capabilities, it has become apparent that simply gaining access to the data is not a panacea, and some unique data challenges have emerged to the forefront in the process. OBJECTIVE: The aim of this viewpoint was to highlight some of the challenges that are germane to real-time processing of health care system-generated data and the accurate interpretation of the results. METHODS: Distinct challenges related to the use and processing of real-time data for safety event detection were compiled and reported by several informatics and clinical experts at a quaternary pediatric academic institution. The challenges were collated from the experiences of the researchers implementing real-time event detection on more than half a dozen distinct projects. The challenges have been presented in a challenge category-specific challenge-example format. RESULTS: In total, 8 major types of challenge categories were reported, with 13 specific challenges and 9 specific examples detailed to provide a context for the challenges. The examples reported are anchored to a specific project using medication order, medication administration record, and smart infusion pump data to detect discrepancies and errors between the 3 datasets. CONCLUSIONS: The use of real-time data to drive safety event detection and clinical decision support is extremely powerful, but it presents its own set of challenges that include data quality and technical complexity. These challenges must be recognized and accommodated for if the full promise of accurate, real-time safety event clinical decision support is to be realized.


Asunto(s)
Análisis de Datos , Sistemas de Apoyo a Decisiones Clínicas/normas , Registros Electrónicos de Salud/normas , Humanos
7.
Pediatr Emerg Care ; 35(12): 868-873, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30281551

RESUMEN

OBJECTIVE: Challenges with efficient patient recruitment including sociotechnical barriers for clinical trials are major barriers to the timely and efficacious conduct of translational studies. We conducted a time-and-motion study to investigate the workflow of clinical trial enrollment in a pediatric emergency department. METHODS: We observed clinical research coordinators during 3 clinically staffed shifts. One clinical research coordinator was shadowed at a time. Tasks were marked in 30-second intervals and annotated to include patient screening, patient contact, performing procedures, and physician contact. Statistical analysis was conducted on the patient enrollment activities. RESULTS: We conducted fifteen 120-minute observations from December 12, 2013, to January 3, 2014 and shadowed 8 clinical research coordinators. Patient screening took 31.62% of their time, patient contact took 18.67%, performing procedures took 17.6%, physician contact was 1%, and other activities took 31.0%. CONCLUSIONS: Screening patients for eligibility constituted the most time. Automated screening methods could help reduce this time. The findings suggest improvement areas in recruitment planning to increase the efficiency of clinical trial enrollment.


Asunto(s)
Determinación de la Elegibilidad/métodos , Servicio de Urgencia en Hospital/organización & administración , Tamizaje Masivo/métodos , Niño , Ensayos Clínicos como Asunto , Servicio de Urgencia en Hospital/normas , Humanos , Selección de Paciente , Estudios Prospectivos , Proyectos de Investigación , Estudios de Tiempo y Movimiento , Flujo de Trabajo
8.
Psychiatr Q ; 89(4): 817-828, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29713946

RESUMEN

School violence has increased over the past ten years. This study evaluated students using a more standard and sensitive method to help identify students who are at high risk for school violence. 103 participants were recruited through Cincinnati Children's Hospital Medical Center (CCHMC) from psychiatry outpatient clinics, the inpatient units, and the emergency department. Participants (ages 12-18) were active students in 74 traditional schools (i.e. non-online education). Collateral information was gathered from guardians before participants were evaluated. School risk evaluations were performed with each participant, and audio recordings from the evaluations were later transcribed and manually annotated. The BRACHA (School Version) and the School Safety Scale (SSS), both 14-item scales, were used. A template of open-ended questions was also used. This analysis included 103 participants who were recruited from 74 different schools. Of the 103 students evaluated, 55 were found to be moderate to high risk and 48 were found to be low risk based on the paper risk assessments including the BRACHA and SSS. Both the BRACHA and the SSS were highly correlated with risk of violence to others (Pearson correlations>0.82). There were significant differences in BRACHA and SSS total scores between low risk and high risk to others groups (p-values <0.001 under unpaired t-test). In particular, there were significant differences in individual SSS items between the two groups (p-value <0.001). Of these items, Previous Violent Behavior (Pearson Correlation = 0.80), Impulsivity (0.69), School Problems (0.64), and Negative Attitudes (0.61) were positively correlated with risk to others. The novel machine learning algorithm achieved an AUC of 91.02% when using the interview content to predict risk of school violence, and the AUC increased to 91.45% when demographic and socioeconomic data were added. Our study indicates that the BRACHA and SSS are clinically useful for assessing risk for school violence. The machine learning algorithm was highly accurate in assessing school violence risk.


Asunto(s)
Conducta del Adolescente , Agresión , Aprendizaje Automático , Medición de Riesgo/métodos , Instituciones Académicas , Violencia , Adolescente , Niño , Femenino , Humanos , Masculino , Procesamiento de Lenguaje Natural
9.
Psychiatr Q ; 88(3): 447-457, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-27528455

RESUMEN

School violence has increased over the past decade and innovative, sensitive, and standardized approaches to assess school violence risk are needed. In our current feasibility study, we initialized a standardized, sensitive, and rapid school violence risk approach with manual annotation. Manual annotation is the process of analyzing a student's transcribed interview to extract relevant information (e.g., key words) to school violence risk levels that are associated with students' behaviors, attitudes, feelings, use of technology (social media and video games), and other activities. In this feasibility study, we first implemented school violence risk assessments to evaluate risk levels by interviewing the student and parent separately at the school or the hospital to complete our novel school safety scales. We completed 25 risk assessments, resulting in 25 transcribed interviews of 12-18 year olds from 15 schools in Ohio and Kentucky. We then analyzed structured professional judgments, language, and patterns associated with school violence risk levels by using manual annotation and statistical methodology. To analyze the student interviews, we initiated the development of an annotation guideline to extract key information that is associated with students' behaviors, attitudes, feelings, use of technology and other activities. Statistical analysis was applied to associate the significant categories with students' risk levels to identify key factors which will help with developing action steps to reduce risk. In a future study, we plan to recruit more subjects in order to fully develop the manual annotation which will result in a more standardized and sensitive approach to school violence assessments.


Asunto(s)
Conducta del Adolescente/psicología , Conducta Infantil/psicología , Investigación Cualitativa , Medición de Riesgo/métodos , Instituciones Académicas , Violencia/psicología , Adolescente , Niño , Estudios de Factibilidad , Femenino , Humanos , Masculino , Proyectos Piloto
10.
J Biomed Inform ; 57: 124-33, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26190267

RESUMEN

OBJECTIVE: To improve neonatal patient safety through automated detection of medication administration errors (MAEs) in high alert medications including narcotics, vasoactive medication, intravenous fluids, parenteral nutrition, and insulin using the electronic health record (EHR); to evaluate rates of MAEs in neonatal care; and to compare the performance of computerized algorithms to traditional incident reporting for error detection. METHODS: We developed novel computerized algorithms to identify MAEs within the EHR of all neonatal patients treated in a level four neonatal intensive care unit (NICU) in 2011 and 2012. We evaluated the rates and types of MAEs identified by the automated algorithms and compared their performance to incident reporting. Performance was evaluated by physician chart review. RESULTS: In the combined 2011 and 2012 NICU data sets, the automated algorithms identified MAEs at the following rates: fentanyl, 0.4% (4 errors/1005 fentanyl administration records); morphine, 0.3% (11/4009); dobutamine, 0 (0/10); and milrinone, 0.3% (5/1925). We found higher MAE rates for other vasoactive medications including: dopamine, 11.6% (5/43); epinephrine, 10.0% (289/2890); and vasopressin, 12.8% (54/421). Fluid administration error rates were similar: intravenous fluids, 3.2% (273/8567); parenteral nutrition, 3.2% (649/20124); and lipid administration, 1.3% (203/15227). We also found 13 insulin administration errors with a resulting rate of 2.9% (13/456). MAE rates were higher for medications that were adjusted frequently and fluids administered concurrently. The algorithms identified many previously unidentified errors, demonstrating significantly better sensitivity (82% vs. 5%) and precision (70% vs. 50%) than incident reporting for error recognition. CONCLUSIONS: Automated detection of medication administration errors through the EHR is feasible and performs better than currently used incident reporting systems. Automated algorithms may be useful for real-time error identification and mitigation.


Asunto(s)
Analgésicos Opioides/uso terapéutico , Unidades de Cuidado Intensivo Neonatal , Errores de Medicación , Seguridad del Paciente , Gestión de Riesgos , Automatización , Humanos , Recién Nacido , Cuidado Intensivo Neonatal , Sistemas de Entrada de Órdenes Médicas
11.
BMC Med Inform Decis Mak ; 15: 28, 2015 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-25881112

RESUMEN

BACKGROUND: Manual eligibility screening (ES) for a clinical trial typically requires a labor-intensive review of patient records that utilizes many resources. Leveraging state-of-the-art natural language processing (NLP) and information extraction (IE) technologies, we sought to improve the efficiency of physician decision-making in clinical trial enrollment. In order to markedly reduce the pool of potential candidates for staff screening, we developed an automated ES algorithm to identify patients who meet core eligibility characteristics of an oncology clinical trial. METHODS: We collected narrative eligibility criteria from ClinicalTrials.gov for 55 clinical trials actively enrolling oncology patients in our institution between 12/01/2009 and 10/31/2011. In parallel, our ES algorithm extracted clinical and demographic information from the Electronic Health Record (EHR) data fields to represent profiles of all 215 oncology patients admitted to cancer treatment during the same period. The automated ES algorithm then matched the trial criteria with the patient profiles to identify potential trial-patient matches. Matching performance was validated on a reference set of 169 historical trial-patient enrollment decisions, and workload, precision, recall, negative predictive value (NPV) and specificity were calculated. RESULTS: Without automation, an oncologist would need to review 163 patients per trial on average to replicate the historical patient enrollment for each trial. This workload is reduced by 85% to 24 patients when using automated ES (precision/recall/NPV/specificity: 12.6%/100.0%/100.0%/89.9%). Without automation, an oncologist would need to review 42 trials per patient on average to replicate the patient-trial matches that occur in the retrospective data set. With automated ES this workload is reduced by 90% to four trials (precision/recall/NPV/specificity: 35.7%/100.0%/100.0%/95.5%). CONCLUSION: By leveraging NLP and IE technologies, automated ES could dramatically increase the trial screening efficiency of oncologists and enable participation of small practices, which are often left out from trial enrollment. The algorithm has the potential to significantly reduce the effort to execute clinical research at a point in time when new initiatives of the cancer care community intend to greatly expand both the access to trials and the number of available trials.


Asunto(s)
Ensayos Clínicos como Asunto/métodos , Determinación de la Elegibilidad/métodos , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Neoplasias/terapia , Selección de Paciente , Niño , Humanos
12.
BMC Med Inform Decis Mak ; 15: 37, 2015 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-25943550

RESUMEN

BACKGROUND: In this study we implemented and developed state-of-the-art machine learning (ML) and natural language processing (NLP) technologies and built a computerized algorithm for medication reconciliation. Our specific aims are: (1) to develop a computerized algorithm for medication discrepancy detection between patients' discharge prescriptions (structured data) and medications documented in free-text clinical notes (unstructured data); and (2) to assess the performance of the algorithm on real-world medication reconciliation data. METHODS: We collected clinical notes and discharge prescription lists for all 271 patients enrolled in the Complex Care Medical Home Program at Cincinnati Children's Hospital Medical Center between 1/1/2010 and 12/31/2013. A double-annotated, gold-standard set of medication reconciliation data was created for this collection. We then developed a hybrid algorithm consisting of three processes: (1) a ML algorithm to identify medication entities from clinical notes, (2) a rule-based method to link medication names with their attributes, and (3) a NLP-based, hybrid approach to match medications with structured prescriptions in order to detect medication discrepancies. The performance was validated on the gold-standard medication reconciliation data, where precision (P), recall (R), F-value (F) and workload were assessed. RESULTS: The hybrid algorithm achieved 95.0%/91.6%/93.3% of P/R/F on medication entity detection and 98.7%/99.4%/99.1% of P/R/F on attribute linkage. The medication matching achieved 92.4%/90.7%/91.5% (P/R/F) on identifying matched medications in the gold-standard and 88.6%/82.5%/85.5% (P/R/F) on discrepant medications. By combining all processes, the algorithm achieved 92.4%/90.7%/91.5% (P/R/F) and 71.5%/65.2%/68.2% (P/R/F) on identifying the matched and the discrepant medications, respectively. The error analysis on algorithm outputs identified challenges to be addressed in order to improve medication discrepancy detection. CONCLUSION: By leveraging ML and NLP technologies, an end-to-end, computerized algorithm achieves promising outcome in reconciling medications between clinical notes and discharge prescriptions.


Asunto(s)
Algoritmos , Prescripciones de Medicamentos/normas , Aprendizaje Automático , Conciliación de Medicamentos/normas , Procesamiento de Lenguaje Natural , Alta del Paciente/normas , Adulto , Humanos
13.
J Biomed Inform ; 50: 173-183, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24556292

RESUMEN

OBJECTIVE: The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experiment with our data and develop new machine learning models for de-identification. This paper describes: (1) the modifications required by the Institutional Review Board before sharing the de-identification gold standard corpus; (2) our efforts to keep the PHI as realistic as possible; (3) and the tests to show the effectiveness of these efforts in preserving the value of the modified data set for machine learning model development. MATERIALS AND METHODS: In a previous study we built an original de-identification gold standard corpus annotated with true Protected Health Information (PHI) from 3503 randomly selected clinical notes for the 22 most frequent clinical note types of our institution. In the current study we modified the original gold standard corpus to make it suitable for external sharing by replacing HIPAA-specified PHI with newly generated realistic PHI. Finally, we evaluated the research value of this new dataset by comparing the performance of an existing published in-house de-identification system, when trained on the new de-identification gold standard corpus, with the performance of the same system, when trained on the original corpus. We assessed the potential benefits of using the new de-identification gold standard corpus to identify PHI in the i2b2 and PhysioNet datasets that were released by other groups for de-identification research. We also measured the effectiveness of the i2b2 and PhysioNet de-identification gold standard corpora in identifying PHI in our original clinical notes. RESULTS: Performance of the de-identification system using the new gold standard corpus as a training set was very close to training on the original corpus (92.56 vs. 93.48 overall F-measures). Best i2b2/PhysioNet/CCHMC cross-training performances were obtained when training on the new shared CCHMC gold standard corpus, although performances were still lower than corpus-specific trainings. DISCUSSION AND CONCLUSION: We successfully modified a de-identification dataset for external sharing while preserving the de-identification research value of the modified gold standard corpus with limited drop in machine learning de-identification performance.


Asunto(s)
Informática Médica , Seguridad Computacional , Registros Electrónicos de Salud , Health Insurance Portability and Accountability Act , Estados Unidos
14.
Environ Adv ; 142023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38094913

RESUMEN

Background: Cystic fibrosis (CF) is a genetic disease but is greatly impacted by non-genetic (social/environmental and stochastic) influences. Some people with CF experience rapid decline, a precipitous drop in lung function relative to patient- and/or center-level norms. Those who experience rapid decline in early adulthood, compared to adolescence, typically exhibit less severe clinical disease but greater loss of lung function. The extent to which timing and degree of rapid decline are informed by social and environmental determinants of health (geomarkers) is unknown. Methods: A longitudinal cohort study was performed (24,228 patients, aged 6-21 years) using the U.S. CF Foundation Patient Registry. Geomarkers at the ZIP Code Tabulation Area level measured air pollution/respiratory hazards, greenspace, crime, and socioeconomic deprivation. A composite score quantifying social-environmental adversity was created and used in covariate-adjusted functional principal component analysis, which was applied to cluster longitudinal lung function trajectories. Results: Social-environmental phenotyping yielded three primary phenotypes that corresponded to early, middle, and late timing of peak decline in lung function over age. Geographic differences were related to distinct cultural and socioeconomic regions. Extent of peak decline, estimated as forced expiratory volume in 1 s of % predicted/year, ranged from 2.8 to 4.1 % predicted/year depending on social-environmental adversity. Middle decliners with increased social-environmental adversity experienced rapid decline 14.2 months earlier than their counterparts with lower social-environmental adversity, while timing was similar within other phenotypes. Early and middle decliners experienced mortality peaks during early adolescence and adulthood, respectively. Conclusion: While early decliners had the most severe CF lung disease, middle and late decliners lost more lung function. Higher social-environmental adversity associated with increased risk of rapid decline and mortality during young adulthood among middle decliners. This sub-phenotype may benefit from enhanced lung-function monitoring and personalized secondary environmental health interventions to mitigate chemical and non-chemical stressors.

15.
Pediatr Pulmonol ; 58(5): 1501-1513, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36775890

RESUMEN

BACKGROUND: The extent to which environmental exposures and community characteristics of the built environment collectively predict rapid lung function decline, during adolescence and early adulthood in cystic fibrosis (CF), has not been examined. OBJECTIVE: To identify built environment characteristics predictive of rapid CF lung function decline. METHODS: We performed a retrospective, single-center, longitudinal cohort study (n = 173 individuals with CF aged 6-20 years, 2012-2017). We used a stochastic model to predict lung function, measured as forced expiratory volume in 1 s (FEV1 ) of % predicted. Traditional demographic/clinical characteristics were evaluated as predictors. Built environmental predictors included exposure to elemental carbon attributable to traffic sources (ECAT), neighborhood material deprivation (poverty, education, housing, and healthcare access), greenspace near the home, and residential drivetime to the CF center. MEASUREMENTS AND MAIN RESULTS: The final model, which included ECAT, material deprivation index, and greenspace, alongside traditional demographic/clinical predictors, significantly improved fit and prediction, compared with only demographic/clinical predictors (Likelihood Ratio Test statistic: 26.78, p < 0.0001; the difference in Akaike Information Criterion: 15). An increase of 0.1 µg/m3 of ECAT was associated with 0.104% predicted/yr (95% confidence interval: 0.024, 0.183) more rapid decline. Although not statistically significant, material deprivation was similarly associated (0.1-unit increase corresponded to additional decline of 0.103% predicted/year [-0.113, 0.319]). High-risk regional areas of rapid decline and age-related heterogeneity were identified from prediction mapping. CONCLUSION: Traffic-related air pollution exposure is an important predictor of rapid pulmonary decline that, coupled with community-level material deprivation and routinely collected demographic/clinical characteristics, enhance CF prognostication and enable personalized environmental health interventions.


Asunto(s)
Fibrosis Quística , Adolescente , Humanos , Adulto , Estudios Longitudinales , Estudios Retrospectivos , Estudios de Cohortes , Pulmón , Volumen Espiratorio Forzado
16.
Sci Rep ; 13(1): 1971, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36737471

RESUMEN

The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Genómica , Algoritmos , Fenotipo
17.
JMIR Med Inform ; 10(12): e37833, 2022 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-36525289

RESUMEN

BACKGROUND: Artificial intelligence (AI) technologies, such as machine learning and natural language processing, have the potential to provide new insights into complex health data. Although powerful, these algorithms rarely move from experimental studies to direct clinical care implementation. OBJECTIVE: We aimed to describe the key components for successful development and integration of two AI technology-based research pipelines for clinical practice. METHODS: We summarized the approach, results, and key learnings from the implementation of the following two systems implemented at a large, tertiary care children's hospital: (1) epilepsy surgical candidate identification (or epilepsy ID) in an ambulatory neurology clinic; and (2) an automated clinical trial eligibility screener (ACTES) for the real-time identification of patients for research studies in a pediatric emergency department. RESULTS: The epilepsy ID system performed as well as board-certified neurologists in identifying surgical candidates (with a sensitivity of 71% and positive predictive value of 77%). The ACTES system decreased coordinator screening time by 12.9%. The success of each project was largely dependent upon the collaboration between machine learning experts, research and operational information technology professionals, longitudinal support from clinical providers, and institutional leadership. CONCLUSIONS: These projects showcase novel interactions between machine learning recommendations and providers during clinical care. Our deployment provides seamless, real-time integration of AI technology to provide decision support and improve patient care.

18.
Stud Health Technol Inform ; 290: 517-521, 2022 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-35673069

RESUMEN

Weight entry errors can cause significant patient harm in pediatrics due to pervasive weight-based dosing practices. While computerized algorithms can assist in error detection, they have not achieved high sensitivity and specificity to be further developed as a clinical decision support tool. To train an advanced algorithm, expert-annotated weight errors are essential but difficult to collect. In this study, we developed a visual annotation tool to gather large amounts of expertly annotated pediatric weight charts and conducted a formal user-centered evaluation. Key features of the tool included configurable grid sizes and annotation styles. The user feedback was collected through a structured survey and user clicks on the interface. The results show that the visual annotation tool has high usability (average SUS=86.4). Different combinations of the key features, however, did not significantly improve the annotation efficiency and duration. We have used this tool to collect expert annotations for algorithm development and benchmarking.


Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas , Pediatría , Algoritmos , Niño , Retroalimentación , Humanos
19.
Neuroimage ; 56(2): 662-73, 2011 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-20348000

RESUMEN

This paper introduces two kernel-based regression schemes to decode or predict brain states from functional brain scans as part of the Pittsburgh Brain Activity Interpretation Competition (PBAIC) 2007, in which our team was awarded first place. Our procedure involved image realignment, spatial smoothing, detrending of low-frequency drifts, and application of multivariate linear and non-linear kernel regression methods: namely kernel ridge regression (KRR) and relevance vector regression (RVR). RVR is based on a Bayesian framework, which automatically determines a sparse solution through maximization of marginal likelihood. KRR is the dual-form formulation of ridge regression, which solves regression problems with high dimensional data in a computationally efficient way. Feature selection based on prior knowledge about human brain function was also used. Post-processing by constrained deconvolution and re-convolution was used to furnish the prediction. This paper also contains a detailed description of how prior knowledge was used to fine tune predictions of specific "feature ratings," which we believe is one of the key factors in our prediction accuracy. The impact of pre-processing was also evaluated, demonstrating that different pre-processing may lead to significantly different accuracies. Although the original work was aimed at the PBAIC, many techniques described in this paper can be generally applied to any fMRI decoding works to increase the prediction accuracy.


Asunto(s)
Inteligencia Artificial , Mapeo Encefálico/métodos , Encéfalo/fisiología , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Simulación por Computador , Humanos , Modelos Neurológicos
20.
J Am Med Inform Assoc ; 28(10): 2116-2127, 2021 09 18.
Artículo en Inglés | MEDLINE | ID: mdl-34333636

RESUMEN

OBJECTIVE: Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data. MATERIALS AND METHODS: Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC). RESULTS: The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively). CONCLUSIONS: It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.


Asunto(s)
Registros Electrónicos de Salud , Trastornos Relacionados con Sustancias , Adolescente , Niño , Humanos , Aprendizaje Automático , Narración , Procesamiento de Lenguaje Natural , Trastornos Relacionados con Sustancias/diagnóstico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA