RESUMEN
BACKGROUND: A barrier to the use of genomic information during prescribing is the limited number of software solutions that combine a user-friendly interface with complex medical data. We built and designed an online, secure, electronic custom interface termed the Genomic Prescribing System (GPS). METHODS: Actionable pharmacogenomic (PGx) information was reviewed, collected, and stored in the back-end of GPS to enable creation of customized drug- and variant-specific clinical decision support (CDS) summaries. The database architecture utilized the star schema to store information. Patient raw genomic data underwent transformation via custom-designed algorithms to enable gene and phenotype-level associations. Multiple external data sets (PubMed, The Systematized Nomenclature of Medicine (SNOMED), National Drug File - Reference Terminology (ND-FRT), and a publically-available PGx knowledgebase) were integrated to facilitate the delivery of patient, drug, disease, and genomic information. Institutional security infrastructure was leveraged to securely store patient genomic and clinical data on a HIPAA-compliant server farm. RESULTS: As of May 17, 2016, the GPS back-end housed 257 CDS encompassing 112 genetic variants, 42 genes, and 46 PGx-actionable drugs. The GPS user interface presented patient-specific CDS alongside a recognizable traffic light symbol (green/yellow/red), denoting PGx risk for each genomic result. The number of traffic lights per visit increased with the corresponding increase in the number of available PGx-annotated drugs over time. An integrated drug and disease search functionality, links to primary literature sources, and potential alternative PGx drugs were indicated. The system, which was initially used as stand-alone CDS software within our clinical environment, was then integrated with the institutional electronic medical record for enhanced usability. There have been nearly 2000 logins in 43months since inception, with usage exceeding 56 logins per month and system up-times of 99.99%. For all patient-provider visits encompassing >3years of implementation, unique alert click-through rates corresponded to genomic risk: red lights clicked 100%, yellow lights 79%, green lights 43%. CONCLUSIONS: Successful deployment of GPS by combining complex data and recognizable iconography led to a tool that enabled point-of-care genomic delivery with high usability. Continued scalability and incorporation of additional clinical elements to be considered alongside PGx information could expand future impact.
Asunto(s)
Farmacogenética , Sistemas de Apoyo a Decisiones Clínicas , Registros Electrónicos de Salud , Humanos , Medicina de Precisión , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Objective: The Pediatric Cancer Data Commons (PCDC)-a project of Data for the Common Good-houses clinical pediatric oncology data and utilizes the open-source Gen3 platform. To meet the needs of end users, the PCDC development team expanded the out-of-box functionality and developed additional custom features that should be useful to any group developing similar data commons. Materials and Methods: Modifications of the PCDC data portal software were implemented to facilitate desired functionality. Results: Newly developed functionality includes updates to authorization methods, expansion of filtering capabilities, and addition of data analysis functions. Discussion: We describe the process by which custom functionalities were developed. Features are open source and available to be implemented and adapted to suit needs of data portals that utilize the Gen3 platform. Conclusion: Data portals are indispensable tools for facilitating data sharing. Open-source infrastructure facilitates a modular and collaborative approach for meeting needs of end users and stakeholders.
RESUMEN
PURPOSE: Although the International Neuroblastoma Risk Group Data Commons (INRGdc) has enabled seminal large cohort studies, the research is limited by the lack of real-world, electronic health record (EHR) treatment data. To address this limitation, we evaluated the feasibility of extracting treatment data directly from EHRs using the REDCap Clinical Data Interoperability Services (CDIS) module for future submission to the INRGdc. METHODS: Patients enrolled on the Children's Oncology Group neuroblastoma biology study ANBL00B1 (ClinicalTrials.gov identifier: NCT00904241) who received care at the University of Chicago (UChicago) or the Vanderbilt University Medical Center (VUMC) after the go-live dates for the Fast Healthcare Interoperability Resources (FHIR)-compliant EHRs were identified. Antineoplastic drug orders were extracted using the CDIS module. To validate the CDIS output, antineoplastic agents extracted through FHIR were compared with those queried through EHR relational databases (UChicago's Clinical Research Data Warehouse and VUMC's Epic Clarity database) and manual chart review. RESULTS: The analytic cohort consisted of 41 patients at UChicago and 32 VUMC patients. Antineoplastic drug orders were identified in the extracted EHR records of 39 (95.1%) UChicago patients and 26 (81.3%) VUMC patients. Manual chart review confirmed that patients with missing (n = 8) or discontinued (n = 1) orders in the CDIS output did not receive antineoplastic agents during the timeframe of the study. More than 99% of the antineoplastic drug orders in the EHR relational databases were identified in the corresponding CDIS output. CONCLUSION: Our results demonstrate the feasibility of extracting EHR treatment data with high fidelity using HL7-FHIR via REDCap CDIS for future submission to the INRGdc.
Asunto(s)
Registros Electrónicos de Salud , Neuroblastoma , Humanos , Neuroblastoma/tratamiento farmacológico , Neuroblastoma/terapia , Femenino , Masculino , Niño , Preescolar , Interoperabilidad de la Información en Salud , Lactante , Antineoplásicos/uso terapéutico , Bases de Datos FactualesRESUMEN
Data commons have proven to be an indispensable avenue for advancing pediatric cancer research by serving as unified information technology platforms that, when coupled with data standards, facilitate data sharing. The Pediatric Cancer Data Commons, the flagship project of Data for the Common Good (D4CG), collaborates with disease-based consortia to facilitate development of clinical data standards, harmonization and pooling of clinical data from disparate sources, establishment of governance structure, and sharing of clinical data. In the interest of international collaboration, researchers developed the Hodgkin Lymphoma Data Collaboration and forged a relationship with the Pediatric Cancer Data Commons to establish a data commons for pediatric Hodgkin lymphoma. Herein, we describe the progress made in the formation of Hodgkin Lymphoma Data Collaboration and foundational goals to advance pediatric Hodgkin lymphoma research.
Asunto(s)
Enfermedad de Hodgkin , Enfermedad de Hodgkin/terapia , Humanos , Niño , Difusión de la Información , Investigación Biomédica/organización & administración , Bases de Datos FactualesRESUMEN
[This corrects the article DOI: 10.1017/cts.2023.670.].
RESUMEN
The Pediatric Cancer Data Commons (PCDC) comprises an international community whose ironclad commitment to data sharing is combatting pediatric cancer in an unprecedented way. The byproduct of their data sharing efforts is a gold-standard consensus data model covering many types of pediatric cancer. This article describes an effort to utilize SSSOM, an emerging specification for semantically-rich data mappings, to provide a "hub and spoke" model of mappings from several common data models (CDMs) to the PCDC data model. This provides important contributions to the research community, including: 1) a clear view of the current coverage of these CDMs in the domain of pediatric oncology, and 2) a demonstration of creating standardized mappings. These mappings can allow downstream crosswalk for data transformation and enhance data sharing. This can guide those who currently create and maintain brittle ad hoc data mappings in order to utilize the growing volume of viable research data.
Asunto(s)
Neoplasias , Niño , Humanos , Oncología Médica , Difusión de la InformaciónRESUMEN
PURPOSE: Matching patients to clinical trials is cumbersome and costly. Attempts have been made to automate the matching process; however, most have used a trial-centric approach, which focuses on a single trial. In this study, we developed a patient-centric matching tool that matches patient-specific demographic and clinical information with free-text clinical trial inclusion and exclusion criteria extracted using natural language processing to return a list of relevant clinical trials ordered by the patient's likelihood of eligibility. MATERIALS AND METHODS: Records from pediatric leukemia clinical trials were downloaded from ClinicalTrials.gov. Regular expressions were used to discretize and extract individual trial criteria. A multilabel support vector machine (SVM) was trained to classify sentence embeddings of criteria into relevant clinical categories. Labeled criteria were parsed using regular expressions to extract numbers, comparators, and relationships. In the validation phase, a patient-trial match score was generated for each trial and returned in the form of a ranked list for each patient. RESULTS: In total, 5,251 discretized criteria were extracted from 216 protocols. The most frequent criterion was previous chemotherapy/biologics (17%). The multilabel SVM demonstrated a pooled accuracy of 75%. The text processing pipeline was able to automatically extract 68% of eligibility criteria rules, as compared with 80% in a manual version of the tool. Automated matching was accomplished in approximately 4 seconds, as compared with several hours using manual derivation. CONCLUSION: To our knowledge, this project represents the first open-source attempt to generate a patient-centric clinical trial matching tool. The tool demonstrated acceptable performance when compared with a manual version, and it has potential to save time and money when matching patients to trials.
Asunto(s)
Leucemia , Procesamiento de Lenguaje Natural , Niño , Humanos , Determinación de la Elegibilidad/métodos , Leucemia/diagnóstico , Leucemia/terapia , Selección de Paciente , Atención Dirigida al Paciente , Ensayos Clínicos como AsuntoRESUMEN
The mechanisms that underlie the timing of labor in humans are largely unknown. In most pregnancies, labor is initiated at term (≥ 37 weeks gestation), but in a signifiicant number of women spontaneous labor occurs preterm and is associated with increased perinatal mortality and morbidity. The objective of this study was to characterize the cells at the maternal-fetal interface (MFI) in term and preterm pregnancies in both the laboring and non-laboring state in Black women, who have among the highest preterm birth rates in the U.S. Using mass cytometry to obtain high-dimensional single-cell resolution, we identified 31 cell populations at the MFI, including 25 immune cell types and six non-immune cell types. Among the immune cells, maternal PD1+ CD8 T cell subsets were less abundant in term laboring compared to term non-laboring women. Among the non-immune cells, PD-L1+ maternal (stromal) and fetal (extravillous trophoblast) cells were less abundant in preterm laboring compared to term laboring women. Consistent with these observations, the expression of CD274, the gene encoding PD-L1, was significantly depressed and less responsive to fetal signaling molecules in cultured mesenchymal stromal cells from the decidua of preterm compared to term women. Overall, these results suggest that the PD1/PD-L1 pathway at the MFI may perturb the delicate balance between immune tolerance and rejection and contribute to the onset of spontaneous preterm labor.
Asunto(s)
Trabajo de Parto , Trabajo de Parto Prematuro , Nacimiento Prematuro , Embarazo , Humanos , Femenino , Recién Nacido , Antígeno B7-H1/genética , Trabajo de Parto Prematuro/metabolismo , Subgrupos de Linfocitos TRESUMEN
Background/Objective: Non-clinical aspects of life, such as social, environmental, behavioral, psychological, and economic factors, what we call the sociome, play significant roles in shaping patient health and health outcomes. This paper introduces the Sociome Data Commons (SDC), a new research platform that enables large-scale data analysis for investigating such factors. Methods: This platform focuses on "hyper-local" data, i.e., at the neighborhood or point level, a geospatial scale of data not adequately considered in existing tools and projects. We enumerate key insights gained regarding data quality standards, data governance, and organizational structure for long-term project sustainability. A pilot use case investigating sociome factors associated with asthma exacerbations in children residing on the South Side of Chicago used machine learning and six SDC datasets. Results: The pilot use case reveals one dominant spatial cluster for asthma exacerbations and important roles of housing conditions and cost, proximity to Superfund pollution sites, urban flooding, violent crime, lack of insurance, and a poverty index. Conclusion: The SDC has been purposefully designed to support and encourage extension of the platform into new data sets as well as the continued development, refinement, and adoption of standards for dataset quality, dataset inclusion, metadata annotation, and data access/governance. The asthma pilot has served as the first driver use case and demonstrates promise for future investigation into the sociome and clinical outcomes. Additional projects will be selected, in part for their ability to exercise and grow the capacity of the SDC to meet its ambitious goals.
RESUMEN
OBJECTIVE: Adherence to a treatment plan from HIV-positive patients is necessary to decrease their mortality and improve their quality of life, however some patients display poor appointment adherence and become lost to follow-up (LTFU). We applied natural language processing (NLP) to analyze indications towards or against LTFU in HIV-positive patients' notes. MATERIALS AND METHODS: Unstructured lemmatized notes were labeled with an LTFU or Retained status using a 183-day threshold. An NLP and supervised machine learning system with a linear model and elastic net regularization was trained to predict this status. Prevalence of characteristics domains in the learned model weights were evaluated. RESULTS: We analyzed 838 LTFU vs 2964 Retained notes and obtained a weighted F1 mean of 0.912 via nested cross-validation; another experiment with notes from the same patients in both classes showed substantially lower metrics. "Comorbidities" were associated with LTFU through, for instance, "HCV" (hepatitis C virus) and likewise "Good adherence" with Retained, represented with "Well on ART" (antiretroviral therapy). DISCUSSION: Mentions of mental health disorders and substance use were associated with disparate retention outcomes, however history vs active use was not investigated. There remains further need to model transitions between LTFU and being retained in care over time. CONCLUSION: We provided an important step for the future development of a model that could eventually help to identify patients who are at risk for falling out of care and to analyze which characteristics could be factors for this. Further research is needed to enhance this method with structured electronic medical record fields.
Asunto(s)
Registros Electrónicos de Salud , Infecciones por VIH/terapia , Procesamiento de Lenguaje Natural , Cooperación del Paciente , Retención en el Cuidado , Adulto , Femenino , Humanos , Perdida de Seguimiento , Masculino , Modelos TeóricosRESUMEN
The international pediatric oncology community has a long history of research collaboration. In the United States, the 2019 launch of the Children's Cancer Data Initiative puts the focus on developing a rich and robust data ecosystem for pediatric oncology. In this spirit, we present here our experience in constructing the Pediatric Cancer Data Commons (PCDC) to highlight the significance of this effort in fighting pediatric cancer and improving outcomes and to provide essential information to those creating resources in other disease areas. The University of Chicago's PCDC team has worked with the international research community since 2015 to build data commons for children's cancers. We identified six critical features of successful data commons design and implementation: (1) establish the need for a data commons, (2) develop and deploy the technical infrastructure, (3) establish and implement governance, (4) make the data commons platform easy and intuitive for researchers, (5) socialize the data commons and create working knowledge and expertise in the research community, and (6) plan for longevity and sustainability. Data commons are critical to conducting research on large patient cohorts that will ultimately lead to improved outcomes for children with cancer. There is value in connecting high-quality clinical and phenotype data to external sources of data such as genomic, proteomics, and imaging data. Next steps for the PCDC include creating an informed and invested data-sharing culture, developing sustainable methods of data collection and sharing, standardizing genetic biomarker reporting, incorporating radiologic and molecular analysis data, and building models for electronic patient consent. The methods and processes described here can be extended to any clinical area and provide a blueprint for others wishing to develop similar resources.
Asunto(s)
Investigación Biomédica , Neoplasias , Niño , Ecosistema , Genómica , Humanos , Oncología Médica , Neoplasias/epidemiología , Neoplasias/terapia , Estados UnidosRESUMEN
PURPOSE: Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS: Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS: We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION: Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.
Asunto(s)
Registros Electrónicos de Salud , Heurística , Informática Médica/métodos , Procesamiento de Lenguaje Natural , Patología Molecular/métodos , Informe de Investigación , Programas Informáticos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Femenino , Humanos , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Estadificación de Neoplasias , Neoplasias/diagnóstico , Interfaz Usuario-Computador , Flujo de Trabajo , Adulto JovenRESUMEN
CAPriCORN, the Chicago Area Patient Centered Outcomes Research Network, is one of the eleven PCORI-funded Clinical Data Research Networks. A collaboration of six academic medical centers, a Chicago public hospital, two VA hospitals and a network of federally qualified health centers, CAPriCORN addresses the needs of a diverse community and overlapping populations. To capture complete medical records without compromising patient privacy and confidentiality, the network created policies and mechanisms for patient consultation, central IRB approval, de-identification, de-duplication, and integration of patient data by study cohort, randomization and sampling, re-identification for consent by providers and patients, and communication with patients to elicit patient-reported outcomes through validated instruments. The paper describes these policies and mechanisms and discusses two case studies to prove the feasibility and effectiveness of the network.