Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 164
1.
J Microsc ; 294(3): 350-371, 2024 Jun.
Article En | MEDLINE | ID: mdl-38752662

Bioimage data are generated in diverse research fields throughout the life and biomedical sciences. Its potential for advancing scientific progress via modern, data-driven discovery approaches reaches beyond disciplinary borders. To fully exploit this potential, it is necessary to make bioimaging data, in general, multidimensional microscopy images and image series, FAIR, that is, findable, accessible, interoperable and reusable. These FAIR principles for research data management are now widely accepted in the scientific community and have been adopted by funding agencies, policymakers and publishers. To remain competitive and at the forefront of research, implementing the FAIR principles into daily routines is an essential but challenging task for researchers and research infrastructures. Imaging core facilities, well-established providers of access to imaging equipment and expertise, are in an excellent position to lead this transformation in bioimaging research data management. They are positioned at the intersection of research groups, IT infrastructure providers, the institution´s administration, and microscope vendors. In the frame of German BioImaging - Society for Microscopy and Image Analysis (GerBI-GMB), cross-institutional working groups and third-party funded projects were initiated in recent years to advance the bioimaging community's capability and capacity for FAIR bioimage data management. Here, we provide an imaging-core-facility-centric perspective outlining the experience and current strategies in Germany to facilitate the practical adoption of the FAIR principles closely aligned with the international bioimaging community. We highlight which tools and services are ready to be implemented and what the future directions for FAIR bioimage data have to offer.


Microscopy , Biomedical Research/methods , Data Management/methods , Image Processing, Computer-Assisted/methods , Microscopy/methods
2.
F1000Res ; 13: 8, 2024.
Article En | MEDLINE | ID: mdl-38779317

Biomedical research projects are becoming increasingly complex and require technological solutions that support all phases of the data lifecycle and application of the FAIR principles. At the Berlin Institute of Health (BIH), we have developed and established a flexible and cost-effective approach to building customized cloud platforms for supporting research projects. The approach is based on a microservice architecture and on the management of a portfolio of supported services. On this basis, we created and maintained cloud platforms for several international research projects. In this article, we present our approach and argue that building customized cloud platforms can offer multiple advantages over using multi-project platforms. Our approach is transferable to other research environments and can be easily adapted by other projects and other service providers.


Biomedical Research , Cloud Computing , Data Management , Humans , Data Management/methods
3.
Methods Mol Biol ; 2787: 3-38, 2024.
Article En | MEDLINE | ID: mdl-38656479

In this chapter, we explore the application of high-throughput crop phenotyping facilities for phenotype data acquisition and the extraction of significant information from the collected data through image processing and data mining methods. Additionally, the construction and outlook of crop phenotype databases are introduced and the need for global cooperation and data sharing is emphasized. High-throughput crop phenotyping significantly improves accuracy and efficiency compared to traditional measurements, making significant contributions to overcoming bottlenecks in the phenotyping field and advancing crop genetics.


Crops, Agricultural , Data Mining , Image Processing, Computer-Assisted , Phenotype , Crops, Agricultural/genetics , Crops, Agricultural/growth & development , Data Mining/methods , Image Processing, Computer-Assisted/methods , Data Management/methods , High-Throughput Screening Assays/methods
4.
Physiol Rev ; 104(3): 1387-1408, 2024 Jul 01.
Article En | MEDLINE | ID: mdl-38451234

Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.


Biomedical Research , Data Management , Information Dissemination , Biomedical Research/standards , Biomedical Research/methods , Information Dissemination/methods , Humans , Animals , Data Management/methods
6.
Environ Int ; 165: 107334, 2022 07.
Article En | MEDLINE | ID: mdl-35696847

Management of datasets that include health information and other sensitive personal information of European study participants has to be compliant with the General Data Protection Regulation (GDPR, Regulation (EU) 2016/679). Within scientific research, the widely subscribed'FAIR' data principles should apply, meaning that research data should be findable, accessible, interoperable and re-usable. Balancing the aim of open science driven FAIR data management with GDPR compliant personal data protection safeguards is now a common challenge for many research projects dealing with (sensitive) personal data. In December 2020 a workshop was held with representatives of several large EU research consortia and of the European Commission to reflect on how to apply the FAIR data principles for environment and health research (E&H). Several recent data intensive EU funded E&H research projects face this challenge and work intensively towards developing solutions to access, exchange, store, handle, share, process and use such sensitive personal data, with the aim to support European and transnational collaborations. As a result, several recommendations, opportunities and current limitations were formulated. New technical developments such as federated data management and analysis systems, machine learning together with advanced search software, harmonized ontologies and data quality standards should in principle facilitate the FAIRification of data. To address ethical, legal, political and financial obstacles to the wider re-use of data for research purposes, both specific expertise and underpinning infrastructure are needed. There is a need for the E&H research data to find their place in the European Open Science Cloud. Communities using health and population data, environmental data and other publicly available data have to interconnect and synergize. To maximize the use and re-use of environment and health data, a dedicated supporting European infrastructure effort, such as the EIRENE research infrastructure within the ESFRI roadmap 2021, is needed that would interact with existing infrastructures.


Computer Security , Data Management , Health Records, Personal , Data Management/methods , Europe , Humans
7.
PLoS One ; 17(1): e0262523, 2022.
Article En | MEDLINE | ID: mdl-35045100

Risk quantification algorithms in the ICU can provide (1) an early alert to the clinician that a patient is at extreme risk and (2) help manage limited resources efficiently or remotely. With electronic health records, large data sets allow the training of predictive models to quantify patient risk. A gradient boosting classifier was trained to predict high-risk and low-risk trauma patients, where patients were labeled high-risk if they expired within the next 10 hours or within the last 10% of their ICU stay duration. The MIMIC-III database was filtered to extract 5,400 trauma patient records (526 non-survivors) each of which contained 5 static variables (age, gender, etc.) and 28 dynamic variables (e.g., vital signs and metabolic panel). Training data was also extracted from the dynamic variables using a 3-hour moving time window whereby each window was treated as a unique patient-time fragment. We extracted the mean, standard deviation, and skew from each of these 3-hour fragments and included them as inputs for training. Additionally, a survival metric upon admission was calculated for each patient using a previously developed National Trauma Data Bank (NTDB)-trained gradient booster model. The final model was able to distinguish between high-risk and low-risk patients to an AUROC of 92.9%, defined as the area under the receiver operator characteristic curve. Importantly, the dynamic survival probability plots for patients who die appear considerably different from those who survive, an example of reducing the high dimensionality of the patient record to a single trauma trajectory.


Hospital Mortality/trends , Risk Assessment/methods , Adult , Aged , Algorithms , Data Management/methods , Databases, Factual , Electronic Health Records , Female , Hospitalization/statistics & numerical data , Hospitalization/trends , Humans , Injury Severity Score , Intensive Care Units/statistics & numerical data , Machine Learning , Male , Middle Aged , Probability , Prognosis , ROC Curve , Retrospective Studies , Risk Factors
8.
Anesth Analg ; 134(2): 380-388, 2022 02 01.
Article En | MEDLINE | ID: mdl-34673658

BACKGROUND: The retrospective analysis of electroencephalogram (EEG) signals acquired from patients under general anesthesia is crucial in understanding the patient's unconscious brain's state. However, the creation of such database is often tedious and cumbersome and involves human labor. Hence, we developed a Raspberry Pi-based system for archiving EEG signals recorded from patients under anesthesia in operating rooms (ORs) with minimal human involvement. METHODS: Using this system, we archived patient EEG signals from over 500 unique surgeries at the Emory University Orthopaedics and Spine Hospital, Atlanta, for about 18 months. For this, we developed a software package that runs on a Raspberry Pi and archives patient EEG signals from a SedLine Root EEG Monitor (Masimo) to a secure Health Insurance Portability and Accountability Act (HIPAA) compliant cloud storage. The OR number corresponding to each surgery was archived along with the EEG signal to facilitate retrospective EEG analysis. We retrospectively processed the archived EEG signals and performed signal quality checks. We also proposed a formula to compute the proportion of true EEG signal and calculated the corresponding statistics. Further, we curated and interleaved patient medical record information with the corresponding EEG signals. RESULTS: We retrospectively processed the EEG signals to demonstrate a statistically significant negative correlation between the relative alpha power (8-12 Hz) of the EEG signal captured under anesthesia and the patient's age. CONCLUSIONS: Our system is a standalone EEG archiver developed using low cost and readily available hardware. We demonstrated that one could create a large-scale EEG database with minimal human involvement. Moreover, we showed that the captured EEG signal is of good quality for retrospective analysis and combined the EEG signal with the patient medical records. This project's software has been released under an open-source license to enable others to use and contribute.


Data Curation/methods , Electroencephalography/instrumentation , Electroencephalography/methods , Monitoring, Intraoperative/instrumentation , Monitoring, Intraoperative/methods , Adult , Aged , Aged, 80 and over , Data Management/instrumentation , Data Management/methods , Female , Humans , Male , Middle Aged , Retrospective Studies , Young Adult
9.
Retina ; 42(1): 4-10, 2022 01 01.
Article En | MEDLINE | ID: mdl-34081638

PURPOSE: To review the current literature on the management of proliferative diabetic retinopathy (PDR) and the challenges in the real-world setting. METHODS: A review of the literature was performed on the therapeutic options for PDR, with a focus on the real-world data presented by the Pan-American Collaborative Retina Study Group. RESULTS: Data from clinical trials and previous literature have reported that intravitreal antivascular endothelial growth factor (anti-VEGF) therapy is noninferior to the gold standard panretinal photocoagulation for treating PDR. However, PDR recurs rapidly after cessation of anti-VEGF therapy. This is especially important in the context of the diabetic population that is prone to loss to follow-up. In a real-world, prospective study, patients with prior panretinal photocoagulation followed by anti-VEGF therapy had higher rates of sustained PDR regression relative to anti-VEGF therapy alone. CONCLUSION: Owing to its transient therapeutic effect, anti-VEGF therapy in patients with diabetes can present a risk of recurrent retinal neovascularization and progression of PDR if follow-up cannot be guaranteed. A combined paradigm with less aggressive, immediate panretinal photocoagulation followed by anti-VEGF therapy should be considered in this population.


Data Management/methods , Diabetic Retinopathy/therapy , Disease Management , Diabetic Retinopathy/epidemiology , Humans , Latin America/epidemiology , Morbidity , Spain/epidemiology
10.
Nutrients ; 13(12)2021 Nov 24.
Article En | MEDLINE | ID: mdl-34959759

The European Commission funded project Stance4Health (S4H) aims to develop a complete personalised nutrition service. In order to succeed, sources of information on nutritional composition and other characteristics of foods need to be as comprehensive as possible. Food composition tables or databases (FCT/FCDB) are the most commonly used tools for this purpose. The aim of this study is to describe the harmonisation efforts carried out to obtain the Stance4Health FCDB. A total of 10 FCT/FCDB were selected from different countries and organizations. Data were classified using FoodEx2 and INFOODS tagnames to harmonise the information. Hazard analysis and critical control points analysis was applied as the quality control method. Data were processed by spreadsheets and MySQL. S4H's FCDB is composed of 880 elements, including nutrients and bioactive compounds. A total of 2648 unified foods were used to complete the missing values of the national FCDB used. Recipes and dishes were estimated following EuroFIR standards via linked tables. S4H's FCDB will be part of the smartphone app developed in the framework of the Stance4Health European project, which will be used in different personalized nutrition intervention studies. S4H FCDB has great perspectives, being one of the most complete in terms of number of harmonized foods, nutrients and bioactive compounds included.


Data Management/methods , Databases as Topic/standards , Food Analysis/statistics & numerical data , Food/statistics & numerical data , Nutrition Therapy , Europe , Food/standards , Food Analysis/standards , Humans , Nutrients/analysis , Phytochemicals/analysis , Proportional Hazards Models , Quality Control
11.
Comput Math Methods Med ; 2021: 1725490, 2021.
Article En | MEDLINE | ID: mdl-34868338

The purpose of this article is to perform in-depth research and analysis on the artificial intelligence coordination and optimization mechanism of college counseling student management using big data technology. This study places the collaborative ideological and political work of colleges and universities in the context of big data, and by analyzing its basic connotation and changes in the real situation, it explores the development progression of colleges and universities making full use of big data resources to cultivate a collaborative education model, which is conducive to promoting colleges and universities to cultivate a whole staff, whole process, and all-round accurate ideological education and value-led services and to shape excellent young college students with comprehensive growth. The first is to scientifically build a multilevel linked big data management platform for counselor professionalization construction, plan the technical architecture of the organizational platform, build a cloud database of counselor career files, and extract valuable information and data from the organizational activities at the macrolevel and personal activities at the microlevel with counselor professionalization construction activities; the second is to realize the integrated application of information resources for counselor team construction. The second is to realize the integrated application of counselor team construction information resources, visualise and accurately analyze and evaluate the counselor group's focus on career development and individual counselors' feedback on career capacity construction, and improve the overall construction, personalized education management level, and self-improvement development ability. Fourth, in the professionalization of counselors, attention should be paid to the scientific selection and prevention of risks of big data application, ensuring the authenticity and reliability of data and leakage prevention and control, etc.


Artificial Intelligence , Big Data , Counselors , Students , Universities , Computational Biology , Data Management/methods , Humans
12.
Nutrients ; 13(10)2021 Oct 03.
Article En | MEDLINE | ID: mdl-34684504

Comprehensive food lists and databases are a critical input for programs aiming to alleviate undernutrition. However, standard methods for developing them may produce databases that are irrelevant for marginalised groups where nutritional needs are highest. Our study provides a method for identifying critical contextual information required to build relevant food lists for Indigenous populations. For our study, we used mixed-methods study design with a community-based approach. Between July and October 2019, we interviewed 74 participants among Batwa and Bakiga communities in south-western Uganda. We conducted focus groups discussions (FGDs), individual dietary surveys and markets and shops assessment. Locally validated information on foods consumed among Indigenous populations can provide results that differ from foods listed in the national food composition tables; in fact, the construction of food lists is influenced by multiple factors such as food culture and meaning of food, environmental changes, dietary transition, and social context. Without using a community-based approach to understanding socio-environmental contexts, we would have missed 33 commonly consumed recipes and foods, and we would not have known the variety of ingredients' quantity in each recipe, and traditional foraged foods. The food culture, food systems and nutrition of Indigenous and vulnerable communities are unique, and need to be considered when developing food lists.


Data Management/methods , Databases, Factual , Diet/ethnology , Food Supply , Black People/ethnology , Culture , Diet Surveys , Focus Groups , Food Assistance , Humans , Indigenous Peoples , Rural Population , Social Environment , Uganda
13.
Value Health ; 24(10): 1484-1489, 2021 10.
Article En | MEDLINE | ID: mdl-34593172

OBJECTIVES: To explore the use of data dashboards to convey information about a drug's value, and reduce the need to collapse dimensions of value to a single measure. METHODS: Review of the literature on US Drug Value Assessment Frameworks, and discussion of the value of data dashboards to improve the manner in which information on value is displayed. RESULTS: The incremental cost per quality-adjusted life-year ratio is a useful starting point for conversation about a drug's value, but it cannot reflect all of the elements of value about which different audiences care deeply. Data dashboards for drug value assessments can draw from other contexts. Decision makers should be presented with well-designed value dashboards containing various metrics, including conventional cost per quality-adjusted life-year ratios as well as measures of a drug's impact on clinical and patient-centric outcomes, and on budgetary and distributional consequences, to convey a drug's value along different dimensions. CONCLUSIONS: The advent of US drug value frameworks in health care has forced a concomitant effort to develop appropriate information displays. Researchers should formally test different formats and elements.


Data Management/methods , Pharmaceutical Preparations/economics , Budgets , Data Management/standards , Data Management/trends , Humans , Social Media/instrumentation , Social Media/standards , Social Media/statistics & numerical data , United States
14.
PLoS One ; 16(10): e0257923, 2021.
Article En | MEDLINE | ID: mdl-34648520

Facial imaging and facial recognition technologies, now common in our daily lives, also are increasingly incorporated into health care processes, enabling touch-free appointment check-in, matching patients accurately, and assisting with the diagnosis of certain medical conditions. The use, sharing, and storage of facial data is expected to expand in coming years, yet little is documented about the perspectives of patients and participants regarding these uses. We developed a pair of surveys to gather public perspectives on uses of facial images and facial recognition technologies in healthcare and in health-related research in the United States. We used Qualtrics Panels to collect responses from general public respondents using two complementary and overlapping survey instruments; one focused on six types of biometrics (including facial images and DNA) and their uses in a wide range of societal contexts (including healthcare and research) and the other focused on facial imaging, facial recognition technology, and related data practices in health and research contexts specifically. We collected responses from a diverse group of 4,048 adults in the United States (2,038 and 2,010, from each survey respectively). A majority of respondents (55.5%) indicated they were equally worried about the privacy of medical records, DNA, and facial images collected for precision health research. A vignette was used to gauge willingness to participate in a hypothetical precision health study, with respondents split as willing to (39.6%), unwilling to (30.1%), and unsure about (30.3%) participating. Nearly one-quarter of respondents (24.8%) reported they would prefer to opt out of the DNA component of a study, and 22.0% reported they would prefer to opt out of both the DNA and facial imaging component of the study. Few indicated willingness to pay a fee to opt-out of the collection of their research data. Finally, respondents were offered options for ideal governance design of their data, as "open science"; "gated science"; and "closed science." No option elicited a majority response. Our findings indicate that while a majority of research participants might be comfortable with facial images and facial recognition technologies in healthcare and health-related research, a significant fraction expressed concern for the privacy of their own face-based data, similar to the privacy concerns of DNA data and medical records. A nuanced approach to uses of face-based data in healthcare and health-related research is needed, taking into consideration storage protection plans and the contexts of use.


Automated Facial Recognition/methods , Biomedical Research/methods , Data Management/methods , Delivery of Health Care/methods , Facial Recognition , Information Dissemination/methods , Public Opinion , Adolescent , Adult , Aged , Female , Humans , Male , Medical Records , Middle Aged , Privacy , Surveys and Questionnaires , United States , Young Adult
15.
Oncology ; 99(12): 802-812, 2021.
Article En | MEDLINE | ID: mdl-34515209

INTRODUCTION: Physicians spend an ever-rising amount of time to collect relevant information from highly variable medical reports and integrate them into the patient's health condition. OBJECTIVES: We compared synoptic reporting based on data elements to narrative reporting in order to evaluate its capabilities to collect and integrate clinical information. METHODS: We developed a novel system to align medical reporting to data integration requirements and tested it in prostate cancer screening. We compared expenditure of time, data quality, and user satisfaction for data acquisition, integration, and evaluation. RESULTS: In a total of 26 sessions, 2 urologists, 2 radiologists, and 2 pathologists conducted the diagnostic work-up for prostate cancer screening with both narrative reporting and the novel system. The novel system led to a significantly reduced time for collection and integration of patient information (91%, p < 0.001), reporting in radiology (44%, p < 0.001) and pathology (33%, p = 0.154). The system usage showed a high positive effect on evaluated data quality parameters completeness, format, understandability, as well as user satisfaction. CONCLUSION: This study provides evidence that synoptic reporting based on data elements is effectively reducing time for collection and integration of patient information. Further research is needed to assess the system's impact for different patient journeys.


Data Management/methods , Early Detection of Cancer/methods , Medical Oncology/methods , Prostatic Neoplasms/diagnostic imaging , Software , Hospitals, University , Humans , Magnetic Resonance Imaging/methods , Male , Pathologists/psychology , Pilot Projects , Prostate-Specific Antigen , Prostatic Neoplasms/epidemiology , Prostatic Neoplasms/pathology , Radiologists/psychology , Research Report , Switzerland/epidemiology , Urologists/psychology
16.
Biochemistry ; 60(38): 2902-2914, 2021 09 28.
Article En | MEDLINE | ID: mdl-34491035

Citrullination is an enzyme-catalyzed post-translational modification (PTM) that is essential for a host of biological processes, including gene regulation, programmed cell death, and organ development. While this PTM is required for normal cellular functions, aberrant citrullination is a hallmark of autoimmune disorders as well as cancer. Although aberrant citrullination is linked to human pathology, the exact role of citrullination in disease remains poorly characterized, in part because of the challenges associated with identifying the specific arginine residues that are citrullinated. Tandem mass spectrometry is the most precise method for uncovering sites of citrullination; however, due to the small mass shift (+0.984 Da) that results from citrullination, current database search algorithms commonly misannotate spectra, leading to a high number of false-positive assignments. To address this challenge, we developed an automated workflow to rigorously and rapidly mine proteomic data to unambiguously identify the sites of citrullination from complex peptide mixtures. The crux of this streamlined workflow is the ionFinder software program, which classifies citrullination sites with high confidence on the basis of the presence of diagnostic fragment ions. These diagnostic ions include the neutral loss of isocyanic acid, which is a dissociative event that is unique to citrulline residues. Using the ionFinder program, we have mapped the sites of autocitrullination on purified protein arginine deiminases (PAD1-4) and mapped the global citrullinome in a PAD2-overexpressing cell line. The ionFinder algorithm is a highly versatile, user-friendly, and open-source program that is agnostic to the type of instrument and mode of fragmentation that are used.


Citrullination/physiology , Data Mining/methods , Proteomics/methods , Algorithms , Arginine/metabolism , Citrullination/genetics , Citrulline/chemistry , Citrulline/genetics , Citrulline/metabolism , Data Analysis , Data Management/methods , Humans , Peptides/metabolism , Protein Processing, Post-Translational , Protein-Arginine Deiminases/genetics , Protein-Arginine Deiminases/metabolism , Tandem Mass Spectrometry/methods
17.
Plast Reconstr Surg ; 148(5): 735e-741e, 2021 Nov 01.
Article En | MEDLINE | ID: mdl-34529595

SUMMARY: The Plastic Surgeries Registry Network supported by the American Society of Plastic Surgeons (ASPS) and the Plastic Surgery Foundation offers a variety of options for procedural data and outcomes assessment and research. The Tracking Operations and Outcomes for Plastic Surgeons (TOPS) database is a registry created for and used by active members of ASPS to monitor all types of procedural outcomes. It functions as a way for individual or group practices to follow surgical outcomes and constitutes a huge research registry available to ASPS members to access for registry-based projects. The TOPS registry was launched in 2002 and has undergone several iterations and improvements over the years and now includes more than 1 million procedure records. Although ASPS member surgeons have proven valuable assets in contributing their data to the TOPS registry, fewer have leveraged the database for registry-based research. This article overviews the authors' experience using the TOPS registry for a database research project to demonstrate the process, usefulness, and accessibility of TOPS data for ASPS member surgeons to conduct registry-based research. This article pairs with the report of the authors' TOPS registry investigation related to 30-day adverse events associated with incision location for augmentation mammaplasty.


Data Management/education , Outcome Assessment, Health Care/methods , Surgeons/education , Surgery, Plastic/statistics & numerical data , Data Management/methods , Humans , Registries/statistics & numerical data , Societies, Medical , United States
18.
Radiology ; 301(1): 115-122, 2021 10.
Article En | MEDLINE | ID: mdl-34342503

Background Patterns of metastasis in cancer are increasingly relevant to prognostication and treatment planning but have historically been documented by means of autopsy series. Purpose To show the feasibility of using natural language processing (NLP) to gather accurate data from radiology reports for assessing spatial and temporal patterns of metastatic spread in a large patient cohort. Materials and Methods In this retrospective longitudinal study, consecutive patients who underwent CT from July 2009 to April 2019 and whose CT reports followed a departmental structured template were included. Three radiologists manually curated a sample of 2219 reports for the presence or absence of metastases across 13 organs; these manually curated reports were used to develop three NLP models with an 80%-20% split for training and test sets. A separate random sample of 448 manually curated reports was used for validation. Model performance was measured by accuracy, precision, and recall for each organ. The best-performing NLP model was used to generate a final database of metastatic disease across all patients. For each cancer type, statistical descriptive reports were provided by analyzing the frequencies of metastatic disease at the report and patient levels. Results In 91 665 patients (mean age ± standard deviation, 61 years ± 15; 46 939 women), 387 359 reports were labeled. The best-performing NLP model achieved accuracies from 90% to 99% across all organs. Metastases were most frequently reported in abdominopelvic (23.6% of all reports) and thoracic (17.6%) nodes, followed by lungs (14.7%), liver (13.7%), and bones (9.9%). Metastatic disease tropism is distinct among common cancers, with the most common first site being bones in prostate and breast cancers and liver among pancreatic and colorectal cancers. Conclusion Natural language processing may be applied to cancer patients' CT reports to generate a large database of metastatic phenotypes. Such a database could be combined with genomic studies and used to explore prognostic imaging phenotypes with relevance to treatment planning. © RSNA, 2021 Online supplemental material is available for this article.


Data Management/methods , Databases, Factual/statistics & numerical data , Electronic Health Records , Natural Language Processing , Neoplasms/epidemiology , Tomography, X-Ray Computed/methods , Feasibility Studies , Female , Humans , Longitudinal Studies , Male , Middle Aged , Neoplasm Metastasis , Reproducibility of Results , Retrospective Studies
19.
PLoS One ; 16(8): e0255562, 2021.
Article En | MEDLINE | ID: mdl-34411131

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others' efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios' benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.


Algorithms , Cloud Computing/statistics & numerical data , Data Management/methods , Database Management Systems/standards , Databases, Factual , Information Storage and Retrieval/statistics & numerical data , Software
20.
Anticancer Res ; 41(7): 3607-3613, 2021 Jul.
Article En | MEDLINE | ID: mdl-34230157

BACKGROUND/AIM: We evaluated timeliness of care at a safety-net hospital after implementation of a multidisciplinary breast program. PATIENTS AND METHODS: A prospective database of patients with breast cancer was created after multidisciplinary breast program initiation in 2018. Patients were tracked to obtain time to completion of diagnostic imaging, biopsy, and treatment initiation. Patients with breast cancer diagnosed from 2015-2017 were reviewed for comparison. RESULTS: A total of 102 patients were identified. There was no statistical difference in time to completion of imaging, biopsy, and initial treatment between the 2018 and the 2015-2017 cohorts (p>0.05). No statistical difference was observed in time to completion of imaging, biopsy, and initial treatment between different races (p>0.05). CONCLUSION: Within the same socioeconomic status, there was no differential delivery of screening, work-up, and treatment by race. Despite protocol implementations, efficiency of care remained limited in a safety-net hospital with lack of financial resources.


Breast Neoplasms/diagnosis , Aged , Biopsy , Breast/pathology , Breast Neoplasms/pathology , Data Management/methods , Female , Health Equity , Humans , Mass Screening/methods , Medically Underserved Area , Middle Aged , Social Class
...