Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-35300321

RESUMO

Colonoscopy plays a critical role in screening of colorectal carcinomas (CC). Unfortunately, the data related to this procedure are stored in disparate documents, colonoscopy, pathology, and radiology reports respectively. The lack of integrated standardized documentation is impeding accurate reporting of quality metrics and clinical and translational research. Natural language processing (NLP) has been used as an alternative to manual data abstraction. Performance of Machine Learning (ML) based NLP solutions is heavily dependent on the accuracy of annotated corpora. Availability of large volume annotated corpora is limited due to data privacy laws and the cost and effort required. In addition, the manual annotation process is error-prone, making the lack of quality annotated corpora the largest bottleneck in deploying ML solutions. The objective of this study is to identify clinical entities critical to colonoscopy quality, and build a high-quality annotated corpus using domain specific taxonomies following standardized annotation guidelines. The annotated corpus can be used to train ML models for a variety of downstream tasks.

2.
J Am Med Inform Assoc ; 29(4): 609-618, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-34590684

RESUMO

OBJECTIVE: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. MATERIALS AND METHODS: We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. RESULTS: Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. DISCUSSION: We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. CONCLUSION: By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.


Assuntos
COVID-19 , Estudos de Coortes , Confiabilidade dos Dados , Health Insurance Portability and Accountability Act , Humanos , Estados Unidos
3.
Stud Health Technol Inform ; 281: 432-436, 2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34042780

RESUMO

Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model's performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and de-identification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set.


Assuntos
Nomes , Alta do Paciente , Algoritmos , Humanos , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
4.
Sci Data ; 7(1): 414, 2020 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-33235265

RESUMO

As the COVID-19 pandemic unfolds, radiology imaging is playing an increasingly vital role in determining therapeutic options, patient management, and research directions. Publicly available data are essential to drive new research into disease etiology, early detection, and response to therapy. In response to the COVID-19 crisis, the National Cancer Institute (NCI) has extended the Cancer Imaging Archive (TCIA) to include COVID-19 related images. Rural populations are one population at risk for underrepresentation in such public repositories. We have published in TCIA a collection of radiographic and CT imaging studies for patients who tested positive for COVID-19 in the state of Arkansas. A set of clinical data describes each patient including demographics, comorbidities, selected lab data and key radiology findings. These data are cross-linked to SARS-COV-2 cDNA sequence data extracted from clinical isolates from the same population, uploaded to the GenBank repository. We believe this collection will help to address population imbalance in COVID-19 data by providing samples from this normally underrepresented population.


Assuntos
COVID-19/diagnóstico por imagem , Radiografia Torácica , População Rural , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , National Cancer Institute (U.S.) , Tomografia Computadorizada por Raios X , Estados Unidos , Adulto Jovem
5.
Healthc Inform Res ; 26(3): 193-200, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32819037

RESUMO

OBJECTIVE: The time-dependent study of comorbidities provides insight into disease progression and trajectory. We hypothesize that understanding longitudinal disease characteristics can lead to more timely intervention and improve clinical outcomes. As a first step, we developed an efficient and easy-to-install toolkit, the Time-based Elixhauser Comorbidity Index (TECI), which pre-calculates time-based Elixhauser comorbidities and can be extended to common data models (CDMs). METHODS: A Structured Query Language (SQL)-based toolkit, TECI, was built to pre-calculate time-specific Elixhauser comorbidity indices using data from a clinical data repository (CDR). Then it was extended to the Informatics for Integrating Biology and the Bedside (I2B2) and Observational Medical Outcomes Partnership (OMOP) CDMs. RESULTS: At the University of Arkansas for Medical Sciences (UAMS), the TECI toolkit was successfully installed to compute the indices from CDR data, and the scores were integrated into the I2B2 and OMOP CDMs. Comorbidity scores calculated by TECI were validated against: scores available in the 2015 quarter 1-3 Nationwide Readmissions Database (NRD) and scores calculated using the comorbidities using a previously validated algorithm on the 2015 quarter 4 NRD. Furthermore, TECI identified 18,846 UAMS patients that had changes in comorbidity scores over time (year 2013 to 2019). Comorbidities for a random sample of patients were independently reviewed, and in all cases, the results were found to be 100% accurate. CONCLUSION: TECI facilitates the study of comorbidities within a time-dependent context, allowing better understanding of disease associations and trajectories, which has the potential to improve clinical outcomes.

6.
Stud Health Technol Inform ; 257: 31-35, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30741168

RESUMO

The increased demand of clinical data for the conduct of clinical and translational research incentivized repurposing of the University of Arkansas for Medical Sciences' enterprise data warehouse (EDW) to meet researchers' data needs. The EDW was renamed the Arkansas Clinical Data Repository (AR-CDR), underwent content enhancements, and deployed a self-service cohort estimation tool in late of 2016. In an effort to increase adoption of the AR-CDR, a team of physician informaticist and information technology professionals conducted various informational sessions across the UAMS campus to increase awareness of the AR-CDR and the informatics capabilities. The restructuring of the data warehouse resulted in four-fold utilization increase of the AR-CDR data services in 2017. To assess acceptance rates of the AR-CDR and quantify outcomes of services provided, Everett Rogers' diffusion of innovation (DOI) framework was applied, and a survey was distributed. Results show the factors that had impact on increased adoption were: presence of physician informaticist to mediate interactions between researchers and analysts, data quality, communication with and engagement of researchers, and the AR-CDR's team responsiveness and customer service mindset.


Assuntos
Data Warehousing , Médicos , Pesquisa Translacional Biomédica , Difusão de Inovações , Humanos , Inquéritos e Questionários
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...