Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Data Brief ; 51: 109738, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38020426

RESUMO

Total joint arthroplasty (TJA) is the most common and fastest inpatient surgical procedure in the elderly, nationwide. Due to the increasing number of TJA patients and advancements in healthcare, there is a growing number of scientific articles being published in a daily basis. These articles offer important insights into TJA, covering aspects like diagnosis, prevention, treatment strategies, and epidemiological factors. However, there has been limited effort to compile a large-scale text dataset from these articles and make it publicly available for open scientific research in TJA. Rapid yet, utilizing computational text analysis on these large columns of scientific literatures holds great potential for uncovering new knowledge to enhance our understanding of joint diseases and improve the quality of TJA care and clinical outcomes. This work aims to build a dataset entitled HexAI-TJAtxt, which includes more than 61,936 scientific abstracts collected from PubMed using MeSH (Medical Subject Headings) terms within "MeSH Subheading" and "MeSH Major Topic," and Publication Date from 01/01/2000 to 12/31/2022. The current dataset is freely and publicly available at https://github.com/pitthexai/HexAI-TJAtxt, and it will be updated frequently in bi-monthly manner from new abstracts published at PubMed.

2.
J Cardiovasc Electrophysiol ; 32(9): 2504-2514, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34260141

RESUMO

INTRODUCTION: The efficacy of cardiac resynchronization therapy (CRT) has been widely studied in the medical literature; however, about 30% of candidates fail to respond to this treatment strategy. Smart computational approaches based on clinical data can help expose hidden patterns useful for identifying CRT responders. METHODS: We retrospectively analyzed the electronic health records of 1664 patients who underwent CRT procedures from January 1, 2002 to December 31, 2017. An ensemble of ensemble (EoE) machine learning (ML) system composed of a supervised and an unsupervised ML layers was developed to generate a prediction model for CRT response. RESULTS: We compared the performance of EoE against traditional ML methods and the state-of-the-art convolutional neural network (CNN) model trained on raw electrocardiographic (ECG) waveforms. We observed that the models exhibited improvement in performance as more features were incrementally used for training. Using the most comprehensive set of predictors, the performance of the EoE model in terms of the area under the receiver operating characteristic curve and F1-score were 0.76 and 0.73, respectively. Direct application of the CNN model on the raw ECG waveforms did not generate promising results. CONCLUSION: The proposed CRT risk calculator effectively discriminates which heart failure (HF) patient is likely to respond to CRT significantly better than using clinical guidelines and traditional ML methods, thus suggesting that the tool can enhanced care management of HF patients by helping to identify high-risk patients.


Assuntos
Terapia de Ressincronização Cardíaca , Insuficiência Cardíaca , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/terapia , Humanos , Aprendizado de Máquina , Estudos Retrospectivos , Resultado do Tratamento
3.
J Arthroplasty ; 36(3): 922-926, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33051119

RESUMO

BACKGROUND: Natural language processing (NLP) methods have the capability to process clinical free text in electronic health records, decreasing the need for costly manual chart review, and improving data quality. We developed rule-based NLP algorithms to automatically extract surgery specific data elements from knee arthroplasty operative notes. METHODS: Within a cohort of 20,000 knee arthroplasty operative notes from 2000 to 2017 at a large tertiary institution, we randomly selected independent pairs of training and test sets to develop and evaluate NLP algorithms to detect five major data elements. The size of the training and test datasets were similar and ranged between 420 to 1592 surgeries. Expert rules using keywords in operative notes were used to implement NLP algorithms capturing: (1) category of surgery (total knee arthroplasty, unicompartmental knee arthroplasty, patellofemoral arthroplasty), (2) laterality of surgery, (3) constraint type, (4) presence of patellar resurfacing, and (5) implant model (catalog numbers). We used institutional registry data as our gold standard to evaluate the NLP algorithms. RESULTS: NLP algorithms to detect the category of surgery, laterality, constraint, and patellar resurfacing achieved 98.3%, 99.5%, 99.2%, and 99.4% accuracy on test datasets, respectively. The implant model algorithm achieved an F1-score (harmonic mean of precision and recall) of 99.9%. CONCLUSIONS: NLP algorithms are a promising alternative to costly manual chart review to automate the extraction of embedded information within knee arthroplasty operative notes. Further validation in other hospital settings will enhance widespread implementation and efficiency in data capture for research and clinical purposes. LEVEL OF EVIDENCE: Level III.


Assuntos
Artroplastia do Joelho , Algoritmos , Elementos de Dados Comuns , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural
4.
Health Data Sci ; 2021: 1504854, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-38487509

RESUMO

Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3,000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The "Findings" medical concept had the largest number of keywords across all groupings of content types and departments. "Anatomical Sites" and "Disorders" keywords were more prevalent in Active Symptom messages, while "Drugs" keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of "Anatomical Sites,", "Disorders,", "Drugs,", and "Findings" keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.

5.
J Biomed Inform ; 102: 103364, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31891765

RESUMO

Machine learning has become ubiquitous and a key technology on mining electronic health records (EHRs) for facilitating clinical research and practice. Unsupervised machine learning, as opposed to supervised learning, has shown promise in identifying novel patterns and relations from EHRs without using human created labels. In this paper, we investigate the application of unsupervised machine learning models in discovering latent disease clusters and patient subgroups based on EHRs. We utilized Latent Dirichlet Allocation (LDA), a generative probabilistic model, and proposed a novel model named Poisson Dirichlet Model (PDM), which extends the LDA approach using a Poisson distribution to model patients' disease diagnoses and to alleviate age and sex factors by considering both observed and expected observations. In the empirical experiments, we evaluated LDA and PDM on three patient cohorts, namely Osteoporosis, Delirium/Dementia, and Chronic Obstructive Pulmonary Disease (COPD)/Bronchiectasis Cohorts, with their EHR data retrieved from the Rochester Epidemiology Project (REP) medical records linkage system, for the discovery of latent disease clusters and patient subgroups. We compared the effectiveness of LDA and PDM in identifying disease clusters through the visualization of disease representations. We tested the performance of LDA and PDM in differentiating patient subgroups through survival analysis, as well as statistical analysis of demographics and Elixhauser Comorbidity Index (ECI) scores in those subgroups. The experimental results show that the proposed PDM could effectively identify distinguished disease clusters based on the latent patterns hidden in the EHR data by alleviating the impact of age and sex, and that LDA could stratify patients into differentiable subgroups with larger p-values than PDM. However, those subgroups identified by LDA are highly associated with patients' age and sex. The subgroups discovered by PDM might imply the underlying patterns of diseases of greater interest in epidemiology research due to the alleviation of age and sex. Both unsupervised machine learning approaches could be leveraged to discover patient subgroups using EHRs but with different foci.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina não Supervisionado , Hotspot de Doença , Humanos , Aprendizado de Máquina , Modelos Estatísticos
6.
Stud Health Technol Inform ; 264: 1783-1784, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31438342

RESUMO

Patients' hospital length of stay (LOS) as a surgical outcome is important indicator of quality of care. We used EMR data to build artificial neural network models to better understand the impact of cold weather on outcome of first surgeries in a day in comparison to a matched cohort receiving surgical treatment in warm days. We found that LOS for first-in-a-day cardiac and orthopedic surgical cases are longer in very cold days.


Assuntos
Tempo de Internação , Redes Neurais de Computação , Tempo (Meteorologia) , Estudos de Coortes , Humanos , Estudos Retrospectivos , Resultado do Tratamento
7.
Data Brief ; 17: 71-75, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29876376

RESUMO

A fully-labeled image dataset provides a unique resource for reproducible research inquiries and data analyses in several computational fields, such as computer vision, machine learning and deep learning machine intelligence. With the present contribution, a large-scale fully-labeled image dataset is provided, and made publicly and freely available to the research community. The current dataset entitled MCIndoor20000 includes more than 20,000 digital images from three different indoor object categories, including doors, stairs, and hospital signs. To make a comprehensive dataset addressing current challenges that exist in indoor objects modeling, we cover a multiple set of variations in images, such as rotation, intra-class variation plus various noise models. The current dataset is freely and publicly available at https://github.com/bircatmcri/MCIndoor20000.

8.
Micron ; 97: 41-55, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28343096

RESUMO

Scanning electron microscopy (SEM) imaging has been a principal component of many studies in biomedical, mechanical, and materials sciences since its emergence. Despite the high resolution of captured images, they remain two-dimensional (2D). In this work, a novel framework using sparse-dense correspondence is introduced and investigated for 3D reconstruction of stereo SEM images. SEM micrographs from microscopic samples are captured by tilting the specimen stage by a known angle. The pair of SEM micrographs is then rectified using sparse scale invariant feature transform (SIFT) features/descriptors and a contrario RANSAC for matching outlier removal to ensure a gross horizontal displacement between corresponding points. This is followed by dense correspondence estimation using dense SIFT descriptors and employing a factor graph representation of the energy minimization functional and loopy belief propagation (LBP) as means of optimization. Given the pixel-by-pixel correspondence and the tilt angle of the specimen stage during the acquisition of micrographs, depth can be recovered. Extensive tests reveal the strength of the proposed method for high-quality reconstruction of microscopic samples.

9.
PLoS One ; 11(9): e0162721, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27685652

RESUMO

BACKGROUND: Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. RESULTS: In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. CONCLUSIONS: This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.

10.
Micron ; 87: 33-45, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27200484

RESUMO

Structural analysis of microscopic objects is a longstanding topic in several scientific disciplines, such as biological, mechanical, and materials sciences. The scanning electron microscope (SEM), as a promising imaging equipment has been around for decades to determine the surface properties (e.g., compositions or geometries) of specimens by achieving increased magnification, contrast, and resolution greater than one nanometer. Whereas SEM micrographs still remain two-dimensional (2D), many research and educational questions truly require knowledge and facts about their three-dimensional (3D) structures. 3D surface reconstruction from SEM images leads to remarkable understanding of microscopic surfaces, allowing informative and qualitative visualization of the samples being investigated. In this contribution, we integrate several computational technologies including machine learning, contrario methodology, and epipolar geometry to design and develop a novel and efficient method called 3DSEM++ for multi-view 3D SEM surface reconstruction in an adaptive and intelligent fashion. The experiments which have been performed on real and synthetic data assert the approach is able to reach a significant precision to both SEM extrinsic calibration and its 3D surface modeling.

11.
Data Brief ; 6: 112-6, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26779561

RESUMO

The Scanning Electron Microscope (SEM) as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples.

12.
Micron ; 78: 54-66, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26277082

RESUMO

The scanning electron microscope (SEM), as one of the most commonly used instruments in biology and material sciences, employs electrons instead of light to determine the surface properties of specimens. However, the SEM micrographs still remain 2D images. To effectively measure and visualize the surface attributes, we need to restore the 3D shape model from the SEM images. 3D surface reconstruction is a longstanding topic in microscopy vision as it offers quantitative and visual information for a variety of applications consisting medicine, pharmacology, chemistry, and mechanics. In this paper, we attempt to explain the expanding body of the work in this area, including a discussion of recent techniques and algorithms. With the present work, we also enhance the reliability, accuracy, and speed of 3D SEM surface reconstruction by designing and developing an optimized multi-view framework. We then consider several real-world experiments as well as synthetic data to examine the qualitative and quantitative attributes of our proposed framework. Furthermore, we present a taxonomy of 3D SEM surface reconstruction approaches and address several challenging issues as part of our future work.


Assuntos
Imageamento Tridimensional/métodos , Algoritmos , Elétrons , Microscopia Eletrônica de Varredura , Reprodutibilidade dos Testes , Propriedades de Superfície
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA