Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Mol Cell Proteomics ; 23(1): 100682, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37993103

RESUMEN

Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning-based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.


Asunto(s)
Minería de Datos , Procesamiento de Lenguaje Natural , Humanos , Minería de Datos/métodos , Bases de Datos Factuales , PubMed
2.
BMC Bioinformatics ; 24(Suppl 3): 477, 2023 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-38102593

RESUMEN

BACKGROUND: With more clinical trials are offering optional participation in the collection of bio-specimens for biobanking comes the increasing complexity of requirements of informed consent forms. The aim of this study is to develop an automatic natural language processing (NLP) tool to annotate informed consent documents to promote biorepository data regulation, sharing, and decision support. We collected informed consent documents from several publicly available sources, then manually annotated them, covering sentences containing permission information about the sharing of either bio-specimens or donor data, or conducting genetic research or future research using bio-specimens or donor data. RESULTS: We evaluated a variety of machine learning algorithms including random forest (RF) and support vector machine (SVM) for the automatic identification of these sentences. 120 informed consent documents containing 29,204 sentences were annotated, of which 1250 sentences (4.28%) provide answers to a permission question. A support vector machine (SVM) model achieved a F-1 score of 0.95 on classifying the sentences when using a gold standard, which is a prefiltered corpus containing all relevant sentences. CONCLUSIONS: This study provides the feasibility of using machine learning tools to classify permission-related sentences in informed consent documents.


Asunto(s)
Bancos de Muestras Biológicas , Formularios de Consentimiento , Aprendizaje Automático , Algoritmos , Procesamiento de Lenguaje Natural
3.
Clin Gastroenterol Hepatol ; 21(5): 1198-1204, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36115659

RESUMEN

BACKGROUND & AIMS: Identifying dysplasia of Barrett's esophagus (BE) in the electronic medical record (EMR) requires manual abstraction of unstructured data. Natural language processing (NLP) creates structure to unstructured free text. We aimed to develop and validate an NLP algorithm to identify dysplasia in BE patients on histopathology reports with varying report formats in a large integrated EMR system. METHODS: We randomly selected 600 pathology reports for NLP development and 400 reports for validation from patients with suspected BE in the national Veterans Affairs databases. BE and dysplasia were verified by manual review of the pathology reports. We used NLP software (Clinical Language Annotation, Modeling, and Processing Toolkit; Melax Tech, Houston, TX) to develop an algorithm to identify dysplasia using findings. The algorithm performance characteristics were calculated as recall, precision, accuracy, and F-measure. RESULTS: In the development set of 600 patients, 457 patients had confirmed BE (60 with dysplasia). The NLP identified dysplasia with 98.0% accuracy, 91.7% recall, and 93.2% precision, with an F-measure of 92.4%. All 7 patients with confirmed high-grade dysplasia were classified by the algorithm as having dysplasia. Among the 400 patients in the validation cohort, 230 had confirmed BE (39 with dysplasia). Compared with manual review, the NLP algorithm identified dysplasia with 98.7% accuracy, 92.3% recall, and 100.0% precision, with an F-measure of 96.0%. CONCLUSIONS: NLP yielded a high degree of sensitivity and accuracy for identifying dysplasia from diverse types of pathology reports for patients with BE. The application of this algorithm would facilitate research and clinical care in an EMR system with text reports in large data repositories.


Asunto(s)
Esófago de Barrett , Humanos , Esófago de Barrett/complicaciones , Esófago de Barrett/diagnóstico , Procesamiento de Lenguaje Natural , Programas Informáticos , Algoritmos , Hiperplasia
4.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 599-602, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-36085691

RESUMEN

Ker NL is a general kernel-based framework for auto calibrated reconstruction method, which does not need any explicit formulas of the kernel function for characterizing nonlinear relationships between acquired and unacquired k-space data. It is non-iterative without requiring a large amount of computational costs. Since the limited autocalibration signals (ACS) are acquired to perform KerNL calibration and the calibration suffers from the overfitting problem, more training data can improve the kernel model accuracy. In this work, virtual conjugate coil data are incorporated into the KerNL calibration and estimation process for enhancing reconstruction performance. Experimental results show that the proposed method can further suppress noise and aliasing artifacts with fewer ACS data and higher acceleration factors. Computation efficiency is still retained to keep fast reconstruction with the random projection.


Asunto(s)
Aceleración , Artefactos , Calibración
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 1456-1459, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-36085960

RESUMEN

Channel suppression can reduce the redundant information in multiple channel receiver coils and accelerate reconstruction speed to meet real-time imaging requirements. The principal component analysis has been used for channel suppression, but it is difficult to be interpreted because all channels contribute to principal components. Furthermore, the importance of interpretability in machine learning has recently attracted increasing attention in radiology. To improve the interpretability of PCA-based channel suppression, a sparse PCA method is proposed to reduce the most coils' loadings to be zero. Channel suppression is formulated as solving a nonlinear eigenvalue problem using the inverse power method instead of the direct matrix decomposition. Experimental results of in vivo data show that the sparse PCA-based channel suppression not only improves the interpretability with sparse channels, but also improves reconstruction quality compared to the standard PCA-based reconstruction with the similar reconstruction time.


Asunto(s)
Algoritmos , Procedimientos de Cirugía Plástica , Imagen por Resonancia Magnética/métodos , Análisis de Componente Principal , Registros
6.
Magn Reson Imaging ; 92: 108-119, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35772581

RESUMEN

Autocalibration signal is acquired in the k-space-based parallel MRI reconstruction for estimating interpolation coefficients and reconstructing missing unacquired data. Many ACS lines can suppress aliasing artifacts and noise by covering the low-frequency signal region. However, more ACS lines will delay the data acquisition process and therefore elongate the scan time. Furthermore, a single interpolator is often used for recovering missing k-space data, and model error may exist if the single interpolator size is not selected appropriately. In this work, based on the idea of the disagreement-based semi-supervised learning, a dual-interpolator strategy is proposed to collaboratively reconstruct missing k-space data. Two interpolators with different sizes are alternatively applied to estimate and re-estimate missing data in k-space. The disagreement between two interpolators is converged and real missing values are co-estimated from two views. The experimental results show that the proposed method outperforms GRAPPA, SPIRiT, and Nonlinear GRAPPA methods using relatively low number of ACS data, and reduces aliasing artifacts and noise in reconstructed images.


Asunto(s)
Algoritmos , Aumento de la Imagen , Artefactos , Aumento de la Imagen/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Cintigrafía
7.
Aliment Pharmacol Ther ; 54(4): 481-492, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34224163

RESUMEN

BACKGROUND: Previous studies have demonstrated an association between nonselective beta-blockers (NSBBs) and lower risk of hepatocellular carcinoma (HCC) in cirrhosis. However, there has been no population-based study investigating the risk of HCC among cirrhotic patients treated using carvedilol. AIMS: To determine the risk of HCC among cirrhotic patients with NSBBs including carvedilol. METHODS: This retrospective cohort study utilised the Cerner Health Facts database in the United States from 2000 to 2017. Kaplan-Meier estimate, Cox proportional hazards regression, and propensity score matching (PSM) were used to test the HCC risk among the carvedilol, nadolol, and propranolol groups compared with no beta-blocker group. RESULTS: The final cohort comprised 107 428 eligible patients. The 100-month cumulative HCC incidence of NSBBs was significantly lower than the no beta-blocker group (carvedilol (11.24%) vs no beta-blocker (15.69%), nadolol (27.55%) vs no beta-blocker (32.11%), and propranolol (26.17%) vs no beta-blocker (28.84%) (P values < 0.0001). NSBBs were associated with a significantly lower risk of HCC (Hazard ratio: carvedilol 0.61 (95% CI 0.51-0.73), nadolol 0.74 (95% CI 0.63-0.87), propranolol 0.75 (95% CI 0.66-0.84) after PSM in the multivariate cox analysis. In subgroup analysis, NSBBs reduced the risk of HCC in cirrhosis with complications and non-alcoholic cirrhosis. CONCLUSIONS: NSBBs, including carvedilol, were associated with a significantly decreased risk of HCC in patients with cirrhosis when compared with no beta-blocker regardless of complications status. Future randomised-controlled studies comparing the incidence of HCC among NSBBs should elucidate which NSBB would be the best option to prevent HCC in cirrhosis.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Antagonistas Adrenérgicos beta/uso terapéutico , Carcinoma Hepatocelular/epidemiología , Carcinoma Hepatocelular/etiología , Carcinoma Hepatocelular/prevención & control , Humanos , Cirrosis Hepática/epidemiología , Neoplasias Hepáticas/epidemiología , Neoplasias Hepáticas/etiología , Neoplasias Hepáticas/prevención & control , Estudios Retrospectivos , Estados Unidos/epidemiología
8.
J Am Med Inform Assoc ; 28(7): 1393-1400, 2021 07 14.
Artículo en Inglés | MEDLINE | ID: mdl-33647938

RESUMEN

OBJECTIVE: Automated analysis of vaccine postmarketing surveillance narrative reports is important to understand the progression of rare but severe vaccine adverse events (AEs). This study implemented and evaluated state-of-the-art deep learning algorithms for named entity recognition to extract nervous system disorder-related events from vaccine safety reports. MATERIALS AND METHODS: We collected Guillain-Barré syndrome (GBS) related influenza vaccine safety reports from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016. VAERS reports were selected and manually annotated with major entities related to nervous system disorders, including, investigation, nervous_AE, other_AE, procedure, social_circumstance, and temporal_expression. A variety of conventional machine learning and deep learning algorithms were then evaluated for the extraction of the above entities. We further pretrained domain-specific BERT (Bidirectional Encoder Representations from Transformers) using VAERS reports (VAERS BERT) and compared its performance with existing models. RESULTS AND CONCLUSIONS: Ninety-one VAERS reports were annotated, resulting in 2512 entities. The corpus was made publicly available to promote community efforts on vaccine AEs identification. Deep learning-based methods (eg, bi-long short-term memory and BERT models) outperformed conventional machine learning-based methods (ie, conditional random fields with extensive features). The BioBERT large model achieved the highest exact match F-1 scores on nervous_AE, procedure, social_circumstance, and temporal_expression; while VAERS BERT large models achieved the highest exact match F-1 scores on investigation and other_AE. An ensemble of these 2 models achieved the highest exact match microaveraged F-1 score at 0.6802 and the second highest lenient match microaveraged F-1 score at 0.8078 among peer models.


Asunto(s)
Aprendizaje Profundo , Síndrome de Guillain-Barré , Vacunas contra la Influenza , Sistemas de Registro de Reacción Adversa a Medicamentos , Sistemas de Computación , Humanos , Vacunas contra la Influenza/efectos adversos , Estados Unidos
9.
J Am Med Inform Assoc ; 28(6): 1275-1283, 2021 06 12.
Artículo en Inglés | MEDLINE | ID: mdl-33674830

RESUMEN

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.


Asunto(s)
COVID-19/diagnóstico , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Aprendizaje Profundo , Humanos , Evaluación de Síntomas/métodos
10.
ArXiv ; 2020 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-32908948

RESUMEN

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA