Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
J Biomed Inform ; 143: 104362, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37146741

RESUMEN

Scientific literature presents a wealth of information yet to be explored. As the number of researchers increase with each passing year and publications are released, this contributes to an era where specialized fields of research are becoming more prevalent. As this trend continues, this further propagates the separation of interdisciplinary publications and makes keeping up to date with literature a laborious task. Literature-based discovery (LBD) aims to mitigate these concerns by promoting information sharing among non-interacting literature while extracting potentially meaningful information. Furthermore, recent advances in neural network architectures and data representation techniques have fueled their respective research communities in achieving state-of-the-art performance in many downstream tasks. However, studies of neural network-based methods for LBD remain to be explored. We introduce and explore a deep learning neural network-based approach for LBD. Additionally, we investigate various approaches to represent terms as concepts and analyze the affect of feature scaling representations into our model. We compare the evaluation performance of our method on five hallmarks of cancer datasets utilized for closed discovery. Our results show the chosen representation as input into our model affects evaluation performance. We found feature scaling our input representations increases evaluation performance and decreases the necessary number of epochs needed to achieve model generalization. We also explore two approaches to represent model output. We found reducing the model's output to capturing a subset of concepts improved evaluation performance at the cost of model generalizability. We also compare the efficacy of our method on the five hallmarks of cancer datasets to a set of randomly chosen relations between concepts. We found these experiments confirm our method's suitability for LBD.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Redes Neurales de la Computación , Descubrimiento del Conocimiento/métodos , Publicaciones
2.
J Biomed Inform ; 137: 104252, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36464228

RESUMEN

Biomedical Entity Linking (BEL) is the task of mapping of spans of text within biomedical documents to normalized, unique identifiers within an ontology. This is an important task in natural language processing for both translational information extraction applications and providing context for downstream tasks like relationship extraction. In this paper, we will survey the progression of BEL from its inception in the late 80s to present day state of the art systems, provide a comprehensive list of datasets available for training BEL systems, reference shared tasks focused on BEL, discuss the technical components that comprise BEL systems, and discuss possible directions for the future of the field.


Asunto(s)
Minería de Datos , Envío de Mensajes de Texto , Procesamiento de Lenguaje Natural
3.
Front Res Metr Anal ; 7: 1001266, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36352893

RESUMEN

Temporal expression recognition and normalization (TERN) is the foundation for all higher-level temporal reasoning tasks in natural language processing, such as timeline extraction, so it must be performed well to limit error propagation. Achieving new heights in state-of-the-art performance for TERN in clinical texts requires knowledge of where current systems struggle. In this work, we summarize the results of a detailed error analysis for three top performing state-of-the-art TERN systems that participated in the 2012 i2b2 Clinical Temporal Relation Challenge, and compare our own home-grown system Chrono to identify specific areas in need of improvement. Performance metrics and an error analysis reveal that all systems have reduced performance in normalization of relative temporal expressions, specifically in disambiguating temporal types and in the identification of the correct anchor time. To address the issue of temporal disambiguation we developed and integrated a module into Chrono that utilizes temporally fine-tuned contextual word embeddings to disambiguate relative temporal expressions. Chrono now achieves state-of-the-art performance for temporal disambiguation of relative temporal expressions in clinical text, and is the only TERN system to output dual annotations into both TimeML and SCATE schemes.

4.
Molecules ; 27(17)2022 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-36080376

RESUMEN

Reducing the use of solvents is an important aim of green chemistry. Using micelles self-assembled from amphiphilic molecules dispersed in water (considered a green solvent) has facilitated reactions of organic compounds. When performing reactions in micelles, the hydrophobic effect can considerably accelerate apparent reaction rates, as well as enhance selectivity. Here, we review micellar reaction media and their potential role in sustainable chemical production. The focus of this review is applications of engineered amphiphilic systems for reactions (surface-active ionic liquids, designer surfactants, and block copolymers) as reaction media. Micelles are a versatile platform for performing a large array of organic chemistries using water as the bulk solvent. Building on this foundation, synthetic sequences combining several reaction steps in one pot have been developed. Telescoping multiple reactions can reduce solvent waste by limiting the volume of solvents, as well as eliminating purification processes. Thus, in particular, we review recent advances in "one-pot" multistep reactions achieved using micellar reaction media with potential applications in medicinal chemistry and agrochemistry. Photocatalyzed reactions in micellar reaction media are also discussed. In addition to the use of micelles, we emphasize the process (steps to isolate the product and reuse the catalyst).


Asunto(s)
Micelas , Polímeros , Interacciones Hidrofóbicas e Hidrofílicas , Polímeros/química , Solventes , Agua/química
5.
JMIR Form Res ; 6(9): e32460, 2022 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-36066925

RESUMEN

BACKGROUND: Community-engaged research (CEnR) is a research approach in which scholars partner with community organizations or individuals with whom they share an interest in the study topic, typically with the goal of supporting that community's well-being. CEnR is well-established in numerous disciplines including the clinical and social sciences. However, universities experience challenges reporting comprehensive CEnR metrics, limiting the development of appropriate CEnR infrastructure and the advancement of relationships with communities, funders, and stakeholders. OBJECTIVE: We propose a novel approach to identifying and categorizing community-engaged studies by applying attention-based deep learning models to human participants protocols that have been submitted to the university's institutional review board (IRB). METHODS: We manually classified a sample of 280 protocols submitted to the IRB using a 3- and 6-level CEnR heuristic. We then trained an attention-based bidirectional long short-term memory unit (Bi-LSTM) on the classified protocols and compared it to transformer models such as Bidirectional Encoder Representations From Transformers (BERT), Bio + Clinical BERT, and Cross-lingual Language Model-Robustly Optimized BERT Pre-training Approach (XLM-RoBERTa). We applied the best-performing models to the full sample of unlabeled IRB protocols submitted in the years 2013-2019 (n>6000). RESULTS: Although transfer learning is superior, receiving a 0.9952 evaluation F1 score for all transformer models implemented compared to the attention-based Bi-LSTM (between 48%-80%), there were key issues with overfitting. This finding is consistent across several methodological adjustments: an augmented data set with and without cross-validation, an unaugmented data set with and without cross-validation, a 6-class CEnR spectrum, and a 3-class one. CONCLUSIONS: Transfer learning is a more viable method than the attention-based bidirectional-LSTM for differentiating small data sets characterized by the idiosyncrasies and variability of CEnR descriptions used by principal investigators in research protocols. Despite these issues involving overfitting, BERT and the other transformer models remarkably showed an understanding of our data unlike the attention-based Bi-LSTM model, promising a more realistic path toward solving this real-world application.

6.
Database (Oxford) ; 20222022 08 11.
Artículo en Inglés | MEDLINE | ID: mdl-35951425

RESUMEN

TopEx is a natural language processing application developed to facilitate the exploration of topics and key words in a set of texts through a user interface that requires no programming or natural language processing knowledge, thus enhancing the ability of nontechnical researchers to explore and analyze textual data. The underlying algorithm groups semantically similar sentences together followed by a topic analysis on each group to identify the key topics discussed in a collection of texts. Implementation is achieved via a Python library back end and a web application front end built with React and D3.js for visualizations. TopEx has been successfully used to identify themes, topics and key words in a variety of corpora, including Coronavirus disease 2019 (COVID-19) discharge summaries and tweets. Feedback from the BioCreative VII Challenge Track 4 concludes that TopEx is a useful tool for text exploration for a variety of users and tasks. DATABSE URL: http://topex.cctr.vcu.edu.


Asunto(s)
COVID-19 , Algoritmos , Minería de Datos/métodos , Humanos , Procesamiento de Lenguaje Natural , Programas Informáticos
7.
ACS Synth Biol ; 11(6): 2043-2054, 2022 06 17.
Artículo en Inglés | MEDLINE | ID: mdl-35671034

RESUMEN

Scientific articles contain a wealth of information about experimental methods and results describing biological designs. Due to its unstructured nature and multiple sources of ambiguity and variability, extracting this information from text is a difficult task. In this paper, we describe the development of the synthetic biology knowledge system (SBKS) text processing pipeline. The pipeline uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers. Specifically, we apply named entity recognition, relation extraction, concept grounding, and topic modeling to extract information from published literature to link articles to elements within our knowledge system. Our results show the efficacy of each of the components on synthetic biology literature and provide future directions for further advancement of the pipeline.


Asunto(s)
Minería de Datos , Biología Sintética , Minería de Datos/métodos , Procesamiento de Lenguaje Natural
8.
J Biomed Inform ; 130: 104062, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35413440

RESUMEN

MOTIVATION: Training domain-specific named entity recognition (NER) models requires high quality hand curated gold standard datasets which are time-consuming and expensive to create. Furthermore, the storage and memory required to deploy NLP models can be prohibitive when the number of tasks is large. In this work, we explore utilizing multi-task learning to reduce the amount of training data needed to train new domain-specific models. We evaluate our system across 22 distinct biomedical NER datasets and evaluate the extent to which transfer learning helps task performance using two forms of ablation. RESULTS: We found that multitasking models generally do not improve performance, but in many cases perform on par compared to single-task models. However, we show that in some cases, new unseen tasks can be trained as a single model using less data by starting with weights from a multitask model and improve performance. AVAILABILITY: The software underlying this article are available in: https://github.com/NLPatVCU/multitasking_bert-1.


Asunto(s)
Procesamiento de Lenguaje Natural , Programas Informáticos
9.
Proc Int World Wide Web Conf ; 2022: 823-832, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37465200

RESUMEN

Since the rise of the COVID-19 pandemic, peer-reviewed biomedical repositories have experienced a surge in chemical and disease related queries. These queries have a wide variety of naming conventions and nomenclatures from trademark and generic, to chemical composition mentions. Normalizing or disambiguating these mentions within texts provides researchers and data-curators with more relevant articles returned by their search query. Named entity normalization aims to automate this disambiguation process by linking entity mentions onto their appropriate candidate concepts within a biomedical knowledge base or ontology. We explore several term embedding aggregation techniques in addition to how the term's context affects evaluation performance. We also evaluate our embedding approaches for normalizing term instances containing one or many relations within unstructured texts.

10.
J Biomed Inform ; 126: 103970, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34920128

RESUMEN

Systematic reviews are labor-intensive processes to combine all knowledge about a given topic into a coherent summary. Despite the high labor investment, they are necessary to create an exhaustive overview of current evidence relevant to a research question. In this work, we evaluate three state-of-the-art supervised multi-label sequence classification systems to automatically identify 24 different experimental design factors for the categories of Animal, Dose, Exposure, and Endpoint from journal articles describing the experiments related to toxicity and health effects of environmental agents. We then present an in depth analysis of the results evaluating the lexical diversity of the design parameters with respect to model performance, evaluating the impact of tokenization and non-contiguous mentions, and finally evaluating the dependencies between entities within the category entities. We demonstrate that in general, algorithms that use embedded representations of the sequences out-perform statistical algorithms, but that even these algorithms struggle with lexically diverse entities.


Asunto(s)
Algoritmos , Procesamiento de Lenguaje Natural , Revisiones Sistemáticas como Asunto
11.
AMIA Jt Summits Transl Sci Proc ; 2021: 420-429, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34457157

RESUMEN

Adverse drug events (ADEs) are unexpected incidents caused by the administration of a drug or medication. To identify and extract these events, we require information about not just the drug itself but attributes describing the drug (e.g., strength, dosage), the reason why the drug was initially prescribed, and any adverse reaction to the drug. This paper explores the relationship between a drug and its associated attributes using relation extraction techniques. We explore three approaches: a rule-based approach, a deep learning-based approach, and a contextualized language model-based approach. We evaluate our system on the n2c2-2018 ADE extraction dataset. Our experimental results demonstrate that the contextualized language model-based approach outperformed other models overall and obtain the state-of-the-art performance in ADE extraction with a Precision of 0.93, Recall of 0.96, and an F1 score of 0.94; however, for certain relation types, the rule-based approach obtained a higher Precision and Recall than either learning approach.


Asunto(s)
Aprendizaje Profundo , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Preparaciones Farmacéuticas , Humanos , Procesamiento de Lenguaje Natural
12.
AMIA Jt Summits Transl Sci Proc ; 2021: 575-584, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34457173

RESUMEN

One of the primary challenges for clinical Named Entity Recognition (NER) is the availability of annotated training data. Technical and legal hurdles prevent the creation and release of corpora related to electronic health records (EHRs). In this work, we look at the impact of pseudo-data generation on clinical NER using gazetteering utilizing a neural network model. We report that gazetteers can result in the inclusion of proper terms with the exclusion of determiners and pronouns in preceding and middle positions. Gazetteers that had higher numbers of terms inclusive to the original dataset had a higher impact.


Asunto(s)
Registros Electrónicos de Salud , Redes Neurales de la Computación , Humanos , Lenguaje
13.
ACS Synth Biol ; 10(9): 2276-2285, 2021 09 17.
Artículo en Inglés | MEDLINE | ID: mdl-34387462

RESUMEN

The Synthetic Biology Knowledge System (SBKS) is an instance of the SynBioHub repository that includes text and data information that has been mined from papers published in ACS Synthetic Biology. This paper describes the SBKS curation framework that is being developed to construct the knowledge stored in this repository. The text mining pipeline performs automatic annotation of the articles using natural language processing techniques to identify salient content such as key terms, relationships between terms, and main topics. The data mining pipeline performs automatic annotation of the sequences extracted from the supplemental documents with the genetic parts used in them. Together these two pipelines link genetic parts to papers describing the context in which they are used. Ultimately, SBKS will reduce the time necessary for synthetic biologists to find the information necessary to complete their designs.


Asunto(s)
Biología Sintética , Interfaz Usuario-Computador , Animales , Línea Celular , Minería de Datos , Humanos
14.
Front Res Metr Anal ; 6: 644728, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34250435

RESUMEN

In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω-3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω-3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models.

15.
Micromachines (Basel) ; 12(7)2021 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-34209404

RESUMEN

Optimization of extrusion-based bioprinting (EBB) parameters have been systematically conducted through experimentation. However, the process is time- and resource-intensive and not easily translatable to other laboratories. This study approaches EBB parameter optimization through machine learning (ML) models trained using data collected from the published literature. We investigated regression-based and classification-based ML models and their abilities to predict printing outcomes of cell viability and filament diameter for cell-containing alginate and gelatin composite bioinks. In addition, we interrogated if regression-based models can predict suitable extrusion pressure given the desired cell viability when keeping other experimental parameters constant. We also compared models trained across data from general literature to models trained across data from one literature source that utilized alginate and gelatin bioinks. The results indicate that models trained on large amounts of data can impart physical trends on cell viability, filament diameter, and extrusion pressure seen in past literature. Regression models trained on the larger dataset also predict cell viability closer to experimental values for material concentration combinations not seen in training data of the single-paper-based regression models. While the best performing classification models for cell viability can achieve an average prediction accuracy of 70%, the cell viability predictions remained constant despite altering input parameter combinations. Our trained models on bioprinting literature data show the potential usage of applying ML models to bioprinting experimental design.

16.
Front Res Metr Anal ; 6: 688353, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34322654

RESUMEN

Chemical patents are an essential source of information about novel chemicals and chemical reactions. However, with the increasing volume of such patents, mining information about these chemicals and chemical reactions has become a time-intensive and laborious endeavor. In this study, we present a system to extract chemical reaction events from patents automatically. Our approach consists of two steps: 1) named entity recognition (NER)-the automatic identification of chemical reaction parameters from the corresponding text, and 2) event extraction (EE)-the automatic classifying and linking of entities based on their relationships to each other. For our NER system, we evaluate bidirectional long short-term memory (BiLSTM)-based and bidirectional encoder representations from transformer (BERT)-based methods. For our EE system, we evaluate BERT-based, convolutional neural network (CNN)-based, and rule-based methods. We evaluate our NER and EE components independently and as an end-to-end system, reporting the precision, recall, and F 1 score. Our results show that the BiLSTM-based method performed best at identifying the entities, and the CNN-based method performed best at extracting events.

17.
J Biomed Inform ; 118: 103784, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33862232

RESUMEN

Understanding a patient's medical history, such as how long symptoms last or when a procedure was performed, is vital to diagnosing problems and providing good care. Frequently, important information regarding a patient's medical timeline is buried in their Electronic Health Record (EHR) in the form of unstructured clinical notes. This results in care providers spending time reading notes in a patient's record in order to become familiar with their condition prior to developing a diagnosis or treatment plan. Valuable time could be saved if this information was readily accessible for searching and visualization for fast comprehension by the medical team. Clinical Natural Language Processing (NLP) is an area of research that aims to build computational methods to automatically extract medically relevant information from unstructured clinical texts. A key component of Clinical NLP is Temporal Reasoning, as understanding a patient's medical history relies heavily on the ability to identify, assimilate, and reason over temporal information. In this work, we review the current state of Temporal Reasoning in the clinical domain with respect to Clinical Timeline Extraction. While much progress has been made, the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Areas such as handling relative and implicit temporal expressions, both in normalization and in identifying temporal relationships, improving co-reference resolution, and building inter-operable timeline extraction tools that can integrate multiple types of data are in need of new and innovative solutions to improve performance on clinical data.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Solución de Problemas , Tiempo
18.
J Biomed Inform ; 112: 103589, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33035705

RESUMEN

Patient-physician communication is an often overlooked yet a very important aspect of providing medical care. Positive patient-physician quality of communication within discourse has an influence on various aspects of a consultation such as a patient's treatment adherence to prescribed medical regimen and their medical care outcome. As few reference standards exist for exploring semantics within the patient-physician setting and its effects on personalized healthcare, this paper presents a study exploring three methods to capture, model and evaluate patient-physician communication among three distinct data-sources. We introduce, compare and contrast these methods for capturing and modeling patient-physician communication quality using relatedness between discourse content within a given consultation. Results are shown for all three data-sources and communication quality scores among physicians recorded. We found our models demonstrate the ability to capture positive communication quality between both participants within a consultation. We also evaluate these findings against self-reported questionnaires highlighting various aspects of the consultation and rank communication quality among seventeen physicians who consulted amid one-hundred and thirty-two patients.


Asunto(s)
Relaciones Médico-Paciente , Médicos , Comunicación , Humanos , Satisfacción del Paciente , Semántica , Encuestas y Cuestionarios
19.
J Biomed Inform ; 110: 103552, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32890727

RESUMEN

Adverse drug events (ADEs) are unintended incidents that involve the taking of a medication. ADEs pose significant health and financial problems worldwide. Information about ADEs can inform health care and improve patient safety. However, much of this information is buried in narrative texts and needs to be extracted with Natural Language Processing techniques, in order to be useful to computerized methods. ADEs can be found on drug labels, contained in the different sections such as descriptions of the drug's active components or more prominently in descriptions of studied side-effects. Extracting these automatically could be useful in triaging and processing drug reports. In this paper, we present three base methods consisting of a Conditional Random Field (CRF), a bi-directional Long Short Term Memory unit with a CRF layer (biLSTM+CRF), and a pre-trained Bi-directional Encoder Representations from Transformers (BERT) model. We also present several ensembles of the CRF and biLSTM+CRF methods for extracting ADEs and their Reason from FDA drug labels. We show that all three methods perform well on our task, and that combining the models through different ensemble methods can improve results, providing increases in recall for the majority class and improving precision for all other classes. We also show the potential of framing ADE extraction from drug labels as a multi-class classification task on the Reason, or type, of ADE.


Asunto(s)
Aprendizaje Profundo , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Preparaciones Farmacéuticas , Etiquetado de Medicamentos , Humanos , Procesamiento de Lenguaje Natural
20.
AMIA Jt Summits Transl Sci Proc ; 2020: 201-210, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32477639

RESUMEN

Individuals increasingly rely on social media to discuss health-related issues. One way to provide easier access to relevant in- formation is through sentiment analysis - classifying text into polarity classes such as positive and negative. In this paper, we generated freely available datasets of WebMD.com drug reviews and star ratings for Common, Cancer, Depression, Diabetes, and Hypertension drugs. We explored four supervised learning models: Naive Bayes, Random Forests, Support Vector Machines, and Convolutional Neural Networks for the purpose of determining the polarity of drug reviews. We conducted inter-domain and cross-domain evaluations. We found that SVM obtained the highest f-measure on average and that cross-domain training produced similar or higher results to models trained directly on their respective datasets.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...