Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
J Biomed Inform ; 145: 104460, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37532000

RESUMO

While a large number of knowledge graphs have previously been developed by automatically extracting and structuring knowledge from literature, there is currently no such knowledge graph that encodes relationships between food, biochemicals and mental illnesses, even though a large amount of knowledge about these relationships is available in the form of unstructured text in biomedical literature articles. To address this limitation, this article describes the development of GENA - (Graph of mEntal-health and Nutrition Association), a knowledge graph that represents relations between nutrition and mental health, extracted from biomedical abstracts. GENA is constructed from PubMed abstracts that contain keywords relating to chemicals, food, and health. A hybrid named entity recognition (NER) model is firstly applied to these abstracts to identify various entities of interest. Subsequently, a deep syntax-based relation extraction model is used to detect binary relations between the identified entities. Finally, the resulting relations are used to populate the GENA knowledge graph, whose relationships can be accessed in an intuitive and interpretable manner using the Neo4J Database Management System. To evaluate the reliability of GENA, two annotators manually assessed a subset of the extracted relations. The evaluation results show that our methods obtain high precision for the NER task and acceptable precision and relative recall for the relation extraction task. GENA consists of 43,367 relationships that encode information about nutrition and health, of which 94.04% are new relations that are not present in existing ontologies of food and diseases. GENA is constructed based on scientific principles, and has the potential to be used within further applications to contribute towards scientific research within the domain. It is a pioneering knowledge graph in nutrition and mental health, containing a diverse range of relationship types. All of our source code and results are publicly available at https://github.com/ddlinh/gena-db.


Assuntos
Saúde Mental , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Software , PubMed
2.
J Biomed Inform ; 141: 104347, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37030658

RESUMO

Automatic extraction of patient medication histories from free-text clinical notes can increase the amount of relevant information to clinicians for developing treatment plans. In addition to detecting medication events, clinical text mining systems must also be able to predict event context, such as negation, uncertainty, and time of occurrence, in order to construct accurate patient timelines. Towards this goal, we introduce Levitated Context Markers (LCMs), a novel transformer-based model for contextualized event extraction. LCMs are an adaptation of levitated markers -originally developed for relation extraction- that allow pretrained transformer models to utilize global input representations while also focusing on event-related subspans using a sparse attention mechanism. In addition to outperforming a strong baseline model on the Contextualized Medication Event Dataset, we show that LCMs' sparse attention can provide interpretable predictions by detecting relevant context cues in an unsupervised manner.


Assuntos
Mineração de Dados , Registros , Humanos , Processamento de Linguagem Natural
3.
Am J Hum Genet ; 96(2): 266-74, 2015 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-25620203

RESUMO

Singleton-Merten syndrome (SMS) is an autosomal-dominant multi-system disorder characterized by dental dysplasia, aortic calcification, skeletal abnormalities, glaucoma, psoriasis, and other conditions. Despite an apparent autosomal-dominant pattern of inheritance, the genetic background of SMS and information about its phenotypic heterogeneity remain unknown. Recently, we found a family affected by glaucoma, aortic calcification, and skeletal abnormalities. Unlike subjects with classic SMS, affected individuals showed normal dentition, suggesting atypical SMS. To identify genetic causes of the disease, we performed exome sequencing in this family and identified a variant (c.1118A>C [p.Glu373Ala]) of DDX58, whose protein product is also known as RIG-I. Further analysis of DDX58 in 100 individuals with congenital glaucoma identified another variant (c.803G>T [p.Cys268Phe]) in a family who harbored neither dental anomalies nor aortic calcification but who suffered from glaucoma and skeletal abnormalities. Cys268 and Glu373 residues of DDX58 belong to ATP-binding motifs I and II, respectively, and these residues are predicted to be located closer to the ADP and RNA molecules than other nonpathogenic missense variants by protein structure analysis. Functional assays revealed that DDX58 alterations confer constitutive activation and thus lead to increased interferon (IFN) activity and IFN-stimulated gene expression. In addition, when we transduced primary human trabecular meshwork cells with c.803G>T (p.Cys268Phe) and c.1118A>C (p.Glu373Ala) mutants, cytopathic effects and a significant decrease in cell number were observed. Taken together, our results demonstrate that DDX58 mutations cause atypical SMS manifesting with variable expression of glaucoma, aortic calcification, and skeletal abnormalities without dental anomalies.


Assuntos
Doenças da Aorta/genética , RNA Helicases DEAD-box/genética , Hipoplasia do Esmalte Dentário/genética , Glaucoma/genética , Metacarpo/anormalidades , Modelos Moleculares , Doenças Musculares/genética , Odontodisplasia/genética , Osteoporose/genética , Calcificação Vascular/genética , Adulto , Doenças da Aorta/patologia , Sequência de Bases , Células Cultivadas , Pré-Escolar , Proteína DEAD-box 58 , RNA Helicases DEAD-box/química , Hipoplasia do Esmalte Dentário/patologia , Exoma/genética , Feminino , Genes Dominantes/genética , Humanos , Masculino , Metacarpo/patologia , Dados de Sequência Molecular , Doenças Musculares/patologia , Anormalidades Musculoesqueléticas/diagnóstico por imagem , Anormalidades Musculoesqueléticas/genética , Mutação de Sentido Incorreto/genética , Odontodisplasia/diagnóstico por imagem , Odontodisplasia/patologia , Osteoporose/patologia , Linhagem , Polimorfismo de Nucleotídeo Único/genética , Radiografia , Receptores Imunológicos , Análise de Sequência de DNA , Calcificação Vascular/patologia
4.
BMC Bioinformatics ; 16: 107, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-25887686

RESUMO

BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework. RESULTS: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems. CONCLUSION: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications.


Assuntos
Mineração de Dados/métodos , MEDLINE , Semântica
5.
J Biomed Inform ; 56: 94-102, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26004792

RESUMO

Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.


Assuntos
Mineração de Dados/métodos , Processamento de Linguagem Natural , Vocabulário Controlado , Algoritmos , Análise por Conglomerados , Bases de Dados Factuais , Reações Falso-Positivas , Lógica Fuzzy , MEDLINE , Cadeias de Markov , Informática Médica/métodos , Modelos Estatísticos , Probabilidade , Reprodutibilidade dos Testes , Semântica
6.
Front Res Metr Anal ; 8: 1247094, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38173988

RESUMO

Biomedical entity linking task is the task of mapping mention(s) that occur in a particular textual context to a unique concept or entity in a knowledge base, e.g., the Unified Medical Language System (UMLS). One of the most challenging aspects of the entity linking task is the ambiguity of mentions, i.e., (1) mentions whose surface forms are very similar, but which map to different entities in different contexts, and (2) entities that can be expressed using diverse types of mentions. Recent studies have used BERT-based encoders to encode mentions and entities into distinguishable representations such that their similarity can be measured using distance metrics. However, most real-world biomedical datasets suffer from severe imbalance, i.e., some classes have many instances while others appear only once or are completely absent from the training data. A common way to address this issue is to down-sample the dataset, i.e., to reduce the number instances of the majority classes to make the dataset more balanced. In the context of entity linking, down-sampling reduces the ability of the model to comprehensively learn the representations of mentions in different contexts, which is very important. To tackle this issue, we propose a metric-based learning method that treats a given entity and its mentions as a whole, regardless of the number of mentions in the training set. Specifically, our method uses a triplet loss-based function in conjunction with a clustering technique to learn the representation of mentions and entities. Through evaluations on two challenging biomedical datasets, i.e., MedMentions and BC5CDR, we show that our proposed method is able to address the issue of imbalanced data and to perform competitively with other state-of-the-art models. Moreover, our method significantly reduces computational cost in both training and inference steps. Our source code is publicly available here.

7.
JAMIA Open ; 4(4): ooab104, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34927002

RESUMO

The COVID-19 pandemic resulted in an unprecedented production of scientific literature spanning several fields. To facilitate navigation of the scientific literature related to various aspects of the pandemic, we developed an exploratory search system. The system is based on automatically identified technical terms, document citations, and their visualization, accelerating identification of relevant documents. It offers a multi-view interactive search and navigation interface, bringing together unsupervised approaches of term extraction and citation analysis. We conducted a user evaluation with domain experts, including epidemiologists, biochemists, medicinal chemists, and medicine students. In general, most users were satisfied with the relevance and speed of the search results. More interestingly, participants mostly agreed on the capacity of the system to enable exploration and discovery of the search space using the graph visualization and filters. The system is updated on a weekly basis and it is publicly available at http://www.nactem.ac.uk/cord/.

8.
J Extracell Vesicles ; 10(4): e12057, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33643546

RESUMO

Natural extracellular vesicles (EVs) are ideal drug carriers due to their remarkable biocompatibility. Their delivery specificity can be achieved by the conjugation of targeting ligands. However, existing methods to engineer target-specific EVs are tedious or inefficient, having to compromise between harsh chemical treatments and transient interactions. Here, we describe a novel method for the covalent conjugation of EVs with high copy numbers of targeting moieties using protein ligases. Conjugation of EVs with either an epidermal growth factor receptor (EGFR)-targeting peptide or anti-EGFR nanobody facilitates their accumulation in EGFR-positive cancer cells, both in vitro and in vivo. Systemic delivery of paclitaxel by EGFR-targeting EVs at a low dose significantly increases drug efficacy in a xenografted mouse model of EGFR-positive lung cancer. The method is also applicable to the conjugation of EVs with peptides and nanobodies targeting other receptors, such as HER2 and SIRP alpha, and the conjugated EVs can deliver RNA in addition to small molecules, supporting the versatile application of EVs in cancer therapies. This simple, yet efficient and versatile method for the stable surface modification of EVs bypasses the need for genetic and chemical modifications, thus facilitating safe and specific delivery of therapeutic payloads to target cells.


Assuntos
Sistemas de Liberação de Medicamentos/métodos , Vesículas Extracelulares , Peptídeos/uso terapêutico , Anticorpos de Domínio Único/uso terapêutico , Animais , Antineoplásicos Fitogênicos/uso terapêutico , Linhagem Celular Tumoral , Portadores de Fármacos/química , Portadores de Fármacos/uso terapêutico , Receptores ErbB/química , Receptores ErbB/uso terapêutico , Eritrócitos , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Camundongos , Paclitaxel/uso terapêutico , Peptídeos/química , Anticorpos de Domínio Único/química , Ensaios Antitumorais Modelo de Xenoenxerto
9.
J Am Med Inform Assoc ; 27(1): 22-30, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31197355

RESUMO

OBJECTIVE: This article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2. MATERIALS AND METHODS: We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. RESULTS: Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. DISCUSSION: Analysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities. CONCLUSION: The overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Redes Neurais de Computação , Humanos , Narração
10.
Biodivers Data J ; (7): e29626, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30700967

RESUMO

Background Species occurrence records are very important in the biodiversity domain. While several available corpora contain only annotations of species names or habitats and geographical locations, there is no consolidated corpus that covers all types of entities necessary for extracting species occurrence from biodiversity literature. In order to alleviate this issue, we have constructed the COPIOUS corpus-a gold standard corpus that covers a wide range of biodiversity entities. Results Two annotators manually annotated the corpus with five categories of entities, i.e. taxon names, geographical locations, habitats, temporal expressions and person names. The overall inter-annotator agreement on 200 doubly-annotated documents is approximately 81.86% F-score. Amongst the five categories, the agreement on habitat entities was the lowest, indicating that this type of entity is complex. The COPIOUS corpus consists of 668 documents downloaded from the Biodiversity Heritage Library with over 26K sentences and more than 28K entities. Named entity recognisers trained on the corpus could achieve an F-score of 74.58%. Moreover, in recognising taxon names, our model performed better than two available tools in the biodiversity domain, namely the SPECIES tagger and the Global Name Recognition and Discovery. More than 1,600 binary relations of Taxon-Habitat, Taxon-Person, Taxon-Geographical locations and Taxon-Temporal expressions were identified by applying a pattern-based relation extraction system to the gold standard. Based on the extracted relations, we can produce a knowledge repository of species occurrences. Conclusion The paper describes in detail the construction of a gold standard named entity corpus for the biodiversity domain. An investigation of the performance of named entity recognition (NER) tools trained on the gold standard revealed that the corpus is sufficiently reliable and sizeable for both training and evaluation purposes. The corpus can be further used for relation extraction to locate species occurrences in literature-a useful task for monitoring species distribution and preserving the biodiversity.

11.
Aging Cell ; 18(3): e12906, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30773781

RESUMO

PDZ domain-containing proteins (PDZ proteins) act as scaffolds for protein-protein interactions and are crucial for a variety of signal transduction processes. However, the role of PDZ proteins in organismal lifespan and aging remains poorly understood. Here, we demonstrate that KIN-4, a PDZ domain-containing microtubule-associated serine-threonine (MAST) protein kinase, is a key longevity factor acting through binding PTEN phosphatase in Caenorhabditis elegans. Through a targeted genetic screen for PDZ proteins, we find that kin-4 is required for the long lifespan of daf-2/insulin/IGF-1 receptor mutants. We then show that neurons are crucial tissues for the longevity-promoting role of kin-4. We find that the PDZ domain of KIN-4 binds PTEN, a key factor for the longevity of daf-2 mutants. Moreover, the interaction between KIN-4 and PTEN is essential for the extended lifespan of daf-2 mutants. As many aspects of lifespan regulation in C. elegans are evolutionarily conserved, MAST family kinases may regulate aging and/or age-related diseases in mammals through their interaction with PTEN.


Assuntos
Proteínas de Caenorhabditis elegans/metabolismo , PTEN Fosfo-Hidrolase/metabolismo , Animais , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , Longevidade/genética , Domínios PDZ/genética , PTEN Fosfo-Hidrolase/genética
12.
PLoS One ; 12(4): e0175277, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28414821

RESUMO

The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications.


Assuntos
Biodiversidade , Mineração de Dados/métodos , Algoritmos , Bibliotecas , Ferramenta de Busca , Semântica , Terminologia como Assunto
13.
J Microbiol ; 54(9): 583-587, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27572506

RESUMO

RIG-I is a cytosolic receptor recognizing virus-specific RNA structures and initiates an antiviral signaling that induces the production of interferons and proinflammatory cytokines. Because inappropriate RIG-I signaling affects either viral clearance or immune toxicity, multiple regulations of RIG-I have been investigated since its discovery as the viral RNA detector. In this review, we describe the recent progress in research on the regulation of RIG-I activity or abundance. Specifically, we focus on the mechanism that modulates RIG-I-dependent antiviral response through post-translational modifications of or protein-protein interactions with RIG-I.


Assuntos
Proteína DEAD-box 58/genética , Proteína DEAD-box 58/imunologia , Transdução de Sinais , Viroses/imunologia , Vírus/imunologia , Animais , Regulação da Expressão Gênica , Interações Hospedeiro-Patógeno , Humanos , Ligação Proteica , Receptores Imunológicos , Viroses/genética , Viroses/virologia , Vírus/genética
14.
Sci Rep ; 6: 23377, 2016 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-26996158

RESUMO

RIG-I is a key cytosolic RNA sensor that mediates innate immune defense against RNA virus. Aberrant RIG-I activity leads to severe pathological states such as autosomal dominant multi-system disorder, inflammatory myophathies and dermatomyositis. Therefore, identification of regulators that ensure efficient defense without harmful immune-pathology is particularly critical to deal with RIG-I-associated diseases. Here, we presented the inflammatory inducible FAT10 as a novel negative regulator of RIG-I-mediated inflammatory response. In various cell lines, FAT10 protein is undetectable unless it is induced by pro-inflammatory cytokines. FAT10 non-covalently associated with the 2CARD domain of RIG-I, and inhibited viral RNA-induced IRF3 and NF-kB activation through modulating the RIG-I protein solubility. We further demonstrated that FAT10 was recruited to RIG-I-TRIM25 to form an inhibitory complex where FAT10 was stabilized by E3 ligase TRIM25. As the result, FAT10 inhibited the antiviral stress granules formation contains RIG-I and sequestered the active RIG-I away from the mitochondria. Our study presented a novel mechanism to dampen RIG-I activity. Highly accumulated FAT10 is observed in various cancers with pro-inflammatory environment, therefore, our finding which uncovered the suppressive effect of the accumulated FAT10 during virus-mediated inflammatory response may also provide molecular clue to understand the carcinogenesis related with infection and inflammation.


Assuntos
Proteína DEAD-box 58/metabolismo , Inflamação/metabolismo , Inflamação/virologia , Transdução de Sinais , Ubiquitinas/metabolismo , Células HEK293 , Humanos , Fator Regulador 3 de Interferon/metabolismo , NF-kappa B/metabolismo , Receptores Imunológicos , Fatores de Transcrição/metabolismo , Proteínas com Motivo Tripartido/metabolismo , Ubiquitina-Proteína Ligases/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA