Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
AMIA Annu Symp Proc ; 2021: 515-524, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457167

RESUMO

Natural language is continually changing. Given the prevalence of unstructured, free-text clinical notes in the healthcare domain, understanding the aspects of this change is of critical importance to clinical Natural Language Processing (NLP) systems. In this study, we examine two previously described semantic change laws based on word frequency and polysemy, and analyze how they apply to the clinical domain. We also explore a new facet of change: whether domain-specific clinical terms exhibit different change patterns compared to general-purpose English. Using a corpus spanning eighteen years of clinical notes, we find that the previously described laws of semantic change hold for our data set. We also find that domain-specific biomedical terms change faster compared to general English words.


Assuntos
Processamento de Linguagem Natural , Semântica , Humanos , Idioma , Unified Medical Language System
2.
J Biomed Inform ; 117: 103755, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33781919

RESUMO

Resource Description Framework (RDF) is one of the three standardized data formats in the HL7 Fast Healthcare Interoperability Resources (FHIR) specification and is being used by healthcare and research organizations to join FHIR and non-FHIR data. However, RDF previously had not been integrated into popular FHIR tooling packages, hindering the adoption of FHIR RDF in the semantic web and other communities. The objective of the study is to develop and evaluate a Java based FHIR RDF data transformation toolkit to facilitate the use and validation of FHIR RDF data. We extended the popular HAPI FHIR tooling to add RDF support, thus enabling FHIR data in XML or JSON to be transformed to or from RDF. We also developed an RDF Shape Expression (ShEx)-based validation framework to verify conformance of FHIR RDF data to the ShEx schemas provided in the FHIR specification for FHIR versions R4 and R5. The effectiveness of ShEx validation was demonstrated by testing it against 2693 FHIR R4 examples and 2197 FHIR R5 examples that are included in the FHIR specification. A total of 5 types of errors including missing properties, unknown element, missing resource Type, invalid attribute value, and unknown resource name in the R5 examples were revealed, demonstrating the value of the ShEx in the quality assurance of the evolving R5 development. This FHIR RDF data transformation and validation framework, based on HAPI and ShEx, is robust and ready for community use in adopting FHIR RDF, improving FHIR data quality, and evolving the FHIR specification.


Assuntos
Atenção à Saúde , Registros Eletrônicos de Saúde
3.
J Biomed Inform ; 110: 103541, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32814201

RESUMO

Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.

4.
J Biomed Inform ; 109: 103526, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32768446

RESUMO

BACKGROUND: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.

5.
AMIA Jt Summits Transl Sci Proc ; 2020: 171-180, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32477636

RESUMO

The effective use of EHR data for clinical research is challenged by the lack of methodologic standards, transparency, and reproducibility. For example, our empirical analysis on clinical research ontologies and reporting standards found little-to-no informatics-related standards. To address these issues, our study aims to leverage natural language processing techniques to discover the reporting patterns and data abstraction methodologies for EHR-based clinical research. We conducted a case study using a collection of full articles of EHR-based population studies published using the Rochester Epidemiology Project infrastructure. Our investigation discovered an upward trend of reporting EHR-related research methodologies, good practice, and the use of informatics related methods. For example, among 1279 articles, 24.0% reported training for data abstraction, 6% reported the abstractors were blinded, 4.5% tested the inter-observer agreement, 5% reported the use of a screening/data collection protocol, 1.5% reported that team meetings were organized for consensus building, and 0.8% mentioned supervision activities by senior researchers. Despite that, the overall ratio of reporting/adoption of methodologic standards was still low. There was also a high variation regarding clinical research reporting. Thus, continuously developing process frameworks, ontologies, and reporting guidelines for promoting good data practice in EHR-based clinical research are recommended.

6.
AMIA Jt Summits Transl Sci Proc ; 2020: 497-506, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32477671

RESUMO

An important function of the patient record is to effectively and concisely communicate patient problems. In many cases, these problems are represented as short textual summarizations and appear in various sections of the record including problem lists, diagnoses, and chief complaints. While free-text problem descriptions effectively capture the clinicians' intent, these unstructured representations are problematic for downstream analytics. We present an automated approach to converting free-text problem descriptions into structured Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) expressions. Our methods focus on incorporating new advances in deep learning to build formal semantic representations of summary level clinical problems from text. We evaluate our methods against current approaches as well as against a large clinical corpus. We find that our methods outperform current techniques on the important relation identification sub-task of this conversion, and highlight the challenges of applying these methods to real-world clinical text.

7.
Trends Genet ; 36(7): 461-463, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32544447

RESUMO

Since 2002, published miRNAs have been collected and named by the online repository miRBase. However, with 11 000 annual publications this has become challenging. Recently, four specialized miRNA databases were published, addressing particular needs for diverse scientific communities. This development provides major opportunities for the future of miRNA annotation and nomenclature.


Assuntos
Bases de Dados de Ácidos Nucleicos , Regulação da Expressão Gênica , MicroRNAs/genética , Anotação de Sequência Molecular/normas , Análise de Sequência de RNA/normas , Software , Genômica , Humanos
9.
Nucleic Acids Res ; 48(D1): D132-D141, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31598695

RESUMO

Small non-coding RNAs have gained substantial attention due to their roles in animal development and human disorders. Among them, microRNAs are special because individual gene sequences are conserved across the animal kingdom. In addition, unique and mechanistically well understood features can clearly distinguish bona fide miRNAs from the myriad other small RNAs generated by cells. However, making this distinction is not a common practice and, thus, not surprisingly, the heterogeneous quality of available miRNA complements has become a major concern in microRNA research. We addressed this by extensively expanding our curated microRNA gene database - MirGeneDB - to 45 organisms, encompassing a wide phylogenetic swath of animal evolution. By consistently annotating and naming 10,899 microRNA genes in these organisms, we show that previous microRNA annotations contained not only many false positives, but surprisingly lacked >2000 bona fide microRNAs. Indeed, curated microRNA complements of closely related organisms are very similar and can be used to reconstruct ancestral miRNA repertoires. MirGeneDB represents a robust platform for microRNA-based research, providing deeper and more significant insights into the biology and evolution of miRNAs as well as biomedical and biomarker research. MirGeneDB is publicly and freely available at http://mirgenedb.org/.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , MicroRNAs/genética , Software , Navegador , Animais , Sequência Conservada , Evolução Molecular , MicroRNAs/classificação , Anotação de Sequência Molecular , Filogenia , Interface Usuário-Computador
11.
EMBO Rep ; 20(2)2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30573526

RESUMO

Testis-expressed X-linked genes typically evolve rapidly. Here, we report on a testis-expressed X-linked microRNA (miRNA) cluster that despite rapid alterations in sequence has retained its position in the Fragile-X region of the X chromosome in placental mammals. Surprisingly, the miRNAs encoded by this cluster (Fx-mir) have a predilection for targeting the immediately adjacent gene, Fmr1, an unexpected finding given that miRNAs usually act in trans, not in cis Robust repression of Fmr1 is conferred by combinations of Fx-mir miRNAs induced in Sertoli cells (SCs) during postnatal development when they terminate proliferation. Physiological significance is suggested by the finding that FMRP, the protein product of Fmr1, is downregulated when Fx-mir miRNAs are induced, and that FMRP loss causes SC hyperproliferation and spermatogenic defects. Fx-mir miRNAs not only regulate the expression of FMRP, but also regulate the expression of eIF4E and CYFIP1, which together with FMRP form a translational regulatory complex. Our results support a model in which Fx-mir family members act cooperatively to regulate the translation of batteries of mRNAs in a developmentally regulated manner in SCs.


Assuntos
Proteína do X Frágil de Retardo Mental/genética , MicroRNAs/genética , Família Multigênica , Interferência de RNA , RNA Mensageiro/genética , Espermatogênese/genética , Regiões 3' não Traduzidas , Animais , Regulação da Expressão Gênica , Humanos , Masculino , Camundongos , Testículo/metabolismo
12.
Curr Biol ; 28(20): 3288-3295.e5, 2018 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-30318349

RESUMO

The emergence of multicellular animals was associated with an increase in phenotypic complexity and with the acquisition of spatial cell differentiation and embryonic development. Paradoxically, this phenotypic transition was not paralleled by major changes in the underlying developmental toolkit and regulatory networks. In fact, most of these systems are ancient, established already in the unicellular ancestors of animals [1-5]. In contrast, the Microprocessor protein machinery, which is essential for microRNA (miRNA) biogenesis in animals, as well as the miRNA genes themselves produced by this Microprocessor, have not been identified outside of the animal kingdom [6]. Hence, the Microprocessor, with the key proteins Pasha and Drosha, is regarded as an animal innovation [7-9]. Here, we challenge this evolutionary scenario by investigating unicellular sister lineages of animals through genomic and transcriptomic analyses. We identify in Ichthyosporea both Drosha and Pasha (DGCR8 in vertebrates), indicating that the Microprocessor complex evolved long before the last common ancestor of animals, consistent with a pre-metazoan origin of most of the animal developmental gene elements. Through small RNA sequencing, we also discovered expressed bona fide miRNA genes in several species of the ichthyosporeans harboring the Microprocessor. A deep, pre-metazoan origin of the Microprocessor and miRNAs comply with a view that the origin of multicellular animals was not directly linked to the innovation of these key regulatory components.


Assuntos
Evolução Molecular , Mesomycetozoea/genética , MicroRNAs/genética , Animais , Sequência de Bases , Mesomycetozoea/metabolismo , MicroRNAs/metabolismo , Filogenia
13.
PLoS One ; 13(9): e0204234, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30260966

RESUMO

Earthworms show a wide spectrum of regenerative potential with certain species like Eisenia fetida capable of regenerating more than two-thirds of their body while other closely related species, such as Paranais litoralis seem to have lost this ability. Earthworms belong to the phylum Annelida, in which the genomes of the marine oligochaete Capitella telata and the freshwater leech Helobdella robusta have been sequenced and studied. Herein, we report the transcriptomic changes in Eisenia fetida (Indian isolate) during regeneration. Following injury, E. fetida regenerates the posterior segments in a time spanning several weeks. We analyzed gene expression changes both in the newly regenerating cells and in the adjacent tissue, at early (15days post amputation), intermediate (20days post amputation) and late (30 days post amputation) by RNAseq based de novo assembly and comparison of transcriptomes. We also generated a draft genome sequence of this terrestrial red worm using short reads and mate-pair reads. An in-depth analysis of the miRNome of the worm showed that many miRNA gene families have undergone extensive duplications. Sox4, a master regulator of TGF-beta mediated epithelial-mesenchymal transition was induced in the newly regenerated tissue. Genes for several proteins such as sialidases and neurotrophins were identified amongst the differentially expressed transcripts. The regeneration of the ventral nerve cord was also accompanied by the induction of nerve growth factor and neurofilament genes. We identified 315 novel differentially expressed transcripts in the transcriptome, that have no homolog in any other species. Surprisingly, 82% of these novel differentially expressed transcripts showed poor potential for coding proteins, suggesting that novel ncRNAs may play a critical role in regeneration of earthworm.


Assuntos
Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Oligoquetos/fisiologia , Análise de Sequência de DNA/métodos , Animais , Evolução Molecular , Regulação da Expressão Gênica , Genoma , MicroRNAs/genética , Família Multigênica , Oligoquetos/genética , Filogenia , Regeneração , Fatores de Transcrição SOXC/genética , Análise de Sequência de RNA/métodos
14.
Proc Natl Acad Sci U S A ; 115(38): E8909-E8918, 2018 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-30181261

RESUMO

The animal kingdom exhibits a great diversity of organismal form (i.e., disparity). Whether the extremes of disparity were achieved early in animal evolutionary history or clades continually explore the limits of possible morphospace is subject to continuing debate. Here we show, through analysis of the disparity of the animal kingdom, that, even though many clades exhibit maximal initial disparity, arthropods, chordates, annelids, echinoderms, and mollusks have continued to explore and expand the limits of morphospace throughout the Phanerozoic, expanding dramatically the envelope of disparity occupied in the Cambrian. The "clumpiness" of morphospace occupation by living clades is a consequence of the extinction of phylogenetic intermediates, indicating that the original distribution of morphologies was more homogeneous. The morphological distances between phyla mirror differences in complexity, body size, and species-level diversity across the animal kingdom. Causal hypotheses of morphologic expansion include time since origination, increases in genome size, protein repertoire, gene family expansion, and gene regulation. We find a strong correlation between increasing morphological disparity, genome size, and microRNA repertoire, but no correlation to protein domain diversity. Our results are compatible with the view that the evolution of gene regulation has been influential in shaping metazoan disparity whereas the invasion of terrestrial ecospace appears to represent an additional gestalt, underpinning the post-Cambrian expansion of metazoan disparity.


Assuntos
Biodiversidade , Evolução Biológica , Regulação da Expressão Gênica/fisiologia , Tamanho do Genoma/fisiologia , MicroRNAs/fisiologia , Animais , Fósseis , Proteínas/genética
15.
Genome Biol Evol ; 10(6): 1457-1470, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29788279

RESUMO

microRNAs are conserved noncoding regulatory factors implicated in diverse physiological and developmental processes in multicellular organisms, as causal macroevolutionary agents and for phylogeny inference. However, the conservation and phylogenetic utility of microRNAs has been questioned on evidence of pervasive loss. Here, we show that apparent widespread losses are, largely, an artefact of poorly sampled and annotated microRNAomes. Using a curated data set of animal microRNAomes, we reject the view that miRNA families are never lost, but they are rarely lost (92% are never lost). A small number of families account for a majority of losses (1.7% of families account for >45% losses), and losses are associated with lineages exhibiting phenotypic simplification. Phylogenetic analyses based on the presence/absence of microRNA families among animal lineages, and based on microRNA sequences among Osteichthyes, demonstrate the power of these small data sets in phylogenetic inference. Perceptions of widespread evolutionary loss of microRNA families are due to the uncritical use of public archives corrupted by spurious microRNA annotations, and failure to discriminate false absences that occur because of incomplete microRNAome annotation.


Assuntos
MicroRNAs/genética , Animais , Sequência Conservada/genética , Evolução Molecular , Anotação de Sequência Molecular/métodos , Fenótipo , Filogenia
16.
Trends Genet ; 34(3): 165-167, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29361313

RESUMO

A lack of knowledge of the cellular origin of miRNAs has greatly confounded functional and biomarkers studies. Recently, three studies characterized miRNA expression patterns across >78 human cell types. These combined data expand our knowledge of miRNA expression localization and confirm that many miRNAs show cell type-specific expression patterns.


Assuntos
Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , MicroRNAs/genética , Animais , Células Eucarióticas/citologia , Células Eucarióticas/metabolismo , Humanos , Especificidade de Órgãos/genética , RNA Mensageiro/genética
17.
AMIA Annu Symp Proc ; 2018: 1451-1460, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815190

RESUMO

.Summary-level clinical text is an important part of the overall clinical record as it provides a condensed and efficient view into the issues pertinent to the patient, or their "problem list." These problem lists contain a wealth of information pertaining to the patient's history as well as current state and well-being. In this study, we explore the structure of these problem list entries both grammatically and semantically in an attempt to learn the specialized rules, or "sublanguage" that governs them. Our methods focus on a large-scale corpus analysis of problem list entries. Using Resource Description Framework (RDF), we incorporate inferencing and reasoning via domain-specific ontologies into our analysis to elicit common semantic patterns. We also explore how these methods can be applied dynamically to learn specific sublanguage features of interest for a particular concept or topic within the domain.


Assuntos
Registros Médicos Orientados a Problemas , Processamento de Linguagem Natural , Systematized Nomenclature of Medicine , Terminologia como Assunto , Humanos , Semântica , Unified Medical Language System
18.
AMIA Annu Symp Proc ; 2017: 1372-1381, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854206

RESUMO

A value set is a collection of permissible values used to describe a specific conceptual domain for a given purpose. By helping to establish a shared semantic understanding across use cases, these artifacts are important enablers of interoperability and data standardization. As the size of repositories cataloging these value sets expand, knowledge management challenges become more pronounced. Specifically, discovering value sets applicable to a given use case may be challenging in a large repository. In this study, we describe methods to extract implicit relationships between value sets, and utilize these relationships to overlay organizational structure onto value set repositories. We successfully extract two different structurings, hierarchy and clustering, and show how tooling can leverage these structures to enable more effective value set discovery.


Assuntos
Mineração de Dados , Vocabulário Controlado , Análise por Conglomerados , Mineração de Dados/métodos , Interoperabilidade da Informação em Saúde , Semântica
19.
Cell ; 165(2): 382-95, 2016 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-27040500

RESUMO

Gene duplication is a major evolutionary force driving adaptation and speciation, as it allows for the acquisition of new functions and can augment or diversify existing functions. Here, we report a gene duplication event that yielded another outcome--the generation of antagonistic functions. One product of this duplication event--UPF3B--is critical for the nonsense-mediated RNA decay (NMD) pathway, while its autosomal counterpart--UPF3A--encodes an enigmatic protein previously shown to have trace NMD activity. Using loss-of-function approaches in vitro and in vivo, we discovered that UPF3A acts primarily as a potent NMD inhibitor that stabilizes hundreds of transcripts. Evidence suggests that UPF3A acquired repressor activity through simple impairment of a critical domain, a rapid mechanism that may have been widely used in evolution. Mice conditionally lacking UPF3A exhibit "hyper" NMD and display defects in embryogenesis and gametogenesis. Our results support a model in which UPF3A serves as a molecular rheostat that directs developmental events.


Assuntos
Desenvolvimento Embrionário , Genes Duplicados , Degradação do RNAm Mediada por Códon sem Sentido , Proteínas de Ligação a RNA/metabolismo , Animais , Linhagem Celular Tumoral , Evolução Molecular , Gametogênese , Células HeLa , Humanos , Camundongos
20.
Genome Biol Evol ; 8(2): 330-44, 2016 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-26733575

RESUMO

Placental mammals comprise three principal clades: Afrotheria (e.g., elephants and tenrecs), Xenarthra (e.g., armadillos and sloths), and Boreoeutheria (all other placental mammals), the relationships among which are the subject of controversy and a touchstone for debate on the limits of phylogenetic inference. Previous analyses have found support for all three hypotheses, leading some to conclude that this phylogenetic problem might be impossible to resolve due to the compounded effects of incomplete lineage sorting (ILS) and a rapid radiation. Here we show, using a genome scale nucleotide data set, microRNAs, and the reanalysis of the three largest previously published amino acid data sets, that the root of Placentalia lies between Atlantogenata and Boreoeutheria. Although we found evidence for ILS in early placental evolution, we are able to reject previous conclusions that the placental root is a hard polytomy that cannot be resolved. Reanalyses of previous data sets recover Atlantogenata + Boreoeutheria and show that contradictory results are a consequence of poorly fitting evolutionary models; instead, when the evolutionary process is better-modeled, all data sets converge on Atlantogenata. Our Bayesian molecular clock analysis estimates that marsupials diverged from placentals 157-170 Ma, crown Placentalia diverged 86-100 Ma, and crown Atlantogenata diverged 84-97 Ma. Our results are compatible with placental diversification being driven by dispersal rather than vicariance mechanisms, postdating early phases in the protracted opening of the Atlantic Ocean.


Assuntos
Evolução Molecular , Mamíferos/genética , Modelos Genéticos , Filogenia , Placenta/anatomia & histologia , Animais , Feminino , Fósseis , Especiação Genética , Genoma , Mamíferos/classificação , MicroRNAs/genética , Gravidez
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...