Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nucleic Acids Res ; 52(W1): W540-W546, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38572754

RESUMEN

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.


Asunto(s)
PubMed , Inteligencia Artificial , Humanos , Programas Informáticos , Minería de Datos/métodos , Semántica , Internet
2.
Nucleic Acids Res ; 51(D1): D1512-D1518, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350613

RESUMEN

LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/)-first launched in February 2020-is a first-of-its-kind literature hub for tracking up-to-date published research on COVID-19. The number of articles in LitCovid has increased from 55 000 to ∼300 000 over the past 2.5 years, with a consistent growth rate of ∼10 000 articles per month. In addition to the rapid literature growth, the COVID-19 pandemic has evolved dramatically. For instance, the Omicron variant has now accounted for over 98% of new infections in the United States. In response to the continuing evolution of the COVID-19 pandemic, this article describes significant updates to LitCovid over the last 2 years. First, we introduced the long Covid collection consisting of the articles on COVID-19 survivors experiencing ongoing multisystemic symptoms, including respiratory issues, cardiovascular disease, cognitive impairment, and profound fatigue. Second, we provided new annotations on the latest COVID-19 strains and vaccines mentioned in the literature. Third, we improved several existing features with more accurate machine learning algorithms for annotating topics and classifying articles relevant to COVID-19. LitCovid has been widely used with millions of accesses by users worldwide on various information needs and continues to play a critical role in collecting, curating and standardizing the latest knowledge on the COVID-19 literature.


Asunto(s)
COVID-19 , Bases de Datos Bibliográficas , Humanos , COVID-19/epidemiología , Pandemias , Síndrome Post Agudo de COVID-19 , SARS-CoV-2 , Estados Unidos
3.
J Neurochem ; 2024 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-38702968

RESUMEN

Ependymal cells form a specialized brain-cerebrospinal fluid (CSF) interface and regulate local CSF microcirculation. It is becoming increasingly recognized that ependymal cells assume a reactive state in response to aging and disease, including conditions involving hypoxia, hydrocephalus, neurodegeneration, and neuroinflammation. Yet what transcriptional signatures govern these reactive states and whether this reactivity shares any similarities with classical descriptions of glial reactivity (i.e., in astrocytes) remain largely unexplored. Using single-cell transcriptomics, we interrogated this phenomenon by directly comparing the reactive ependymal cell transcriptome to the reactive astrocyte transcriptome using a well-established model of autoimmune-mediated neuroinflammation (MOG35-55 EAE). In doing so, we unveiled core glial reactivity-associated genes that defined the reactive ependymal cell and astrocyte response to MOG35-55 EAE. Interestingly, known reactive astrocyte genes from other CNS injury/disease contexts were also up-regulated by MOG35-55 EAE ependymal cells, suggesting that this state may be conserved in response to a variety of pathologies. We were also able to recapitulate features of the reactive ependymal cell state acutely using a classic neuroinflammatory cocktail (IFNγ/LPS) both in vitro and in vivo. Taken together, by comparing reactive ependymal cells and astrocytes, we identified a conserved signature underlying glial reactivity that was present in several neuroinflammatory contexts. Future work will explore the mechanisms driving ependymal reactivity and assess downstream functional consequences.

4.
Eur J Neurosci ; 59(10): 2465-2482, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38487941

RESUMEN

The enteric nervous system (ENS) comprises a complex network of neurons whereby a subset appears to be dopaminergic although the characteristics, roles, and implications in disease are less understood. Most investigations relating to enteric dopamine (DA) neurons rely on immunoreactivity to tyrosine hydroxylase (TH)-the rate-limiting enzyme in the production of DA. However, TH immunoreactivity is likely to provide an incomplete picture. This study herein provides a comprehensive characterization of DA neurons in the gut using a reporter mouse line, expressing a fluorescent protein (tdTomato) under control of the DA transporter (DAT) promoter. Our findings confirm a unique localization of DA neurons in the gut and unveil the discrete subtypes of DA neurons in this organ, which we characterized using both immunofluorescence and single-cell transcriptomics, as well as validated using in situ hybridization. We observed distinct subtypes of DAT-tdTomato neurons expressing co-transmitters and modulators across both plexuses; some of them likely co-releasing acetylcholine, while others were positive for a slew of canonical DAergic markers (TH, VMAT2 and GIRK2). Interestingly, we uncovered a seemingly novel population of DA neurons unique to the ENS which was ChAT/DAT-tdTomato-immunoreactive and expressed Grp, Calcb, and Sst. Given the clear heterogeneity of DAergic gut neurons, further investigation is warranted to define their functional signatures and decipher their implication in disease.


Asunto(s)
Proteínas de Transporte de Dopamina a través de la Membrana Plasmática , Neuronas Dopaminérgicas , Sistema Nervioso Entérico , Animales , Proteínas de Transporte de Dopamina a través de la Membrana Plasmática/metabolismo , Proteínas de Transporte de Dopamina a través de la Membrana Plasmática/genética , Neuronas Dopaminérgicas/metabolismo , Ratones , Sistema Nervioso Entérico/metabolismo , Sistema Nervioso Entérico/citología , Ratones Transgénicos , Tirosina 3-Monooxigenasa/metabolismo , Dopamina/metabolismo , Masculino , Proteínas Luminiscentes/metabolismo , Proteínas Luminiscentes/genética , Proteínas de Transporte Vesicular de Monoaminas/metabolismo , Proteínas de Transporte Vesicular de Monoaminas/genética
5.
Bioinformatics ; 38(18): 4449-4451, 2022 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-35904569

RESUMEN

MOTIVATION: Previous studies have shown that automated text-mining tools are becoming increasingly important for successfully unlocking variant information in scientific literature at large scale. Despite multiple attempts in the past, existing tools are still of limited recognition scope and precision. RESULT: We propose tmVar 3.0: an improved variant recognition and normalization system. Compared to its predecessors, tmVar 3.0 recognizes a wider spectrum of variant-related entities (e.g. allele and copy number variants), and groups together different variant mentions belonging to the same genomic sequence position in an article for improved accuracy. Moreover, tmVar 3.0 provides advanced variant normalization options such as allele-specific identifiers from the ClinGen Allele Registry. tmVar 3.0 exhibits state-of-the-art performance with over 90% in F-measure for variant recognition and normalization, when evaluated on three independent benchmarking datasets. tmVar 3.0 as well as annotations for the entire PubMed and PMC datasets are freely available for download. AVAILABILITY AND IMPLEMENTATION: https://github.com/ncbi/tmVar3.


Asunto(s)
Minería de Datos , Publicaciones , PubMed , Genómica
6.
PLoS Biol ; 18(6): e3000716, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32479517

RESUMEN

Data-driven research in biomedical science requires structured, computable data. Increasingly, these data are created with support from automated text mining. Text-mining tools have rapidly matured: although not perfect, they now frequently provide outstanding results. We describe 10 straightforward writing tips-and a web tool, PubReCheck-guiding authors to help address the most common cases that remain difficult for text-mining tools. We anticipate these guides will help authors' work be found more readily and used more widely, ultimately increasing the impact of their work and the overall benefit to both authors and readers. PubReCheck is available at http://www.ncbi.nlm.nih.gov/research/pubrecheck.


Asunto(s)
Minería de Datos , Automatización , Internet , Programas Informáticos
7.
Nucleic Acids Res ; 49(D1): D1534-D1540, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33166392

RESUMEN

Since the outbreak of the current pandemic in 2020, there has been a rapid growth of published articles on COVID-19 and SARS-CoV-2, with about 10,000 new articles added each month. This is causing an increasingly serious information overload, making it difficult for scientists, healthcare professionals and the general public to remain up to date on the latest SARS-CoV-2 and COVID-19 research. Hence, we developed LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub, to track up-to-date scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. To support manual curation, advanced machine-learning and deep-learning algorithms have been developed, evaluated and integrated into the curation workflow. To the best of our knowledge, LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available. Since its release, LitCovid has been widely used, with millions of accesses by users worldwide for various information needs, such as evidence synthesis, drug discovery and text and data mining, among others.


Asunto(s)
COVID-19/prevención & control , Curaduría de Datos/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales , PubMed/estadística & datos numéricos , SARS-CoV-2/aislamiento & purificación , COVID-19/epidemiología , COVID-19/virología , Curaduría de Datos/métodos , Minería de Datos/métodos , Humanos , Internet , Aprendizaje Automático , Pandemias , Publicaciones/estadística & datos numéricos , SARS-CoV-2/fisiología
8.
Nucleic Acids Res ; 49(W1): W352-W358, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-33950204

RESUMEN

Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.


Asunto(s)
Publicaciones , Programas Informáticos , COVID-19 , Curaduría de Datos , Disparidades en Atención de Salud , Humanos , Internet , Neoplasias Hepáticas/epidemiología , Aprendizaje Automático
10.
Nucleic Acids Res ; 47(W1): W587-W593, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31114887

RESUMEN

PubTator Central (https://www.ncbi.nlm.nih.gov/research/pubtator/) is a web service for viewing and retrieving bioconcept annotations in full text biomedical articles. PubTator Central (PTC) provides automated annotations from state-of-the-art text mining systems for genes/proteins, genetic variants, diseases, chemicals, species and cell lines, all available for immediate download. PTC annotates PubMed (29 million abstracts) and the PMC Text Mining subset (3 million full text articles). The new PTC web interface allows users to build full text document collections and visualize concept annotations in each document. Annotations are downloadable in multiple formats (XML, JSON and tab delimited) via the online interface, a RESTful web service and bulk FTP. Improved concept identification systems and a new disambiguation module based on deep learning increase annotation accuracy, and the new server-side architecture is significantly faster. PTC is synchronized with PubMed and PubMed Central, with new articles added daily. The original PubTator service has served annotated abstracts for ∼300 million requests, enabling third-party research in use cases such as biocuration support, gene prioritization, genetic disease analysis, and literature-based knowledge discovery. We demonstrate the full text results in PTC significantly increase biomedical concept coverage and anticipate this expansion will both enhance existing downstream applications and enable new use cases.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Línea Celular , Curaduría de Datos , Enfermedad , Genes , Variación Genética , Humanos , Proteínas , PubMed , Interfaz Usuario-Computador
11.
Nucleic Acids Res ; 47(W1): W594-W599, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31020319

RESUMEN

Literature search is a routine practice for scientific studies as new discoveries build on knowledge from the past. Current tools (e.g. PubMed, PubMed Central), however, generally require significant effort in query formulation and optimization (especially in searching the full-length articles) and do not allow direct retrieval of specific statements, which is key for tasks such as comparing/validating new findings with previous knowledge and performing evidence attribution in biocuration. Thus, we introduce LitSense, which is the first web-based system that specializes in sentence retrieval for biomedical literature. LitSense provides unified access to PubMed and PMC content with over a half-billion sentences in total. Given a query, LitSense returns best-matching sentences using both a traditional term-weighting approach that up-weights sentences that contain more of the rare terms in the user query as well as a novel neural embedding approach that enables the retrieval of semantically relevant results without explicit keyword match. LitSense provides a user-friendly interface that assists its users to quickly browse the returned sentences in context and/or further filter search results by section or publication date. LitSense also employs PubTator to highlight biomedical entities (e.g. gene/proteins) in the sentences for better result visualization. LitSense is freely available at https://www.ncbi.nlm.nih.gov/research/litsense.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Indización y Redacción de Resúmenes , PubMed , Publicaciones
12.
Nucleic Acids Res ; 46(W1): W530-W536, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29762787

RESUMEN

The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. 'A146T' versus 'c.436G>A' versus 'rs121913527'). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Polimorfismo de Nucleótido Simple , Motor de Búsqueda , Interfaz Usuario-Computador , Genética Médica , Genoma Humano , Genómica/métodos , Humanos , Internet , PubMed , Semántica
13.
Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29092050

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.


Asunto(s)
Archaea/genética , Bacterias/genética , Bases de Datos Genéticas , Bases de Datos de Proteínas , Eucariontes/genética , Genómica , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Minería de Datos , Predicción , Genoma , Anotación de Secuencia Molecular , ARN/genética , Interfaz Usuario-Computador
14.
Mol Biol Evol ; 34(8): 2016-2034, 2017 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-28460059

RESUMEN

Cilia (flagella) are important eukaryotic organelles, present in the Last Eukaryotic Common Ancestor, and are involved in cell motility and integration of extracellular signals. Ciliary dysfunction causes a class of genetic diseases, known as ciliopathies, however current knowledge of the underlying mechanisms is still limited and a better characterization of genes is needed. As cilia have been lost independently several times during evolution and they are subject to important functional variation between species, ciliary genes can be investigated through comparative genomics. We performed phylogenetic profiling by predicting orthologs of human protein-coding genes in 100 eukaryotic species. The analysis integrated three independent methods to predict a consensus set of 274 ciliary genes, including 87 new promising candidates. A fine-grained analysis of the phylogenetic profiles allowed a partitioning of ciliary genes into modules with distinct evolutionary histories and ciliary functions (assembly, movement, centriole, etc.) and thus propagation of potential annotations to previously undocumented genes. The cilia/basal body localization was experimentally confirmed for five of these previously unannotated proteins (LRRC23, LRRC34, TEX9, WDR27, and BIVM), validating the relevance of our approach. Furthermore, our multi-level analysis sheds light on the core gene sets retained in gamete-only flagellates or Ecdysozoa for instance. By combining gene-centric and species-oriented analyses, this work reveals new ciliary and ciliopathy gene candidates and provides clues about the evolution of ciliary processes in the eukaryotic domain. Additionally, the positive and negative reference gene sets and the phylogenetic profile of human genes constructed during this study can be exploited in future work.


Asunto(s)
Cilios/genética , Ciliopatías/genética , Animales , Movimiento Celular/genética , Cilios/metabolismo , Ciliopatías/metabolismo , Bases de Datos de Ácidos Nucleicos , Eucariontes , Células Eucariotas , Evolución Molecular , Flagelos/genética , Flagelos/metabolismo , Genómica , Humanos , Filogenia , Análisis de Secuencia de ADN/métodos
15.
J Med Internet Res ; 19(6): e212, 2017 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-28623182

RESUMEN

BACKGROUND: The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. OBJECTIVE: MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. METHODS: MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. RESULTS: MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. CONCLUSIONS: We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends.


Asunto(s)
Enfermedades Genéticas Congénitas/genética , Pruebas Genéticas/métodos , Red Social , Telemedicina/estadística & datos numéricos , Amigos , Humanos , Investigadores
16.
Bioinformatics ; 31(3): 447-8, 2015 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-25273105

RESUMEN

SUMMARY: We previously developed OrthoInspector, a package incorporating an original algorithm for the detection of orthology and inparalogy relations between different species. We have added new functionalities to the package. While its original algorithm was not modified, performing similar orthology predictions, we facilitated the prediction of very large databases (thousands of proteomes), refurbished its graphical interface, added new visualization tools for comparative genomics/protein family analysis and facilitated its deployment in a network environment. Finally, we have released three online databases of precomputed orthology relationships. AVAILABILITY: Package and databases are freely available at http://lbgi.fr/orthoinspector with all major browsers supported. CONTACT: odile.lecompte@unistra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Gráficos por Computador , Bases de Datos Factuales , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
17.
Bioinformatics ; 29(20): 2643-4, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-23929031

RESUMEN

SUMMARY: We present PARSEC (PAtteRn Search and Contextualization), a new open source platform for guided discovery, allowing localization and biological characterization of short genomic sites in entire eukaryotic genomes. PARSEC can search for a sequence or a degenerated pattern. The retrieved set of genomic sites can be characterized in terms of (i) conservation in model organisms, (ii) genomic context (proximity to genes) and (iii) function of neighboring genes. These modules allow the user to explore, visualize, filter and extract biological knowledge from a set of short genomic regions such as transcription factor binding sites. AVAILABILITY: Web site implemented in Java, JavaScript and C++, with all major browsers supported. Freely available at lbgi.fr/parsec. Source code is freely available at sourceforge.net/projects/genomicparsec.


Asunto(s)
Genómica/métodos , Algoritmos , Genoma , Humanos , Internet , Dinámicas no Lineales , Lenguajes de Programación , Programas Informáticos
18.
ArXiv ; 2024 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-38410657

RESUMEN

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

19.
Patterns (N Y) ; 4(1): 100659, 2023 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-36471749

RESUMEN

A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning framework combining data programming with active learning into a robust ensemble model, demonstrating higher specificity and considerably higher sensitivity than other methods. Analysis of the Long Covid Collection shows that (1) most Long Covid articles do not refer to Long Covid by any name, (2) when the condition is named, the name used most frequently in the literature is Long Covid, and (3) Long Covid is associated with disorders in a wide variety of body systems. The Long Covid Collection is updated weekly and is searchable online at the LitCovid portal: https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum?filters=e_condition.LongCovid.

20.
IEEE/ACM Trans Comput Biol Bioinform ; 19(5): 2584-2595, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35536809

RESUMEN

The rapid growth of biomedical literature poses a significant challenge for curation and interpretation. This has become more evident during the COVID-19 pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed, has accumulated over 200,000 articles with millions of accesses. Approximately 10,000 new articles are added to LitCovid every month. A main curation task in LitCovid is topic annotation where an article is assigned with up to eight topics, e.g., Treatment and Diagnosis. The annotated topics have been widely used both in LitCovid (e.g., accounting for ∼18% of total uses) and downstream studies such as network generation. However, it has been a primary curation bottleneck due to the nature of the task and the rapid literature growth. This study proposes LITMC-BERT, a transformer-based multi-label classification method in biomedical literature. It uses a shared transformer backbone for all the labels while also captures label-specific features and the correlations between label pairs. We compare LITMC-BERT with three baseline models on two datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the current best results, respectively, and only requires ∼18% of the inference time than the Binary BERT baseline. The related datasets and models are available via https://github.com/ncbi/ml-transformer.


Asunto(s)
COVID-19 , Minería de Datos , Minería de Datos/métodos , Bases de Datos Factuales , Humanos , Pandemias , Publicaciones
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA