RESUMEN
Ependymal cells form a specialized brain-cerebrospinal fluid (CSF) interface and regulate local CSF microcirculation. It is becoming increasingly recognized that ependymal cells assume a reactive state in response to aging and disease, including conditions involving hypoxia, hydrocephalus, neurodegeneration, and neuroinflammation. Yet what transcriptional signatures govern these reactive states and whether this reactivity shares any similarities with classical descriptions of glial reactivity (i.e., in astrocytes) remain largely unexplored. Using single-cell transcriptomics, we interrogated this phenomenon by directly comparing the reactive ependymal cell transcriptome to the reactive astrocyte transcriptome using a well-established model of autoimmune-mediated neuroinflammation (MOG35-55 EAE). In doing so, we unveiled core glial reactivity-associated genes that defined the reactive ependymal cell and astrocyte response to MOG35-55 EAE. Interestingly, known reactive astrocyte genes from other CNS injury/disease contexts were also up-regulated by MOG35-55 EAE ependymal cells, suggesting that this state may be conserved in response to a variety of pathologies. We were also able to recapitulate features of the reactive ependymal cell state acutely using a classic neuroinflammatory cocktail (IFNγ/LPS) both in vitro and in vivo. Taken together, by comparing reactive ependymal cells and astrocytes, we identified a conserved signature underlying glial reactivity that was present in several neuroinflammatory contexts. Future work will explore the mechanisms driving ependymal reactivity and assess downstream functional consequences.
Asunto(s)
Astrocitos , Encefalomielitis Autoinmune Experimental , Epéndimo , Ratones Endogámicos C57BL , Animales , Astrocitos/metabolismo , Astrocitos/patología , Epéndimo/metabolismo , Epéndimo/patología , Ratones , Encefalomielitis Autoinmune Experimental/patología , Encefalomielitis Autoinmune Experimental/metabolismo , Encefalomielitis Autoinmune Experimental/inmunología , Femenino , Enfermedades Neuroinflamatorias/patología , TranscriptomaRESUMEN
PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.
Asunto(s)
PubMed , Inteligencia Artificial , Humanos , Programas Informáticos , Minería de Datos/métodos , Semántica , InternetRESUMEN
The enteric nervous system (ENS) comprises a complex network of neurons whereby a subset appears to be dopaminergic although the characteristics, roles, and implications in disease are less understood. Most investigations relating to enteric dopamine (DA) neurons rely on immunoreactivity to tyrosine hydroxylase (TH)-the rate-limiting enzyme in the production of DA. However, TH immunoreactivity is likely to provide an incomplete picture. This study herein provides a comprehensive characterization of DA neurons in the gut using a reporter mouse line, expressing a fluorescent protein (tdTomato) under control of the DA transporter (DAT) promoter. Our findings confirm a unique localization of DA neurons in the gut and unveil the discrete subtypes of DA neurons in this organ, which we characterized using both immunofluorescence and single-cell transcriptomics, as well as validated using in situ hybridization. We observed distinct subtypes of DAT-tdTomato neurons expressing co-transmitters and modulators across both plexuses; some of them likely co-releasing acetylcholine, while others were positive for a slew of canonical DAergic markers (TH, VMAT2 and GIRK2). Interestingly, we uncovered a seemingly novel population of DA neurons unique to the ENS which was ChAT/DAT-tdTomato-immunoreactive and expressed Grp, Calcb, and Sst. Given the clear heterogeneity of DAergic gut neurons, further investigation is warranted to define their functional signatures and decipher their implication in disease.
Asunto(s)
Proteínas de Transporte de Dopamina a través de la Membrana Plasmática , Neuronas Dopaminérgicas , Sistema Nervioso Entérico , Animales , Ratones , Dopamina/metabolismo , Proteínas de Transporte de Dopamina a través de la Membrana Plasmática/metabolismo , Proteínas de Transporte de Dopamina a través de la Membrana Plasmática/genética , Neuronas Dopaminérgicas/metabolismo , Sistema Nervioso Entérico/metabolismo , Sistema Nervioso Entérico/citología , Proteínas Luminiscentes/metabolismo , Proteínas Luminiscentes/genética , Ratones Transgénicos , Tirosina 3-Monooxigenasa/metabolismo , Proteínas de Transporte Vesicular de Monoaminas/metabolismo , Proteínas de Transporte Vesicular de Monoaminas/genética , Genes ReporterosRESUMEN
PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.
RESUMEN
A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning framework combining data programming with active learning into a robust ensemble model, demonstrating higher specificity and considerably higher sensitivity than other methods. Analysis of the Long Covid Collection shows that (1) most Long Covid articles do not refer to Long Covid by any name, (2) when the condition is named, the name used most frequently in the literature is Long Covid, and (3) Long Covid is associated with disorders in a wide variety of body systems. The Long Covid Collection is updated weekly and is searchable online at the LitCovid portal: https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum?filters=e_condition.LongCovid.
RESUMEN
LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/)-first launched in February 2020-is a first-of-its-kind literature hub for tracking up-to-date published research on COVID-19. The number of articles in LitCovid has increased from 55 000 to â¼300 000 over the past 2.5 years, with a consistent growth rate of â¼10 000 articles per month. In addition to the rapid literature growth, the COVID-19 pandemic has evolved dramatically. For instance, the Omicron variant has now accounted for over 98% of new infections in the United States. In response to the continuing evolution of the COVID-19 pandemic, this article describes significant updates to LitCovid over the last 2 years. First, we introduced the long Covid collection consisting of the articles on COVID-19 survivors experiencing ongoing multisystemic symptoms, including respiratory issues, cardiovascular disease, cognitive impairment, and profound fatigue. Second, we provided new annotations on the latest COVID-19 strains and vaccines mentioned in the literature. Third, we improved several existing features with more accurate machine learning algorithms for annotating topics and classifying articles relevant to COVID-19. LitCovid has been widely used with millions of accesses by users worldwide on various information needs and continues to play a critical role in collecting, curating and standardizing the latest knowledge on the COVID-19 literature.
Asunto(s)
COVID-19 , Bases de Datos Bibliográficas , Humanos , COVID-19/epidemiología , Pandemias , Síndrome Post Agudo de COVID-19 , SARS-CoV-2 , Estados UnidosRESUMEN
The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.
Asunto(s)
COVID-19 , COVID-19/epidemiología , Minería de Datos/métodos , Bases de Datos Factuales , Humanos , PubMed , PublicacionesRESUMEN
MOTIVATION: Previous studies have shown that automated text-mining tools are becoming increasingly important for successfully unlocking variant information in scientific literature at large scale. Despite multiple attempts in the past, existing tools are still of limited recognition scope and precision. RESULT: We propose tmVar 3.0: an improved variant recognition and normalization system. Compared to its predecessors, tmVar 3.0 recognizes a wider spectrum of variant-related entities (e.g. allele and copy number variants), and groups together different variant mentions belonging to the same genomic sequence position in an article for improved accuracy. Moreover, tmVar 3.0 provides advanced variant normalization options such as allele-specific identifiers from the ClinGen Allele Registry. tmVar 3.0 exhibits state-of-the-art performance with over 90% in F-measure for variant recognition and normalization, when evaluated on three independent benchmarking datasets. tmVar 3.0 as well as annotations for the entire PubMed and PMC datasets are freely available for download. AVAILABILITY AND IMPLEMENTATION: https://github.com/ncbi/tmVar3.
Asunto(s)
Minería de Datos , Publicaciones , PubMed , GenómicaRESUMEN
The rapid growth of biomedical literature poses a significant challenge for curation and interpretation. This has become more evident during the COVID-19 pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed, has accumulated over 200,000 articles with millions of accesses. Approximately 10,000 new articles are added to LitCovid every month. A main curation task in LitCovid is topic annotation where an article is assigned with up to eight topics, e.g., Treatment and Diagnosis. The annotated topics have been widely used both in LitCovid (e.g., accounting for â¼18% of total uses) and downstream studies such as network generation. However, it has been a primary curation bottleneck due to the nature of the task and the rapid literature growth. This study proposes LITMC-BERT, a transformer-based multi-label classification method in biomedical literature. It uses a shared transformer backbone for all the labels while also captures label-specific features and the correlations between label pairs. We compare LITMC-BERT with three baseline models on two datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the current best results, respectively, and only requires â¼18% of the inference time than the Binary BERT baseline. The related datasets and models are available via https://github.com/ncbi/ml-transformer.
Asunto(s)
COVID-19 , Minería de Datos , Minería de Datos/métodos , Bases de Datos Factuales , Humanos , Pandemias , PublicacionesRESUMEN
The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)-the branch of artificial intelligence that interprets human language-can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges.
Asunto(s)
COVID-19/epidemiología , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Comunicación , Minería de Datos/métodos , Conjuntos de Datos como Asunto , Emociones , Humanos , Descubrimiento del Conocimiento , Pandemias , Publicaciones Periódicas como Asunto , Programas InformáticosRESUMEN
Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.
Asunto(s)
Publicaciones , Programas Informáticos , COVID-19 , Curaduría de Datos , Disparidades en Atención de Salud , Humanos , Internet , Neoplasias Hepáticas/epidemiología , Aprendizaje AutomáticoRESUMEN
OBJECTIVE: Reticular pseudodrusen (RPD), a key feature of age-related macular degeneration (AMD), are poorly detected by human experts on standard color fundus photography (CFP) and typically require advanced imaging modalities such as fundus autofluorescence (FAF). The objective was to develop and evaluate the performance of a novel multimodal, multitask, multiattention (M3) deep learning framework on RPD detection. MATERIALS AND METHODS: A deep learning framework (M3) was developed to detect RPD presence accurately using CFP alone, FAF alone, or both, employing >8000 CFP-FAF image pairs obtained prospectively (Age-Related Eye Disease Study 2). The M3 framework includes multimodal (detection from single or multiple image modalities), multitask (training different tasks simultaneously to improve generalizability), and multiattention (improving ensembled feature representation) operation. Performance on RPD detection was compared with state-of-the-art deep learning models and 13 ophthalmologists; performance on detection of 2 other AMD features (geographic atrophy and pigmentary abnormalities) was also evaluated. RESULTS: For RPD detection, M3 achieved an area under the receiver-operating characteristic curve (AUROC) of 0.832, 0.931, and 0.933 for CFP alone, FAF alone, and both, respectively. M3 performance on CFP was very substantially superior to human retinal specialists (median F1 score = 0.644 vs 0.350). External validation (the Rotterdam Study) demonstrated high accuracy on CFP alone (AUROC, 0.965). The M3 framework also accurately detected geographic atrophy and pigmentary abnormalities (AUROC, 0.909 and 0.912, respectively), demonstrating its generalizability. CONCLUSIONS: This study demonstrates the successful development, robust evaluation, and external validation of a novel deep learning framework that enables accessible, accurate, and automated AMD diagnosis and prognosis.
Asunto(s)
Aprendizaje Profundo , Diagnóstico por Computador , Drusas Retinianas/diagnóstico , Anciano , Simulación por Computador , Conjuntos de Datos como Asunto , Femenino , Fondo de Ojo , Humanos , Degeneración Macular/diagnóstico , MasculinoRESUMEN
Since the outbreak of the current pandemic in 2020, there has been a rapid growth of published articles on COVID-19 and SARS-CoV-2, with about 10,000 new articles added each month. This is causing an increasingly serious information overload, making it difficult for scientists, healthcare professionals and the general public to remain up to date on the latest SARS-CoV-2 and COVID-19 research. Hence, we developed LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub, to track up-to-date scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. To support manual curation, advanced machine-learning and deep-learning algorithms have been developed, evaluated and integrated into the curation workflow. To the best of our knowledge, LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available. Since its release, LitCovid has been widely used, with millions of accesses by users worldwide for various information needs, such as evidence synthesis, drug discovery and text and data mining, among others.
Asunto(s)
COVID-19/prevención & control , Curaduría de Datos/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales , PubMed/estadística & datos numéricos , SARS-CoV-2/aislamiento & purificación , COVID-19/epidemiología , COVID-19/virología , Curaduría de Datos/métodos , Minería de Datos/métodos , Humanos , Internet , Aprendizaje Automático , Pandemias , Publicaciones/estadística & datos numéricos , SARS-CoV-2/fisiologíaRESUMEN
By 2040, age-related macular degeneration (AMD) will affect ~288 million people worldwide. Identifying individuals at high risk of progression to late AMD, the sight-threatening stage, is critical for clinical actions, including medical interventions and timely monitoring. Although deep learning has shown promise in diagnosing/screening AMD using color fundus photographs, it remains difficult to predict individuals' risks of late AMD accurately. For both tasks, these initial deep learning attempts have remained largely unvalidated in independent cohorts. Here, we demonstrate how deep learning and survival analysis can predict the probability of progression to late AMD using 3298 participants (over 80,000 images) from the Age-Related Eye Disease Studies AREDS and AREDS2, the largest longitudinal clinical trials in AMD. When validated against an independent test data set of 601 participants, our model achieved high prognostic accuracy (5-year C-statistic 86.4 (95% confidence interval 86.2-86.6)) that substantially exceeded that of retinal specialists using two existing clinical standards (81.3 (81.1-81.5) and 82.0 (81.8-82.3), respectively). Interestingly, our approach offers additional strengths over the existing clinical standards in AMD prognosis (e.g., risk ascertainment above 50%) and is likely to be highly generalizable, given the breadth of training data from 82 US retinal specialty clinics. Indeed, during external validation through training on AREDS and testing on AREDS2 as an independent cohort, our model retained substantially higher prognostic accuracy than existing clinical standards. These results highlight the potential of deep learning systems to enhance clinical decision-making in AMD patients.
RESUMEN
Data-driven research in biomedical science requires structured, computable data. Increasingly, these data are created with support from automated text mining. Text-mining tools have rapidly matured: although not perfect, they now frequently provide outstanding results. We describe 10 straightforward writing tips-and a web tool, PubReCheck-guiding authors to help address the most common cases that remain difficult for text-mining tools. We anticipate these guides will help authors' work be found more readily and used more widely, ultimately increasing the impact of their work and the overall benefit to both authors and readers. PubReCheck is available at http://www.ncbi.nlm.nih.gov/research/pubrecheck.
Asunto(s)
Minería de Datos , Automatización , Internet , Programas InformáticosRESUMEN
PubTator Central (https://www.ncbi.nlm.nih.gov/research/pubtator/) is a web service for viewing and retrieving bioconcept annotations in full text biomedical articles. PubTator Central (PTC) provides automated annotations from state-of-the-art text mining systems for genes/proteins, genetic variants, diseases, chemicals, species and cell lines, all available for immediate download. PTC annotates PubMed (29 million abstracts) and the PMC Text Mining subset (3 million full text articles). The new PTC web interface allows users to build full text document collections and visualize concept annotations in each document. Annotations are downloadable in multiple formats (XML, JSON and tab delimited) via the online interface, a RESTful web service and bulk FTP. Improved concept identification systems and a new disambiguation module based on deep learning increase annotation accuracy, and the new server-side architecture is significantly faster. PTC is synchronized with PubMed and PubMed Central, with new articles added daily. The original PubTator service has served annotated abstracts for â¼300 million requests, enabling third-party research in use cases such as biocuration support, gene prioritization, genetic disease analysis, and literature-based knowledge discovery. We demonstrate the full text results in PTC significantly increase biomedical concept coverage and anticipate this expansion will both enhance existing downstream applications and enable new use cases.
Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Línea Celular , Curaduría de Datos , Enfermedad , Genes , Variación Genética , Humanos , Proteínas , PubMed , Interfaz Usuario-ComputadorRESUMEN
Literature search is a routine practice for scientific studies as new discoveries build on knowledge from the past. Current tools (e.g. PubMed, PubMed Central), however, generally require significant effort in query formulation and optimization (especially in searching the full-length articles) and do not allow direct retrieval of specific statements, which is key for tasks such as comparing/validating new findings with previous knowledge and performing evidence attribution in biocuration. Thus, we introduce LitSense, which is the first web-based system that specializes in sentence retrieval for biomedical literature. LitSense provides unified access to PubMed and PMC content with over a half-billion sentences in total. Given a query, LitSense returns best-matching sentences using both a traditional term-weighting approach that up-weights sentences that contain more of the rare terms in the user query as well as a novel neural embedding approach that enables the retrieval of semantically relevant results without explicit keyword match. LitSense provides a user-friendly interface that assists its users to quickly browse the returned sentences in context and/or further filter search results by section or publication date. LitSense also employs PubTator to highlight biomedical entities (e.g. gene/proteins) in the sentences for better result visualization. LitSense is freely available at https://www.ncbi.nlm.nih.gov/research/litsense.
Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Indización y Redacción de Resúmenes , PubMed , PublicacionesRESUMEN
The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. 'A146T' versus 'c.436G>A' versus 'rs121913527'). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.