Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nature ; 571(7763): 95-98, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31270483

RESUMO

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.


Assuntos
Mineração de Dados/métodos , Conhecimento , Ciência dos Materiais , Processamento de Linguagem Natural , Relatório de Pesquisa , Pesquisa , Terminologia como Assunto , Aprendizado de Máquina não Supervisionado , Condutividade Elétrica , Eletrodos , Ferro , Lítio , Magnetismo , Reprodutibilidade dos Testes , Semântica , Temperatura
2.
J Med Internet Res ; 23(7): e26995, 2021 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-34138726

RESUMO

BACKGROUND: Papers on COVID-19 are being published at a high rate and concern many different topics. Innovative tools are needed to aid researchers to find patterns in this vast amount of literature to identify subsets of interest in an automated fashion. OBJECTIVE: We present a new online software resource with a friendly user interface that allows users to query and interact with visual representations of relationships between publications. METHODS: We publicly released an application called PLATIPUS (Publication Literature Analysis and Text Interaction Platform for User Studies) that allows researchers to interact with literature supplied by COVIDScholar via a visual analytics platform. This tool contains standard filtering capabilities based on authors, journals, high-level categories, and various research-specific details via natural language processing and dozens of customizable visualizations that dynamically update from a researcher's query. RESULTS: PLATIPUS is available online and currently links to over 100,000 publications and is still growing. This application has the potential to transform how COVID-19 researchers use public literature to enable their research. CONCLUSIONS: The PLATIPUS application provides the end user with a variety of ways to search, filter, and visualize over 100,00 COVID-19 publications.


Assuntos
COVID-19 , Interpretação de Imagem Assistida por Computador , Armazenamento e Recuperação da Informação , SARS-CoV-2 , Humanos , Processamento de Linguagem Natural , Software , Interface Usuário-Computador
3.
Nat Commun ; 15(1): 1418, 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38360817

RESUMO

Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

4.
PLoS One ; 18(2): e0281147, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36724184

RESUMO

The ongoing COVID-19 pandemic produced far-reaching effects throughout society, and science is no exception. The scale, speed, and breadth of the scientific community's COVID-19 response lead to the emergence of new research at the remarkable rate of more than 250 papers published per day. This posed a challenge for the scientific community as traditional methods of engagement with the literature were strained by the volume of new research being produced. Meanwhile, the urgency of response lead to an increasingly prominent role for preprint servers and a diffusion of relevant research through many channels simultaneously. These factors created a need for new tools to change the way scientific literature is organized and found by researchers. With this challenge in mind, we present an overview of COVIDScholar https://covidscholar.org, an automated knowledge portal which utilizes natural language processing (NLP) that was built to meet these urgent needs. The search interface for this corpus of more than 260,000 research articles, patents, and clinical trials served more than 33,000 users at an average of 2,000 monthly active users and a peak of more than 8,600 weekly active users in the summer of 2020. Additionally, we include an analysis of trends in COVID-19 research over the course of the pandemic with a particular focus on the first 10 months, which represents a unique period of rapid worldwide shift in scientific attention.


Assuntos
COVID-19 , Humanos , Pandemias , Publicações , Processamento de Linguagem Natural
5.
Patterns (N Y) ; 3(4): 100488, 2022 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-35465225

RESUMO

A bottleneck in efficiently connecting new materials discoveries to established literature has arisen due to an increase in publications. This problem may be addressed by using named entity recognition (NER) to extract structured summary-level data from unstructured materials science text. We compare the performance of four NER models on three materials science datasets. The four models include a bidirectional long short-term memory (BiLSTM) and three transformer models (BERT, SciBERT, and MatBERT) with increasing degrees of domain-specific materials science pre-training. MatBERT improves over the other two BERTBASE-based models by 1%∼12%, implying that domain-specific pre-training provides measurable advantages. Despite relative architectural simplicity, the BiLSTM model consistently outperforms BERT, perhaps due to its domain-specific pre-trained word embeddings. Furthermore, MatBERT and SciBERT models outperform the original BERT model to a greater extent in the small data limit. MatBERT's higher-quality predictions should accelerate the extraction of structured data from materials science literature.

6.
Nat Commun ; 8(1): 323, 2017 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-28831161

RESUMO

Auxetics comprise a rare family of materials that manifest negative Poisson's ratio, which causes an expansion instead of contraction under tension. Most known homogeneously auxetic materials are porous foams or artificial macrostructures and there are few examples of inorganic materials that exhibit this behavior as polycrystalline solids. It is now possible to accelerate the discovery of materials with target properties, such as auxetics, using high-throughput computations, open databases, and efficient search algorithms. Candidates exhibiting features correlating with auxetic behavior were chosen from the set of more than 67 000 materials in the Materials Project database. Poisson's ratios were derived from the calculated elastic tensor of each material in this reduced set of compounds. We report that this strategy results in the prediction of three previously unidentified homogeneously auxetic materials as well as a number of compounds with a near-zero homogeneous Poisson's ratio, which are here denoted "anepirretic materials".There are very few inorganic materials with auxetic homogenous Poisson's ratio in polycrystalline form. Here authors develop an approach to screening materials databases for target properties such as negative Poisson's ratio by using stability and structural motifs to predict new instances of homogenous auxetic behavior as well as a number of materials with near-zero Poisson's ratio.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA