Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Chem Mater ; 36(2): 772-785, 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38282687

RESUMEN

We used data-driven methods to understand the formation of impurity phases in BiFeO3 thin-film synthesis through the sol-gel technique. Using a high-quality dataset of 331 synthesis procedures and outcomes extracted manually from 177 scientific articles, we trained decision tree models that reinforce important experimental heuristics for the avoidance of phase impurities but ultimately show limited predictive capability. We find that several important synthesis features, identified by our model, are often not reported in the literature. To test our ability to correctly impute missing synthesis parameters, we attempted to reproduce nine syntheses from the literature with varying degrees of "missingness". We demonstrate how a text-mined dataset can be made useful by informing new controlled experiments and forming a better understanding for impurity phase formation in this complex oxide system.

2.
PLoS One ; 18(2): e0281147, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36724184

RESUMEN

The ongoing COVID-19 pandemic produced far-reaching effects throughout society, and science is no exception. The scale, speed, and breadth of the scientific community's COVID-19 response lead to the emergence of new research at the remarkable rate of more than 250 papers published per day. This posed a challenge for the scientific community as traditional methods of engagement with the literature were strained by the volume of new research being produced. Meanwhile, the urgency of response lead to an increasingly prominent role for preprint servers and a diffusion of relevant research through many channels simultaneously. These factors created a need for new tools to change the way scientific literature is organized and found by researchers. With this challenge in mind, we present an overview of COVIDScholar https://covidscholar.org, an automated knowledge portal which utilizes natural language processing (NLP) that was built to meet these urgent needs. The search interface for this corpus of more than 260,000 research articles, patents, and clinical trials served more than 33,000 users at an average of 2,000 monthly active users and a peak of more than 8,600 weekly active users in the summer of 2020. Additionally, we include an analysis of trends in COVID-19 research over the course of the pandemic with a particular focus on the first 10 months, which represents a unique period of rapid worldwide shift in scientific attention.


Asunto(s)
COVID-19 , Humanos , Pandemias , Publicaciones , Procesamiento de Lenguaje Natural
3.
Chem Mater ; 34(16): 7323-7336, 2022 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-36032555

RESUMEN

There currently exist no quantitative methods to determine the appropriate conditions for solid-state synthesis. This not only hinders the experimental realization of novel materials but also complicates the interpretation and understanding of solid-state reaction mechanisms. Here, we demonstrate a machine-learning approach that predicts synthesis conditions using large solid-state synthesis data sets text-mined from scientific journal articles. Using feature importance ranking analysis, we discovered that optimal heating temperatures have strong correlations with the stability of precursor materials quantified using melting points and formation energies (ΔG f , ΔH f ). In contrast, features derived from the thermodynamics of synthesis-related reactions did not directly correlate to the chosen heating temperatures. This correlation between optimal solid-state heating temperature and precursor stability extends Tamman's rule from intermetallics to oxide systems, suggesting the importance of reaction kinetics in determining synthesis conditions. Heating times are shown to be strongly correlated with the chosen experimental procedures and instrument setups, which may be indicative of human bias in the data set. Using these predictive features, we constructed machine-learning models with good performance and general applicability to predict the conditions required to synthesize diverse chemical systems.

4.
Sci Data ; 9(1): 234, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35618761

RESUMEN

Gold nanoparticles are highly desired for a range of technological applications due to their tunable properties, which are dictated by the size and shape of the constituent particles. Many heuristic methods for controlling the morphological characteristics of gold nanoparticles are well known. However, the underlying mechanisms controlling their size and shape remain poorly understood, partly due to the immense range of possible combinations of synthesis parameters. Data-driven methods can offer insight to help guide understanding of these underlying mechanisms, so long as sufficient synthesis data are available. To facilitate data mining in this direction, we have constructed and made publicly available a dataset of codified gold nanoparticle synthesis protocols and outcomes extracted directly from the nanoparticle materials science literature using natural language processing and text-mining techniques. This dataset contains 5,154 data records, each representing a single gold nanoparticle synthesis article, filtered from a database of 4,973,165 publications. Each record contains codified synthesis protocols and extracted morphological information from a total of 7,608 experimental and 12,519 characterization paragraphs.

5.
Patterns (N Y) ; 3(4): 100488, 2022 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-35465225

RESUMEN

A bottleneck in efficiently connecting new materials discoveries to established literature has arisen due to an increase in publications. This problem may be addressed by using named entity recognition (NER) to extract structured summary-level data from unstructured materials science text. We compare the performance of four NER models on three materials science datasets. The four models include a bidirectional long short-term memory (BiLSTM) and three transformer models (BERT, SciBERT, and MatBERT) with increasing degrees of domain-specific materials science pre-training. MatBERT improves over the other two BERTBASE-based models by 1%∼12%, implying that domain-specific pre-training provides measurable advantages. Despite relative architectural simplicity, the BiLSTM model consistently outperforms BERT, perhaps due to its domain-specific pre-trained word embeddings. Furthermore, MatBERT and SciBERT models outperform the original BERT model to a greater extent in the small data limit. MatBERT's higher-quality predictions should accelerate the extraction of structured data from materials science literature.

6.
J Med Internet Res ; 23(7): e26995, 2021 07 16.
Artículo en Inglés | MEDLINE | ID: mdl-34138726

RESUMEN

BACKGROUND: Papers on COVID-19 are being published at a high rate and concern many different topics. Innovative tools are needed to aid researchers to find patterns in this vast amount of literature to identify subsets of interest in an automated fashion. OBJECTIVE: We present a new online software resource with a friendly user interface that allows users to query and interact with visual representations of relationships between publications. METHODS: We publicly released an application called PLATIPUS (Publication Literature Analysis and Text Interaction Platform for User Studies) that allows researchers to interact with literature supplied by COVIDScholar via a visual analytics platform. This tool contains standard filtering capabilities based on authors, journals, high-level categories, and various research-specific details via natural language processing and dozens of customizable visualizations that dynamically update from a researcher's query. RESULTS: PLATIPUS is available online and currently links to over 100,000 publications and is still growing. This application has the potential to transform how COVID-19 researchers use public literature to enable their research. CONCLUSIONS: The PLATIPUS application provides the end user with a variety of ways to search, filter, and visualize over 100,00 COVID-19 publications.


Asunto(s)
COVID-19 , Interpretación de Imagen Asistida por Computador , Almacenamiento y Recuperación de la Información , SARS-CoV-2 , Humanos , Procesamiento de Lenguaje Natural , Programas Informáticos , Interfaz Usuario-Computador
7.
iScience ; 24(3): 102155, 2021 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-33665573

RESUMEN

Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of the information contained within. Recent progress in natural language processing (NLP) has provided a variety of tools for high-quality information extraction from unstructured text. These tools are primarily trained on non-technical text and struggle to produce accurate results when applied to scientific text, involving specific technical terminology. During the last years, significant efforts in information retrieval have been made for biomedical and biochemical publications. For materials science, text mining (TM) methodology is still at the dawn of its development. In this review, we survey the recent progress in creating and applying TM and NLP approaches to materials science field. This review is directed at the broad class of researchers aiming to learn the fundamentals of TM as applied to the materials science publications.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...