Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters

Database
Language
Publication year range
1.
Nature ; 571(7763): 95-98, 2019 07.
Article in English | MEDLINE | ID: mdl-31270483

ABSTRACT

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.


Subject(s)
Data Mining/methods , Knowledge , Materials Science , Natural Language Processing , Research Report , Research , Terminology as Topic , Unsupervised Machine Learning , Electric Conductivity , Electrodes , Iron , Lithium , Magnetics , Reproducibility of Results , Semantics , Temperature
2.
Soft Matter ; 17(20): 5131-5136, 2021 May 26.
Article in English | MEDLINE | ID: mdl-34037064

ABSTRACT

Understanding the diffusive behavior of particles and large molecules in channels is of fundamental importance in biological and synthetic systems, such as channel proteins, nanopores, and nanofluidics. Although theoretical and numerical modelings have suggested some solutions, these models have not been fully supported with direct experimental measurements. Here, we demonstrate that experimental diffusion coefficients of particles in finite open-ended channels are always higher than the prediction based on the conventional theoretical model of infinitely long channels. By combining microfluidic experiments, numerical simulations, and analytical modeling, we show that diffusion coefficients are dependent not only on the radius ratio but also on the channel length, the boundary conditions of the neighboring reservoirs, and the compressibility of the medium.

3.
Sci Data ; 6(1): 273, 2019 11 15.
Article in English | MEDLINE | ID: mdl-31729397

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

4.
Sci Data ; 6(1): 203, 2019 10 15.
Article in English | MEDLINE | ID: mdl-31615989

ABSTRACT

Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of "codified recipes" for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.

SELECTION OF CITATIONS
SEARCH DETAIL