Your browser doesn't support javascript.
loading
A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing.
Shetty, Pranav; Rajan, Arunkumar Chitteth; Kuenneth, Chris; Gupta, Sonakshi; Panchumarti, Lakshmi Prerana; Holm, Lauren; Zhang, Chao; Ramprasad, Rampi.
Afiliação
  • Shetty P; School of Computational Science & Engineering, Atlanta, GA USA.
  • Rajan AC; School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, 30332 GA USA.
  • Kuenneth C; School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, 30332 GA USA.
  • Gupta S; Department of Metallurgy Engineering and Materials Science, Indian Institute of Technology, Indore, Madhya Pradesh India.
  • Panchumarti LP; School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, 30332 GA USA.
  • Holm L; School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, 30332 GA USA.
  • Zhang C; School of Computational Science & Engineering, Atlanta, GA USA.
  • Ramprasad R; School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, 30332 GA USA.
NPJ Comput Mater ; 9(1): 52, 2023.
Article em En | MEDLINE | ID: mdl-37033291
ABSTRACT
The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from literature. We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets. Using this pipeline, we obtained ~300,000 material property records from ~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available at polymerscholar.org which can be used to locate material property data recorded in abstracts. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with extracted material property information.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article