Your browser doesn't support javascript.
loading
Creation of a structured solar cell material dataset and performance prediction using large language models.
Xie, Tong; Wan, Yuwei; Zhou, Yufei; Huang, Wei; Liu, Yixuan; Linghu, Qingyuan; Wang, Shaozhou; Kit, Chunyu; Grazian, Clara; Zhang, Wenjie; Hoex, Bram.
Afiliación
  • Xie T; School of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Kensington, NSW, Australia.
  • Wan Y; GreenDynamics Pty. Ltd, Kensington, NSW, Australia.
  • Zhou Y; GreenDynamics Pty. Ltd, Kensington, NSW, Australia.
  • Huang W; Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China.
  • Liu Y; Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China.
  • Linghu Q; School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia.
  • Wang S; GreenDynamics Pty. Ltd, Kensington, NSW, Australia.
  • Kit C; GreenDynamics Pty. Ltd, Kensington, NSW, Australia.
  • Grazian C; School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia.
  • Zhang W; School of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Kensington, NSW, Australia.
  • Hoex B; GreenDynamics Pty. Ltd, Kensington, NSW, Australia.
Patterns (N Y) ; 5(5): 100955, 2024 May 10.
Article en En | MEDLINE | ID: mdl-38800367
ABSTRACT
Materials scientists usually collect experimental data to summarize experiences and predict improved materials. However, a crucial issue is how to proficiently utilize unstructured data to update existing structured data, particularly in applied disciplines. This study introduces a new natural language processing (NLP) task called structured information inference (SII) to address this problem. We propose an end-to-end approach to summarize and organize the multi-layered device-level information from the literature into structured data. After comparing different methods, we fine-tuned LLaMA with an F1 score of 87.14% to update an existing perovskite solar cell dataset with articles published since its release, allowing its direct use in subsequent data analysis. Using structured information, we developed regression tasks to predict the electrical performance of solar cells. Our results demonstrate comparable performance to traditional machine-learning methods without feature selection and highlight the potential of large language models for scientific knowledge acquisition and material development.
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Revista: Patterns (N Y) Año: 2024 Tipo del documento: Article País de afiliación: Australia

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Revista: Patterns (N Y) Año: 2024 Tipo del documento: Article País de afiliación: Australia