RESUMEN
Inspired by the previous machine-learning study that the number of hydrogen-bonding acceptor (NHBA) is important index for the hole mobility of organic semiconductors, seven dithienobenzothiazole (DBT) derivatives 1 a-g (NHBA=5) were designed and synthesized by one-step functionalization from a common precursor. X-ray single-crystal structural analyses confirmed that the molecular arrangements of 1b (the diethyl and ethylthienyl derivative) and 1c (the di(n-propyl) and n-propylthienyl derivative) in the crystal are classified into brickwork structures with multidirectional intermolecular charge-transfer integrals, as a result of incorporation of multiple hydrogen-bond acceptors. The solution-processed top-gate bottom-contact devices of 1b and 1c had hole mobilities of 0.16 and 0.029â cm2 V-1s-1, respectively.
RESUMEN
Machine learning interatomic potentials (MLIPs) are one of the main techniques in the materials science toolbox, able to bridge ab initio accuracy with the computational efficiency of classical force fields. This allows simulations ranging from atoms, molecules, and biosystems, to solid and bulk materials, surfaces, nanomaterials, and their interfaces and complex interactions. A recent class of advanced MLIPs, which use equivariant representations and deep graph neural networks, is known as universal models. These models are proposed as foundation models suitable for any system, covering most elements from the periodic table. Current universal MLIPs (UIPs) have been trained with the largest consistent data set available nowadays. However, these are composed mostly of bulk materials' DFT calculations. In this article, we assess the universality of all openly available UIPs, namely MACE, CHGNet, and M3GNet, in a representative task of generalization: calculation of surface energies. We find that the out-of-the-box foundation models have significant shortcomings in this task, with errors correlated to the total energy of surface simulations, having an out-of-domain distance from the training data set. Our results show that while UIPs are an efficient starting point for fine-tuning specialized models, we envision the potential of increasing the coverage of the materials space toward universal training data sets for MLIPs.
RESUMEN
Charge transport in organic semiconductors occurs via overlapping molecular orbitals quantified by transfer integrals. However, no statistical study of transfer integrals for a wide variety of molecules has been reported. Here we present a statistical analysis of transfer integrals for more than 27,000 organic compounds in the Cambridge Structural Database. Interatomic transfer integrals were used to identify substructures with high transfer integrals. As a result, thione and amine groups as in thiourea were found to exhibit high transfer integrals. Such compounds are considered as potential non-aromatic, water-soluble organic semiconductors.
The analysis of interatomic transfer integrals for 27,718 organic compounds revealed that thione (S=R)amine (NR3) and thionethione interactions tend to increase transfer integrals and are suitable to highmobility organic semiconductors.
RESUMEN
Modern data mining techniques using machine learning (ML) and deep learning (DL) algorithms have been shown to excel in the regression-based task of materials property prediction using various materials representations. In an attempt to improve the predictive performance of the deep neural network model, researchers have tried to add more layers as well as develop new architectural components to create sophisticated and deep neural network models that can aid in the training process and improve the predictive ability of the final model. However, usually, these modifications require a lot of computational resources, thereby further increasing the already large model training time, which is often not feasible, thereby limiting usage for most researchers. In this paper, we study and propose a deep neural network framework for regression-based problems comprising of fully connected layers that can work with any numerical vector-based materials representations as model input. We present a novel deep regression neural network, iBRNet, with branched skip connections and multiple schedulers, which can reduce the number of parameters used to construct the model, improve the accuracy, and decrease the training time of the predictive model. We perform the model training using composition-based numerical vectors representing the elemental fractions of the respective materials and compare their performance against other traditional ML and several known DL architectures. Using multiple datasets with varying data sizes for training and testing, We show that the proposed iBRNet models outperform the state-of-the-art ML and DL models for all data sizes. We also show that the branched structure and usage of multiple schedulers lead to fewer parameters and faster model training time with better convergence than other neural networks. Scientific contribution: The combination of multiple callback functions in deep neural networks minimizes training time and maximizes accuracy in a controlled computational environment with parametric constraints for the task of materials property prediction.
RESUMEN
Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be tailored to specific applications. Common featurizers suitable for generic chemical problems may not be effective in features-property mapping in solid-state materials with ML models. Here, we have assembled the Oliynyk property list for compositional feature generation, which performs well on limited datasets (50 to 1000 training data points) in the solid-state materials domain. The dataset contains 98 elemental features for atomic numbers from 1 to 92, including thermodynamic properties, electronic structure data, size, electronegativity, and bulk properties such as melting point, density, and conductivity. The dataset has been utilized peer-reviewed publications in predicting material hardness, classification, discovery of novel Heusler compounds, band gap prediction, and determining the site preference of atoms using machine learning models including support vector machines, random forests for classification, and support vector regression for regression problems. We have compiled the dataset by parsing data from publicly available databases and literature and further supplementing it by interpolating values with Gaussian process regression.
RESUMEN
Combining materials science, artificial intelligence (AI), physical chemistry, and other disciplines, materials informatics is continuously accelerating the vigorous development of new materials. The emergence of "GPT (Generative Pre-trained Transformer) AI" shows that the scientific research field has entered the era of intelligent civilization with "data" as the basic factor and "algorithm + computing power" as the core productivity. The continuous innovation of AI will impact the cognitive laws and scientific methods, and reconstruct the knowledge and wisdom system. This leads to think more about materials informatics. Here, a comprehensive discussion of AI models and materials infrastructures is provided, and the advances in the discovery and design of new materials are reviewed. With the rise of new research paradigms triggered by "AI for Science", the vane of materials informatics: "MatGPT", is proposed and the technical path planning from the aspects of data, descriptors, generative models, pretraining models, directed design models, collaborative training, experimental robots, as well as the efforts and preparations needed to develop a new generation of materials informatics, is carried out. Finally, the challenges and constraints faced by materials informatics are discussed, in order to achieve a more digital, intelligent, and automated construction of materials informatics with the joint efforts of more interdisciplinary scientists.
RESUMEN
The electrical resistivity and the Hall effect of topological insulator Bi2Te3 and Bi2Se3 single crystals were studied in the temperature range from 4.2 to 300 K and in magnetic fields up to 10 T. Theoretical calculations of the electronic structure of these compounds were carried out in density functional approach, taking into account spin-orbit coupling and crystal structure data for temperatures of 5, 50 and 300 K. A clear correlation was found between the density of electronic states at the Fermi level and the current carrier concentration. In the case of Bi2Te3, the density of states at the Fermi level and the current carrier concentration increase with increasing temperature, from 0.296 states eV-1 cell-1 (5 K) to 0.307 states eV-1 cell-1 (300 K) and from 0.9 × 1019 cm-3 (5 K) to 2.6 × 1019 cm-3 (300 K), respectively. On the contrary, in the case of Bi2Se3, the density of states decreases with increasing temperature, from 0.201 states eV-1 cell-1 (5 K) to 0.198 states eV-1 cell-1 (300 K), and, as a consequence, the charge carrier concentration also decreases from 2.94 × 1019 cm-3 (5 K) to 2.81 × 1019 cm-3 (300 K).
RESUMEN
The integration of artificial intelligence (AI) algorithms in materials design is revolutionizing the field of materials engineering thanks to their power to predict material properties, design de novo materials with enhanced features, and discover new mechanisms beyond intuition. In addition, they can be used to infer complex design principles and identify high-quality candidates more rapidly than trial-and-error experimentation. From this perspective, herein we describe how these tools can enable the acceleration and enrichment of each stage of the discovery cycle of novel materials with optimized properties. We begin by outlining the state-of-the-art AI models in materials design, including machine learning (ML), deep learning, and materials informatics tools. These methodologies enable the extraction of meaningful information from vast amounts of data, enabling researchers to uncover complex correlations and patterns within material properties, structures, and compositions. Next, a comprehensive overview of AI-driven materials design is provided and its potential future prospects are highlighted. By leveraging such AI algorithms, researchers can efficiently search and analyze databases containing a wide range of material properties, enabling the identification of promising candidates for specific applications. This capability has profound implications across various industries, from drug development to energy storage, where materials performance is crucial. Ultimately, AI-based approaches are poised to revolutionize our understanding and design of materials, ushering in a new era of accelerated innovation and advancement.
RESUMEN
A multimodal deep-learning (MDL) framework is presented for predicting physical properties of a ten-dimensional acrylic polymer composite material by merging physical attributes and chemical data. The MDL model comprises four modules, including three generative deep-learning models for material structure characterization and a fourth model for property prediction. The approach handles an 18-dimensional complexity, with ten compositional inputs and eight property outputs, successfully predicting 913â¯680 property data points across 114â¯210 composition conditions. This level of complexity is unprecedented in computational materials science, particularly for materials with undefined structures. A framework is proposed to analyze the high-dimensional information space for inverse material design, demonstrating flexibility and adaptability to various materials and scales, provided sufficient data are available. This study advances future research on different materials and the development of more sophisticated models, drawing the authors closer to the ultimate goal of predicting all properties of all materials.
RESUMEN
Why are the transition temperatures (T c) of superconducting materials so different? The answer to this question is not only of great significance in revealing the mechanism of high-T c superconductivity but also can be used as a guide for the design of new superconductors. However, so far, it is still challenging to identify the governing factors affecting the T c. In this work, with the aid of machine learning and first-principles calculations, we found a close relevance between the upper limit of the T c and the energy-level distribution of valence electrons. It implies that some additional inter-orbital electron-electron interaction should be considered in the interpretation of high-T c superconductivity.
RESUMEN
Using machine learning based on a random forest (RF) regression algorithm, we attempted to predict the amount of adsorbed serum protein on polymer brush films from the films' physicochemical information and the monomers' chemical structures constituting the films using a RF model. After the training of the RF model using the data of polymer brush films synthesized from five different types of monomers, the model became capable of predicting the amount of adsorbed protein from the chemical structure, physicochemical properties of monomer molecules, and structural parameters (density and thickness of the films). The analysis of the trained RF quantitatively provided the importance of each structural parameter and physicochemical properties of monomers toward serum protein adsorption (SPA). The ranking for the significance of the parameters agrees with our general understanding and perception. Based on the results, we discuss the correlation between brush film's physical properties (such as thickness and density) and SPA and attempt to provide a guideline for the design of antibiofouling polymer brush films.
Asunto(s)
Proteínas Sanguíneas , Polímeros , Adsorción , Aprendizaje Automático , Propiedades de SuperficieRESUMEN
BACKGROUND: Drug design is one of the important applications of biological science. Extensive studies have been done on computer-aided drug design based on inverse quantitative structure activity relationship (inverse QSAR), which is to infer chemical compounds from given chemical activities and constraints. However, exact or optimal solutions are not guaranteed in most of the existing methods. METHOD: Recently a novel framework based on artificial neural networks (ANNs) and mixed integer linear programming (MILP) has been proposed for designing chemical structures. This framework consists of two phases: an ANN is used to construct a prediction function, and then an MILP formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. In this paper, we use linear regression instead of ANNs to construct a prediction function. For this, we derive a novel MILP formulation that simulates the computation process of a prediction function by linear regression. RESULTS: For the first phase, we performed computational experiments using 18 chemical properties, and the proposed method achieved good prediction accuracy for a relatively large number of properties, in comparison with ANNs in our previous work. For the second phase, we performed computational experiments on five chemical properties, and the method could infer chemical structures with around up to 50 non-hydrogen atoms. CONCLUSIONS: Combination of linear regression and integer programming is a potentially useful approach to computational molecular design.
Asunto(s)
Algoritmos , Relación Estructura-Actividad Cuantitativa , Diseño de Fármacos , Modelos Lineales , Redes Neurales de la ComputaciónRESUMEN
Automated molecule design by computers is an essential topic in materials informatics. Still, generating practical structures is not easy because of the difficulty in treating material stability, synthetic difficulty, mechanical properties, and other miscellaneous parameters, often leading to the generation of junk molecules. The problem is tackled by introducing supervised/unsupervised machine learning and quantum-inspired annealing. This autonomous molecular design system can help experimental researchers discover practical materials more efficiently. Like the human design process, new molecules are explored based on knowledge of existing compounds. A new solid-state polymer electrolyte for lithium-ion batteries is designed and synthesized, giving a promising room temperature conductivity of 10-5 S cm-1 with reasonable thermal, chemical, and mechanical properties.
Asunto(s)
Litio , Polímeros , Humanos , Litio/química , Suministros de Energía Eléctrica , Electrólitos/química , IonesRESUMEN
We develop a framework powered by machine learning (ML) and high-throughput density functional theory (DFT) computations for the prediction and screening of functional impurities in groups IV, III-V, and II-VI zinc blende semiconductors. Elements spanning the length and breadth of the periodic table are considered as impurity atoms at the cation, anion, or interstitial sites in supercells of 34 candidate semiconductors, leading to a chemical space of approximately 12,000 points, 10% of which are used to generate a DFT dataset of charge dependent defect formation energies. Descriptors based on tabulated elemental properties, defect coordination environment, and relevant semiconductor properties are used to train ML regression models for the DFT computed neutral state formation energies and charge transition levels of impurities. Optimized kernel ridge, Gaussian process, random forest, and neural network regression models are applied to screen impurities with lower formation energy than dominant native defects in all compounds.
RESUMEN
The success of machine learning (ML) in materials property prediction depends heavily on how the materials are represented for learning. Two dominant families of material descriptors exist, one that encodes crystal structure in the representation and the other that only uses stoichiometric information with the hope of discovering new materials. Graph neural networks (GNNs) in particular have excelled in predicting material properties within chemical accuracy. However, current GNNs are limited to only one of the above two avenues owing to the little overlap between respective material representations. Here, a new concept of formula graph which unifies stoichiometry-only and structure-based material descriptors is introduced. A self-attention integrated GNN that assimilates a formula graph is further developed and it is found that the proposed architecture produces material embeddings transferable between the two domains. The proposed model can outperform some previously reported structure-agnostic models and their structure-based counterparts while exhibiting better sample efficiency and faster convergence. Finally, the model is applied in a challenging exemplar to predict the complex dielectric function of materials and nominate new substances that potentially exhibit epsilon-near-zero phenomena.
Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Modelos Químicos , Relación Estructura-ActividadRESUMEN
The ever-growing data acquisition speed represents a challenge for data analysis in materials sciences in general and the field of solar cells in particular. This is because many unsupervised and supervised learning algorithms require model re-derivation when presented with new samples which are markedly different from those used for model construction. Dynamic segmentation addresses this problem by continuously updating the clusters structure, for example, by splitting old clusters or opening new ones, as new samples are presented. In this work we present the application of a Dynamic Classification Unit (DCU) to the study of the photovoltaic space. Using a database of 1165 metal oxide-based solar cells, constructed from five libraries, we demonstrate that the DCU algorithm, when initiated with only 10 % of the database, correctly classified 82 % of the remaining, 90 % samples. At the same time the algorithm unveiled the presence of interesting trends, outliers and compositional activity cliffs. These abilities may prove useful for the analysis of the photovoltaic space and in turn may contribute to the design of solar cells with improved properties. We suggest that DCU and other dynamic clustering methods will find wide applications in the rapidly developing field of materials informatics.
Asunto(s)
Algoritmos , Ciencia de los Materiales , Análisis por Conglomerados , Bases de Datos Factuales , Óxidos/químicaRESUMEN
In this paper, we develop a data-driven machine learning (ML) approach to predict the adiabatic temperature change (ΔT) in BaTiO3-based ceramics as a function of chemical composition, temperature, and applied electric field. The data set was curated from a survey of published electrocaloric measurements. Each chemical composition was represented by elemental descriptors of A-site and B-site elements. Pair-wise statistical correlation analysis was used to remove linearly correlated descriptors. We trained two separate regression-based ML models for indirect and direct measurements and found that both are capable of capturing the general trend of the temperature vs ΔT curve for various applied electric fields. We then complemented the regression models with a classification learning model that predicts the expected phase as a function of chemical composition and temperature. The combined regression and classification learning ML models predict a global maxima in ΔT near rhombohedral to cubic or tetragonal to cubic phase transition regions. An interactive, open source web application is developed to enable interested users to query our trained models and accelerate the design of novel BaTiO3-based ceramics with targeted phase and ΔT properties for electrocaloric applications.
RESUMEN
First-principles calculation based on density functional theory is a powerful tool for understanding and designing magnetic materials. It enables us to quantitatively describe magnetic properties and structural stability, although further methodological developments for the treatment of strongly correlated 4f electrons and finite-temperature magnetism are needed. Here, we review recent developments of computational schemes for rare-earth magnet compounds, and summarize our theoretical studies on Nd2Fe14B and RFe12-type compounds. Effects of chemical substitution and interstitial dopants are clarified. We also discuss how data-driven approaches are used for studying multinary systems. Chemical composition can be optimized with fewer trials by the Bayesian optimization. We also present a data-assimilation method for predicting finite-temperature magnetization in wide composition space by integrating computational and experimental data.
RESUMEN
Materials discovery via machine learning has become an increasingly popular method due to its ability to rapidly predict materials properties in a time-efficient and low-cost manner. However, one limitation in this field is the lack of benchmark datasets, particularly those that encompass the size, tasks, material systems, and data modalities present in the materials informatics literature. This makes it difficult to identify optimal machine learning model choices including algorithm, model architecture, data splitting, and data featurization for a given task. Here, we attempt to address this lack of benchmark datasets by assembling a unique repository of 50 different datasets for materials properties. The data contains both experimental and computational data, data suited for regression as well as classification, sizes ranging from 12 to 6354 samples, and materials systems spanning the diversity of materials research. Data were extracted from 16 publications. In addition to cleaning the data where necessary, each dataset was split into train, validation, and test splits. For datasets with more than 100 values, train-val-test splits were created, either with a 5-fold or 10-fold cross-validation method, depending on what each respective paper did in their studies. Datasets with less than 100 values had train-test splits created using the Leave-One-Out cross-validation method. These benchmark data can serve as a basis for a more diverse benchmark dataset in the future to further improve their effectiveness in the comparison of machine learning models.
RESUMEN
Quasicrystals have emerged as the third class of solid-state materials, distinguished from periodic crystals and amorphous solids, which have long-range order without periodicity exhibiting rotational symmetries that are disallowed for periodic crystals in most cases. To date, more than one hundred stable quasicrystals have been reported, leading to the discovery of many new and exciting phenomena. However, the pace of the discovery of new quasicrystals has lowered in recent years, largely owing to the lack of clear guiding principles for the synthesis of new quasicrystals. Here, it is shown that the discovery of new quasicrystals can be accelerated with a simple machine-learning workflow. With a list of the chemical compositions of known stable quasicrystals, approximant crystals, and ordinary crystals, a prediction model is trained to solve the three-class classification task and its predictability compared to the observed phase diagrams of ternary aluminum systems is evaluated. The validation experiments strongly support the superior predictive power of machine learning, with the overall prediction accuracy of the phase prediction task reaching ≈0.728. Furthermore, analyzing the input-output relationships black-boxed into the model, nontrivial empirical equations interpretable by humans that describe conditions necessary for stable quasicrystal formation are identified.