Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Sci Rep ; 13(1): 16799, 2023 Oct 05.
Article in English | MEDLINE | ID: mdl-37798325

ABSTRACT

Efforts to optimize known materials and enhance their performance are ongoing, driven by the advancements resulting from the discovery of novel functional materials. Traditionally, the search for and optimization of functional materials has relied on the experience and intuition of specialized researchers. However, materials informatics (MI), which integrates materials data and machine learning, has frequently been used to realize systematic and efficient materials exploration without depending on manual tasks. Nonetheless, the discovery of new materials using MI remains challenging. In this study, we propose a method for the discovery of materials outside the scope of existing databases by combining MI with the experience and intuition of researchers. Specifically, we designed a two-dimensional map that plots known materials data based on their composition and structure, facilitating researchers' intuitive search for new materials. The materials map was implemented using an autoencoder-based neural network. We focused on the conductivity of 708 lithium oxide materials and considered the correlation with migration energy (ME), an index of lithium-ion conductivity. The distribution of existing data reflected in the materials map can contribute to the development of new lithium-ion conductive materials by enhancing the experience and intuition of material researchers.

2.
RSC Adv ; 12(47): 30696-30703, 2022 Oct 24.
Article in English | MEDLINE | ID: mdl-36337942

ABSTRACT

NASICON-type LiZr2(PO4)3 (LZP) has attracted significant attention as a solid oxide electrolyte for all-solid-state Li-ion or Li-metal batteries owing to its high Li-ion conductivity, usability in all-solid-state batteries, and electrochemical stability against Li metal. In this study, we aim to improve the Li-ion conductivity of Li-rich NASICON-type LZPs doped with CaO and SiO2, i.e., Li1+x+2y Ca y Zr2-y Si x P3-x O12(0 ≤ x ≤ 0.3, 0 ≤ y ≤ 0.3) (LCZSP). Herein, a total of 49 compositions were synthesised, and their crystal structures, relative densities, and Li-ion conductivities were characterised experimentally. We confirmed the improvement in Li-ion conductivity by simultaneous replacement of Zr and P sites with Ca and Si ions, respectively. However, the intuition-derived determination of the composition exhibiting the highest Li-ion conductivity is technically difficult because the compositional dependence of the relative density and the crystalline phase of the sample is very complex. Bayesian optimisation (BO) was performed to efficiently discover the optimal composition that exhibited the highest Li-ion conductivity among the samples evaluated experimentally. We also optimised the composition of the LCZSP using multi-task Gaussian process regression after transferring prior knowledge of 47 compositions of Li1+x+2y Y x Ca y Zr2-x-y P3O12 (0 ≤ x ≤ 0.376, 0 ≤ y ≤ 0.376) (LYCZP), i.e., BO with transfer learning. The present study successfully demonstrated that BO with transfer learning can search for optimal compositions two times as rapid as the conventional BO approach. This approach can be widely applicable for the optimisation of various functional materials as well as ionic conductors.

3.
Neural Comput ; 34(12): 2408-2431, 2022 Nov 08.
Article in English | MEDLINE | ID: mdl-36283050

ABSTRACT

Complex processes in science and engineering are often formulated as multistage decision-making problems. In this letter, we consider a cascade process, a type of multistage decision-making process. This is a multistage process in which the output of one stage is used as an input for the subsequent stage. When the cost of each stage is expensive, it is difficult to search for the optimal controllable parameters for each stage exhaustively. To address this problem, we formulate the optimization of the cascade process as an extension of the Bayesian optimization framework and propose two types of acquisition functions based on credible intervals and expected improvement. We investigate the theoretical properties of the proposed acquisition functions and demonstrate their effectiveness through numerical experiments. In addition, we consider suspension setting, an extension in which we are allowed to suspend the cascade process at the middle of the multistage decision-making process that often arises in practical problems. We apply the proposed method in a test problem involving a solar cell simulator, the motivation for this study.

4.
Neural Comput ; 34(10): 2145-2203, 2022 Sep 12.
Article in English | MEDLINE | ID: mdl-36027725

ABSTRACT

Bayesian optimization (BO) is a popular method for expensive black-box optimization problems; however, querying the objective function at every iteration can be a bottleneck that hinders efficient search capabilities. In this regard, multifidelity Bayesian optimization (MFBO) aims to accelerate BO by incorporating lower-fidelity observations available with a lower sampling cost. In our previous work, we proposed an information-theoretic approach to MFBO, referred to as multifidelity max-value entropy search (MF-MES), which inherits practical effectiveness and computational simplicity of the well-known max-value entropy search (MES) for the single-fidelity BO. However, the applicability of MF-MES is still limited to the case that a single observation is sequentially obtained. In this letter, we generalize MF-MES so that information gain can be evaluated even when multiple observations are simultaneously obtained. This generalization enables MF-MES to address two practical problem settings: synchronous parallelization and trace-aware querying. We show that the acquisition functions for these extensions inherit the simplicity of MF-MES without introducing additional assumptions. We also provide computational techniques for entropy evaluation and posterior sampling in the acquisition functions, which can be commonly used for all variants of MF-MES. The effectiveness of MF-MES is demonstrated using benchmark functions and real-world applications such as materials science data and hyperparameter tuning of machine-learning algorithms.

5.
Chem Commun (Camb) ; 58(67): 9328-9340, 2022 Aug 18.
Article in English | MEDLINE | ID: mdl-35950409

ABSTRACT

All-solid-state Li-ion batteries are of considerable interest as safer alternatives to Li-ion batteries containing flammable organic electrolytes. To date, however, achieving sufficient charging and discharging rates, in addition to capacity, at room temperature using these all-solid-state batteries has been challenging. To overcome these issues, material simulations and informatics investigations of a relatively new Na superionic conductor (NASICON)-type LiZr2(PO4)3 (LZP) electrolyte were conducted to elucidate its characteristics and material functions. The following thermodynamic and/or kinetic properties of NASICON-type Li-ion conductive oxides were investigated with respect to the crystal structure mainly using material simulation and informatics approaches: (1) the electrochemical stabilities of LZP materials with respect to Li metal and (2) Li-ion conductivities in the bulk and at the grain boundaries. An efficient materials informatics search method was employed to optimise the material functions of the LZP electrolyte via Bayesian optimisation. This study should promote the application of LZP in all-solid-state batteries for use in technologies such as mobile devices and electric vehicles and enable more complex composition and process control.

7.
Commun Biol ; 4(1): 362, 2021 03 19.
Article in English | MEDLINE | ID: mdl-33742139

ABSTRACT

Microbial rhodopsins are photoreceptive membrane proteins, which are used as molecular tools in optogenetics. Here, a machine learning (ML)-based experimental design method is introduced for screening rhodopsins that are likely to be red-shifted from representative rhodopsins in the same subfamily. Among 3,022 ion-pumping rhodopsins that were suggested by a protein BLAST search in several protein databases, the ML-based method selected 65 candidate rhodopsins. The wavelengths of 39 of them were able to be experimentally determined by expressing proteins with the Escherichia coli system, and 32 (82%, p = 7.025 × 10-5) actually showed red-shift gains. In addition, four showed red-shift gains >20 nm, and two were found to have desirable ion-transporting properties, indicating that they would be potentially useful in optogenetics. These findings suggest that data-driven ML-based approaches play effective roles in the experimental design of rhodopsin and other photobiological studies. (141/150 words).


Subject(s)
Ion Channels/metabolism , Machine Learning , Optogenetics , Rhodopsins, Microbial/metabolism , Amino Acid Sequence , Bayes Theorem , Color , Databases, Protein , Escherichia coli/genetics , Escherichia coli/metabolism , Hydrogen-Ion Concentration , Ion Channels/genetics , Ion Channels/radiation effects , Light , Proof of Concept Study , Protein Conformation, alpha-Helical , Rhodopsins, Microbial/genetics , Rhodopsins, Microbial/radiation effects , Sequence Analysis, Protein
8.
Neural Comput ; 32(12): 2486-2531, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33080163

ABSTRACT

Testing under what conditions a product satisfies the desired properties is a fundamental problem in manufacturing industry. If the condition and the property are respectively regarded as the input and the output of a black-box function, this task can be interpreted as the problem called level set estimation (LSE): the problem of identifying input regions such that the function value is above (or below) a threshold. Although various methods for LSE problems have been developed, many issues remain to be solved for their practical use. As one of such issues, we consider the case where the input conditions cannot be controlled precisely-LSE problems under input uncertainty. We introduce a basic framework for handling input uncertainty in LSE problems and then propose efficient methods with proper theoretical guarantees. The proposed methods and theories can be generally applied to a variety of challenges related to LSE under input uncertainty such as cost-dependent input uncertainties and unknown input uncertainties. We apply the proposed methods to artificial and real data to demonstrate their applicability and effectiveness.

9.
Neural Comput ; 32(10): 1998-2031, 2020 10.
Article in English | MEDLINE | ID: mdl-32795233

ABSTRACT

In this letter, we study an active learning problem for maximizing an unknown linear function with high-dimensional binary features. This problem is notoriously complex but arises in many important contexts. When the sampling budget, that is, the number of possible function evaluations, is smaller than the number of dimensions, it tends to be impossible to identify all of the optimal binary features. Therefore, in practice, only a small number of such features are considered, with the majority kept fixed at certain default values, which we call the working set heuristic. The main contribution of this letter is to formally study the working set heuristic and present a suite of theoretically robust algorithms for more efficient use of the sampling budget. Technically, we introduce a novel method for estimating the confidence regions of model parameters that is tailored to active learning with high-dimensional binary features. We provide a rigorous theoretical analysis of these algorithms and prove that a commonly used working set heuristic can identify optimal binary features with favorable sample complexity. We explore the performance of the proposed approach through numerical simulations and an application to a functional protein design problem.


Subject(s)
Linear Models , Supervised Machine Learning , Bayes Theorem , Confidence Intervals , Humans
10.
Sci Rep ; 9(1): 15794, 2019 Oct 31.
Article in English | MEDLINE | ID: mdl-31673031

ABSTRACT

In this study, an efficient method for estimating material parameters based on the experimental data of precipitate shape is proposed. First, a computational model that predicts the energetically favorable shape of precipitate when a d-dimensional material parameter (x) is given is developed. Second, the discrepancy (y) between the precipitate shape obtained through the experiment and that predicted using the computational model is calculated. Third, the Gaussian process (GP) is used to model the relation between x and y. Finally, for identifying the "low-error region (LER)" in the material parameter space where y is less than a threshold, we introduce an adaptive sampling strategy, wherein the estimated GP model suggests the subsequent candidate x to be sampled/calculated. To evaluate the effectiveness of the proposed method, we apply it to the estimation of interface energy and lattice mismatch between MgZn2 ([Formula: see text]) and α-Mg phases in an Mg-based alloy. The result shows that the number of computational calculations of the precipitate shape required for the LER estimation is significantly decreased by using the proposed method.

11.
Neural Comput ; 31(12): 2432-2491, 2019 12.
Article in English | MEDLINE | ID: mdl-31614101

ABSTRACT

Distance metric learning has been widely used to obtain the optimal distance function based on the given training data. We focus on a triplet-based loss function, which imposes a penalty such that a pair of instances in the same class is closer than a pair in different classes. However, the number of possible triplets can be quite large even for a small data set, and this considerably increases the computational cost for metric optimization. In this letter, we propose safe triplet screening that identifies triplets that can be safely removed from the optimization problem without losing the optimality. In comparison with existing safe screening studies, triplet screening is particularly significant because of the huge number of possible triplets and the semidefinite constraint in the optimization problem. We demonstrate and verify the effectiveness of our screening rules by using several benchmark data sets.

12.
Sci Rep ; 8(1): 15580, 2018 10 22.
Article in English | MEDLINE | ID: mdl-30349075

ABSTRACT

The light-dependent ion-transport function of microbial rhodopsin has been widely used in optogenetics for optical control of neural activity. In order to increase the variety of rhodopsin proteins having a wide range of absorption wavelengths, the light absorption properties of various wild-type rhodopsins and their artificially mutated variants were investigated in the literature. Here, we demonstrate that a machine-learning-based (ML-based) data-driven approach is useful for understanding and predicting the light-absorption properties of microbial rhodopsin proteins. We constructed a database of 796 proteins consisting of microbial rhodopsin wildtypes and their variants. We then proposed an ML method that produces a statistical model describing the relationship between amino-acid sequences and absorption wavelengths and demonstrated that the fitted statistical model is useful for understanding colour tuning rules and predicting absorption wavelengths. By applying the ML method to the database, two residues that were not considered in previous studies are newly identified to be important to colour shift.


Subject(s)
Chemical Phenomena , Color , Mutant Proteins/chemistry , Rhodopsins, Microbial/chemistry , Machine Learning , Models, Statistical , Mutant Proteins/genetics , Rhodopsins, Microbial/genetics
13.
Cell Syst ; 5(5): 485-497.e3, 2017 11 22.
Article in English | MEDLINE | ID: mdl-28988802

ABSTRACT

We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data type; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlated with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison with this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study also demonstrates the value of releasing pre-publication data publicly to engage the community in an open research collaboration.


Subject(s)
Gene Expression/genetics , Genes, Essential/genetics , Algorithms , Cell Line, Tumor , Genomics/methods , Humans , RNA, Small Interfering/genetics
14.
Brief Bioinform ; 18(4): 619-633, 2017 07 01.
Article in English | MEDLINE | ID: mdl-27197545

ABSTRACT

Triple-negative (TN) breast cancer (BC) patients have limited treatment options and poor prognosis even after extant treatments and standard chemotherapeutic regimens. Linking TN patients to clinically known phenotypes with appropriate treatments is vital. Location-specific sequence variants are expected to be useful for this purpose by identifying subgroups within a disease population. Single gene mutational signatures have been widely reported, with related phenotypes in literature. We thoroughly survey currently available mutations (and mutated genes), linked to BC phenotypes, to demonstrate their limited performance as sole predictors/biomarkers to assign phenotypes to patients. We then explore mutational combinations, as a pilot study, using The Cancer Genome Atlas Research Network mutational data of BC and three machine learning methods: association rules (limitless arity multiple procedure), decision tree and hierarchical disjoint clustering. The study results in a patient classification scheme through combinatorial mutations in Phosphatidylinositol-4,5-Bisphosphate 3-Kinase and tumor protein 53, being consistent with all three methods, implying its validity from a diverse viewpoint. However, it would warrant further research to select multi-gene signatures to identify phenotypes specifically and be clinically used routinely.


Subject(s)
Breast Neoplasms , Humans , Mutation , Phenotype , Pilot Projects
15.
IEEE Trans Neural Netw Learn Syst ; 24(12): 1999-2012, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24805218

ABSTRACT

Graph-based approaches have been most successful in semisupervised learning. In this paper, we focus on label propagation in graph-based semisupervised learning. One essential point of label propagation is that the performance is heavily affected by incorporating underlying manifold of given data into the input graph. The other more important point is that in many recent real-world applications, the same instances are represented by multiple heterogeneous data sources. A key challenge under this setting is to integrate different data representations automatically to achieve better predictive performance. In this paper, we address the issue of obtaining the optimal linear combination of multiple different graphs under the label propagation setting. For this problem, we propose a new formulation with the sparsity (in coefficients of graph combination) property which cannot be rightly achieved by any other existing methods. This unique feature provides two important advantages: 1) the improvement of prediction performance by eliminating irrelevant or noisy graphs and 2) the interpretability of results, i.e., easily identifying informative graphs on classification. We propose efficient optimization algorithms for the proposed approach, by which clear interpretations of the mechanism for sparsity is provided. Through various synthetic and two real-world data sets, we empirically demonstrate the advantages of our proposed approach not only in prediction performance but also in graph selection ability.

16.
Neural Netw ; 34: 46-55, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22831849

ABSTRACT

Canonical correlation analysis (CCA) is a classical dimensionality reduction technique for two sets of variables that iteratively finds projection directions with maximum correlation. Although CCA is still in vital use in many practical application areas, recent real-world data often contain more complicated nonlinear correlations that cannot be properly captured by classical CCA. In this paper, we thus propose an extension of CCA that can effectively capture such complicated nonlinear correlations through statistical dependency maximization. The proposed method, which we call least-squares canonical dependency analysis (LSCDA), is based on a squared-loss variant of mutual information, and it has various useful properties besides its ability to capture higher-order correlations: for example, it can simultaneously find multiple projection directions (i.e., subspaces), it does not involve density estimation, and it is equipped with a model selection strategy. We demonstrate the usefulness of LSCDA through various experiments on artificial and real-world datasets.


Subject(s)
Algorithms , Artificial Intelligence , Artificial Intelligence/statistics & numerical data , Databases, Factual , Least-Squares Analysis
17.
IEEE Trans Neural Netw ; 22(10): 1613-25, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21880570

ABSTRACT

Regularization path algorithms have been proposed to deal with model selection problem in several machine learning approaches. These algorithms allow computation of the entire path of solutions for every value of regularization parameter using the fact that their solution paths have piecewise linear form. In this paper, we extend the applicability of regularization path algorithm to a class of learning machines that have quadratic loss and quadratic penalty term. This class contains several important learning machines such as squared hinge loss support vector machine (SVM) and modified Huber loss SVM. We first show that the solution paths of this class of learning machines have piecewise nonlinear form, and piecewise segments between two breakpoints are characterized by a class of rational functions. Then we develop an algorithm that can efficiently follow the piecewise nonlinear path by solving these rational equations. To solve these rational equations, we use rational approximation technique with quadratic convergence rate, and thus, our algorithm can follow the nonlinear path much more precisely than existing approaches such as predictor-corrector type nonlinear-path approximation. We show the algorithm performance on some artificial and real data sets.


Subject(s)
Algorithms , Artificial Intelligence , Neural Networks, Computer , Nonlinear Dynamics , Computers , Humans , Software Design
18.
IEEE Trans Neural Netw ; 21(7): 1048-59, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20550990

ABSTRACT

We propose a multiple incremental decremental algorithm of support vector machines (SVM). In online learning, we need to update the trained model when some new observations arrive and/or some observations become obsolete. If we want to add or remove single data point, conventional single incremental decremental algorithm can be used to update the model efficiently. However, to add and/or remove multiple data points, the computational cost of current update algorithm becomes inhibitive because we need to repeatedly apply it for each data point. In this paper, we develop an extension of incremental decremental algorithm which efficiently works for simultaneous update of multiple data points. Some analyses and experimental results show that the proposed algorithm can substantially reduce the computational cost. Our approach is especially useful for online SVM learning in which we need to remove old data points and add new data points in a short amount of time.


Subject(s)
Algorithms , Artificial Intelligence , Information Storage and Retrieval , Learning , Computer Simulation , Humans , Online Systems , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...