Pesquisa | BVS IEC

graphkernels: R and Python packages for graph comparison.

Sugiyama, Mahito; Ghisu, M Elisabetta; Llinares-López, Felipe; Borgwardt, Karsten.

Bioinformatics ; 34(3): 530-532, 2018 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-29028902

RESUMO

Summary: Measuring the similarity of graphs is a fundamental step in the analysis of graph-structured data, which is omnipresent in computational biology. Graph kernels have been proposed as a powerful and efficient approach to this problem of graph comparison. Here we provide graphkernels, the first R and Python graph kernel libraries including baseline kernels such as label histogram based kernels, classic graph kernels such as random walk based kernels, and the state-of-the-art Weisfeiler-Lehman graph kernel. The core of all graph kernels is implemented in C ++ for efficiency. Using the kernel matrices computed by the package, we can easily perform tasks such as classification, regression and clustering on graph-structured samples. Availability and implementation: The R and Python packages including source code are available at https://CRAN.R-project.org/package=graphkernels and https://pypi.python.org/pypi/graphkernels. Contact: mahito@nii.ac.jp or elisabetta.ghisu@bsse.ethz.ch. Supplementary information: Supplementary data are available online at Bioinformatics.

Assuntos

Biologia Computacional/métodos , Software

Genome-wide detection of intervals of genetic heterogeneity associated with complex traits.

Llinares-López, Felipe; Grimm, Dominik G; Bodenham, Dean A; Gieraths, Udo; Sugiyama, Mahito; Rowan, Beth; Borgwardt, Karsten.

Bioinformatics ; 31(12): i240-9, 2015 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-26072488

RESUMO

MOTIVATION: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals. RESULTS: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping. CONCLUSIONS: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes. AVAILABILITY AND IMPLEMENTATION: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html.

Assuntos

Heterogeneidade Genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Arabidopsis/genética , Fenótipo

Efficient network-guided multi-locus association mapping with graph cuts.

Azencott, Chloé-Agathe; Grimm, Dominik; Sugiyama, Mahito; Kawahara, Yoshinobu; Borgwardt, Karsten M.

Bioinformatics ; 29(13): i171-9, 2013 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-23812981

RESUMO

MOTIVATION: As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. Although several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci or do not scale to genome-wide settings. RESULTS: We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidopsis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature. AVAILABILITY: Code is available at http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Loci Gênicos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Flores , Genótipo , Humanos

Molecular Graph Generation by Decomposition and Reassembling.

Yamada, Masatsugu; Sugiyama, Mahito.

ACS Omega ; 8(22): 19575-19586, 2023 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-37305268

RESUMO

Designing molecular structures with desired chemical properties is an essential task in drug discovery and materials design. However, finding molecules with the optimized desired properties is still a challenging task due to combinatorial explosion of the candidate space of molecules. Here we propose a novel decomposition-and-reassembling-based approach, which does not include any optimization in hidden space, and our generation process is highly interpretable. Our method is a two-step procedure: In the first decomposition step, we apply frequent subgraph mining to a molecular database to collect a smaller size of subgraphs as building blocks of molecules. In the second reassembling step, we search desirable building blocks guided via reinforcement learning and combine them to generate new molecules. Our experiments show that our method not only can find better molecules in terms of two standard criteria, the penalized log P and druglikeness, but also can generate drug molecules showing the valid intermediate molecules.

Artificial Neural Networks Applied as Molecular Wave Function Solvers.

Yang, Peng-Jian; Sugiyama, Mahito; Tsuda, Koji; Yanai, Takeshi.

J Chem Theory Comput ; 16(6): 3513-3529, 2020 Jun 09.

Artigo em Inglês | MEDLINE | ID: mdl-32320233

RESUMO

We use artificial neural networks (ANNs) based on the Boltzmann machine (BM) architectures as an encoder of ab initio molecular many-electron wave functions represented with the complete active space configuration interaction (CAS-CI) model. As first introduced by the work of Carleo and Troyer for physical systems, the coefficients of the electronic configurations in the CI expansion are parametrized with the BMs as a function of their occupancies that act as descriptors. This ANN-based wave function ansatz is referred to as the neural-network quantum state (NQS). The machine learning is used for training the BMs in terms of finding a variationally optimal form of the ground-state wave function on the basis of the energy minimization. It is relevant to reinforcement learning and does not use any reference data nor prior knowledge of the wave function, while the Hamiltonian is given based on a user-specified chemical structure in the first-principles manner. Carleo and Troyer used the restricted Boltzmann machine (RBM), which has hidden units, for the neural network architecture of NQS, while, in this study, we further introduce its replacement with the BM that has only visible units but with different orders of connectivity. For this hidden-node free BM, the second- and third-order BMs based on quadratic and cubic energy functions, respectively, were implemented. We denote these second- and third-order BMs as BM2 and BM3, respectively. The pilot implementation of the NQS solver into an exact diagonalization module of the quantum chemistry program was made to assess the capability of variants of the BM-based NQS. The test calculations were performed by determining the CAS-CI wave functions of illustrative molecular systems, indocyanine green, and dinitrogen dissociation. The simulated energies have been shown to converge to CAS-CI energy in most cases by improving RBM with an increasing number of hidden nodes. BM3 systematically yields lower energies than BM2, reproducing the CAS-CI energies of dinitrogen across potential energy curves within an error of 50 µEh.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA