Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 193, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755527

RESUMO

We have developed AMRViz, a toolkit for analyzing, visualizing, and managing bacterial genomics samples. The toolkit is bundled with the current best practice analysis pipeline allowing researchers to perform comprehensive analysis of a collection of samples directly from raw sequencing data with a single command line. The analysis results in a report showing the genome structure, genome annotations, antibiotic resistance and virulence profile for each sample. The pan-genome of all samples of the collection is analyzed to identify core- and accessory-genes. Phylogenies of the whole genome as well as all gene clusters are also generated. The toolkit provides a web-based visualization dashboard allowing researchers to interactively examine various aspects of the analysis results. Availability: AMRViz is implemented in Python and NodeJS, and is publicly available under open source MIT license at https://github.com/amromics/amrviz .


Assuntos
Genoma Bacteriano , Genômica , Software , Genômica/métodos , Farmacorresistência Bacteriana/genética , Filogenia , Bactérias/genética , Bactérias/efeitos dos fármacos , Antibacterianos/farmacologia
2.
Nucleic Acids Res ; 52(3): e15, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38084888

RESUMO

Whole genome sequencing has increasingly become the essential method for studying the genetic mechanisms of antimicrobial resistance and for surveillance of drug-resistant bacterial pathogens. The majority of bacterial genomes sequenced to date have been sequenced with Illumina sequencing technology, owing to its high-throughput, excellent sequence accuracy, and low cost. However, because of the short-read nature of the technology, these assemblies are fragmented into large numbers of contigs, hindering the obtaining of full information of the genome. We develop Pasa, a graph-based algorithm that utilizes the pangenome graph and the assembly graph information to improve scaffolding quality. By leveraging the population information of the bacteria species, Pasa is able to utilize the linkage information of the gene families of the species to resolve the contig graph of the assembly. We show that our method outperforms the current state of the arts in terms of accuracy, and at the same time, is computationally efficient to be applied to a large number of existing draft assemblies.


Assuntos
Algoritmos , Bactérias , Genoma Bacteriano , Bactérias/classificação , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
3.
Artigo em Inglês | MEDLINE | ID: mdl-37018091

RESUMO

Predicting drug-drug interactions (DDIs) is the problem of predicting side effects (unwanted outcomes) of a pair of drugs using drug information and known side effects of many pairs. This problem can be formulated as predicting labels (i.e., side effects) for each pair of nodes in a DDI graph, of which nodes are drugs and edges are interacting drugs with known labels. State-of-the-art methods for this problem are graph neural networks (GNNs), which leverage neighborhood information in the graph to learn node representations. For DDI, however, there are many labels with complicated relationships due to the nature of side effects. Usual GNNs often fix labels as one-hot vectors that do not reflect label relationships and potentially do not obtain the highest performance in the difficult cases of infrequent labels. In this brief, we formulate DDI as a hypergraph where each hyperedge is a triple: two nodes for drugs and one node for a label. We then present CentSmoothie , a hypergraph neural network (HGNN) that learns representations of nodes and labels altogether with a novel "central-smoothing" formulation. We empirically demonstrate the performance advantages of CentSmoothie in simulations as well as real datasets.

4.
Bioinformatics ; 38(Suppl 1): i333-i341, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758803

RESUMO

MOTIVATION: Predicting side effects of drug-drug interactions (DDIs) is an important task in pharmacology. The state-of-the-art methods for DDI prediction use hypergraph neural networks to learn latent representations of drugs and side effects to express high-order relationships among two interacting drugs and a side effect. The idea of these methods is that each side effect is caused by a unique combination of latent features of the corresponding interacting drugs. However, in reality, a side effect might have multiple, different mechanisms that cannot be represented by a single combination of latent features of drugs. Moreover, DDI data are sparse, suggesting that using a sparsity regularization would help to learn better latent representations to improve prediction performances. RESULTS: We propose SPARSE, which encodes the DDI hypergraph and drug features to latent spaces to learn multiple types of combinations of latent features of drugs and side effects, controlling the model sparsity by a sparse prior. Our extensive experiments using both synthetic and three real-world DDI datasets showed the clear predictive performance advantage of SPARSE over cutting-edge competing methods. Also, latent feature analysis over unknown top predictions by SPARSE demonstrated the interpretability advantage contributed by the model sparsity. AVAILABILITY AND IMPLEMENTATION: Code and data can be accessed at https://github.com/anhnda/SPARSE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Redes Neurais de Computação , Interações Medicamentosas , Humanos
5.
iScience ; 24(1): 102002, 2021 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-33490910

RESUMO

The biological carbon pump, in which carbon fixed by photosynthesis is exported to the deep ocean through sinking, is a major process in Earth's carbon cycle. The proportion of primary production that is exported is termed the carbon export efficiency (CEE). Based on in-lab or regional scale observations, viruses were previously suggested to affect the CEE (i.e., viral "shunt" and "shuttle"). In this study, we tested associations between viral community composition and CEE measured at a global scale. A regression model based on relative abundance of viral marker genes explained 67% of the variation in CEE. Viruses with high importance in the model were predicted to infect ecologically important hosts. These results are consistent with the view that the viral shunt and shuttle functions at a large scale and further imply that viruses likely act in this process in a way dependent on their hosts and ecosystem dynamics.

6.
Brief Bioinform ; 22(1): 164-177, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31838499

RESUMO

MOTIVATION: Adverse drug reaction (ADR) or drug side effect studies play a crucial role in drug discovery. Recently, with the rapid increase of both clinical and non-clinical data, machine learning methods have emerged as prominent tools to support analyzing and predicting ADRs. Nonetheless, there are still remaining challenges in ADR studies. RESULTS: In this paper, we summarized ADR data sources and review ADR studies in three tasks: drug-ADR benchmark data creation, drug-ADR prediction and ADR mechanism analysis. We focused on machine learning methods used in each task and then compare performances of the methods on the drug-ADR prediction task. Finally, we discussed open problems for further ADR studies. AVAILABILITY: Data and code are available at https://github.com/anhnda/ADRPModels.


Assuntos
Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/etiologia , Aprendizado de Máquina , Humanos
7.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2710-2722, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32086195

RESUMO

Hypergraph is a general way of representing high-order relations on a set of objects. It is a generalization of graph, in which only pairwise relations can be represented. It finds applications in various domains where relationships of more than two objects are observed. On a hypergraph, as a generalization of graph, one wishes to learn a smooth function with respect to its topology. A fundamental issue is to find suitable smoothness measures of functions on the nodes of a graph/hypergraph. We show a general framework that generalizes previously proposed smoothness measures and also generates new ones. To address the problem of irrelevant or noisy data, we wish to incorporate sparse learning framework into learning on hypergraphs. We propose sparsely smooth formulations that learn smooth functions and induce sparsity on hypergraphs at both hyperedge and node levels. We show their properties and sparse support recovery results. We conduct experiments to show that our sparsely smooth models are beneficial to learning irrelevant and noisy data, and usually give similar or improved performances compared to dense models.

8.
Bioinformatics ; 35(14): i164-i172, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510641

RESUMO

MOTIVATION: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. RESULTS: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. AVAILABILITY AND IMPLEMENTATION: The code will be accessed through http://www.bic.kyoto-u.ac.jp/pathway/tools/ADAPTIVE after the acceptance of this article.


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Benchmarking , Bases de Dados Factuais , Aprendizado de Máquina
9.
Brief Bioinform ; 20(6): 2028-2043, 2019 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30099485

RESUMO

MOTIVATION: Metabolomics involves studies of a great number of metabolites, which are small molecules present in biological systems. They play a lot of important functions such as energy transport, signaling, building block of cells and inhibition/catalysis. Understanding biochemical characteristics of the metabolites is an essential and significant part of metabolomics to enlarge the knowledge of biological systems. It is also the key to the development of many applications and areas such as biotechnology, biomedicine or pharmaceuticals. However, the identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. The standard method for identifying metabolites is based on the mass spectrometry (MS) preceded by a separation technique. Over many decades, many techniques with different approaches have been proposed for MS-based metabolite identification task, which can be divided into the following four groups: mass spectra database, in silico fragmentation, fragmentation tree and machine learning. In this review paper, we thoroughly survey currently available tools for metabolite identification with the focus on in silico fragmentation, and machine learning-based approaches. We also give an intensive discussion on advanced machine learning methods, which can lead to further improvement on this task.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Metabolômica , Simulação por Computador , Espectroscopia de Ressonância Magnética , Espectrometria de Massas
10.
Bioinformatics ; 34(13): i323-i332, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29950009

RESUMO

Motivation: Recent success in metabolite identification from tandem mass spectra has been led by machine learning, which has two stages: mapping mass spectra to molecular fingerprint vectors and then retrieving candidate molecules from the database. In the first stage, i.e. fingerprint prediction, spectrum peaks are features and considering their interactions would be reasonable for more accurate identification of unknown metabolites. Existing approaches of fingerprint prediction are based on only individual peaks in the spectra, without explicitly considering the peak interactions. Also the current cutting-edge method is based on kernels, which are computationally heavy and difficult to interpret. Results: We propose two learning models that allow to incorporate peak interactions for fingerprint prediction. First, we extend the state-of-the-art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL). Second, we formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that both models achieved comparative prediction accuracy with the current top-performance kernel method. Furthermore SIMPLE clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction. Availability and implementation: The code will be accessed through http://mamitsukalab.org/tools/SIMPLE/.


Assuntos
Algoritmos , Modelos Químicos , Espectrometria de Massas em Tandem , Bases de Dados Factuais , Aprendizado de Máquina , Software
11.
Bioinformatics ; 32(13): 2067-8, 2016 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153725

RESUMO

UNLABELLED: The popularity of using NMR spectroscopy in metabolomics and natural products has driven the development of an array of NMR spectral analysis tools and databases. Particularly, web applications are well used recently because they are platform-independent and easy to extend through reusable web components. Currently available web applications provide the analysis of NMR spectra. However, they still lack the necessary processing and interactive visualization functionalities. To overcome these limitations, we present NMRPro, a web component that can be easily incorporated into current web applications, enabling easy-to-use online interactive processing and visualization. NMRPro integrates server-side processing with client-side interactive visualization through three parts: a python package to efficiently process large NMR datasets on the server-side, a Django App managing server-client interaction, and SpecdrawJS for client-side interactive visualization. AVAILABILITY AND IMPLEMENTATION: Demo and installation instructions are available at http://mamitsukalab.org/tools/nmrpro/ CONTACT: mohamed@kuicr.kyoto-u.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Imagem Assistida por Computador , Espectroscopia de Ressonância Magnética , Software , Bases de Dados Factuais , Internet , Metabolômica
12.
Brief Bioinform ; 17(2): 309-21, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26153512

RESUMO

Research in natural products has always enhanced drug discovery by providing new and unique chemical compounds. However, recently, drug discovery from natural products is slowed down by the increasing chance of re-isolating known compounds. Rapid identification of previously isolated compounds in an automated manner, called dereplication, steers researchers toward novel findings, thereby reducing the time and effort for identifying new drug leads. Dereplication identifies compounds by comparing processed experimental data with those of known compounds, and so, diverse computational resources such as databases and tools to process and compare compound data are necessary. Automating the dereplication process through the integration of computational resources has always been an aspired goal of natural product researchers. To increase the utilization of current computational resources for natural products, we first provide an overview of the dereplication process, and then list useful resources, categorizing into databases, methods and software tools and further explaining them from a dereplication perspective. Finally, we discuss the current challenges to automating dereplication and proposed solutions.


Assuntos
Algoritmos , Produtos Biológicos/química , Cromatografia/métodos , Bases de Dados de Produtos Farmacêuticos , Espectroscopia de Ressonância Magnética/métodos , Espectrometria de Massas/métodos , Produtos Biológicos/análise , Mineração de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Reconhecimento Automatizado de Padrão/métodos , Bibliotecas de Moléculas Pequenas/análise , Bibliotecas de Moléculas Pequenas/química
13.
Bioinformatics ; 30(21): 3139-41, 2014 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-25075120

RESUMO

UNLABELLED: NetPathMiner is a general framework for mining, from genome-scale networks, paths that are related to specific experimental conditions. NetPathMiner interfaces with various input formats including KGML, SBML and BioPAX files and allows for manipulation of networks in three different forms: metabolic, reaction and gene representations. NetPathMiner ranks the obtained paths and applies Markov model-based clustering and classification methods to the ranked paths for easy interpretation. NetPathMiner also provides static and interactive visualizations of networks and paths to aid manual investigation. AVAILABILITY: The package is available through Bioconductor and from Github at http://github.com/ahmohamed/NetPathMiner.


Assuntos
Mineração de Dados/métodos , Redes e Vias Metabólicas/genética , Transdução de Sinais/genética , Software , Metabolismo dos Carboidratos/genética , Análise por Conglomerados , Gráficos por Computador , Genômica , Transcriptoma
14.
IEEE Trans Neural Netw Learn Syst ; 23(11): 1793-804, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24808073

RESUMO

Predicting new links in a network is a problem of interest in many application domains. Most of the prediction methods utilize information on the network's entities, such as nodes, to build a model of links. Network structures are usually not used except for networks with similarity or relatedness semantics. In this paper, we use network structures for link prediction with a more general network type with latent feature models. The problem with these models is the computational cost to train the models directly for large data. We propose a method to solve this problem using kernels and cast the link prediction problem into a binary classification problem. The key idea is not to infer latent features explicitly, but to represent these features implicitly in the kernels, making the method scalable to large networks. In contrast to the other methods for latent feature models, our method inherits all the advantages of the kernel framework: optimality, efficiency, and nonlinearity. On sparse graphs, we show that our proposed kernels are close enough to the ideal kernels defined directly on latent features. We apply our method to real data of protein-protein interaction and gene regulatory networks to show the merits of our method.


Assuntos
Algoritmos , Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Simulação por Computador , Humanos , Aprendizagem , Valor Preditivo dos Testes
15.
IEEE Trans Neural Netw ; 22(9): 1395-405, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21788187

RESUMO

In many applications, the available information is encoded in graph structures. This is a common problem in biological networks, social networks, web communities and document citations. We investigate the problem of classifying nodes' labels on a similarity graph given only a graph structure on the nodes. Conventional machine learning methods usually require data to reside in some Euclidean spaces or to have a kernel representation. Applying these methods to nodes on graphs would require embedding the graphs into these spaces. By embedding and then learning the nodes on graphs, most methods are either flexible with different learning objectives or efficient enough for large scale applications. We propose a method to embed a graph into a feature space for a discriminative purpose. Our idea is to include label information into the embedding process, making the space representation tailored to the task. We design embedding objective functions that the following learning formulations become spectral transforms. We then reformulate these spectral transforms into multiple kernel learning problems. Our method, while being tailored to the discriminative tasks, is efficient and can scale to massive data sets. We show the need of discriminative embedding on some simulations. Applying to biological network problems, our method is shown to outperform baselines.


Assuntos
Inteligência Artificial , Gráficos por Computador , Técnicas de Apoio para a Decisão , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Humanos , Sistemas de Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...