Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
Add more filters










Publication year range
1.
Sci Rep ; 14(1): 4068, 2024 02 19.
Article in English | MEDLINE | ID: mdl-38374282

ABSTRACT

The gut microbiome is a diverse ecosystem, dominated by bacteria; however, fungi, phages/viruses, archaea, and protozoa are also important members of the gut microbiota. Exploration of taxonomic compositions beyond bacteria as well as an understanding of the interaction between the bacteriome with the other members is limited using 16S rDNA sequencing. Here, we developed a pipeline enabling the simultaneous interrogation of the gut microbiome (bacteriome, mycobiome, archaeome, eukaryome, DNA virome) and of antibiotic resistance genes based on optimized long-read shotgun metagenomics protocols and custom bioinformatics. Using our pipeline we investigated the longitudinal composition of the gut microbiome in an exploratory clinical study in patients undergoing allogeneic hematopoietic stem cell transplantation (alloHSCT; n = 31). Pre-transplantation microbiomes exhibited a 3-cluster structure, characterized by Bacteroides spp. /Phocaeicola spp., mixed composition and Enterococcus abundances. We revealed substantial inter-individual and temporal variabilities of microbial domain compositions, human DNA, and antibiotic resistance genes during the course of alloHSCT. Interestingly, viruses and fungi accounted for substantial proportions of microbiome content in individual samples. In the course of HSCT, bacterial strains were stable or newly acquired. Our results demonstrate the disruptive potential of alloHSCTon the gut microbiome and pave the way for future comprehensive microbiome studies based on long-read metagenomics.


Subject(s)
Gastrointestinal Microbiome , Hematopoietic Stem Cell Transplantation , Microbiota , Humans , Gastrointestinal Microbiome/genetics , Microbiota/genetics , Bacteria/genetics , Anti-Bacterial Agents , Fungi/genetics , DNA, Ribosomal , Metagenomics/methods
2.
Genome Biol ; 25(1): 26, 2024 Jan 19.
Article in English | MEDLINE | ID: mdl-38243222

ABSTRACT

Potato is one of the world's major staple crops, and like many important crop plants, it has a polyploid genome. Polyploid haplotype assembly poses a major computational challenge. We introduce a novel strategy for the assembly of polyploid genomes and present an assembly of the autotetraploid potato cultivar Altus. Our method uses low-depth sequencing data from an offspring population to achieve chromosomal clustering and haplotype phasing on the assembly graph. Our approach generates high-quality assemblies of individual chromosomes with haplotype-specific sequence resolution of whole chromosome arms and can be applied in common breeding scenarios where collections of offspring are available.


Subject(s)
Solanum tuberosum , Tetraploidy , Humans , Haplotypes , Sequence Analysis, DNA , Solanum tuberosum/genetics , Plant Breeding , Polyploidy
3.
Cell Syst ; 14(12): 1122-1130.e3, 2023 12 20.
Article in English | MEDLINE | ID: mdl-38128484

ABSTRACT

The efficacy of epitope vaccines depends on the included epitopes as well as the probability that the selected epitopes are presented by the major histocompatibility complex (MHC) proteins of a vaccinated individual. Designing vaccines that effectively immunize a high proportion of the population is challenging because of high MHC polymorphism, diverging MHC-peptide binding affinities, and physical constraints on epitope vaccine constructs. Here, we present HOGVAX, a combinatorial optimization approach for epitope vaccine design. To optimize population coverage within the constraint of limited vaccine construct space, HOGVAX employs a hierarchical overlap graph (HOG) to identify and exploit overlaps between selected peptides and explicitly models the structure of linkage disequilibrium in the MHC. In a SARS-CoV-2 case study, we demonstrate that HOGVAX-designed vaccines contain substantially more epitopes than vaccines built from concatenated peptides and predict vaccine efficacy in over 98% of the population with high numbers of presented peptides in vaccinated individuals.


Subject(s)
COVID-19 , Vaccines , Humans , SARS-CoV-2 , COVID-19/prevention & control , Epitopes, T-Lymphocyte , Peptides
4.
J Comput Aided Mol Des ; 37(8): 357-371, 2023 08.
Article in English | MEDLINE | ID: mdl-37310542

ABSTRACT

An Online tool for Fragment-based Molecule Parametrization (OFraMP) is described. OFraMP is a web application for assigning atomic interaction parameters to large molecules by matching sub-fragments within the target molecule to equivalent sub-fragments within the Automated Topology Builder (ATB, atb.uq.edu.au) database. OFraMP identifies and compares alternative molecular fragments from the ATB database, which contains over 890,000 pre-parameterized molecules, using a novel hierarchical matching procedure. Atoms are considered within the context of an extended local environment (buffer region) with the degree of similarity between an atom in the target molecule and that in the proposed match controlled by varying the size of the buffer region. Adjacent matching atoms are combined into progressively larger matched sub-structures. The user then selects the most appropriate match. OFraMP also allows users to manually alter interaction parameters and automates the submission of missing substructures to the ATB in order to generate parameters for atoms in environments not represented in the existing database. The utility of OFraMP is illustrated using the anti-cancer agent paclitaxel and a dendrimer used in organic semiconductor devices. OFraMP applied to paclitaxel (ATB ID 35922).


Subject(s)
Software , Databases, Factual
5.
Orig Life Evol Biosph ; 52(4): 263-275, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36383289

ABSTRACT

Protein coordinated iron-sulfur clusters drive electron flow within metabolic pathways for organisms throughout the tree of life. It is not known how iron-sulfur clusters were first incorporated into proteins. Structural analogies to iron-sulfide minerals present on early Earth, suggest a connection in the evolution of both proteins and minerals. The availability of large protein and mineral crystallographic structure data sets, provides an opportunity to explore co-evolution of proteins and minerals on a large-scale using informatics approaches. However, quantitative comparisons are confounded by the infinite, repeating nature of the mineral lattice, in contrast to metal clusters in proteins, which are finite in size. We address this problem using the Niggli reduction to transform a mineral lattice to a finite, unique structure that when translated reproduces the crystal lattice. Protein and reduced mineral structures were represented as quotient graphs with the edges and nodes corresponding to bonds and atoms, respectively. We developed a graph theory-based method to calculate the maximum common connected edge subgraph (MCCES) between mineral and protein quotient graphs. MCCES can accommodate differences in structural volumes and easily allows additional chemical criteria to be considered when calculating similarity. To account for graph size differences, we use the Tversky similarity index. Using consistent criteria, we found little similarity between putative ancient iron-sulfur protein clusters and iron-sulfur mineral lattices, suggesting these metal sites are not as evolutionarily connected as once thought. We discuss possible evolutionary implications of these findings in addition to suggesting an alternative proxy, mineral surfaces, for better understanding the coevolution of the geosphere and biosphere.


Subject(s)
Iron-Sulfur Proteins , Metalloproteins , Minerals , Iron-Sulfur Proteins/chemistry , Iron-Sulfur Proteins/metabolism , Sulfur/chemistry , Sulfur/metabolism , Iron/chemistry
6.
iScience ; 25(6): 104461, 2022 Jun 17.
Article in English | MEDLINE | ID: mdl-35692633

ABSTRACT

An important challenge in genome assembly is haplotype phasing, that is, to reconstruct the different haplotype sequences of an individual genome. Phasing becomes considerably more difficult with increasing ploidy, which makes polyploid phasing a notoriously hard computational problem. We present a novel genetic phasing method for plant breeding with the aim to phase two deep-sequenced parental samples with the help of a large number of progeny samples sequenced at low depth. The key ideas underlying our approach are to (i) integrate the individually weak Mendelian progeny signals with a Bayesian log-likelihood model, (ii) cluster alleles according to their likelihood of co-occurrence, and (iii) assign them to haplotypes via an interval scheduling approach. We show on two deep-sequenced parental and 193 low-depth progeny potato samples that our approach computes high-quality sparse phasings and that it scales to whole genomes.

7.
Cell Genom ; 2(2)2022 Feb 09.
Article in English | MEDLINE | ID: mdl-35382456

ABSTRACT

Recent genome-wide CRISPR-Cas9 loss-of-function screens have identified genetic dependencies across many cancer cell lines. Associations between these dependencies and genomic alterations in the same cell lines reveal phenomena such as oncogene addiction and synthetic lethality. However, comprehensive identification of such associations is complicated by complex interactions between genes across genetically heterogeneous cancer types. We introduce and apply the algorithm SuperDendrix to CRISPR-Cas9 loss-of-function screens from 769 cancer cell lines, to identify differential dependencies across cell lines and to find associations between differential dependencies and combinations of genomic alterations and cell-type-specific markers. These associations respect the position and type of interactions within pathways: for example, we observe increased dependencies on downstream activators of pathways, such as NFE2L2, and decreased dependencies on upstream activators of pathways, such as CDK6. SuperDendrix also reveals dozens of dependencies on lineage-specific transcription factors, identifies cancer-type-specific correlations between dependencies, and enables annotation of individual mutated residues.

8.
Elife ; 102021 06 21.
Article in English | MEDLINE | ID: mdl-34152268

ABSTRACT

In the adult heart, the epicardium becomes activated after injury, contributing to cardiac healing by secretion of paracrine factors. Here, we analyzed by single-cell RNA sequencing combined with RNA in situ hybridization and lineage tracing of Wilms tumor protein 1-positive (WT1+) cells, the cellular composition, location, and hierarchy of epicardial stromal cells (EpiSC) in comparison to activated myocardial fibroblasts/stromal cells in infarcted mouse hearts. We identified 11 transcriptionally distinct EpiSC populations, which can be classified into three groups, each containing a cluster of proliferating cells. Two groups expressed cardiac specification markers and sarcomeric proteins suggestive of cardiomyogenic potential. Transcripts of hypoxia-inducible factor (HIF)-1α and HIF-responsive genes were enriched in EpiSC consistent with an epicardial hypoxic niche. Expression of paracrine factors was not limited to WT1+ cells but was a general feature of activated cardiac stromal cells. Our findings provide the cellular framework by which myocardial ischemia may trigger in EpiSC the formation of cardioprotective/regenerative responses.


Subject(s)
Fibroblasts/metabolism , Myocardium/metabolism , Pericardium/physiology , Stromal Cells/metabolism , Transcriptome , Animals , Gene Expression Profiling , In Situ Hybridization , Male , Mice , Mice, Inbred C57BL , RNA , Sequence Analysis, RNA , Single-Cell Analysis , WT1 Proteins/metabolism
9.
Algorithms Mol Biol ; 16(1): 11, 2021 Jun 28.
Article in English | MEDLINE | ID: mdl-34183036

ABSTRACT

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.

10.
Genome Biol ; 21(1): 252, 2020 09 21.
Article in English | MEDLINE | ID: mdl-32951599

ABSTRACT

Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. Polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes.We present WHATSHAP POLYPHASE, a novel two-stage approach that addresses these challenges by (i) clustering reads and (ii) threading the haplotypes through the clusters. Our method outperforms the state-of-the-art in terms of phasing quality. Using a real tetraploid potato dataset, we demonstrate how to assemble local genomic regions of interest at the haplotype level. Our algorithm is implemented as part of the widely used open source tool WhatsHap.


Subject(s)
Haplotypes , Models, Genetic , Polyploidy , Algorithms , Solanum tuberosum/genetics
11.
Psychol Methods ; 2020 Jun 22.
Article in English | MEDLINE | ID: mdl-32567870

ABSTRACT

Numerous applications in psychological research require that a pool of elements is partitioned into multiple parts. While many applications seek groups that are well-separated, that is, dissimilar from each other, others require the different groups to be as similar as possible. Examples include the assignment of students to parallel courses, assembling stimulus sets in experimental psychology, splitting achievement tests into parts of equal difficulty, and dividing a data set for cross-validation. We present anticlust, an easy-to-use and free software package for solving these problems fast and in an automated manner. The package anticlust is an open source extension to the R programming language and implements the methodology of anticlustering. Anticlustering divides elements into similar parts, ensuring similarity between groups by enforcing heterogeneity within groups. Thus, anticlustering is the direct reversal of cluster analysis that aims to maximize homogeneity within groups and dissimilarity between groups. Our package anticlust implements 2 anticlustering criteria, reversing the clustering methods k-means and cluster editing, respectively. In a simulation study, we show that anticlustering returns excellent results and outperforms alternative approaches like random assignment and matching. In 3 example applications, we illustrate how to apply anticlust on real data sets. We demonstrate how to assign experimental stimuli to equivalent sets based on norming data, how to divide a large data set for cross-validation, and how to split a test into parts of equal item difficulty and discrimination. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

12.
Algorithms Mol Biol ; 14: 1, 2019.
Article in English | MEDLINE | ID: mdl-30839948

ABSTRACT

A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Charges are accompanied by a score that depends on their observed frequency in similar neighbourhoods (chemical environments) in a database of previously parameterised molecules. The aim is to assign the charges such that the total charge equals a known target charge within a margin of error while maximizing the sum of the charge scores. We show that the problem is a variant of the well-studied multiple-choice knapsack problem and thus weakly NP -complete. We propose solutions based on Integer Linear Programming and a pseudo-polynomial time Dynamic Programming algorithm. We demonstrate that the results obtained for novel molecules not included in the database are comparable to the ones obtained performing explicit charge calculations while decreasing the time to determine partial charges for a molecule from hours or even days to below a second. Our software is openly available.

13.
F1000Res ; 7: 519, 2018.
Article in English | MEDLINE | ID: mdl-29983924

ABSTRACT

eXamine is a Cytoscape app that displays set membership as contours on top of a node-link layout of a small graph. In addition to facilitating interpretation of enriched gene sets of small biological networks, eXamine can be used in other domains such as the visualization of communities in small social networks. eXamine was made available on the Cytoscape App Store in March 2014, has since registered more than 7,700 downloads, and has been highly rated by more than 25 users. In this paper, we present eXamine's new automation features that enable researchers to compose reproducible analysis workflows to generate visualizations of small, set-annotated graphs.

14.
J Comput Biol ; 25(7): 689-708, 2018 07.
Article in English | MEDLINE | ID: mdl-29658782

ABSTRACT

Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.


Subject(s)
Computational Biology , DNA Copy Number Variations/genetics , Neoplasms/genetics , Phylogeny , Algorithms , Humans , Neoplasms/pathology
15.
Appl Environ Microbiol ; 83(21)2017 Nov 01.
Article in English | MEDLINE | ID: mdl-28842544

ABSTRACT

Whooping cough is a highly contagious respiratory disease caused by Bordetella pertussis Despite widespread vaccination, its incidence has been rising alarmingly, and yet, the physiology of B. pertussis remains poorly understood. We combined genome-scale metabolic reconstruction, a novel optimization algorithm, and experimental data to probe the full metabolic potential of this pathogen, using B. pertussis strain Tohama I as a reference. Experimental validation showed that B. pertussis secretes a significant proportion of nitrogen as arginine and purine nucleosides, which may contribute to modulation of the host response. We also found that B. pertussis can be unexpectedly versatile, being able to metabolize many compounds while displaying minimal nutrient requirements. It can grow without cysteine, using inorganic sulfur sources, such as thiosulfate, and it can grow on organic acids, such as citrate or lactate, as sole carbon sources, providing in vivo demonstration that its tricarboxylic acid (TCA) cycle is functional. Although the metabolic reconstruction of eight additional strains indicates that the structural genes underlying this metabolic flexibility are widespread, experimental validation suggests a role of strain-specific regulatory mechanisms in shaping metabolic capabilities. Among five alternative strains tested, three strains were shown to grow on substrate combinations requiring a functional TCA cycle, but only one strain could use thiosulfate. Finally, the metabolic model was used to rationally design growth media with >2-fold improvements in pertussis toxin production. This study thus provides novel insights into B. pertussis physiology and highlights the potential, but also the limitations, of models based solely on metabolic gene content.IMPORTANCE The metabolic capabilities of Bordetella pertussis, the causative agent of whooping cough, were investigated from a systems-level perspective. We constructed a comprehensive genome-scale metabolic model for B. pertussis and challenged its predictions experimentally. This systems approach shed light on new potential host-microbe interactions and allowed us to rationally design novel growth media with >2-fold improvements in pertussis toxin production. Most importantly, we also uncovered the potential for metabolic flexibility of B. pertussis (significantly larger range of substrates than previously alleged; novel active pathways allowing growth in minimal, nearly mineral nutrient combinations where only the carbon source must be organic), although our results also highlight the importance of strain-specific regulatory determinants in shaping metabolic capabilities. Deciphering the underlying regulatory mechanisms appears to be crucial for a comprehensive understanding of B. pertussis's lifestyle and the epidemiology of whooping cough. The contribution of metabolic models in this context will require the extension of the genome-scale metabolic model to integrate this regulatory dimension.

16.
Sci Rep ; 6: 36812, 2016 11 23.
Article in English | MEDLINE | ID: mdl-27876821

ABSTRACT

Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present 'Logic Optimization for Binary Input to Continuous Output' (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models.


Subject(s)
Antineoplastic Agents/therapeutic use , Neoplasms/drug therapy , Algorithms , Cell Line, Tumor , Data Mining/methods , Humans , Logic , Models, Theoretical , Precision Medicine/methods , Sensitivity and Specificity
17.
J Comput Biol ; 23(9): 718-36, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27280382

ABSTRACT

In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.


Subject(s)
Algorithms , Diploidy , Genome, Human , Haplotypes , Polyploidy , Sequence Analysis, DNA/methods , Humans , Models, Genetic , Polymorphism, Single Nucleotide
18.
Genome Biol ; 17: 16, 2016 Jan 30.
Article in English | MEDLINE | ID: mdl-26831908

ABSTRACT

We present CIDANE, a novel framework for genome-based transcript reconstruction and quantification from RNA-seq reads. CIDANE assembles transcripts efficiently with significantly higher sensitivity and precision than existing tools. Its algorithmic core not only reconstructs transcripts ab initio, but also allows the use of the growing annotation of known splice sites, transcription start and end sites, or full-length transcripts, which are available for most model organisms. CIDANE supports the integrated analysis of RNA-seq and additional gene-boundary data and recovers splice junctions that are invisible to other methods. CIDANE is available at http://ccb.jhu.edu/software/cidane/.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Protein Isoforms/genetics , RNA/genetics , Sequence Analysis, RNA/methods , Algorithms , Gene Expression Profiling , Protein Isoforms/isolation & purification , RNA Splicing/genetics , Software , Transcriptome/genetics
19.
Bioinformatics ; 32(11): 1610-7, 2016 06 01.
Article in English | MEDLINE | ID: mdl-26315913

ABSTRACT

MOTIVATION: Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of 'future-generation' sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions. RESULTS: By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. AVAILABILITY AND IMPLEMENTATION: Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/ CONTACT: bonizzoni@disco.unimib.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Haplotypes , Algorithms , Diploidy , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Software
20.
Bioinformatics ; 32(11): 1678-85, 2016 06 01.
Article in English | MEDLINE | ID: mdl-26342232

ABSTRACT

MOTIVATION: The human microbiome plays a key role in health and disease. Thanks to comparative metatranscriptomics, the cellular functions that are deregulated by the microbiome in disease can now be computationally explored. Unlike gene-centric approaches, pathway-based methods provide a systemic view of such functions; however, they typically consider each pathway in isolation and in its entirety. They can therefore overlook the key differences that (i) span multiple pathways, (ii) contain bidirectionally deregulated components, (iii) are confined to a pathway region. To capture these properties, computational methods that reach beyond the scope of predefined pathways are needed. RESULTS: By integrating an existing module discovery algorithm into comparative metatranscriptomic analysis, we developed metaModules, a novel computational framework for automated identification of the key functional differences between health- and disease-associated communities. Using this framework, we recovered significantly deregulated subnetworks that were indeed recognized to be involved in two well-studied, microbiome-mediated oral diseases, such as butanoate production in periodontal disease and metabolism of sugar alcohols in dental caries. More importantly, our results indicate that our method can be used for hypothesis generation based on automated discovery of novel, disease-related functional subnetworks, which would otherwise require extensive and laborious manual assessment. AVAILABILITY AND IMPLEMENTATION: metaModules is available at https://bitbucket.org/alimay/metamodules/ CONTACT: a.may@vu.nl or s.abeln@vu.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Microbiota , Algorithms , Dental Caries , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...