Search | VHL Regional Portal

Ten simple rules for a successful international consortium in big data omics.

Stobbe, Miranda D; Gonzalez-Perez, Abel; Lopez-Bigas, Nuria; Gut, Ivo Glynne.

PLoS Comput Biol ; 18(10): e1010546, 2022 10.

Article in English | MEDLINE | ID: mdl-36264838

Subject(s)

Big Data

Framework for quality assessment of whole genome cancer sequences.

Whalley, Justin P; Buchhalter, Ivo; Rheinbay, Esther; Raine, Keiran M; Stobbe, Miranda D; Kleinheinz, Kortine; Werner, Johannes; Beltran, Sergi; Gut, Marta; Hübschmann, Daniel; Hutter, Barbara; Livitz, Dimitri; Perry, Marc D; Rosenberg, Mara; Saksena, Gordon; Trotta, Jean-Rémi; Eils, Roland; Gerhard, Daniela S; Campbell, Peter J; Schlesner, Matthias; Gut, Ivo G.

Nat Commun ; 11(1): 5040, 2020 10 07.

Article in English | MEDLINE | ID: mdl-33028839

ABSTRACT

Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2 .

Subject(s)

Genome, Human/genetics , Genomics/standards , Neoplasms/genetics , Quality Control , Chromosome Mapping/standards , Chromosomes, Human/genetics , DNA Mutational Analysis/standards , Female , Genomics/methods , High-Throughput Nucleotide Sequencing/standards , Humans , Male , Mutation , Software , Whole Genome Sequencing/standards

Recurrent somatic mutations reveal new insights into consequences of mutagenic processes in cancer.

Stobbe, Miranda D; Thun, Gian A; Diéguez-Docampo, Andrea; Oliva, Meritxell; Whalley, Justin P; Raineri, Emanuele; Gut, Ivo G.

PLoS Comput Biol ; 15(11): e1007496, 2019 11.

Article in English | MEDLINE | ID: mdl-31765368

ABSTRACT

The sheer size of the human genome makes it improbable that identical somatic mutations at the exact same position are observed in multiple tumours solely by chance. The scarcity of cancer driver mutations also precludes positive selection as the sole explanation. Therefore, recurrent mutations may be highly informative of characteristics of mutational processes. To explore the potential, we use recurrence as a starting point to cluster >2,500 whole genomes of a pan-cancer cohort. We describe each genome with 13 recurrence-based and 29 general mutational features. Using principal component analysis we reduce the dimensionality and create independent features. We apply hierarchical clustering to the first 18 principal components followed by k-means clustering. We show that the resulting 16 clusters capture clinically relevant cancer phenotypes. High levels of recurrent substitutions separate the clusters that we link to UV-light exposure and deregulated activity of POLE from the one representing defective mismatch repair, which shows high levels of recurrent insertions/deletions. Recurrence of both mutation types characterizes cancer genomes with somatic hypermutation of immunoglobulin genes and the cluster of genomes exposed to gastric acid. Low levels of recurrence are observed for the cluster where tobacco-smoke exposure induces mutagenesis and the one linked to increased activity of cytidine deaminases. Notably, the majority of substitutions are recurrent in a single tumour type, while recurrent insertions/deletions point to shared processes between tumour types. Recurrence also reveals susceptible sequence motifs, including TT[C>A]TTT and AAC[T>G]T for the POLE and 'gastric-acid exposure' clusters, respectively. Moreover, we refine knowledge of mutagenesis, including increased C/G deletion levels in general for lung tumours and specifically in midsize homopolymer sequence contexts for microsatellite instable tumours. Our findings are an important step towards the development of a generic cancer diagnostic test for clinical practice based on whole-genome sequencing that could replace multiple diagnostics currently in use.

Subject(s)

Computational Biology/methods , Neoplasms/classification , Neoplasms/genetics , Cohort Studies , Databases, Nucleic Acid , Genetic Predisposition to Disease/genetics , Genome, Human/genetics , Humans , INDEL Mutation/genetics , Mutagenesis/genetics , Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Sequence Deletion/genetics

Building the future of bioinformatics through student-facilitated conferencing.

Ramdayal, Kavisha; Stobbe, Miranda D; Mishra, Tarun; Michaut, Magali.

PLoS Comput Biol ; 10(1): e1003458, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24499938

ABSTRACT

Sharing results, techniques, and challenges is paramount to advance our understanding of any field of science. In the scientific community this exchange of ideas is mainly made possible through national and international conferences. Scientists have the opportunity to showcase their work, receive feedback, and improve their presentation skills. However, conferences can be large and intimidating for young researchers. In addition, for many of the more prestigious conferences, the very high number of submissions and low selection rate are major limitations to aspiring young researchers aiming to present their work to the scientific community. To improve student participation and proliferation of information, regional student groups have successfully organized conferences and symposia specifically aimed at students. This gives more students the opportunity to present their work and receive valuable experience and insight from peers and leaders in the field. At the same time, it is an ideal way for students to gain familiarity with the conference experience. In this paper, we highlight some of the benefits of participating in such student conferences, and we review the challenges we have encountered when organizing them. Both topics are illustrated in detail with examples from different ISCB Student Council Regional Student Groups.

Subject(s)

Computational Biology/education , Computational Biology/methods , Students , Communication , Congresses as Topic , Humans , Societies, Scientific

Knowledge representation in metabolic pathway databases.

Stobbe, Miranda D; Jansen, Gerbert A; Moerland, Perry D; van Kampen, Antoine H C.

Brief Bioinform ; 15(3): 455-70, 2014 May.

Article in English | MEDLINE | ID: mdl-23202525

ABSTRACT

The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.

Subject(s)

Artificial Intelligence/statistics & numerical data , Computational Biology/methods , Databases, Factual , Metabolic Networks and Pathways , Computer Simulation , Enzymes/genetics , Enzymes/metabolism , Fatty Acids/metabolism , Humans

Consensus and conflict cards for metabolic pathway databases.

Stobbe, Miranda D; Swertz, Morris A; Thiele, Ines; Rengaw, Trebor; van Kampen, Antoine H C; Moerland, Perry D.

BMC Syst Biol ; 7: 50, 2013 Jun 26.

Article in English | MEDLINE | ID: mdl-23803311

ABSTRACT

BACKGROUND: The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial. RESULTS: We introduce the concept of Consensus and Conflict Cards (C2Cards) to provide concise overviews of what the databases do or do not agree on. Each card is centered at a single gene, EC number or reaction. These three complementary perspectives make it possible to distinguish disagreements on the underlying biology of a metabolic process from differences that can be explained by different decisions on how and in what detail to represent knowledge. As a proof-of-concept, we implemented C2Cards(Human), as a web application http://www.molgenis.org/c2cards, covering five human pathway databases. CONCLUSIONS: C2Cards can contribute to ongoing reconciliation efforts by simplifying the identification of consensus and conflicts between pathway databases and lowering the threshold for experts to contribute. Several case studies illustrate the potential of the C2Cards in identifying disagreements on the underlying biology of a metabolic process. The overviews may also point out controversial biological knowledge that should be subject of further research. Finally, the examples provided emphasize the importance of manual curation and the need for a broad community involvement.

Subject(s)

Data Mining/methods , Databases, Genetic , Metabolic Networks and Pathways , Conflict, Psychological , Consensus , Humans , Metabolic Networks and Pathways/genetics

A community-driven global reconstruction of human metabolism.

Thiele, Ines; Swainston, Neil; Fleming, Ronan M T; Hoppe, Andreas; Sahoo, Swagatika; Aurich, Maike K; Haraldsdottir, Hulda; Mo, Monica L; Rolfsson, Ottar; Stobbe, Miranda D; Thorleifsson, Stefan G; Agren, Rasmus; Bölling, Christian; Bordel, Sergio; Chavali, Arvind K; Dobson, Paul; Dunn, Warwick B; Endler, Lukas; Hala, David; Hucka, Michael; Hull, Duncan; Jameson, Daniel; Jamshidi, Neema; Jonsson, Jon J; Juty, Nick; Keating, Sarah; Nookaew, Intawat; Le Novère, Nicolas; Malys, Naglis; Mazein, Alexander; Papin, Jason A; Price, Nathan D; Selkov, Evgeni; Sigurdsson, Martin I; Simeonidis, Evangelos; Sonnenschein, Nikolaus; Smallbone, Kieran; Sorokin, Anatoly; van Beek, Johannes H G M; Weichart, Dieter; Goryanin, Igor; Nielsen, Jens; Westerhoff, Hans V; Kell, Douglas B; Mendes, Pedro; Palsson, Bernhard Ø.

Nat Biotechnol ; 31(5): 419-25, 2013 May.

Article in English | MEDLINE | ID: mdl-23455439

ABSTRACT

Multiple models of human metabolism have been reconstructed, but each represents only a subset of our knowledge. Here we describe Recon 2, a community-driven, consensus 'metabolic reconstruction', which is the most comprehensive representation of human metabolism that is applicable to computational modeling. Compared with its predecessors, the reconstruction has improved topological and functional features, including â¼2× more reactions and â¼1.7× more unique metabolites. Using Recon 2 we predicted changes in metabolite biomarkers for 49 inborn errors of metabolism with 77% accuracy when compared to experimental data. Mapping metabolomic data and drug information onto Recon 2 demonstrates its potential for integrating and analyzing diverse data types. Using protein expression data, we automatically generated a compendium of 65 cell type-specific models, providing a basis for manual curation or investigation of cell-specific metabolic properties. Recon 2 will facilitate many future biomedical studies and is freely available at http://humanmetabolism.org/.

Subject(s)

Databases, Protein , Metabolome/physiology , Models, Biological , Proteome/metabolism , Computer Simulation , Humans

Improving the description of metabolic networks: the TCA cycle as example.

Stobbe, Miranda D; Houten, Sander M; van Kampen, Antoine H C; Wanders, Ronald J A; Moerland, Perry D.

FASEB J ; 26(9): 3625-36, 2012 Sep.

Article in English | MEDLINE | ID: mdl-22661004

ABSTRACT

To collect the ever-increasing yet scattered knowledge on metabolism, multiple pathway databases like the Kyoto Encyclopedia of Genes and Genomes have been created. A complete and accurate description of the metabolic network for human and other organisms is essential to foster new biological discoveries. Previous research has shown, however, that the level of agreement among pathway databases is surprisingly low. We investigated whether the lack of consensus among databases can be explained by an inaccurate representation of the knowledge described in scientific literature. As an example, we focus on the well-known tricarboxylic acid (TCA) cycle and evaluated the description of this pathway as found in a comprehensive selection of 10 human metabolic pathway databases. Remarkably, none of the descriptions given by these databases is entirely correct. Moreover, consensus exists on only 3 reactions. Mistakes in pathway databases might lead to the propagation of incorrect knowledge, misinterpretation of high-throughput molecular data, and poorly designed follow-up experiments. We provide an improved description of the TCA cycle via the community-curated database WikiPathways. We review various initiatives that aim to improve the description of the human metabolic network and discuss the importance of the active involvement of biological experts in these.

Subject(s)

Citric Acid Cycle , Humans

Critical assessment of human metabolic pathway databases: a stepping stone for future integration.

Stobbe, Miranda D; Houten, Sander M; Jansen, Gerbert A; van Kampen, Antoine H C; Moerland, Perry D.

BMC Syst Biol ; 5: 165, 2011 Oct 14.

Article in English | MEDLINE | ID: mdl-21999653

ABSTRACT

BACKGROUND: Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. RESULTS: We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. CONCLUSIONS: Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor.

Subject(s)

Citric Acid Cycle , Databases, Factual , Metabolic Networks and Pathways , Databases, Genetic , Genes , Humans , Terminology as Topic

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL