Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
BMC Bioinformatics ; 19(1): 28, 2018 01 31.
Article in English | MEDLINE | ID: mdl-29385983

ABSTRACT

BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS: The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS: The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.


Subject(s)
Genome, Human , Germ Cells/metabolism , Polymorphism, Single Nucleotide , Algorithms , Humans , Internet , Neoplasms/genetics , Neoplasms/pathology , User-Computer Interface , Whole Genome Sequencing
2.
BMC Bioinformatics ; 19(1): 339, 2018 Sep 25.
Article in English | MEDLINE | ID: mdl-30253747

ABSTRACT

BACKGROUND: Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. RESULTS: To determine how to create subsets of predictions for validation that maximize accuracy of global error profile inference, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. We evaluated these selection strategies on one simulated and two experimental datasets. CONCLUSIONS: Valection is implemented in multiple programming languages, available at: http://labs.oicr.on.ca/boutros-lab/software/valection.


Subject(s)
Sequence Analysis, DNA/methods , Software Validation
3.
Nat Methods ; 12(7): 623-30, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25984700

ABSTRACT

The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.


Subject(s)
Benchmarking , Crowdsourcing , Genome , Neoplasms/genetics , Polymorphism, Single Nucleotide , Algorithms , Humans
4.
Nat Methods ; 11(10): 1071-5, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25173705

ABSTRACT

As high-throughput sequencing continues to increase in speed and throughput, routine clinical and industrial application draws closer. These 'production' settings will require enhanced quality monitoring and quality control to optimize output and reduce costs. We developed SeqControl, a framework for predicting sequencing quality and coverage using a set of 15 metrics describing overall coverage, coverage distribution, basewise coverage and basewise quality. Using whole-genome sequences of 27 prostate cancers and 26 normal references, we derived multivariate models that predict sequencing quality and depth. SeqControl robustly predicted how much sequencing was required to reach a given coverage depth (area under the curve (AUC) = 0.993), accurately classified clinically relevant formalin-fixed, paraffin-embedded samples, and made predictions from as little as one-eighth of a sequencing lane (AUC = 0.967). These techniques can be immediately incorporated into existing sequencing pipelines to monitor data quality in real time. SeqControl is available at http://labs.oicr.on.ca/Boutros-lab/software/SeqControl/.


Subject(s)
Computational Biology/methods , Prostatic Neoplasms/metabolism , Sequence Analysis, DNA/methods , Algorithms , Area Under Curve , Genome , Genotype , Humans , Linear Models , Male , Multivariate Analysis , Quality Control , Software
5.
Alzheimers Dement ; 12(6): 645-53, 2016 06.
Article in English | MEDLINE | ID: mdl-27079753

ABSTRACT

Identifying accurate biomarkers of cognitive decline is essential for advancing early diagnosis and prevention therapies in Alzheimer's disease. The Alzheimer's disease DREAM Challenge was designed as a computational crowdsourced project to benchmark the current state-of-the-art in predicting cognitive outcomes in Alzheimer's disease based on high dimensional, publicly available genetic and structural imaging data. This meta-analysis failed to identify a meaningful predictor developed from either data modality, suggesting that alternate approaches should be considered for prediction of cognitive performance.


Subject(s)
Alzheimer Disease/complications , Cognition Disorders/diagnosis , Cognition Disorders/etiology , Alzheimer Disease/genetics , Apolipoproteins E/genetics , Biomarkers , Cognition Disorders/genetics , Computational Biology , Databases, Bibliographic/statistics & numerical data , Humans , Predictive Value of Tests
6.
Cell Syst ; 12(8): 827-838.e5, 2021 08 18.
Article in English | MEDLINE | ID: mdl-34146471

ABSTRACT

The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.


Subject(s)
Neoplasms , Humans , Neoplasms/genetics , Protein Isoforms/genetics , RNA/genetics , RNA-Seq , Sequence Analysis, RNA
7.
Genome Biol ; 19(1): 188, 2018 11 06.
Article in English | MEDLINE | ID: mdl-30400818

ABSTRACT

BACKGROUND: The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic structural variant detection algorithms have been created to enable these discoveries; however, there are no systematic benchmarks of them. Rigorous performance evaluation of somatic structural variant detection methods has been challenged by the lack of gold standards, extensive resource requirements, and difficulties arising from the need to share personal genomic information. RESULTS: To facilitate structural variant detection algorithm evaluations, we create a robust simulation framework for somatic structural variants by extending the BAMSurgeon algorithm. We then organize and enable a crowdsourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of structural variant benchmarking on three different tumors, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for new ways to aggregate somatic structural variant detection approaches. CONCLUSIONS: The synthetic tumors and somatic structural variant detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon .


Subject(s)
Benchmarking , Computer Simulation , Crowdsourcing , Genetic Variation , Genome, Human , Genomics/methods , Neoplasms/genetics , Algorithms , Databases, Genetic , High-Throughput Nucleotide Sequencing , Humans , Software
SELECTION OF CITATIONS
SEARCH DETAIL