Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
Add more filters










Publication year range
1.
J Thorac Cardiovasc Surg ; 166(1): 141-152.e1, 2023 07.
Article in English | MEDLINE | ID: mdl-34689984

ABSTRACT

OBJECTIVES: We examined for differences in pre-left ventricular assist device (LVAD) implantation myocardial transcriptome signatures among patients with different degrees of mitral regurgitation (MR). METHODS: Between January 2018 and October 2019, we collected left ventricular (LV) cores during durable LVAD implantation (n = 72). A retrospective chart review was performed. Total RNA was isolated from LV cores and used to construct cDNA sequence libraries. The libraries were sequenced with the NovaSeq system, and data were quantified using Kallisto. Gene Set Enrichment Analysis (GSEA) and Gene Ontology analyses were performed, with a false discovery rate <0.05 considered significant. RESULTS: Comparing patients with preoperative mild or less MR (n = 30) and those with moderate-severe MR (n = 42), the moderate-severe MR group weighted less (P = .004) and had more tricuspid valve repairs (P = .043), without differences in demographics or comorbidities. We then compared both groups with a group of human donor hearts without heart failure (n = 8). Compared with the donor hearts, there were 3985 differentially expressed genes (DEGs) for mild or less MR and 4587 DEGs for moderate-severe MR. Specifically altered genes included 448 DEGs for specific for mild or less MR and 1050 DEGs for moderate-severe MR. On GSEA, common regulated genes showed increased immune gene expression and reduced expression of contraction and energetic genes. Of the 1050 genes specific for moderate-severe MR, there were additional up-regulated genes related to inflammation and reduced expression of genes related to cellular proliferation. CONCLUSIONS: Patients undergoing durable LVAD implantation with moderate-severe MR had increased activation of genes related to inflammation and reduction of cellular proliferation genes. This may have important implications for myocardial recovery.


Subject(s)
Heart Failure , Heart Transplantation , Heart-Assist Devices , Mitral Valve Insufficiency , Humans , Mitral Valve Insufficiency/diagnostic imaging , Mitral Valve Insufficiency/genetics , Mitral Valve Insufficiency/surgery , Transcriptome , Retrospective Studies , Treatment Outcome , Tissue Donors , Heart Failure/genetics , Heart Failure/surgery , Inflammation
2.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36477833

ABSTRACT

MOTIVATION: While many quantum computing (QC) methods promise theoretical advantages over classical counterparts, quantum hardware remains limited. Exploiting near-term QC in computer-aided drug design (CADD) thus requires judicious partitioning between classical and quantum calculations. RESULTS: We present HypaCADD, a hybrid classical-quantum workflow for finding ligands binding to proteins, while accounting for genetic mutations. We explicitly identify modules of our drug-design workflow currently amenable to replacement by QC: non-intuitively, we identify the mutation-impact predictor as the best candidate. HypaCADD thus combines classical docking and molecular dynamics with quantum machine learning (QML) to infer the impact of mutations. We present a case study with the coronavirus (SARS-CoV-2) protease and associated mutants. We map a classical machine-learning module onto QC, using a neural network constructed from qubit-rotation gates. We have implemented this in simulation and on two commercial quantum computers. We find that the QML models can perform on par with, if not better than, classical baselines. In summary, HypaCADD offers a successful strategy for leveraging QC for CADD. AVAILABILITY AND IMPLEMENTATION: Jupyter Notebooks with Python code are freely available for academic use on GitHub: https://www.github.com/hypahub/hypacadd_notebook. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , Software , Humans , Workflow , Computing Methodologies , Quantum Theory , SARS-CoV-2 , Drug Design , Molecular Dynamics Simulation
3.
Nat Biotechnol ; 39(9): 1151-1160, 2021 09.
Article in English | MEDLINE | ID: mdl-34504347

ABSTRACT

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.


Subject(s)
Benchmarking , Breast Neoplasms/genetics , DNA Mutational Analysis/standards , High-Throughput Nucleotide Sequencing/standards , Whole Genome Sequencing/standards , Cell Line, Tumor , Datasets as Topic , Germ Cells , Humans , Mutation , Reference Standards , Reproducibility of Results
4.
Circ Heart Fail ; 13(4): e006409, 2020 04.
Article in English | MEDLINE | ID: mdl-32264717

ABSTRACT

BACKGROUND: Ischemic tolerance of donor hearts has a major impact on the efficiency in utilization and clinical outcomes. Molecular events during storage may influence the severity of ischemic injury. METHODS: RNA sequencing was used to study the transcriptional profile of the human left ventricle (LV, n=4) and right ventricle (RV, n=4) after 0, 4, and 8 hours of cold storage in histidine-tryptophan-ketoglutarate preservation solution. Gene set enrichment analysis and gene ontology analysis was used to examine transcriptomic changes with cold storage. Terminal deoxynucleotidyl transferase 2´-Deoxyuridine, 5´-Triphosphate nick end labeling and p65 staining was used to examine for cell death and NFκB activation, respectively. RESULTS: The LV showed activation of genes related to inflammation and allograft rejection but downregulation of oxidative phosphorylation and fatty acid metabolism pathway genes. In contrast, inflammation-related genes were down-regulated in the RV and while oxidative phosphorylation genes were activated. These transcriptomic changes were most significant at the 8 hours with much lower differences observed between 0 and 4 hours. RNA velocity estimates corroborated the finding that immune-related genes were activated in the LV but not in the RV during storage. With increasing preservation duration, the LV showed an increase in nuclear translocation of NFκB (p65), whereas the RV showed increased cell death close to the endocardium especially at 8 hours. CONCLUSIONS: Our results demonstrated that the LV and RV of human donor hearts have distinct responses to cold ischemic storage. Transcriptomic changes related to inflammation, oxidative phosphorylation, and fatty acid metabolism pathways as well as cell death and NFκB activation were most pronounced after 8 hours of storage.


Subject(s)
Cold Temperature/adverse effects , Heart Transplantation , Heart Ventricles/metabolism , Organ Preservation , Primary Graft Dysfunction/genetics , Transcriptome , Apoptosis/drug effects , Apoptosis/genetics , Energy Metabolism/drug effects , Energy Metabolism/genetics , Gene Expression Profiling , Glucose/pharmacology , Heart Transplantation/adverse effects , Heart Ventricles/drug effects , Heart Ventricles/pathology , Humans , Inflammation/genetics , Inflammation/pathology , Mannitol/pharmacology , Organ Preservation/adverse effects , Organ Preservation Solutions/pharmacology , Potassium Chloride/pharmacology , Primary Graft Dysfunction/pathology , Primary Graft Dysfunction/prevention & control , Procaine/pharmacology , Risk Factors , Time Factors , Transcriptome/drug effects
5.
Sci Rep ; 10(1): 4983, 2020 03 18.
Article in English | MEDLINE | ID: mdl-32188929

ABSTRACT

Tumor Mutational Burden (TMB) is a measure of the abundance of somatic mutations in a tumor, which has been shown to be an emerging biomarker for both anti-PD-(L)1 treatment and prognosis; however, multiple challenges still hinder the adoption of TMB as a biomarker. The key challenges are the inconsistency of tumor mutational burden measurement among assays and the lack of a meaningful threshold for TMB classification. Here we describe a new method, ecTMB (Estimation and Classification of TMB), which uses an explicit background mutation model to predict TMB robustly and to classify samples into biologically meaningful subtypes defined by tumor mutational burden.


Subject(s)
Biomarkers, Tumor/genetics , DNA, Neoplasm/genetics , Genome, Human , Mutation , Neoplasms/classification , Neoplasms/genetics , Tumor Burden , DNA Mutational Analysis , DNA, Neoplasm/analysis , Exome , Humans , Immunotherapy/methods , Models, Statistical , Neoplasms/drug therapy , Neoplasms/pathology , Prognosis , Treatment Outcome
6.
Nat Commun ; 10(1): 1041, 2019 03 04.
Article in English | MEDLINE | ID: mdl-30833567

ABSTRACT

Accurate detection of somatic mutations is still a challenge in cancer analysis. Here we present NeuSomatic, the first convolutional neural network approach for somatic mutation detection, which significantly outperforms previous methods on different sequencing platforms, sequencing strategies, and tumor purities. NeuSomatic summarizes sequence alignments into small matrices and incorporates more than a hundred features to capture mutation signals effectively. It can be used universally as a stand-alone somatic mutation detection method or with an ensemble of existing methods to achieve the highest accuracy.


Subject(s)
Computational Biology/methods , DNA Mutational Analysis/methods , Machine Learning , Mutation , Neural Networks, Computer , Computational Biology/instrumentation , DNA Mutational Analysis/instrumentation , Databases, Genetic , Diploidy , Exome , Genes, Neoplasm , Humans , Neoplasms/genetics , Sequence Alignment , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods
7.
Nat Commun ; 9(1): 1069, 2018 03 14.
Article in English | MEDLINE | ID: mdl-29540679

ABSTRACT

The human genome is generally organized into stable chromosomes, and only tumor cells are known to accumulate kilobase (kb)-sized extrachromosomal circular DNA elements (eccDNAs). However, it must be expected that kb eccDNAs exist in normal cells as a result of mutations. Here, we purify and sequence eccDNAs from muscle and blood samples from 16 healthy men, detecting ~100,000 unique eccDNA types from 16 million nuclei. Half of these structures carry genes or gene fragments and the majority are smaller than 25 kb. Transcription from eccDNAs suggests that eccDNAs reside in nuclei and recurrence of certain eccDNAs in several individuals implies DNA circularization hotspots. Gene-rich chromosomes contribute to more eccDNAs per megabase and the most transcribed protein-coding gene in muscle, TTN (titin), provides the most eccDNAs per gene. Thus, somatic genomes are rich in chromosome-derived eccDNAs that may influence phenotypes through altered gene copy numbers and transcription of full-length or truncated genes.


Subject(s)
Chromosomes, Human/genetics , DNA, Circular/genetics , Humans , Mutation/genetics , Transcription, Genetic/genetics
8.
Genome Res ; 28(4): 423-431, 2018 04.
Article in English | MEDLINE | ID: mdl-29567674

ABSTRACT

Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype-6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age-leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6-8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes (COL1A1, COL2A1, KMT2D, FLNB, ATR, TRIP11, PCNT) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification.


Subject(s)
DNA, Ancient/analysis , Genome, Human/genetics , Osteochondrodysplasias/genetics , Whole Genome Sequencing , Animals , Female , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Molecular Sequence Annotation , Mutation/genetics , Osteochondrodysplasias/physiopathology , Phenotype , Polymorphism, Single Nucleotide/genetics
9.
Nat Commun ; 8(1): 59, 2017 07 05.
Article in English | MEDLINE | ID: mdl-28680106

ABSTRACT

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.


Subject(s)
Embryonic Stem Cells , Transcriptome , Base Sequence , Cell Line , Humans
10.
Hum Mutat ; 38(9): 1155-1168, 2017 09.
Article in English | MEDLINE | ID: mdl-28397312

ABSTRACT

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.


Subject(s)
Computational Biology/methods , Sequence Analysis, DNA/methods , Databases, Genetic , Genetic Predisposition to Disease , Genetic Testing , Humans , Phenotype
11.
Bioinformatics ; 32(24): 3829-3832, 2016 12 15.
Article in English | MEDLINE | ID: mdl-27667791

ABSTRACT

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. AVAILABILITY AND IMPLEMENTATION: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Software , Computer Simulation , Sequence Alignment
12.
BMC Genomics ; 17: 64, 2016 Jan 16.
Article in English | MEDLINE | ID: mdl-26772178

ABSTRACT

BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.


Subject(s)
Genome, Human , Genomic Structural Variation , Software , Benchmarking , Genomics , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Pedigree , Polymorphism, Single Nucleotide/genetics
13.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Article in English | MEDLINE | ID: mdl-26432246

ABSTRACT

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Physical Chromosome Mapping , Amino Acid Sequence , Genetic Predisposition to Disease , Genetics, Medical , Genetics, Population , Genome-Wide Association Study , Genomics , Genotype , Haplotypes/genetics , Homozygote , Humans , Molecular Sequence Data , Mutation Rate , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, DNA , Sequence Deletion/genetics
15.
Genome Biol ; 16: 197, 2015 Sep 17.
Article in English | MEDLINE | ID: mdl-26381235

ABSTRACT

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.


Subject(s)
DNA Mutational Analysis/methods , Machine Learning , Neoplasms/genetics , Humans , INDEL Mutation
16.
Sci Rep ; 5: 14493, 2015 Sep 28.
Article in English | MEDLINE | ID: mdl-26412485

ABSTRACT

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.


Subject(s)
Benchmarking , High-Throughput Nucleotide Sequencing/methods , Genetic Variation , Genome, Human , Genomics/methods , High-Throughput Nucleotide Sequencing/standards , Humans
17.
Nat Commun ; 6: 7256, 2015 Jun 01.
Article in English | MEDLINE | ID: mdl-26028266

ABSTRACT

Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence microinsertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These microinsertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.


Subject(s)
Chromosome Breakpoints , DNA/metabolism , Gene Deletion , Genome, Human/genetics , Chromatin , DNA Replication , Homologous Recombination , Humans , Mutation , Nucleotides , Sequence Deletion
18.
Bioinformatics ; 31(16): 2741-4, 2015 Aug 15.
Article in English | MEDLINE | ID: mdl-25861968

ABSTRACT

UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Software , Mutagenesis, Insertional , Sequence Deletion
19.
Bioinformatics ; 31(9): 1469-71, 2015 May 01.
Article in English | MEDLINE | ID: mdl-25524895

ABSTRACT

SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Software , Computer Simulation , Genomics , Humans , Mutation , Neoplasms/genetics , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...