Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36477833

RESUMEN

MOTIVATION: While many quantum computing (QC) methods promise theoretical advantages over classical counterparts, quantum hardware remains limited. Exploiting near-term QC in computer-aided drug design (CADD) thus requires judicious partitioning between classical and quantum calculations. RESULTS: We present HypaCADD, a hybrid classical-quantum workflow for finding ligands binding to proteins, while accounting for genetic mutations. We explicitly identify modules of our drug-design workflow currently amenable to replacement by QC: non-intuitively, we identify the mutation-impact predictor as the best candidate. HypaCADD thus combines classical docking and molecular dynamics with quantum machine learning (QML) to infer the impact of mutations. We present a case study with the coronavirus (SARS-CoV-2) protease and associated mutants. We map a classical machine-learning module onto QC, using a neural network constructed from qubit-rotation gates. We have implemented this in simulation and on two commercial quantum computers. We find that the QML models can perform on par with, if not better than, classical baselines. In summary, HypaCADD offers a successful strategy for leveraging QC for CADD. AVAILABILITY AND IMPLEMENTATION: Jupyter Notebooks with Python code are freely available for academic use on GitHub: https://www.github.com/hypahub/hypacadd_notebook. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , Programas Informáticos , Humanos , Flujo de Trabajo , Metodologías Computacionales , Teoría Cuántica , SARS-CoV-2 , Diseño de Fármacos , Simulación de Dinámica Molecular
2.
J Thorac Cardiovasc Surg ; 166(1): 141-152.e1, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-34689984

RESUMEN

OBJECTIVES: We examined for differences in pre-left ventricular assist device (LVAD) implantation myocardial transcriptome signatures among patients with different degrees of mitral regurgitation (MR). METHODS: Between January 2018 and October 2019, we collected left ventricular (LV) cores during durable LVAD implantation (n = 72). A retrospective chart review was performed. Total RNA was isolated from LV cores and used to construct cDNA sequence libraries. The libraries were sequenced with the NovaSeq system, and data were quantified using Kallisto. Gene Set Enrichment Analysis (GSEA) and Gene Ontology analyses were performed, with a false discovery rate <0.05 considered significant. RESULTS: Comparing patients with preoperative mild or less MR (n = 30) and those with moderate-severe MR (n = 42), the moderate-severe MR group weighted less (P = .004) and had more tricuspid valve repairs (P = .043), without differences in demographics or comorbidities. We then compared both groups with a group of human donor hearts without heart failure (n = 8). Compared with the donor hearts, there were 3985 differentially expressed genes (DEGs) for mild or less MR and 4587 DEGs for moderate-severe MR. Specifically altered genes included 448 DEGs for specific for mild or less MR and 1050 DEGs for moderate-severe MR. On GSEA, common regulated genes showed increased immune gene expression and reduced expression of contraction and energetic genes. Of the 1050 genes specific for moderate-severe MR, there were additional up-regulated genes related to inflammation and reduced expression of genes related to cellular proliferation. CONCLUSIONS: Patients undergoing durable LVAD implantation with moderate-severe MR had increased activation of genes related to inflammation and reduction of cellular proliferation genes. This may have important implications for myocardial recovery.


Asunto(s)
Insuficiencia Cardíaca , Trasplante de Corazón , Corazón Auxiliar , Insuficiencia de la Válvula Mitral , Humanos , Insuficiencia de la Válvula Mitral/diagnóstico por imagen , Insuficiencia de la Válvula Mitral/genética , Insuficiencia de la Válvula Mitral/cirugía , Transcriptoma , Estudios Retrospectivos , Resultado del Tratamiento , Donantes de Tejidos , Insuficiencia Cardíaca/genética , Insuficiencia Cardíaca/cirugía , Inflamación
3.
Nat Biotechnol ; 39(9): 1151-1160, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34504347

RESUMEN

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.


Asunto(s)
Benchmarking , Neoplasias de la Mama/genética , Análisis Mutacional de ADN/normas , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Secuenciación Completa del Genoma/normas , Línea Celular Tumoral , Conjuntos de Datos como Asunto , Células Germinativas , Humanos , Mutación , Estándares de Referencia , Reproducibilidad de los Resultados
4.
Circ Heart Fail ; 13(4): e006409, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32264717

RESUMEN

BACKGROUND: Ischemic tolerance of donor hearts has a major impact on the efficiency in utilization and clinical outcomes. Molecular events during storage may influence the severity of ischemic injury. METHODS: RNA sequencing was used to study the transcriptional profile of the human left ventricle (LV, n=4) and right ventricle (RV, n=4) after 0, 4, and 8 hours of cold storage in histidine-tryptophan-ketoglutarate preservation solution. Gene set enrichment analysis and gene ontology analysis was used to examine transcriptomic changes with cold storage. Terminal deoxynucleotidyl transferase 2´-Deoxyuridine, 5´-Triphosphate nick end labeling and p65 staining was used to examine for cell death and NFκB activation, respectively. RESULTS: The LV showed activation of genes related to inflammation and allograft rejection but downregulation of oxidative phosphorylation and fatty acid metabolism pathway genes. In contrast, inflammation-related genes were down-regulated in the RV and while oxidative phosphorylation genes were activated. These transcriptomic changes were most significant at the 8 hours with much lower differences observed between 0 and 4 hours. RNA velocity estimates corroborated the finding that immune-related genes were activated in the LV but not in the RV during storage. With increasing preservation duration, the LV showed an increase in nuclear translocation of NFκB (p65), whereas the RV showed increased cell death close to the endocardium especially at 8 hours. CONCLUSIONS: Our results demonstrated that the LV and RV of human donor hearts have distinct responses to cold ischemic storage. Transcriptomic changes related to inflammation, oxidative phosphorylation, and fatty acid metabolism pathways as well as cell death and NFκB activation were most pronounced after 8 hours of storage.


Asunto(s)
Frío/efectos adversos , Trasplante de Corazón , Ventrículos Cardíacos/metabolismo , Preservación de Órganos , Disfunción Primaria del Injerto/genética , Transcriptoma , Apoptosis/efectos de los fármacos , Apoptosis/genética , Metabolismo Energético/efectos de los fármacos , Metabolismo Energético/genética , Perfilación de la Expresión Génica , Glucosa/farmacología , Trasplante de Corazón/efectos adversos , Ventrículos Cardíacos/efectos de los fármacos , Ventrículos Cardíacos/patología , Humanos , Inflamación/genética , Inflamación/patología , Manitol/farmacología , Preservación de Órganos/efectos adversos , Soluciones Preservantes de Órganos/farmacología , Cloruro de Potasio/farmacología , Disfunción Primaria del Injerto/patología , Disfunción Primaria del Injerto/prevención & control , Procaína/farmacología , Factores de Riesgo , Factores de Tiempo , Transcriptoma/efectos de los fármacos
5.
Sci Rep ; 10(1): 4983, 2020 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-32188929

RESUMEN

Tumor Mutational Burden (TMB) is a measure of the abundance of somatic mutations in a tumor, which has been shown to be an emerging biomarker for both anti-PD-(L)1 treatment and prognosis; however, multiple challenges still hinder the adoption of TMB as a biomarker. The key challenges are the inconsistency of tumor mutational burden measurement among assays and the lack of a meaningful threshold for TMB classification. Here we describe a new method, ecTMB (Estimation and Classification of TMB), which uses an explicit background mutation model to predict TMB robustly and to classify samples into biologically meaningful subtypes defined by tumor mutational burden.


Asunto(s)
Biomarcadores de Tumor/genética , ADN de Neoplasias/genética , Genoma Humano , Mutación , Neoplasias/clasificación , Neoplasias/genética , Carga Tumoral , Análisis Mutacional de ADN , ADN de Neoplasias/análisis , Exoma , Humanos , Inmunoterapia/métodos , Modelos Estadísticos , Neoplasias/tratamiento farmacológico , Neoplasias/patología , Pronóstico , Resultado del Tratamiento
6.
Nat Commun ; 10(1): 1041, 2019 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-30833567

RESUMEN

Accurate detection of somatic mutations is still a challenge in cancer analysis. Here we present NeuSomatic, the first convolutional neural network approach for somatic mutation detection, which significantly outperforms previous methods on different sequencing platforms, sequencing strategies, and tumor purities. NeuSomatic summarizes sequence alignments into small matrices and incorporates more than a hundred features to capture mutation signals effectively. It can be used universally as a stand-alone somatic mutation detection method or with an ensemble of existing methods to achieve the highest accuracy.


Asunto(s)
Biología Computacional/métodos , Análisis Mutacional de ADN/métodos , Aprendizaje Automático , Mutación , Redes Neurales de la Computación , Biología Computacional/instrumentación , Análisis Mutacional de ADN/instrumentación , Bases de Datos Genéticas , Diploidia , Exoma , Genes Relacionados con las Neoplasias , Humanos , Neoplasias/genética , Alineación de Secuencia , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos
7.
Genome Res ; 28(4): 423-431, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29567674

RESUMEN

Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype-6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age-leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6-8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes (COL1A1, COL2A1, KMT2D, FLNB, ATR, TRIP11, PCNT) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification.


Asunto(s)
ADN Antiguo/análisis , Genoma Humano/genética , Osteocondrodisplasias/genética , Secuenciación Completa del Genoma , Animales , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Anotación de Secuencia Molecular , Mutación/genética , Osteocondrodisplasias/fisiopatología , Fenotipo , Polimorfismo de Nucleótido Simple/genética
8.
Nat Commun ; 9(1): 1069, 2018 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-29540679

RESUMEN

The human genome is generally organized into stable chromosomes, and only tumor cells are known to accumulate kilobase (kb)-sized extrachromosomal circular DNA elements (eccDNAs). However, it must be expected that kb eccDNAs exist in normal cells as a result of mutations. Here, we purify and sequence eccDNAs from muscle and blood samples from 16 healthy men, detecting ~100,000 unique eccDNA types from 16 million nuclei. Half of these structures carry genes or gene fragments and the majority are smaller than 25 kb. Transcription from eccDNAs suggests that eccDNAs reside in nuclei and recurrence of certain eccDNAs in several individuals implies DNA circularization hotspots. Gene-rich chromosomes contribute to more eccDNAs per megabase and the most transcribed protein-coding gene in muscle, TTN (titin), provides the most eccDNAs per gene. Thus, somatic genomes are rich in chromosome-derived eccDNAs that may influence phenotypes through altered gene copy numbers and transcription of full-length or truncated genes.


Asunto(s)
Cromosomas Humanos/genética , ADN Circular/genética , Humanos , Mutación/genética , Transcripción Genética/genética
9.
Nat Commun ; 8(1): 59, 2017 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-28680106

RESUMEN

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.


Asunto(s)
Células Madre Embrionarias , Transcriptoma , Secuencia de Bases , Línea Celular , Humanos
10.
Hum Mutat ; 38(9): 1155-1168, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28397312

RESUMEN

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Pruebas Genéticas , Humanos , Fenotipo
11.
Bioinformatics ; 32(24): 3829-3832, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27667791

RESUMEN

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. AVAILABILITY AND IMPLEMENTATION: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Simulación por Computador , Alineación de Secuencia
12.
BMC Genomics ; 17: 64, 2016 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-26772178

RESUMEN

BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Programas Informáticos , Benchmarking , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular , Linaje , Polimorfismo de Nucleótido Simple/genética
13.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-26432246

RESUMEN

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mapeo Físico de Cromosoma , Secuencia de Aminoácidos , Predisposición Genética a la Enfermedad , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Genotipo , Haplotipos/genética , Homocigoto , Humanos , Datos de Secuencia Molecular , Tasa de Mutación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia/genética
14.
Genome Biol ; 16: 197, 2015 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-26381235

RESUMEN

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.


Asunto(s)
Análisis Mutacional de ADN/métodos , Aprendizaje Automático , Neoplasias/genética , Humanos , Mutación INDEL
15.
Sci Rep ; 5: 14493, 2015 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-26412485

RESUMEN

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.


Asunto(s)
Benchmarking , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Variación Genética , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos
17.
Nat Commun ; 6: 7256, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-26028266

RESUMEN

Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence microinsertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These microinsertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.


Asunto(s)
Puntos de Rotura del Cromosoma , ADN/metabolismo , Eliminación de Gen , Genoma Humano/genética , Cromatina , Replicación del ADN , Recombinación Homóloga , Humanos , Mutación , Nucleótidos , Eliminación de Secuencia
18.
Bioinformatics ; 31(16): 2741-4, 2015 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-25861968

RESUMEN

UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Mutagénesis Insercional , Eliminación de Secuencia
19.
Bioinformatics ; 31(9): 1469-71, 2015 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-25524895

RESUMEN

SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Simulación por Computador , Genómica , Humanos , Mutación , Neoplasias/genética , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA