Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Plant Biotechnol J ; 21(6): 1240-1253, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36807472

RESUMO

Rapid adaptation of weeds to herbicide applications in agriculture through resistance development is a widespread phenomenon. In particular, the grass Alopecurus myosuroides is an extremely problematic weed in cereal crops with the potential to manifest resistance in only a few generations. Target-site resistances (TSRs), with their strong phenotypic response, play an important role in this rapid adaptive response. Recently, using PacBio's long-read amplicon sequencing technology in hundreds of individuals, we were able to decipher the genomic context in which TSR mutations occur. However, sequencing individual amplicons are costly and time-consuming, thus impractical to implement for other resistance loci or applications. Alternatively, pool-based approaches overcome these limitations and provide reliable allele frequencies, although at the expense of not preserving haplotype information. In this proof-of-concept study, we sequenced with PacBio High Fidelity (HiFi) reads long-range amplicons (13.2 kb), encompassing the entire ACCase gene in pools of over 100 individuals, and resolved them into haplotypes using the clustering algorithm PacBio amplicon analysis (pbaa), a new application for pools in plants and other organisms. From these amplicon pools, we were able to recover most haplotypes from previously sequenced individuals of the same population. In addition, we analysed new pools from a Germany-wide collection of A. myosuroides populations and found that TSR mutations originating from soft sweeps of independent origin were common. Forward-in-time simulations indicate that TSR haplotypes will persist for decades even at relatively low frequencies and without selection, highlighting the importance of accurate measurement of TSR haplotype prevalence for weed management.


Assuntos
Acetil-CoA Carboxilase , Resistência a Herbicidas , Poaceae , Acetil-CoA Carboxilase/genética , Agricultura , Frequência do Gene/genética , Haplótipos/genética , Resistência a Herbicidas/genética , Herbicidas/farmacologia , Mutação , Poaceae/genética
2.
PLoS Comput Biol ; 18(5): e1009123, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35639788

RESUMO

Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.


Assuntos
Ecossistema , Variação Genética , Biologia Computacional , Variação Genética/genética , Nucleotídeos , Software
3.
Nat Commun ; 12(1): 1935, 2021 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-33911078

RESUMO

Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Bovinos , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Peixe-Zebra/genética
4.
Nat Commun ; 11(1): 2071, 2020 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-32350247

RESUMO

Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.


Assuntos
Bovinos/genética , Variação Genética , Genoma , Haplótipos/genética , Alelos , Desequilíbrio Alélico , Animais , Sequência de Bases , Cromossomos de Mamíferos/genética , Feminino , Loci Gênicos , Mutação INDEL/genética , Masculino , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Sequências Repetitivas de Ácido Nucleico/genética
5.
Ann Hum Genet ; 84(2): 125-140, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31711268

RESUMO

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.


Assuntos
Biomarcadores/análise , Variação Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Gravidez
6.
Science ; 366(6463)2019 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-31624180

RESUMO

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


Assuntos
Introgressão Genética , Animais , Duplicação Cromossômica , Cromossomos Humanos Par 16/genética , Cromossomos Humanos Par 8/genética , Variações do Número de Cópias de DNA , Evolução Molecular , Genoma Humano , Haplótipos , Hominidae/genética , Humanos , Melanesia , Modelos Genéticos , Homem de Neandertal/genética , Polimorfismo Genético , Seleção Genética , Sequenciamento Completo do Genoma
7.
Nat Commun ; 10(1): 4233, 2019 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-31530812

RESUMO

We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.


Assuntos
Genoma , Hominidae/genética , Macaca mulatta/genética , Animais , China , Evolução Molecular , Hominidae/classificação , Humanos , Macaca mulatta/classificação , Masculino , Anotação de Sequência Molecular , Fenótipo , Análise de Sequência de RNA , Especificidade da Espécie
8.
Nat Commun ; 10(1): 1784, 2019 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-30992455

RESUMO

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.


Assuntos
Genoma Humano/genética , Variação Estrutural do Genoma , Genômica/métodos , Haplótipos/genética , Algoritmos , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Sequenciamento Completo do Genoma/métodos
9.
Cell ; 176(4): 743-756.e17, 2019 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-30735633

RESUMO

Direct comparisons of human and non-human primate brains can reveal molecular pathways underlying remarkable specializations of the human brain. However, chimpanzee tissue is inaccessible during neocortical neurogenesis when differences in brain size first appear. To identify human-specific features of cortical development, we leveraged recent innovations that permit generating pluripotent stem cell-derived cerebral organoids from chimpanzee. Despite metabolic differences, organoid models preserve gene regulatory networks related to primary cell types and developmental processes. We further identified 261 differentially expressed genes in human compared to both chimpanzee organoids and macaque cortex, enriched for recent gene duplications, and including multiple regulators of PI3K-AKT-mTOR signaling. We observed increased activation of this pathway in human radial glia, dependent on two receptors upregulated specifically in human: INSR and ITGB8. Our findings establish a platform for systematic analysis of molecular changes contributing to human brain development and evolution.


Assuntos
Córtex Cerebral/citologia , Organoides/metabolismo , Animais , Evolução Biológica , Encéfalo/citologia , Técnicas de Cultura de Células/métodos , Diferenciação Celular/genética , Córtex Cerebral/metabolismo , Redes Reguladoras de Genes/genética , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Macaca , Neurogênese/genética , Organoides/crescimento & desenvolvimento , Pan troglodytes , Células-Tronco Pluripotentes/citologia , Análise de Célula Única , Especificidade da Espécie , Transcriptoma/genética
10.
Science ; 360(6393)2018 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-29880660

RESUMO

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.


Assuntos
Evolução Molecular , Genoma Humano , Hominidae/genética , Animais , Mapeamento de Sequências Contíguas , Variação Genética , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA
11.
Genome Res ; 28(7): 1029-1038, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29884752

RESUMO

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Assuntos
Genoma Humano/genética , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Anotação de Sequência Molecular/métodos , RNA/genética , Ratos
13.
Cell ; 171(3): 710-722.e12, 2017 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-28965761

RESUMO

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10-8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10-3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10-3), suggesting a path forward for genetically characterizing more complex cases of autism.


Assuntos
Transtorno Autístico/genética , Variações do Número de Cópias de DNA , Polimorfismo de Nucleotídeo Único , Animais , Análise Mutacional de DNA , Feminino , Estudo de Associação Genômica Ampla , Humanos , Mutação INDEL , Masculino , Camundongos
14.
J Virol ; 91(4)2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27928012

RESUMO

Viruses are under relentless selective pressure from host immune defenses. To study how poxviruses adapt to innate immune detection pathways, we performed serial vaccinia virus infections in primary human cells. Independent courses of experimental evolution with a recombinant strain lacking E3L revealed several high-frequency point mutations in conserved poxvirus genes, suggesting important roles for essential poxvirus proteins in innate immune subversion. Two distinct mutations were identified in the viral RNA polymerase gene A24R, which seem to act through different mechanisms to increase virus replication. Specifically, a Leu18Phe substitution encoded within A24R conferred fitness trade-offs, including increased activation of the antiviral factor protein kinase R (PKR). Intriguingly, this A24R variant underwent a drastic selective sweep during passaging, despite enhanced PKR activity. We showed that the sweep of this variant could be accelerated by the presence of copy number variation (CNV) at the K3L locus, which in multiple copies strongly reduced PKR activation. Therefore, adaptive cases of CNV can facilitate the accumulation of point mutations separate from the expanded locus. This study reveals how rapid bouts of gene copy number amplification during accrual of distant point mutations can potently facilitate poxvirus adaptation to host defenses. IMPORTANCE: Viruses can evolve quickly to defeat host immune functions. For poxviruses, little is known about how multiple adaptive mutations emerge in populations at the same time. In this study, we uncovered a means of vaccinia virus adaptation involving the accumulation of distinct genetic variants within a single population. We identified adaptive point mutations in the viral RNA polymerase gene A24R and, surprisingly, found that one of these mutations activates the nucleic acid sensing factor PKR. We also found that gene copy number variation (CNV) can provide dual benefits to evolving virus populations, including evidence that CNV facilitates the accumulation of a point mutation distant from the expanded locus. Our data suggest that transient CNV can accelerate the fixation of mutations conferring modest benefits, or even fitness trade-offs, and highlight how structural variation might aid poxvirus adaptation through both direct and indirect actions.


Assuntos
Evolução Biológica , Variações do Número de Cópias de DNA , RNA Polimerases Dirigidas por DNA/genética , Amplificação de Genes , Vaccinia virus/fisiologia , Adaptação Biológica , Alelos , RNA Polimerases Dirigidas por DNA/metabolismo , Evolução Molecular , Fibroblastos , Frequência do Gene , Aptidão Genética , Genoma Viral , Humanos , Taxa de Mutação , Fases de Leitura Aberta , Mutação Puntual , RNA Viral , Replicação Viral
15.
Genome Res ; 27(5): 677-685, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-27895111

RESUMO

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Variação Estrutural do Genoma , Haploidia , Análise de Sequência de DNA/métodos , Mapeamento de Sequências Contíguas/normas , Projeto Genoma Humano , Humanos , Análise de Sequência de DNA/normas
16.
Nat Commun ; 7: 13316, 2016 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-27824329

RESUMO

Recurrent de novo (DN) and likely gene-disruptive (LGD) mutations contribute significantly to autism spectrum disorders (ASDs) but have been primarily investigated in European cohorts. Here, we sequence 189 risk genes in 1,543 Chinese ASD probands (1,045 from trios). We report an 11-fold increase in the odds of DN LGD mutations compared with expectation under an exome-wide neutral model of mutation. In aggregate, ∼4% of ASD patients carry a DN mutation in one of just 29 autism risk genes. The most prevalent gene for recurrent DN mutations is SCN2A (1.1% of patients) followed by CHD8, DSCAM, MECP2, POGZ, WDFY3 and ASH1L. We identify novel DN LGD recurrences (GIGYF2, MYT1L, CUL3, DOCK8 and ZNF292) and DN mutations in previous ASD candidates (ARHGAP32, NCOR1, PHIP, STXBP1, CDKL5 and SHANK1). Phenotypic follow-up confirms potential subtypes and highlights how large global cohorts might be leveraged to prove the pathogenic significance of individually rare mutations.


Assuntos
Transtorno do Espectro Autista/genética , Mutação/genética , Povo Asiático/genética , Estudos de Coortes , Análise Mutacional de DNA , Exoma/genética , Predisposição Genética para Doença , Geografia , Humanos , Padrões de Herança/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco
17.
Genome Res ; 26(11): 1453-1467, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27803192

RESUMO

Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10-5). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.


Assuntos
Evolução Molecular , Predisposição Genética para Doença , Instabilidade Genômica , Duplicações Segmentares Genômicas , Animais , Pontos de Quebra do Cromossomo , Deleção Cromossômica , Cromossomos Humanos Par 8/genética , Humanos , Primatas/genética
18.
Science ; 352(6281): aae0344, 2016 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-27034376

RESUMO

Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.


Assuntos
Gorilla gorilla/genética , Análise de Sequência de DNA/métodos , Animais , Mapeamento de Sequências Contíguas , Evolução Molecular , Etiquetas de Sequências Expressas , Feminino , Variação Genética , Genoma Humano , Genômica , Humanos , Alinhamento de Sequência
19.
PLoS Comput Biol ; 11(12): e1004572, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26625158

RESUMO

Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools-Lumpy, Delly and SoftSearch-and demonstrate Wham's ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Estudos de Associação Genética/métodos , Variação Genética/genética , Genoma Humano/genética , Variação Estrutural do Genoma/genética , Sequência de Bases , Humanos , Dados de Sequência Molecular , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA