Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nature ; 583(7814): 83-89, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32460305

RESUMO

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Assuntos
Variação Genética , Genoma Humano/genética , Sequenciamento Completo do Genoma , Alelos , Estudos de Casos e Controles , Epigênese Genética , Feminino , Dosagem de Genes/genética , Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Anotação de Sequência Molecular , Locos de Características Quantitativas , Grupos Raciais/genética , Software
2.
Am J Hum Genet ; 109(4): 680-691, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35298919

RESUMO

Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.


Assuntos
Variação Genética , Projetos de Pesquisa , Simulação por Computador , Variação Genética/genética , Haplótipos/genética , Humanos , Modelos Genéticos , Herança Multifatorial
3.
Nat Methods ; 19(4): 445-448, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35396485

RESUMO

Structural variants are associated with cancers and developmental disorders, but challenges with estimating population frequency remain a barrier to prioritizing mutations over inherited variants. In particular, variability in variant calling heuristics and filtering limits the use of current structural variant catalogs. We present STIX, a method that, instead of relying on variant calls, indexes and searches the raw alignments from thousands of samples to enable more comprehensive allele frequency estimation.


Assuntos
Genoma , Variação Estrutural do Genoma , Neoplasias , Algoritmos , Variação Estrutural do Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Software
4.
Nucleic Acids Res ; 48(12): 6597-6610, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32479598

RESUMO

The human genome encodes an order of magnitude more gene expression enhancers than promoters, suggesting that most genes are regulated by the combined action of multiple enhancers. We have previously shown that neighboring estrogen-responsive enhancers exhibit complex synergistic contributions to the production of an estrogenic transcriptional response. Here we sought to determine the molecular underpinnings of this enhancer cooperativity. We generated genetic deletions of four estrogen receptor α (ER) bound enhancers that regulate two genes and found that enhancers containing full estrogen response element (ERE) motifs control ER binding at neighboring sites, while enhancers with pre-existing histone acetylation/accessibility confer a permissible chromatin environment to the neighboring enhancers. Genome engineering revealed that two enhancers with half EREs could not compensate for the lack of a full ERE site within the cluster. In contrast, two enhancers with full EREs produced a transcriptional response greater than the wild-type locus. By swapping genomic sequences, we found that the genomic location of a full ERE strongly influences enhancer activity. Our results lead to a model in which a full ERE is required for ER recruitment, but the presence of a pre-existing permissible chromatin environment can also be needed for estrogen-driven gene regulation to occur.


Assuntos
Elementos Facilitadores Genéticos/genética , Receptor alfa de Estrogênio/genética , Motivos de Nucleotídeos/genética , Transcrição Gênica , Acetilação , Cromatina/genética , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Humanos , Regiões Promotoras Genéticas/genética
5.
Nat Methods ; 15(2): 123-126, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29309061

RESUMO

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Assuntos
Neoplasias da Mama/genética , Genoma Humano , Genômica/métodos , Ferramenta de Busca/métodos , Análise de Sequência de DNA/métodos , Software , Bases de Dados Genéticas , Feminino , Humanos , Internet
7.
Bioinformatics ; 35(22): 4782-4787, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31218349

RESUMO

SUMMARY: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies. AVAILABILITY AND IMPLEMENTATION: svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Humano , Software , Humanos , Deleção de Sequência , Sequenciamento Completo do Genoma
8.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29873782

RESUMO

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Assuntos
Genômica/métodos , Software , Imunoprecipitação da Cromatina , Fator de Transcrição GATA1/metabolismo , Internet , Análise de Sequência de DNA , Interface Usuário-Computador
9.
Nat Methods ; 13(1): 63-5, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26550772

RESUMO

Genotype Query Tools (GQT) is an indexing strategy that expedites analyses of genome-variation data sets in Variant Call Format based on sample genotypes, phenotypes and relationships. GQT's compressed genotype index minimizes decompression for analysis, and its performance relative to that of existing methods improves with cohort size. We show substantial (up to 443-fold) gains in performance over existing methods and demonstrate GQT's utility for exploring massive data sets involving thousands to millions of genomes. GQT can be accessed at https://github.com/ryanlayer/gqt.


Assuntos
Variação Genética , Genótipo , Conjuntos de Dados como Assunto
10.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26258291

RESUMO

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Software , Variação Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Fluxo de Trabalho
11.
Proc IEEE Inst Electr Electron Eng ; 105(3): 542-551, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30333632

RESUMO

The comparison of sets of genome intervals (e.g., genes, repeats, ChIP-seq peaks) is essential to genome research, especially as modern sequencing technologies enable ever larger and more complex experiments. Relationships between genomic features are commonly identified by their intersection: that is, if feature sets contain overlapping intervals then it is inferred that they share a common biological function or origin. Using this technique, researchers identify genomic regions that are common among multiple (or unique to individual) datasets. While there have been recent advances in algorithms for pairwise intersections between two sets of genomic intervals, few advances have been made to the intersection of many sets of genomic intervals. Identifying intersections among many interval sets is particularly important when attempting to distill biological insights from the massive, multi-dimensional datasets that are common to modern genome research. For such analyses, speed and efficiency are crucial given the size and sheer number of datasets involved. To solve this problem, we present a novel "slice-then-sweep" algorithm that, given N interval sets, efficiently reveals the subset of intervals that are common to all N sets. We demonstrate that our algorithm is more efficient in the sequential case and has a vastly higher capacity for parallelization with a 19x speedup over the existing algorithm.

12.
Genome Res ; 23(5): 762-76, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23410887

RESUMO

Tumor genomes are generally thought to evolve through a gradual accumulation of mutations, but the observation that extraordinarily complex rearrangements can arise through single mutational events suggests that evolution may be accelerated by punctuated changes in genome architecture. To assess the prevalence and origins of complex genomic rearrangements (CGRs), we mapped 6179 somatic structural variation breakpoints in 64 cancer genomes from seven tumor types and screened for clusters of three or more interconnected breakpoints. We find that complex breakpoint clusters are extremely common: 154 clusters comprise 25% of all somatic breakpoints, and 75% of tumors exhibit at least one complex cluster. Based on copy number state profiling, 63% of breakpoint clusters are consistent with being CGRs that arose through a single mutational event. CGRs have diverse architectures including focal breakpoint clusters, large-scale rearrangements joining clusters from one or more chromosomes, and staggeringly complex chromothripsis events. Notably, chromothripsis has a significantly higher incidence in glioblastoma samples (39%) relative to other tumor types (9%). Chromothripsis breakpoints also show significantly elevated intra-tumor allele frequencies relative to simple SVs, which indicates that they arise early during tumorigenesis or confer selective advantage. Finally, assembly and analysis of 4002 somatic and 6982 germline breakpoint sequences reveal that somatic breakpoints show significantly less microhomology and fewer templated insertions than germline breakpoints, and this effect is stronger at CGRs than at simple variants. These results are inconsistent with replication-based models of CGR genesis and strongly argue that nonhomologous repair of concurrently arising DNA double-strand breaks is the predominant mechanism underlying complex cancer genome rearrangements.


Assuntos
Aberrações Cromossômicas , Pontos de Quebra do Cromossomo , Mutação/genética , Neoplasias/genética , Sequência de Bases , Quebras de DNA de Cadeia Dupla , Replicação do DNA/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/patologia
13.
Bioinformatics ; 29(1): 1-7, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23129298

RESUMO

MOTIVATION: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. RESULTS: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. AVAILABILITY: https://github.com/arq5x/bits.


Assuntos
Algoritmos , Genômica/métodos , Método de Monte Carlo , Alinhamento de Sequência , Análise de Sequência de DNA
14.
Sci Rep ; 14(1): 3432, 2024 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-38341450

RESUMO

Many nocturnally active fireflies use precisely timed bioluminescent patterns to identify mates, making them especially vulnerable to light pollution. As urbanization continues to brighten the night sky, firefly populations are under constant stress, and close to half of the species are now threatened. Ensuring the survival of firefly biodiversity depends on a large-scale conservation effort to monitor and protect thousands of populations. While species can be identified by their flash patterns, current methods require expert measurement and manual classification and are infeasible given the number and geographic distribution of fireflies. Here we present the application of a recurrent neural network (RNN) for accurate automated firefly flash pattern classification. Using recordings from commodity cameras, we can extract flash trajectories of individuals within a swarm and classify their species with an accuracy of approximately seventy percent. In addition to its potential in population monitoring, automated classification provides the means to study firefly behavior at the population level. We employ the classifier to measure and characterize the variability within and between swarms, unlocking a new dimension of their behavior. Our method is open source, and deployment in community science applications could revolutionize our ability to monitor and understand firefly populations.


Assuntos
Vaga-Lumes , Comportamento Sexual Animal , Humanos , Animais
15.
Front Genet ; 12: 639355, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33732289

RESUMO

Genomic structural variants (SVs) are a major source of genetic and phenotypic variation but have not been investigated systematically in rainbow trout (Oncorhynchus mykiss), an important aquaculture species of cold freshwater. The objectives of this study were 1) to identify and validate high-confidence SVs in rainbow trout using whole-genome re-sequencing; and 2) to examine the contribution of transposable elements (TEs) to SVs in rainbow trout. A total of 96 rainbow trout, including 11 homozygous lines and 85 outbred fish from three breeding populations, were whole-genome sequenced with an average genome coverage of 17.2×. Putative SVs were identified using the program Smoove which integrates LUMPY and other associated tools into one package. After rigorous filtering, 13,863 high-confidence SVs were identified. Pacific Biosciences long-reads of Arlee, one of the homozygous lines used for SV detection, validated 98% (3,948 of 4,030) of the high-confidence SVs identified in the Arlee homozygous line. Based on principal component analysis, the 85 outbred fish clustered into three groups consistent with their populations of origin, further indicating that the high-confidence SVs identified in this study are robust. The repetitive DNA content of the high-confidence SV sequences was 86.5%, which is much higher than the 57.1% repetitive DNA content of the reference genome, and is also higher than the repetitive DNA content of Atlantic salmon SVs reported previously. TEs thus contribute substantially to SVs in rainbow trout as TEs make up the majority of repetitive sequences. Hundreds of the high-confidence SVs were annotated as exon-loss or gene-fusion variants, and may have phenotypic effects. The high-confidence SVs reported in this study provide a foundation for further rainbow trout SV studies.

16.
Genome Biol ; 22(1): 161, 2021 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-34034781

RESUMO

Visual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot .


Assuntos
Variação Estrutural do Genoma , Software , Automação , Inversão Cromossômica , Duplicação Gênica , Reprodutibilidade dos Testes , Translocação Genética
17.
Front Genet ; 11: 152, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32194629

RESUMO

SUMMARY: Genotype Query Tools (GQT) were developed to discover disease-causing variations from billions of genotypes and millions of genomes, processes data at substantially higher speed over other existing methods. While GQT has been available to a wide audience as command-line software, the difficulty of constructing queries among non-IT or non-bioinformatics researchers has limited its applicability. To overcome this limitation, we developed webGQT, an easy-to-use tool with a graphical user interface. With pre-built queries across three modules, webGQT allows for pedigree analysis, case-control studies, and population frequency studies. As a package, webGQT allows researchers with less or no applied bioinformatics/IT experience to mine potential disease-causing variants from billions. RESULTS: webGQT offers a flexible and easy-to-use interface for model-based candidate variant filtering for Mendelian diseases from thousands to millions of genomes at a reduced computation time. Additionally, webGQT provides adjustable parameters to reduce false positives and rescue missing genotypes across all modules. Using a case study, we demonstrate the applicability of webGQT to query non-human genomes. In addition, we demonstrate the scalability of webGQT on large data sets by implementing complex population-specific queries on the 1000 Genomes Project Phase 3 data set, which includes 8.4 billion variants from 2504 individuals across 26 different populations. Furthermore, webGQT supports filtering single-nucleotide variants, short insertions/deletions, copy number or any other variant genotypes supported by the VCF specification. Our results show that webGQT can be used as an online web service, or deployed on personal computers or local servers within research groups. AVAILABILITY: webGQT is made available to the users in three forms: 1) as a webserver available at https://vm1138.kaj.pouta.csc.fi/webgqt/, 2) as an R package to install on personal computers, and 3) as part of the same R package to configure on the user's own servers. The application is available for installation at https://github.com/arumds/webgqt.

18.
Nat Commun ; 11(1): 5176, 2020 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-33056985

RESUMO

Structural variants (SVs) are a major source of genetic and phenotypic variation, but remain challenging to accurately type and are hence poorly characterized in most species. We present an approach for reliable SV discovery in non-model species using whole genome sequencing and report 15,483 high-confidence SVs in 492 Atlantic salmon (Salmo salar L.) sampled from a broad phylogeographic distribution. These SVs recover population genetic structure with high resolution, include an active DNA transposon, widely affect functional features, and overlap more duplicated genes retained from an ancestral salmonid autotetraploidization event than expected. Changes in SV allele frequency between wild and farmed fish indicate polygenic selection on behavioural traits during domestication, targeting brain-expressed synaptic networks linked to neurological disorders in humans. This study offers novel insights into the role of SVs in genome evolution and the genetic architecture of domestication traits, along with resources supporting reliable SV discovery in non-model species.


Assuntos
Animais Selvagens/genética , Domesticação , Genoma , Variação Estrutural do Genoma , Salmo salar/genética , Animais , Elementos de DNA Transponíveis/genética , Pesqueiros , Duplicação Gênica , Frequência do Gene , Variação Genética , Genética Populacional , Técnicas de Genotipagem , Masculino , Anotação de Sequência Molecular , Filogeografia , Sequenciamento Completo do Genoma , Fluxo de Trabalho
19.
Nat Genet ; 51(1): 88-95, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30531870

RESUMO

Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.


Assuntos
Genoma Humano/genética , Fases de Leitura Aberta/genética , Mapeamento Cromossômico/métodos , Deficiências do Desenvolvimento/genética , Humanos , Mutação/genética
20.
Gigascience ; 7(7)2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29860504

RESUMO

SV-plaudit is a framework for rapidly curating structural variant (SV) predictions. For each SV, we generate an image that visualizes the coverage and alignment signals from a set of samples. Images are uploaded to our cloud framework where users assess the quality of each image using a client-side web application. Reports can then be generated as a tab-delimited file or annotated Variant Call Format (VCF) file. As a proof of principle, nine researchers collaborated for 1 hour to evaluate 1,350 SVs each. We anticipate that SV-plaudit will become a standard step in variant calling pipelines and the crowd-sourced curation of other biological results.Code available at https://github.com/jbelyeu/SV-plauditDemonstration video available at https://www.youtube.com/watch?v=ono8kHMKxDs.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Informática Médica/métodos , Alinhamento de Sequência , Análise de Sequência de DNA , Reações Falso-Positivas , Variação Genética , Genoma Humano , Humanos , Internet , Software
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa