Pesquisa | BVS - MINISTÉRIO DA SAÚDE

A cost-effective sequencing method for genetic studies combining high-depth whole exome and low-depth whole genome.

Bhérer, Claude; Eveleigh, Robert; Trajanoska, Katerina; St-Cyr, Janick; Paccard, Antoine; Nadukkalam Ravindran, Praveen; Caron, Elizabeth; Bader Asbah, Nimara; McClelland, Peyton; Wei, Clare; Baumgartner, Iris; Schindewolf, Marc; Döring, Yvonne; Perley, Danielle; Lefebvre, François; Lepage, Pierre; Bourgey, Mathieu; Bourque, Guillaume; Ragoussis, Jiannis; Mooser, Vincent; Taliun, Daniel.

NPJ Genom Med ; 9(1): 8, 2024 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-38326393

RESUMO

Whole genome sequencing (WGS) at high-depth (30X) allows the accurate discovery of variants in the coding and non-coding DNA regions and helps elucidate the genetic underpinnings of human health and diseases. Yet, due to the prohibitive cost of high-depth WGS, most large-scale genetic association studies use genotyping arrays or high-depth whole exome sequencing (WES). Here we propose a cost-effective method which we call "Whole Exome Genome Sequencing" (WEGS), that combines low-depth WGS and high-depth WES with up to 8 samples pooled and sequenced simultaneously (multiplexed). We experimentally assess the performance of WEGS with four different depth of coverage and sample multiplexing configurations. We show that the optimal WEGS configurations are 1.7-2.0 times cheaper than standard WES (no-plexing), 1.8-2.1 times cheaper than high-depth WGS, reach similar recall and precision rates in detecting coding variants as WES, and capture more population-specific variants in the rest of the genome that are difficult to recover when using genotype imputation methods. We apply WEGS to 862 patients with peripheral artery disease and show that it directly assesses more known disease-associated variants than a typical genotyping array and thousands of non-imputable variants per disease-associated locus.

RADProc: A computationally efficient de novo locus assembler for population studies using RADseq data.

Nadukkalam Ravindran, Praveen; Bentzen, Paul; Bradbury, Ian R; Beiko, Robert G.

Mol Ecol Resour ; 19(1): 272-282, 2019 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-30312001

RESUMO

Restriction site-associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly vary between data sets and can be difficult and computationally expensive to determine. Here, we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence-comparison results in a graph eliminates unnecessary and redundant sequence similarity calculations. De novo locus formation for a given parameter set can be performed on the precomputed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide-distance calculation. The performance of RADProc compares favourably with that of the widely used Stacks software. The run-time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green-crab (Carcinus maenas) samples showed that RADProc took as little as 2 hr 40 min compared to 78 hr by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 and 263 hr, respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci.

Assuntos

Biologia Computacional/métodos , Loci Gênicos , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Braquiúros/genética

PMERGE: Computational filtering of paralogous sequences from RAD-seq data.

Nadukkalam Ravindran, Praveen; Bentzen, Paul; Bradbury, Ian R; Beiko, Robert G.

Ecol Evol ; 8(14): 7002-7013, 2018 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-30073062

RESUMO

Restriction-site associated DNA sequencing (RAD-seq) can identify and score thousands of genetic markers from a group of samples for population-genetics studies. One challenge of de novo RAD-seq analysis is to distinguish paralogous sequence variants (PSVs) from true single-nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network-based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD-seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD-seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package.

RAD sequencing reveals genomewide divergence between independent invasions of the European green crab (Carcinus maenas) in the Northwest Atlantic.

Jeffery, Nicholas W; DiBacco, Claudio; Van Wyngaarden, Mallory; Hamilton, Lorraine C; Stanley, Ryan R E; Bernier, Renée; FitzGerald, Jennifer; Matheson, K; McKenzie, C H; Nadukkalam Ravindran, Praveen; Beiko, Robert; Bradbury, Ian R.

Ecol Evol ; 7(8): 2513-2524, 2017 04.

Artigo em Inglês | MEDLINE | ID: mdl-28428843

RESUMO

Genomic studies of invasive species can reveal both invasive pathways and functional differences underpinning patterns of colonization success. The European green crab (Carcinus maenas) was initially introduced to eastern North America nearly 200 years ago where it expanded northwards to eastern Nova Scotia. A subsequent invasion to Nova Scotia from a northern European source allowed further range expansion, providing a unique opportunity to study the invasion genomics of a species with multiple invasions. Here, we use restriction-site-associated DNA sequencing-derived SNPs to explore fine-scale genomewide differentiation between these two invasions. We identified 9137 loci from green crab sampled from 11 locations along eastern North America and compared spatial variation to mitochondrial COI sequence variation used previously to characterize these invasions. Overall spatial divergence among invasions was high (pairwise FST ~0.001 to 0.15) and spread across many loci, with a mean FST ~0.052 and 52% of loci examined characterized by FST values >0.05. The majority of the most divergent loci (i.e., outliers, ~1.2%) displayed latitudinal clines in allele frequency highlighting extensive genomic divergence among the invasions. Discriminant analysis of principal components (both neutral and outlier loci) clearly resolved the two invasions spatially and was highly correlated with mitochondrial divergence. Our results reveal extensive cryptic intraspecific genomic diversity associated with differing patterns of colonization success and demonstrates clear utility for genomic approaches to delineating the distribution and colonization success of aquatic invasive species.

megasat: automated inference of microsatellite genotypes from sequence data.

Zhan, Luyao; Paterson, Ian G; Fraser, Bonnie A; Watson, Beth; Bradbury, Ian R; Nadukkalam Ravindran, Praveen; Reznick, David; Beiko, Robert G; Bentzen, Paul.

Mol Ecol Resour ; 17(2): 247-256, 2017 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-27333119

RESUMO

megasat is software that enables genotyping of microsatellite loci using next-generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. megasat reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts, including nontarget amplification products, replication slippage during PCR (amplification stutter) and differential amplification of alleles. An important feature of megasat is the generation of histograms of the length-frequency distributions of amplification products for each locus and each individual. These histograms, analogous to electropherograms traditionally used to score microsatellite genotypes, enable rapid evaluation and editing of automatically scored genotypes. megasat is written in Perl, runs on Windows, Mac OS X and Linux systems, and includes a simple graphical user interface. We demonstrate megasat using data from guppy, Poecilia reticulata. We genotype 1024 guppies at 43 microsatellites per run on an Illumina MiSeq sequencer. We evaluated the accuracy of automatically called genotypes using two methods, based on pedigree and repeat genotyping data, and obtained estimates of mean genotyping error rates of 0.021 and 0.012. In both estimates, three loci accounted for a disproportionate fraction of genotyping errors; conversely, 26 loci were scored with 0-1 detected error (error rate ≤0.007). Our results show that with appropriate selection of loci, automated genotyping of microsatellite loci can be achieved with very high throughput, low genotyping error and very low genotyping costs.

Assuntos

Biologia Computacional/métodos , Genótipo , Técnicas de Genotipagem/métodos , Repetições de Microssatélites , Técnicas de Amplificação de Ácido Nucleico/métodos , Análise de Sequência de DNA/métodos , Animais , Poecilia/classificação , Poecilia/genética , Software

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA