Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 51
Filter
1.
J Am Med Inform Assoc ; 27(9): 1425-1430, 2020 09 01.
Article in English | MEDLINE | ID: mdl-32719837

ABSTRACT

OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?


Subject(s)
Cloud Computing , Genome-Wide Association Study , Cloud Computing/economics , Computer Communication Networks , Cost-Benefit Analysis , Genome-Wide Association Study/economics , Genome-Wide Association Study/methods , Genomics/methods , Humans
2.
Genet Epidemiol ; 44(6): 537-549, 2020 09.
Article in English | MEDLINE | ID: mdl-32519380

ABSTRACT

A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.


Subject(s)
Genome, Human , Genome-Wide Association Study , Whole Genome Sequencing , Cost-Benefit Analysis , Gene Frequency/genetics , Genome-Wide Association Study/economics , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Whole Genome Sequencing/economics
3.
Genet Epidemiol ; 41(7): 587-598, 2017 11.
Article in English | MEDLINE | ID: mdl-28726280

ABSTRACT

Increasing evidence has shown that genes may cause prenatal, neonatal, and pediatric diseases depending on their parental origins. Statistical models that incorporate parent-of-origin effects (POEs) can improve the power of detecting disease-associated genes and help explain the missing heritability of diseases. In many studies, children have been sequenced for genome-wide association testing. But it may become unaffordable to sequence their parents and evaluate POEs. Motivated by the reality, we proposed a budget-friendly study design of sequencing children and only genotyping their parents through single nucleotide polymorphism array. We developed a powerful likelihood-based method, which takes into account both sequence reads and linkage disequilibrium to infer the parental origins of children's alleles and estimate their POEs on the outcome. We evaluated the performance of our proposed method and compared it with an existing method using only genotypes, through extensive simulations. Our method showed higher power than the genotype-based method. When either the mean read depth or the pair-end length was reasonably large, our method achieved ideal power. When single parents' genotypes were unavailable or parental genotypes at the testing locus were not typed, both methods lost power compared with when complete data were available; but the power loss from our method was smaller than the genotype-based method. We also extended our method to accommodate mixed genotype, low-, and high-coverage sequence data from children and their parents. At presence of sequence errors, low-coverage parental sequence data may lead to lower power than parental genotype data.


Subject(s)
DNA Mutational Analysis/methods , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study/methods , Models, Genetic , Research Design , Alleles , Child , Computer Simulation , DNA Mutational Analysis/economics , Female , Genome-Wide Association Study/economics , Humans , Likelihood Functions , Linkage Disequilibrium , Male , Nuclear Family , Pedigree
4.
Med Sci (Paris) ; 32(5): 519-22, 2016 May.
Article in French | MEDLINE | ID: mdl-27225928

ABSTRACT

A Genome Wide Association Study on propensity to motion sickness published by 23andMe gives interesting results, shows validity for self-reported phenotypic information and underlines the value of the model developed by the company for customer participation in genetic studies.


Subject(s)
Drug Industry , Genome-Wide Association Study , Motion Sickness/genetics , Self Report , Commerce , Drug Industry/economics , Genetic Testing/economics , Genome-Wide Association Study/economics , Genome-Wide Association Study/methods , Genome-Wide Association Study/trends , Humans , Models, Economic , Motion Sickness/epidemiology , Patient Participation , Polymorphism, Single Nucleotide
5.
Article in English | MEDLINE | ID: mdl-26253094

ABSTRACT

New sequencing methods capable of rapidly analyzing the genome at increasing resolution have transformed diagnosis of single-gene or oligogenic genetic disorders in pediatric and adult medicine. Targeted tests, consisting of disease-focused multigene panels and diagnostic exome sequencing to interrogate the sequence of the coding regions of nearly all genes, are now clinically offered when there is suspicion for an undiagnosed genetic disorder or cancer in children and adults. Implementation of diagnostic exome and genome sequencing tests on invasively and noninvasively obtained fetal DNA samples for prenatal genetic diagnosis is also being explored. We predict that they will become more widely integrated into prenatal care in the near future. Providers must prepare for the practical, ethical, and societal dilemmas that accompany the capacity to generate and analyze large amounts of genetic information about the fetus during pregnancy.


Subject(s)
Fetal Diseases/diagnosis , Genetic Diseases, Inborn/diagnosis , Genome, Human/genetics , Genome-Wide Association Study/methods , Prenatal Diagnosis/methods , Amniotic Fluid/chemistry , Chorionic Villi Sampling/economics , Chorionic Villi Sampling/methods , Confidentiality , Exome/genetics , Female , Fetal Diseases/economics , Fetal Diseases/genetics , Genetic Counseling/economics , Genetic Counseling/ethics , Genetic Diseases, Inborn/genetics , Genetic Testing/economics , Genetic Testing/methods , Genetic Variation/genetics , Genome-Wide Association Study/economics , Humans , Incidental Findings , Informed Consent , Mutation/genetics , Patient Satisfaction , Phenotype , Pregnancy , Prenatal Diagnosis/economics , Sequence Analysis, DNA/economics , Sequence Analysis, DNA/methods
7.
PLoS One ; 10(3): e0119096, 2015.
Article in English | MEDLINE | ID: mdl-25763822

ABSTRACT

Case-control association studies often suffer from population stratification bias. A previous triple combination strategy of stratum matching, genomic controlling, and multiple DNA pooling can correct the bias and save genotyping cost. However the method requires researchers to prepare a multitude of DNA pools-more than 30 case-control pooling sets in total (polyset). In this paper, the authors propose a permutation test for oligoset DNA pooling studies. Monte-Carlo simulations show that the proposed test has a type I error rate under control and a power comparable to that of individual genotyping. For a researcher on a tight budget, oligoset DNA pooling is a viable option.


Subject(s)
DNA/genetics , Genome-Wide Association Study/economics , Genome-Wide Association Study/methods , Algorithms , Case-Control Studies , Genotype , Humans , Models, Genetic , Monte Carlo Method
8.
Am J Hum Biol ; 27(3): 295-303, 2015.
Article in English | MEDLINE | ID: mdl-25711975

ABSTRACT

The study of epigenetics, or chemical modifications to the genome that may alter gene expression, is a growing area of interest for social scientists. Anthropologists and human biologists are interested in epigenetics specifically, as it provides a potential link between the environment and the genome, as well as a new layer of complexity for the study of human biological variation. In pace with the rapid increase in interest in epigenetic research, the range of methods has greatly expanded over the past decade. The primary objective of this article is to provide an overview of the current methods for assaying DNA methylation, the most commonly studied epigenetic modification. We will address considerations for all steps required to plan and conduct an analysis of DNA methylation, from appropriate sample collection, to the most commonly used methods for laboratory analyses of locus-specific and genome-wide approaches, and recommendations for statistical analyses. Key challenges in the study of DNA methylation are also discussed, including tissue specificity, the stability of measures, timing of sample collection, statistical considerations, batch effects, and challenges related to analysis and interpretation of data. Our hope is that this review serves as a primer for anthropologists and human biologists interested in incorporating epigenetic data into their research programs.


Subject(s)
DNA Methylation , Epigenomics/methods , Genetic Techniques/instrumentation , Epigenesis, Genetic/physiology , Gene Expression/physiology , Genetic Techniques/economics , Genome-Wide Association Study/economics , Genome-Wide Association Study/instrumentation , Humans , Specimen Handling/methods
9.
Orphanet J Rare Dis ; 10: 10, 2015 Feb 04.
Article in English | MEDLINE | ID: mdl-25648394

ABSTRACT

High throughput assays tend to be expensive per subject. Often studies are limited not so much by the number of subjects available as by assay costs, making assay choice a critical issue. We have developed a framework for assay choice that maximises the number of true disease causing mechanisms 'seen', given limited resources. Although straightforward, some of the ramifications of our methodology run counter to received wisdom on study design. We illustrate our methodology with examples, and have built a website allowing calculation of quantities of interest to those designing rare disease studies.


Subject(s)
Genome-Wide Association Study/economics , Rare Diseases/diagnosis , Rare Diseases/genetics , Cost-Benefit Analysis , Genetic Predisposition to Disease , Humans , Mutation
10.
J Dev Orig Health Dis ; 6(1): 10-6, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25315715

ABSTRACT

Analysis of DNA methylation data in epigenome-wide association studies provides many bioinformatics and statistical challenges. Not least of these, are the non-independence of individual DNA methylation marks from each other, from genotype and from technical sources of variation. In this review we discuss DNA methylation data from the Infinium450K array and processing methodologies to reduce technical variation. We describe recent approaches to harness the concordance of neighbouring DNA methylation values to improve power in association studies. We also describe how the non-independence of genotype and DNA methylation has been used to infer causality (in the case of Mendelian randomization approaches); suggest the mediating effect of DNA methylation in linking intergenic single nucleotide polymorphisms, identified in genome-wide association studies, to phenotype; and to uncover the widespread influence of gene and environment interactions on methylation levels.


Subject(s)
DNA Methylation , Epigenesis, Genetic , Gene-Environment Interaction , Genome-Wide Association Study/methods , Cytosine/analysis , Cytosine Nucleotides/genetics , Data Interpretation, Statistical , Genome-Wide Association Study/economics , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Sulfites/analysis
12.
Eur J Hum Genet ; 23(7): 975-83, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25293720

ABSTRACT

The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies.


Subject(s)
Genome, Human/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Cost-Benefit Analysis , Gene Frequency , Genetics, Population , Genome-Wide Association Study/economics , Genotype , Haplotypes , Humans , Italy , Minnesota , Research Design , Sequence Analysis, DNA/economics , White People/genetics
13.
Int J Environ Res Public Health ; 11(12): 12283-303, 2014 Nov 28.
Article in English | MEDLINE | ID: mdl-25464127

ABSTRACT

Longitudinal data enables detecting the effect of aging/time, and as a repeated measures design is statistically more efficient compared to cross-sectional data if the correlations between repeated measurements are not large. In particular, when genotyping cost is more expensive than phenotyping cost, the collection of longitudinal data can be an efficient strategy for genetic association analysis. However, in spite of these advantages, genome-wide association studies (GWAS) with longitudinal data have rarely been analyzed taking this into account. In this report, we calculate the required sample size to achieve 80% power at the genome-wide significance level for both longitudinal and cross-sectional data, and compare their statistical efficiency. Furthermore, we analyzed the GWAS of eight phenotypes with three observations on each individual in the Korean Association Resource (KARE). A linear mixed model allowing for the correlations between observations for each individual was applied to analyze the longitudinal data, and linear regression was used to analyze the first observation on each individual as cross-sectional data. We found 12 novel genome-wide significant disease susceptibility loci that were then confirmed in the Health Examination cohort, as well as some significant interactions between age/sex and SNPs.


Subject(s)
Genome-Wide Association Study/methods , Genotype , Polymorphism, Single Nucleotide , Adult , Aging , Asian People , Epigenesis, Genetic , Female , Gene Expression Regulation , Genome-Wide Association Study/economics , Humans , Linkage Disequilibrium , Longitudinal Studies , Male , Middle Aged , Republic of Korea , Research Design , Sex Factors
14.
Curr Mol Med ; 14(7): 833-40, 2014.
Article in English | MEDLINE | ID: mdl-25109794

ABSTRACT

A new standard for medicine is emerging that aims to improve individual drug responses through studying associations with genetic variations. This field, pharmacogenomics, is undergoing a rapid expansion due to a variety of technological advancements that are enabling higher throughput with reductions in cost. Here we review the advantages, limitations, and opportunities for using lymphoblastoid cell lines (LCL) as a model system for human pharmacogenomic studies. There are a wide range of publicly available resources with genome-wide data available for LCLs from both related and unrelated populations, removing the cost of genotyping the data for drug response studies. Furthermore, in contrast to human clinical trials or in vivo model systems, with high-throughput in vitro screening technologies, pharmacogenomics studies can easily be scaled to accommodate large sample sizes. An important component to leveraging genome-wide data in LCL models is association mapping. Several methods are discussed herein, and include multivariate concentration response modeling, issues with multiple testing, and successful examples of the 'triangle model' to identify candidate variants. Once candidate gene variants have been determined, their biological roles can be elucidated using pathway analyses and functionally confirmed using siRNA knockdown experiments. The wealth of genomics data being produced using related and unrelated populations is creating many exciting opportunities leading to new insights into the genetic contribution and heritability of drug response.


Subject(s)
Genome-Wide Association Study/methods , Lymphocytes/drug effects , Pharmacogenetics/methods , Polymorphism, Single Nucleotide , Cell Line, Transformed , Cost-Benefit Analysis , Genome-Wide Association Study/economics , Humans , Lymphocytes/metabolism , Models, Biological , Models, Genetic , Pharmacogenetics/economics , RNA Interference , Reproducibility of Results , Translational Research, Biomedical/economics , Translational Research, Biomedical/methods
15.
Clin Chem ; 60(5): 724-33, 2014 May.
Article in English | MEDLINE | ID: mdl-24227285

ABSTRACT

BACKGROUND: Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) as a highly accessible clinical test for numerous indications feasible. There have been many recent, successful applications of WGS in establishing the etiology of complex diseases and guiding therapeutic decision-making in neoplastic and nonneoplastic diseases and in various aspects of reproductive health. However, there are major, but not insurmountable, obstacles to the increased clinical implementation of WGS, such as hidden costs, issues surrounding sequencing and analysis, quality assurance and standardization protocols, ethical dilemmas, and difficulties with interpretation of the results. CONTENT: The widespread use of WGS in routine clinical practice remains a distant proposition. Prospective trials will be needed to establish if, and for whom, the benefits of WGS will outweigh the likely substantial costs associated with follow-up tests, the risks of overdiagnosis and overtreatment, and the associated emotional distress. SUMMARY: WGS should be carefully implemented in the clinic to allow the realization of its potential to improve patient health in specific indications. To minimize harm the use of WGS for all other reasons must be carefully evaluated before clinical implementation.


Subject(s)
Disease/genetics , Genetic Testing/methods , Genome, Human/genetics , Genome-Wide Association Study/methods , Costs and Cost Analysis , Genetic Predisposition to Disease , Genetic Testing/economics , Genetic Testing/ethics , Genetic Testing/standards , Genome-Wide Association Study/economics , Genome-Wide Association Study/ethics , Genome-Wide Association Study/standards , Humans , Quality Control
16.
Eur J Hum Genet ; 21 Suppl 1: S6-26, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23677179

ABSTRACT

Sequencing an individual's complete genome is expected to be possible for a relatively low sum 'one thousand dollars' within a few years. Sequencing refers to determining the order of base pairs that make up the genome. The result is a library of three billion letter combinations. Cheap whole-genome sequencing is of greatest importance to medical scientific research. Comparing individual complete genomes will lead to a better understanding of the contribution genetic variation makes to health and disease. As knowledge increases, the 'thousand-dollar genome' will also become increasingly important to healthcare. The applications that come within reach raise a number of ethical questions. This monitoring report addresses the issue.


Subject(s)
Genomics/ethics , Sequence Analysis, DNA/economics , Sequence Analysis, DNA/ethics , Adult , Ethics, Medical , Female , Genome-Wide Association Study/economics , Genome-Wide Association Study/ethics , Humans , Male , Netherlands , Oligonucleotide Array Sequence Analysis , Pregnancy , Prenatal Diagnosis/ethics
19.
Methods Inf Med ; 52(1): 91-5, 2013.
Article in English | MEDLINE | ID: mdl-23223640

ABSTRACT

BACKGROUND: Until recently, genotype studies were limited to the investigation of single SNP effects due to the computational burden incurred when studying pairwise interactions of SNPs. However, some genetic effects as simple as coloring (in plants and animals) cannot be ascribed to a single locus but only understood when epistasis is taken into account [1]. It is expected that such effects are also found in complex diseases where many genes contribute to the clinical outcome of affected individuals. Only recently have such problems become feasible computationally. OBJECTIVES: The inherently parallel structure of the problem makes it a perfect candidate for massive parallelization on either grid or cloud architectures. Since we are also dealing with confidential patient data, we were not able to consider a cloud-based solution but had to find a way to process the data in-house and aimed to build a local GPU-based grid structure. METHODS: Sequential epistatsis calculations were ported to GPU using CUDA at various levels. Parallelization on the CPU was compared to corresponding GPU counterparts with regards to performance and cost. RESULTS: A cost-effective solution was created by combining custom-built nodes equipped with relatively inexpensive consumer-level graphics cards with highly parallel GPUs in a local grid. The GPU method outperforms current cluster-based systems on a price/performance criterion, as a single GPU shows speed performance comparable up to 200 CPU cores. CONCLUSION: The outlined approach will work for problems that easily lend themselves to massive parallelization. Code for various tasks has been made available and ongoing development of tools will further ease the transition from sequential to parallel algorithms.


Subject(s)
Computer Communication Networks/economics , Computing Methodologies , Epistasis, Genetic , Genome-Wide Association Study/economics , Software/economics , Computer Systems/economics , Cost-Benefit Analysis/economics , Genetic Privacy/economics , Germany , Humans
20.
Anim Sci J ; 83(11): 719-26, 2012 Nov.
Article in English | MEDLINE | ID: mdl-23126324

ABSTRACT

Genome-wide association mapping for complex traits in cattle populations is a powerful, but expensive, selection tool. The DNA pooling technique can potentially reduce the cost of genome-wide association studies. However, in DNA pooling design, the additional variance generated by pooling-specific errors must be taken into account. Therefore, this study aimed to investigate factors such as: (i) the accuracy of allele frequency estimation; (ii) the magnitude of errors in pooling construction and in the array; and (iii) the effect of the number of replicate arrays on P-values estimated by a genome-wide association study. Results showed that the Illumina correction method is the most effective method to correct the allele frequency estimation; pooling errors, especially array variance, should be taken into account in DNA pooling design; and the risk of a type I error can be reduced by using at least two replicate arrays. These results indicate the practical capability and cost-effectiveness of pool-based genome-wide association studies using the BovineSNP50 array in a cattle population.


Subject(s)
Cattle/genetics , DNA/genetics , Gene Pool , Genome-Wide Association Study/economics , Genome-Wide Association Study/methods , Oligonucleotide Array Sequence Analysis/methods , Polymorphism, Single Nucleotide/genetics , Animals , Cost-Benefit Analysis , Gene Frequency
SELECTION OF CITATIONS
SEARCH DETAIL
...