RESUMO
Whole-exome sequencing (WES) and whole-genome sequencing (WGS) studies are underway to investigate the impact of genetic variants on complex diseases and traits. It is customary to perform single-variant association tests for common variants and region-based association tests for rare variants. The latter may target variants with similar or opposite effects, interrogate variants with different frequencies or different functional annotations, and examine a variety of regions. The large number of tests that are performed necessitates adjustment for multiple testing. The conventional Bonferroni correction is overly conservative as the test statistics are correlated. To address this challenge, we propose a simple and accurate method based on parametric bootstrap to assess genomewide significance. We show that the correlations of the test statistics are determined primarily by the genotypes, such that the same significance threshold can be used in different studies that share a common sequencing platform. We demonstrate the usefulness of the proposed method with WES data from the National Heart, Lung, and Blood Institute Exome Sequencing Project and WGS data from the 1000 Genomes Project. We recommend the p value of 5×10-9 as the genomewide significance threshold for testing all common and low-frequency variants (MAFs ≥ 0.1%) in the human genome.
Assuntos
Estudos de Associação Genética , Genoma Humano/genética , Sequenciamento Completo do Genoma , Exoma , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Estudos de Associação Genética/estatística & dados numéricos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Modelos Teóricos , Fenótipo , Polimorfismo de Nucleotídeo Único , Guias de Prática Clínica como Assunto , Reprodutibilidade dos Testes , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/estatística & dados numéricosRESUMO
Von Willebrand disease (VWD) constitutes the most common inherited human bleeding disorder. It is associated with a mucocutaneous bleeding phenotype that can significantly impact upon quality of life. Despite its prevalence and associated morbidity, the diagnosis and subclassification of VWD continue to pose significant clinical challenges. This is in part attributable to the fact that plasma von Willebrand factor (VWF) levels vary over a wide range in the normal population, together with the multiple different physiological functions played by VWF in vivo. Over recent years, substantial progress has been achieved in elucidating the biological roles of VWF. Significant advances have also been made into defining the pathophysiological mechanisms underpinning both quantitative and qualitative VWD. In particular, several new laboratory assays have been developed that enable more precise assessment of specific aspects of VWF activity. In the present review, we discuss these recent developments in the field of VWD diagnosis, and consider how these advances can impact upon clinical diagnostic algorithms for use in routine clinical practice. In addition, we review some important recent advances pertaining to the various treatment options available for managing patients with VWD.
Assuntos
Doenças de von Willebrand/diagnóstico , Doenças de von Willebrand/terapia , Biomarcadores , Tomada de Decisão Clínica , Terapia Combinada , Gerenciamento Clínico , Suscetibilidade a Doenças , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Predisposição Genética para Doença , Genótipo , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Fenótipo , Resultado do Tratamento , Doenças de von Willebrand/etiologiaRESUMO
Massively parallel whole-genome sequencing (WGS) data have ushered in a new era in human genetics. These data are now being used to understand the role of rare variants in complex traits and to advance the goals of precision medicine. The technological and computing advances that have enabled us to generate WGS data on thousands of individuals have also outpaced our ability to perform analyses in scientifically and statistically rigorous and thoughtful ways. The past several years have witnessed the application of whole-exome sequencing (WES) to complex traits and diseases. From our analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but our collective experience offers some valuable lessons for WGS initiatives. These include caveats associated with generating automated pipelines for quality control and analysis of rare variants; the importance of studying minority populations; sample size requirements and efficient study designs for identifying rare-variant associations; and the significance of incidental findings in population-based genetic research. With the ESP as an example, we offer guidance and a framework on how to conduct a large-scale association study in the era of WGS.
Assuntos
Exoma/genética , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , National Heart, Lung, and Blood Institute (U.S.) , Feminino , Variação Genética , Genoma Humano/genética , Guias como Assunto , Humanos , Masculino , Controle de Qualidade , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Estados UnidosRESUMO
BACKGROUND: Copy number variation (CNV) analysis is an integral component of the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome. METHODS: We compared the CNV detection capabilities of WGS strategies (short insert, 3 kb insert mate pair and 5 kb insert mate pair) each at 1×, 3× and 5× coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of gold standard (GS) CNVs generated for the 1000 Genomes Project CEU subject NA12878. RESULTS: Overall, low-coverage WGS strategies detect drastically more GS CNVs compared with arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1× coverage) is able to detect all seven GS deletion CNVs >100 kb in NA12878, whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri du chat deletion can be readily detected with short-insert paired-end WGS at even just 1× coverage. CONCLUSIONS: CNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.
Assuntos
Hibridização Genômica Comparativa , Variações do Número de Cópias de DNA , Genoma Humano , Genômica , Sequenciamento Completo do Genoma , Hibridização Genômica Comparativa/métodos , Hibridização Genômica Comparativa/normas , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Predisposição Genética para Doença , Testes Genéticos , Genômica/métodos , Genômica/normas , Humanos , Padrões de Referência , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Despite the extensive discovery of disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants may explain additional disease risk or trait variability. Although sequencing technology provides a supreme opportunity to investigate the roles of rare variants in complex diseases, detection of these variants in sequencing-based association studies presents substantial challenges. In this article, we propose novel statistical tests to test the association between rare and common variants in a genomic region and a complex trait of interest based on cross-validation prediction error (PE). We first propose a PE method based on Ridge regression. Based on PE, we also propose another two tests PE-WS and PE-TOW by testing a weighted combination of variants with two different weighting schemes. PE-WS is the PE version of the test based on the weighted sum statistic (WS) and PE-TOW is the PE version of the test based on the optimally weighted combination of variants (TOW). Using extensive simulation studies, we are able to show that (1) PE-TOW and PE-WS are consistently more powerful than TOW and WS, respectively, and (2) PE is the most powerful test when causal variants contain both common and rare variants.
Assuntos
Estudos de Associação Genética/normas , Variação Genética/genética , Valor Preditivo dos Testes , Característica Quantitativa Herdável , Algoritmos , Simulação por Computador , Humanos , Modelos Genéticos , Fenótipo , Reprodutibilidade dos TestesRESUMO
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
Assuntos
Interpretação Estatística de Dados , Estudos de Associação Genética/métodos , Técnicas de Genotipagem/métodos , Modelos Genéticos , Linhagem , Análise de Sequência de DNA/métodos , Software , Simulação por Computador , Europa (Continente) , Estudos de Associação Genética/normas , Humanos , Análise de Componente PrincipalRESUMO
PurposeThe advent of next-generation sequencing resulted in substantial increases in the number of variants detected, interpreted, and reported by molecular genetics diagnostic laboratories. Recent publications have provided standards for the interpretation of sequence variants, but there are currently no standards regarding reinterpretation of these variants. Recognizing that significant changes in variant classification may occur over time, many genetics diagnostic laboratories have independently developed practices for variant reinterpretation. The purpose of this study is to describe our laboratory approach to variant reinterpretation.MethodsWe surveyed eight genetics diagnostic laboratories in Canada and the United States.ResultsEach laboratory had differing protocols, but most felt that clinically relevant changes to variant classifications should be communicated to ordering providers. Based on results of this survey and our experience, we developed a cost-effective and resource-efficient approach to variant reinterpretation.ConclusionOngoing variant reinterpretation is required to maintain the highest standards for delivering genetics laboratory services. Our approach to variant reinterpretation offers an efficient solution that does not compromise accuracy or timely delivery of genetics laboratory services.
Assuntos
Variação Genética , Anotação de Sequência Molecular/normas , Canadá , Comunicação , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Predisposição Genética para Doença , Testes Genéticos/normas , Guias como Assunto , Pesquisas sobre Atenção à Saúde , Humanos , Laboratórios , Estados Unidos , Fluxo de TrabalhoRESUMO
PurposeClinical genome sequencing produces uncertain diagnostic results, raising concerns about how to communicate the method's inherent complexities in ways that reduce potential misunderstandings and harm. This study investigates clinicians' communications and patient/participant responses to uncertain diagnostic results arising from a clinical exome sequencing research study, contributing empirical data to the debate surrounding disclosure of uncertain genomic information.MethodsWe investigated the communication and impact of uncertain diagnostic results using ethnographic observations of result disclosures with 21 adults and 11 parents of child patients, followed by two semistructured interviews with these same participants.ResultsParticipants understood their uncertain results in ways that were congruent with clinical geneticists' communications. They followed recommendations for further consultation, although family testing to resolve uncertainty was not always done. Participants were prepared for learning an uncertain result and grasped the key concept that it should not be used to guide health-care or other decisions. They did not express regret for having learned the uncertain result; most regarded it as potentially valuable in the future.ConclusionThis study suggests that uncertain diagnostic results from genome sequencing can be relayed to patients in ways they can understand and consistent with providers' interpretations, without causing undue harm.
Assuntos
Confiabilidade dos Dados , Estudos de Associação Genética/normas , Incerteza , Adulto , Idoso , Idoso de 80 Anos ou mais , Comunicação , Exoma , Feminino , Estudos de Associação Genética/métodos , Aconselhamento Genético , Testes Genéticos/normas , Humanos , Masculino , Pessoa de Meia-Idade , Participação do Paciente , Encaminhamento e Consulta , Sequenciamento do Exoma , Adulto JovemRESUMO
PurposeTo describe the frequency and nature of differences in variant classifications between clinicians and genetic testing laboratories.MethodsRetrospective review of variants identified through genetic testing ordered in routine clinical care by clinicians in the Stanford Center for Inherited Cardiovascular Disease. We compared classifications made by clinicians, the testing laboratory, and other laboratories in ClinVar.ResultsOf 688 laboratory classifications, 124 (18%) differed from the clinicians' classifications. Most differences in classification would probably affect clinical care of the patient and/or family (83%, 103/124). The frequency of discordant classifications differed depending on the testing laboratory (P < 0.0001) and the testing laboratory's classification (P < 0.00001). For the majority (82/124, 66%) of discordant classifications, clinicians were more conservative (less likely to classify a variant pathogenic or likely pathogenic). The clinicians' classification was discordant with one or more submitter in ClinVar in 49.1% (28/57) of cases, while the testing laboratory's classification was discordant with a ClinVar submitter in 82.5% of cases (47/57, P = 0.0002).ConclusionThe clinical team disagreed with the laboratory's classification at a rate similar to that of reported disagreements between laboratories. Most of this discordance was clinically significant, with clinicians tending to be more conservative than laboratories in their classifications.
Assuntos
Variação Genética , Genética Médica/normas , Laboratórios , Anotação de Sequência Molecular/normas , Médicos , Alelos , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Predisposição Genética para Doença , Testes Genéticos/métodos , Testes Genéticos/normas , Genética Médica/métodos , HumanosRESUMO
PurposeAs genome science advances, people receiving personalized genetic information may receive reinterpretations of pathogenicity. Little is known about responses to adjusted results. We examined how reinterpretations might affect attitudes about genetic testing and intentions to share results with family.MethodsData were collected from high-socioeconomic-status participants (n = 58) in a genome sequencing study. Twenty-nine originally learned they were carriers of Duarte variant galactosemia, based on a variant that was reclassified as benign. Positive testers (n = 19) had a newly identified causative variant and remained carriers. Negative testers (n = 10) learned they were no longer carriers. Twenty-nine controls were carriers for a disease of comparable severity with no reclassification. Participants completed baseline, immediate, and 3-month follow-up surveys.ResultsApproximately 80% of participants demonstrated complete or partially accurate recall of their results and reported positive or neutral reactions to their result and about genetic information more generally. Positive testers reported lower intentions to share the change in their result with family. Controls reported the lowest intentions to learn future results. There were no significant group differences or changes over time in perceived ambiguity or negative emotions.ConclusionThe results suggest that high-socioeconomic-status participants understand reinterpretations conferring a neutral change or a change from carrier to noncarrier status. Participants' responses to changes in carrier results for a low-risk condition indicated minimal adverse effects.
Assuntos
Estudos de Associação Genética , Predisposição Genética para Doença , Testes Genéticos , Variação Genética , Participação do Paciente , Estudos de Casos e Controles , Confiabilidade dos Dados , Emoções , Seguimentos , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Aconselhamento Genético , Testes Genéticos/métodos , Testes Genéticos/normas , Humanos , Intenção , Percepção , Inquéritos e QuestionáriosRESUMO
PurposeGenetic testing is an integral diagnostic component of pediatric medicine. Standard of care is often a time-consuming stepwise approach involving chromosomal microarray analysis and targeted gene sequencing panels, which can be costly and inconclusive. Whole-genome sequencing (WGS) provides a comprehensive testing platform that has the potential to streamline genetic assessments, but there are limited comparative data to guide its clinical use.MethodsWe prospectively recruited 103 patients from pediatric non-genetic subspecialty clinics, each with a clinical phenotype suggestive of an underlying genetic disorder, and compared the diagnostic yield and coverage of WGS with those of conventional genetic testing.ResultsWGS identified diagnostic variants in 41% of individuals, representing a significant increase over conventional testing results (24%; P = 0.01). Genes clinically sequenced in the cohort (n = 1,226) were well covered by WGS, with a median exonic coverage of 40 × ±8 × (mean ±SD). All the molecular diagnoses made by conventional methods were captured by WGS. The 18 new diagnoses made with WGS included structural and non-exonic sequence variants not detectable with whole-exome sequencing, and confirmed recent disease associations with the genes PIGG, RNU4ATAC, TRIO, and UNC13A.ConclusionWGS as a primary clinical test provided a higher diagnostic yield than conventional genetic testing in a clinically heterogeneous cohort.
Assuntos
Estudos de Associação Genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Testes Genéticos , Análise de Sequência de DNA , Sequenciamento Completo do Genoma , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Exoma , Feminino , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Testes Genéticos/métodos , Testes Genéticos/normas , Variação Genética , Humanos , Masculino , Anotação de Sequência Molecular , Fenótipo , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Sequenciamento do Exoma/métodos , Sequenciamento do Exoma/normas , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/normasRESUMO
Ascertaining a diagnosis through exome sequencing can provide potential benefits to patients, insurance companies, and the healthcare system. Yet, as diagnostic sequencing is increasingly employed, vast amounts of human genetic data are produced that need careful curation. We discuss methods for accurately assessing the clinical validity of gene-disease relationships to interpret new research findings in a clinical context and increase the diagnostic rate. The specifics of a gene-disease scoring system adapted for use in a clinical laboratory are described. In turn, clinical validity scoring of gene-disease relationships can inform exome reporting for the identification of new or the upgrade of previous, clinically relevant gene findings. Our retrospective analysis of all reclassification reports from the first 4 years of diagnostic exome sequencing showed that 78% were due to new gene-disease discoveries published in the literature. Among all exome positive/likely positive findings in characterized genes, 32% were in genetic etiologies that were discovered after 2010. Our data underscore the importance and benefits of active and up-to-date curation of a gene-disease database combined with critical clinical validity scoring and proactive reanalysis in the clinical genomics era.
Assuntos
Exoma , Estudos de Associação Genética/métodos , Genômica/métodos , Estudos de Associação Genética/normas , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
Microdeletions of the Y chromosome (YCMs), Klinefelter syndrome (47,XXY), and CFTR mutations are known genetic causes of severe male infertility, but the majority of cases remain idiopathic. Here, we describe a novel method using single molecule Molecular Inversion Probes (smMIPs), to screen infertile men for mutations and copy number variations affecting known disease genes. We designed a set of 4,525 smMIPs targeting the coding regions of causal (n = 6) and candidate (n = 101) male infertility genes. After extensive validation, we screened 1,112 idiopathic infertile men with non-obstructive azoospermia or severe oligozoospermia. In addition to five chromosome YCMs and six other sex chromosomal anomalies, we identified five patients with rare recessive mutations in CFTR as well as a patient with a rare heterozygous frameshift mutation in SYCP3 that may be of clinical relevance. This results in a genetic diagnosis in 11-17 patients (1%-1.5%), a yield that may increase significantly when more genes are confidently linked to male infertility. In conclusion, we developed a flexible and scalable method to reliably detect genetic causes of male infertility. The assay consolidates the detection of different types of genetic variation while increasing the diagnostic yield and detection precision at the same or lower price compared with currently used methods.
Assuntos
Azoospermia/diagnóstico , Azoospermia/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Testes Genéticos , Oligospermia/diagnóstico , Oligospermia/genética , Aberrações Cromossômicas , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Testes Genéticos/métodos , Testes Genéticos/normas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Mutação , Fenótipo , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Aberrações dos Cromossomos Sexuais , Contagem de EspermatozoidesRESUMO
The unprecedented efficiency of the CRISPR/Cas9 system in genome engineering has opened the prospect of employing mutant founders for phenotyping cohorts, thus accelerating research projects by circumventing the requirement to generate cohorts using conventional two- or three-generation crosses. However, these first-generation mutants are often genetic mosaics, with a complex and difficult to define genetic make-up. Here, we discuss the potential benefits, challenges and scientific validity of such models.
Assuntos
Edição de Genes , Genoma , Mutação , Fenótipo , Animais , Sistemas CRISPR-Cas , Quimerismo , Edição de Genes/métodos , Edição de Genes/normas , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Engenharia Genética/métodos , Engenharia Genética/normas , MosaicismoRESUMO
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.
Assuntos
Estudos de Associação Genética/métodos , Modelos Genéticos , Estudos de Associação Genética/normas , Genótipo , Humanos , Fenótipo , Característica Quantitativa Herdável , Tamanho da AmostraRESUMO
BACKGROUND/AIMS: Genome-wide association studies (GWAS) have identified many variants that each affect multiple phenotypes, which suggests that pleiotropic effects on human complex phenotypes may be widespread. Therefore, statistical methods that can jointly analyze multiple phenotypes in GWAS may have advantages over analyzing each phenotype individually. Several statistical methods have been developed to utilize such multivariate phenotypes in genetic association studies; however, the performance of these methods under different scenarios is largely unknown. Our goal was to provide researchers with useful guidelines on selecting statistical methods for the application of real data to multiple phenotypes. METHODS: In this study, we evaluated the performance of some of the existing methods for association studies using multiple phenotypes. These methods included the O'Brien method (OB), cross-validation method (CV), optimal weight method (OW), Trait-based Association Test that uses Extended Simes procedure (TATES), principal components of heritability (PCH), canonical correlation analysis (CCA), multivariate analysis of variance (MANOVA), and a joint model of multiple phenotypes (MultiPhen). We used simulation studies to compare the powers of these methods under a variety of scenarios, including different numbers of phenotypes, different values of between-phenotype correlation, different minor allele frequencies, and different mean and variance models. RESULTS AND CONCLUSION: Our simulation results show that there is no single method with consistently good performance among all the scenarios. Each method has its own advantages and disadvantages.
Assuntos
Estudos de Associação Genética/métodos , Modelos Genéticos , Fenótipo , Simulação por Computador , Interpretação Estatística de Dados , Frequência do Gene , Estudos de Associação Genética/normas , HumanosRESUMO
Correct harmonized statistical re-analysis of the data published in this Journal by I.V.Polyakova et al. (2014) clearly shows that, contrary to the authors' opinion, the distribution of genotypes among residents of besieged Leningrad and the residents of the North-West region of Russia appeared to be statistically indistinguishable in all five genes studied. The main causes of the erroneous conclusions of the authors are neglecting the problem of multiple comparisons and fundamental impossibility of sampling adequate control group. A scheme for harmonized statistical analysis of such data is presented. It implies not only frequentist but Bayesian point and interval estimates for genotype proportions and their differences, for fixation index (coefficient of inbreeding) FIS, for the effect size φ based on χ2 statistic (contingency coefficient) and for the achieved power (1 - ß), as well as estimates of posterior probabilities for the null hypothesis P(H_0 |D), Bayes factors ãBFã_01, observed p-values, p_obs, with the prediction intervals, and p-values adjusted for the multiplicity of null hypotheses tested (P_S).
Assuntos
Estudos de Associação Genética , Fosfato de Piridoxal/análogos & derivados , Proteína Desacopladora 1/genética , Teorema de Bayes , Interpretação Estatística de Dados , Estudos de Associação Genética/métodos , Estudos de Associação Genética/normas , Estudos de Associação Genética/estatística & dados numéricos , Humanos , Modelos Estatísticos , Fosfato de Piridoxal/genética , Federação Russa , Sobreviventes/estatística & dados numéricos , II Guerra MundialRESUMO
MOTIVATION: Next-generation sequencing and other high-throughput technology advances have promoted great interest in detecting associations between complex traits and genetic variants. Phenotype selection, quality control (QC) and control of confounders are crucial and can have a great impact on the ability to detect associations. Although there are programs to perform association analyses, e.g. PLINK and GenABEL, they cannot be used for comprehensive management and QC of phenotype data. To address this need PhenoMan was developed: to select individuals based on multiple phenotype criteria or population membership; control for missing covariate data; remove related individuals, duplicate samples and individuals with incorrect sex specification; recode primary traits and covariates; transform data; remove or winsorize outliers; select covariates for analysis; and create residuals. To ensure consistency and harmonization between analyses, a report is generated for every dataset. Summary statistics are also provided in graphical or text format. PhenoMan can be used for selection and manipulation of quantitative, disease and control data. SUMMARY: Phenoman is freeware that provides approaches for efficient exploration and management of phenotype data. Proper QC of phenotypes before proceeding to the association analysis is critical to ensure control of type I and II errors, reliable effect estimates and consistent results between studies. PhenoMan is highly beneficial for the preparation of qualitative and quantitative trait data for association studies using new datasets as well as those obtained from public repositories. AVAILABILITY AND IMPLEMENTATION: code.google.com/p/phenoman
Assuntos
Estudos de Associação Genética/métodos , Variação Genética , Fenótipo , Software , Estudos de Associação Genética/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Controle de Qualidade , Característica Quantitativa HerdávelRESUMO
BACKGROUND: Advances in genomics technology have led to a dramatic increase in the number of published genetic association studies. Systematic reviews and meta-analyses are a common method of synthesizing findings and providing reliable estimates of the effect of a genetic variant on a trait of interest. However, summary estimates are subject to bias due to the varying methodological quality of individual studies. We embarked on an effort to develop and evaluate a tool that assesses the quality of published genetic association studies. Performance characteristics (i.e. validity, reliability, and item discrimination) were evaluated using a sample of thirty studies randomly selected from a previously conducted systematic review. RESULTS: The tool demonstrates excellent psychometric properties and generates a quality score for each study with corresponding ratings of 'low', 'moderate', or 'high' quality. We applied our tool to a published systematic review to exclude studies of low quality, and found a decrease in heterogeneity and an increase in precision of summary estimates. CONCLUSION: This tool can be used in systematic reviews to inform the selection of studies for inclusion, to conduct sensitivity analyses, and to perform meta-regressions.