RESUMEN
Although there are many methods available for inferring copy-number variants (CNVs) from next-generation sequence data, there remains a need for a system that is computationally efficient but that retains good sensitivity and specificity across all types of CNVs. Here, we introduce a new method, estimation by read depth with single-nucleotide variants (ERDS), and use various approaches to compare its performance to other methods. We found that for common CNVs and high-coverage genomes, ERDS performs as well as the best method currently available (Genome STRiP), whereas for rare CNVs and high-coverage genomes, ERDS performs better than any available method. Importantly, ERDS accommodates both unique and highly amplified regions of the genome and does so without requiring separate alignments for calling CNVs and other variants. These comparisons show that for genomes sequenced at high coverage, ERDS provides a computationally convenient method that calls CNVs as well as or better than any currently available method.
Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Análisis de Secuencia de ADN/métodos , Algoritmos , Eliminación de Gen , Técnicas de Genotipaje , Humanos , Estudios de Validación como AsuntoRESUMEN
To date, the widely used genome-wide association studies (GWASs) of the human genome have reported thousands of variants that are significantly associated with various human traits. However, in the vast majority of these cases, the causal variants responsible for the observed associations remain unknown. In order to facilitate the identification of causal variants, we designed a simple computational method called the "preferential linkage disequilibrium (LD)" approach, which follows the variants discovered by GWASs to pinpoint the causal variants, even if they are rare compared with the discovery variants. The approach is based on the hypothesis that the GWAS-discovered variant is better at tagging the causal variants than are most other variants evaluated in the original GWAS. Applying the preferential LD approach to the GWAS signals of five human traits for which the causal variants are already known, we successfully placed the known causal variants among the top ten candidates in the majority of these cases. Application of this method to additional GWASs, including those of hepatitis C virus treatment response, plasma levels of clotting factors, and late-onset Alzheimer disease, has led to the identification of a number of promising candidate causal variants. This method represents a useful tool for delineating causal variants by bringing together GWAS signals and the rapidly accumulating variant data from next-generation sequencing.
Asunto(s)
Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento , Biología Computacional/métodos , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genoma Humano , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Schizophrenia is a severe psychiatric disorder with strong heritability and marked heterogeneity in symptoms, course, and treatment response. There is strong interest in identifying genetic risk factors that can help to elucidate the pathophysiology and that might result in the development of improved treatments. Linkage and genome-wide association studies (GWASs) suggest that the genetic basis of schizophrenia is heterogeneous. However, it remains unclear whether the underlying genetic variants are mostly moderately rare and can be identified by the genotyping of variants observed in sequenced cases in large follow-up cohorts or whether they will typically be much rarer and therefore more effectively identified by gene-based methods that seek to combine candidate variants. Here, we consider 166 persons who have schizophrenia or schizoaffective disorder and who have had either their genomes or their exomes sequenced to high coverage. From these data, we selected 5,155 variants that were further evaluated in an independent cohort of 2,617 cases and 1,800 controls. No single variant showed a study-wide significant association in the initial or follow-up cohorts. However, we identified a number of case-specific variants, some of which might be real risk factors for schizophrenia, and these can be readily interrogated in other data sets. Our results indicate that schizophrenia risk is unlikely to be predominantly influenced by variants just outside the range detectable by GWASs. Rather, multiple rarer genetic variants must contribute substantially to the predisposition to schizophrenia, suggesting that both very large sample sizes and gene-based association tests will be required for securely identifying genetic risk factors.
Asunto(s)
Exoma/genética , Predisposición Genética a la Enfermedad/genética , Esquizofrenia/genética , Secuencia de Bases , Finlandia , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Datos de Secuencia Molecular , Factores de Riesgo , Alineación de Secuencia , Análisis de Secuencia de ADN , Estados UnidosRESUMEN
Idiopathic generalized epilepsy (IGE) is a complex disease with high heritability, but little is known about its genetic architecture. Rare copy-number variants have been found to explain nearly 3% of individuals with IGE; however, it remains unclear whether variants with moderate effect size and frequencies below what are reliably detected with genome-wide association studies contribute significantly to disease risk. In this study, we compare the exome sequences of 118 individuals with IGE and 242 controls of European ancestry by using next-generation sequencing. The exome-sequenced epilepsy cases include study subjects with two forms of IGE, including juvenile myoclonic epilepsy (n = 93) and absence epilepsy (n = 25). However, our discovery strategy did not assume common genetic control between the subtypes of IGE considered. In the sequence data, as expected, no variants were significantly associated with the IGE phenotype or more specific IGE diagnoses. We then selected 3,897 candidate epilepsy-susceptibility variants from the sequence data and genotyped them in a larger set of 878 individuals with IGE and 1,830 controls. Again, no variant achieved statistical significance. However, 1,935 variants were observed exclusively in cases either as heterozygous or homozygous genotypes. It is likely that this set of variants includes real risk factors. The lack of significant association evidence of single variants with disease in this two-stage approach emphasizes the high genetic heterogeneity of epilepsy disorders, suggests that the impact of any individual single-nucleotide variant in this disease is small, and indicates that gene-based approaches might be more successful for future sequencing studies of epilepsy predisposition.
Asunto(s)
Epilepsia Generalizada/genética , Exoma/genética , Predisposición Genética a la Enfermedad/genética , Secuencia de Bases , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Datos de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN , Población Blanca/genéticaRESUMEN
One of the longest running debates in evolutionary biology concerns the kind of genetic variation that is primarily responsible for phenotypic variation in species. Here, we address this question for humans specifically from the perspective of population allele frequency of variants across the complete genome, including both coding and noncoding regions. We establish simple criteria to assess the likelihood that variants are functional based on their genomic locations and then use whole-genome sequence data from 29 subjects of European origin to assess the relationship between the functional properties of variants and their population allele frequencies. We find that for all criteria used to assess the likelihood that a variant is functional, the rarer variants are significantly more likely to be functional than the more common variants. Strikingly, these patterns disappear when we focus on only those variants in which the major alleles are derived. These analyses indicate that the majority of the genetic variation in terms of phenotypic consequence may result from a mutation-selection balance, as opposed to balancing selection, and have direct relevance to the study of human disease.
Asunto(s)
Variación Genética , Alelos , Secuencia Conservada , Evolución Molecular , Frecuencia de los Genes , Genes Reguladores , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Mutación , Fenotipo , Polimorfismo de Nucleótido Simple , Selección Genética , Población Blanca/genéticaRESUMEN
Although many methods are available to test sequence variants for association with complex diseases and traits, methods that specifically seek to identify causal variants are less developed. Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects. By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes. We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.
Asunto(s)
Causalidad , Análisis de Regresión , Exoma , Estudio de Asociación del Genoma Completo , Genotipo , Modelos TeóricosRESUMEN
Deletions at 16p13.11 are associated with schizophrenia, mental retardation, and most recently idiopathic generalized epilepsy. To evaluate the role of 16p13.11 deletions, as well as other structural variation, in epilepsy disorders, we used genome-wide screens to identify copy number variation in 3812 patients with a diverse spectrum of epilepsy syndromes and in 1299 neurologically-normal controls. Large deletions (> 100 kb) at 16p13.11 were observed in 23 patients, whereas no control had a deletion greater than 16 kb. Patients, even those with identically sized 16p13.11 deletions, presented with highly variable epilepsy phenotypes. For a subset of patients with a 16p13.11 deletion, we show a consistent reduction of expression for included genes, suggesting that haploinsufficiency might contribute to pathogenicity. We also investigated another possible mechanism of pathogenicity by using hybridization-based capture and next-generation sequencing of the homologous chromosome for ten 16p13.11-deletion patients to look for unmasked recessive mutations. Follow-up genotyping of suggestive polymorphisms failed to identify any convincing recessive-acting mutations in the homologous interval corresponding to the deletion. The observation that two of the 16p13.11 deletions were larger than 2 Mb in size led us to screen for other large deletions. We found 12 additional genomic regions harboring deletions > 2 Mb in epilepsy patients, and none in controls. Additional evaluation is needed to characterize the role of these exceedingly large, non-locus-specific deletions in epilepsy. Collectively, these data implicate 16p13.11 and possibly other large deletions as risk factors for a wide range of epilepsy disorders, and they appear to point toward haploinsufficiency as a contributor to the pathogenicity of deletions.
Asunto(s)
Cromosomas Humanos Par 16 , Susceptibilidad a Enfermedades , Epilepsia/genética , Mutación , Eliminación de Secuencia , Humanos , Hibridación de Ácido Nucleico/genética , SíndromeRESUMEN
A fundamental goal of systems biology is to identify genetic elements that contribute to complex phenotypes and to understand how they interact in networks predictive of system response to genetic variation. Few studies in plants have developed such networks, and none have examined their conservation among functionally specialized organs. Here we used genetical genomics in an interspecific hybrid population of the model hardwood plant Populus to uncover transcriptional networks in xylem, leaves, and roots. Pleiotropic eQTL hotspots were detected and used to construct coexpression networks a posteriori, for which regulators were predicted based on cis-acting expression regulation. Networks were shown to be enriched for groups of genes that function in biologically coherent processes and for cis-acting promoter motifs with known roles in regulating common groups of genes. When contrasted among xylem, leaves, and roots, transcriptional networks were frequently conserved in composition, but almost invariably regulated by different loci. Similarly, the genetic architecture of gene expression regulation is highly diversified among plant organs, with less than one-third of genes with eQTL detected in two organs being regulated by the same locus. However, colocalization in eQTL position increases to 50% when they are detected in all three organs, suggesting conservation in the genetic regulation is a function of ubiquitous expression. Genes conserved in their genetic regulation among all organs are primarily cis regulated (approximately 92%), whereas genes with eQTL in only one organ are largely trans regulated. Trans-acting regulation may therefore be the primary driver of differentiation in function between plant organs.
Asunto(s)
Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Populus/genética , Estudio de Asociación del Genoma Completo , Hojas de la Planta/genética , Raíces de Plantas/genética , Sitios de Carácter Cuantitativo , Xilema/genéticaRESUMEN
Although more than 2,400 genes have been shown to contain variants that cause Mendelian disease, there are still several thousand such diseases yet to be molecularly defined. The ability of new whole-genome sequencing technologies to rapidly indentify most of the genetic variants in any given genome opens an exciting opportunity to identify these disease genes. Here we sequenced the whole genome of a single patient with the dominant Mendelian disease, metachondromatosis (OMIM 156250), and used partial linkage data from her small family to focus our search for the responsible variant. In the proband, we identified an 11 bp deletion in exon four of PTPN11, which alters frame, results in premature translation termination, and co-segregates with the phenotype. In a second metachondromatosis family, we confirmed our result by identifying a nonsense mutation in exon 4 of PTPN11 that also co-segregates with the phenotype. Sequencing PTPN11 exon 4 in 469 controls showed no such protein truncating variants, supporting the pathogenicity of these two mutations. This combination of a new technology and a classical genetic approach provides a powerful strategy to discover the genes responsible for unexplained Mendelian disorders.
Asunto(s)
Ligamiento Genético , Predisposición Genética a la Enfermedad , Genoma Humano , Proteína Tirosina Fosfatasa no Receptora Tipo 11/genética , Exones , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Mutación , Linaje , Análisis de Secuencia de ADNRESUMEN
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.
Asunto(s)
Genoma Humano/genética , Análisis de Secuencia de ADN , Secuencia de Bases , Estudios de Casos y Controles , Variaciones en el Número de Copia de ADN/genética , Bases de Datos Genéticas , Exones/genética , Factor VIII/genética , Duplicación de Gen/genética , Técnicas de Inactivación de Genes , Genética de Población , Genotipo , Hemofilia A/genética , Humanos , Mutación INDEL/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Sistemas de Lectura Abierta/genética , Polimorfismo Genético , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
SUMMARY: Here we present Sequence Variant Analyzer (SVA), a software tool that assigns a predicted biological function to variants identified in next-generation sequencing studies and provides a browser to visualize the variants in their genomic contexts. SVA also provides for flexible interaction with software implementing variant association tests allowing users to consider both the bioinformatic annotation of identified variants and the strength of their associations with studied traits. We illustrate the annotation features of SVA using two simple examples of sequenced genomes that harbor Mendelian mutations. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://www.svaproject.org.
Asunto(s)
Genoma Humano , Programas Informáticos , Recursos Audiovisuales , Secuencia de Bases , Variación Estructural del Genoma , Humanos , Internet , Análisis de Secuencia de ADN/métodosRESUMEN
We sequenced the genomes of ten unrelated individuals and identified heterozygous stop codon-gain variants in protein-coding genes: we then sequenced their transcriptomes and assessed the expression levels of the stop codon-gain alleles. An ANOVA showed statistically significant differences between their expression levels (p=4×10(-16)). This difference was almost entirely accounted for by whether the stop codon-gain variant had a second, non-protein-truncating function in or near an alternate transcript: stop codon-gains without alternate functions were generally not found in the cDNA (p=3×10(-5)). Additionally, stop codon-gain variants in two intronless genes were not expressed, an unexpected outcome given previous studies. In this study, stop codon-gain variants were either well expressed in all individuals or were never expressed. Our finding that stop codon-gain variants were generally expressed only when they had an alternate function suggests that most naturally occurring stop codon-gain variants in protein-coding genes are either not transcribed or have their transcripts destroyed.
Asunto(s)
Desequilibrio Alélico , Codón sin Sentido/genética , Genoma Humano , Análisis de Varianza , ADN Complementario/genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Leucocitos Mononucleares/citología , Polimorfismo Genético , Alineación de Secuencia , TranscriptomaRESUMEN
Certolizumab pegol (CZP) is a PEGylated Fc-free tumor necrosis factor (TNF) inhibitor antibody approved for use in the treatment of rheumatoid arthritis (RA), Crohn's disease, psoriatic arthritis, axial spondyloarthritis and psoriasis. In a clinical trial of patients with severe RA, CZP improved disease symptoms in approximately half of patients. However, variability in CZP efficacy remains a problem for clinicians, thus, the aim of this study was to identify genetic variants predictive of CZP response. We performed a genome-wide association study (GWAS) of 302 RA patients treated with CZP in the REALISTIC trial to identify common single nucleotide polymorphisms (SNPs) associated with treatment response. Whole-exome sequencing was also performed for 74 CZP extreme responders and non-responders within the same population, as well as 1546 population controls. No common SNPs or rare functional variants were significantly associated with CZP response, though a non-significant enrichment in the RA-implicated KCNK5 gene was observed. Two SNPs near spondin-1 and semaphorin-4G approached genome-wide significance. The results of the current study did not provide an unambiguous predictor of CZP response.
Asunto(s)
Antirreumáticos , Artritis Reumatoide , Antirreumáticos/uso terapéutico , Artritis Reumatoide/inducido químicamente , Artritis Reumatoide/tratamiento farmacológico , Artritis Reumatoide/genética , Certolizumab Pegol/uso terapéutico , Estudio de Asociación del Genoma Completo , Humanos , Resultado del Tratamiento , Inhibidores del Factor de Necrosis TumoralRESUMEN
Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in approximately 750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics.
Asunto(s)
Cognición , Variaciones en el Número de Copia de ADN , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Gemelos/genética , Adolescente , Adulto , Anciano , Proteínas de Unión al Calcio , Moléculas de Adhesión Celular Neuronal , Femenino , Genética de Población , Humanos , Masculino , Persona de Mediana Edad , Proteínas del Tejido Nervioso/genética , Moléculas de Adhesión de Célula Nerviosa , Pruebas Neuropsicológicas , Gemelos/psicología , Adulto JovenRESUMEN
Numerous genome-wide screens for polymorphisms that influence gene expression have provided key insights into the genetic control of transcription. Despite this work, the relevance of specific polymorphisms to in vivo expression and splicing remains unclear. We carried out the first genome-wide screen, to our knowledge, for SNPs that associate with alternative splicing and gene expression in human primary cells, evaluating 93 autopsy-collected cortical brain tissue samples with no defined neuropsychiatric condition and 80 peripheral blood mononucleated cell samples collected from living healthy donors. We identified 23 high confidence associations with total expression and 80 with alternative splicing as reflected by expression levels of specific exons. Fewer than 50% of the implicated SNPs however show effects in both tissue types, reflecting strong evidence for distinct genetic control of splicing and expression in the two tissue types. The data generated here also suggest the possibility that splicing effects may be responsible for up to 13 out of 84 reported genome-wide significant associations with human traits. These results emphasize the importance of establishing a database of polymorphisms affecting splicing and expression in primary tissue types and suggest that splicing effects may be of more phenotypic significance than overall gene expression changes.
Asunto(s)
Especificidad de Órganos/genética , Carácter Cuantitativo Heredable , Empalme del ARN/genética , Encéfalo/metabolismo , Exones/genética , Genoma Humano/genética , Humanos , Leucocitos Mononucleares/metabolismo , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , Sitios de Carácter Cuantitativo/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Reproducibilidad de los Resultados , Reacción en Cadena de la Polimerasa de Transcriptasa InversaRESUMEN
We performed a whole-genome association study of human immunodeficiency virus type 1 (HIV-1) set point among a cohort of African Americans (n = 515), and an intronic single-nucleotide polymorphism (SNP) in the HLA-B gene showed one of the strongest associations. We use a subset of patients to demonstrate that this SNP reflects the effect of the HLA-B*5703 allele, which shows a genome-wide statistically significant association with viral load set point (P = 5.6 x 10(-10)). These analyses therefore confirm a member of the HLA-B*57 group of alleles as the most important common variant that influences viral load variation in African Americans, which is consistent with what has been observed for individuals of European ancestry, among whom the most important common variant is HLA-B*5701.
Asunto(s)
Negro o Afroamericano/genética , Estudio de Asociación del Genoma Completo , Infecciones por VIH/genética , VIH-1/inmunología , Adolescente , Adulto , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/inmunología , Progresión de la Enfermedad , Genotipo , Infecciones por VIH/inmunología , Infecciones por VIH/virología , Antígenos HLA-B/genética , Antígenos HLA-B/inmunología , Antígenos HLA-C/genética , Antígenos HLA-C/inmunología , Humanos , Masculino , Persona de Mediana Edad , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Carga Viral/genética , Adulto JovenRESUMEN
OBJECTIVE: The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are an inbreed Wistar rat strain widely used as a model of genetic generalised epilepsy with absence seizures. As in humans, the genetic architecture that results in genetic generalized epilepsy in GAERS is poorly understood. Here we present the strain-specific variants found among the epileptic GAERS and their related Non-Epileptic Control (NEC) strain. The GAERS and NEC represent a powerful opportunity to identify neurobiological factors that are associated with the genetic generalised epilepsy phenotype. METHODS: We performed whole genome sequencing on adult epileptic GAERS and adult NEC rats, a strain derived from the same original Wistar colony. We also generated whole genome sequencing on four double-crossed (GAERS with NEC) F2 selected for high-seizing (n = 2) and non-seizing (n = 2) phenotypes. RESULTS: Specific to the GAERS genome, we identified 1.12 million single nucleotide variants, 296.5K short insertion-deletions, and 354 putative copy number variants that result in complete or partial loss/duplication of 41 genes. Of the GAERS-specific variants that met high quality criteria, 25 are annotated as stop codon gain/loss, 56 as putative essential splice sites, and 56 indels are predicted to result in a frameshift. Subsequent screening against the two F2 progeny sequenced for having the highest and two F2 progeny for having the lowest seizure burden identified only the selected Cacna1h GAERS-private protein-coding variant as exclusively co-segregating with the two high-seizing F2 rats. SIGNIFICANCE: This study highlights an approach for using whole genome sequencing to narrow down to a manageable candidate list of genetic variants in a complex genetic epilepsy animal model, and suggests utility of this sequencing design to investigate other spontaneously occurring animal models of human disease.
Asunto(s)
Canales de Calcio Tipo T/genética , Epilepsia Tipo Ausencia/genética , Genoma , Animales , Encéfalo/diagnóstico por imagen , Encéfalo/metabolismo , Encéfalo/patología , ADN/química , ADN/aislamiento & purificación , ADN/metabolismo , Modelos Animales de Enfermedad , Electroencefalografía , Epilepsia Tipo Ausencia/patología , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Masculino , Polimorfismo de Nucleótido Simple , Ratas , Ratas Wistar , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.