RESUMO
To evaluate whether germline variants in genes encoding pancreatic secretory enzymes contribute to pancreatic cancer susceptibility, we sequenced the coding regions of CPB1 and other genes encoding pancreatic secretory enzymes and known pancreatitis susceptibility genes (PRSS1, CPA1, CTRC, and SPINK1) in a hospital series of pancreatic cancer cases and controls. Variants in CPB1, CPA1 (encoding carboxypeptidase B1 and A1), and CTRC were evaluated in a second set of cases with familial pancreatic cancer and controls. More deleterious CPB1 variants, defined as having impaired protein secretion and induction of endoplasmic reticulum (ER) stress in transfected HEK 293T cells, were found in the hospital series of pancreatic cancer cases (5/986, 0.5%) than in controls (0/1,045, P = 0.027). Among familial pancreatic cancer cases, ER stress-inducing CPB1 variants were found in 4 of 593 (0.67%) vs. 0 of 967 additional controls (P = 0.020), with a combined prevalence in pancreatic cancer cases of 9/1,579 vs. 0/2,012 controls (P < 0.01). More ER stress-inducing CPA1 variants were also found in the combined set of hospital and familial cases with pancreatic cancer than in controls [7/1,546 vs. 1/2,012; P = 0.025; odds ratio, 9.36 (95% CI, 1.15-76.02)]. Overall, 16 (1%) of 1,579 pancreatic cancer cases had an ER stress-inducing CPA1 or CPB1 variant, compared with 1 of 2,068 controls (P < 0.00001). No other candidate genes had statistically significant differences in variant prevalence between cases and controls. Our study indicates ER stress-inducing variants in CPB1 and CPA1 are associated with pancreatic cancer susceptibility and implicate ER stress in pancreatic acinar cells in pancreatic cancer development.
Assuntos
Carboxipeptidase B , Carboxipeptidases A , Estresse do Retículo Endoplasmático/genética , Predisposição Genética para Doença , Mutação , Proteínas de Neoplasias , Neoplasias Pancreáticas , Idoso , Idoso de 80 Anos ou mais , Carboxipeptidase B/genética , Carboxipeptidase B/metabolismo , Carboxipeptidases A/genética , Carboxipeptidases A/metabolismo , Linhagem Celular Tumoral , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias Pancreáticas/enzimologia , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologiaRESUMO
In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values < 0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.
Assuntos
Estudos de Associação Genética , Quinases de Proteína Quinase Ativadas por Mitógeno , Modelos Genéticos , Transdução de Sinais/genética , Estudos de Casos e Controles , Simulação por Computador , Exoma , Genoma Humano , Humanos , Funções Verossimilhança , Quinases de Proteína Quinase Ativadas por Mitógeno/genética , Quinases de Proteína Quinase Ativadas por Mitógeno/metabolismo , Modelos Teóricos , Polimorfismo de Nucleotídeo Único , ProbabilidadeRESUMO
BACKGROUND: The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal. RESULTS: We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable. CONCLUSIONS: Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Transtorno Bipolar/genética , Interpretação Estatística de Dados , Exoma , Humanos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Although nearly half of the human genome is comprised of repetitive sequences, the expression profile of these elements remains largely uncharacterized. Recently developed high throughput sequencing technologies provide us with a powerful new set of tools to study repeat elements. Hence, we performed whole transcriptome sequencing to investigate the expression of repetitive elements in human frontal cortex using postmortem tissue obtained from the Stanley Medical Research Institute. RESULTS: We found a significant amount of reads from the human frontal cortex originate from repeat elements. We also noticed that Alu elements were expressed at levels higher than expected by random or background transcription. In contrast, L1 elements were expressed at lower than expected amounts. CONCLUSIONS: Repetitive elements are expressed abundantly in the human brain. This expression pattern appears to be element specific and can not be explained by random or background transcription. These results demonstrate that our knowledge about repetitive elements is far from complete. Further characterization is required to determine the mechanism, the control, and the effects of repeat element expression.
Assuntos
Encéfalo/metabolismo , Sequências Repetitivas de Ácido Nucleico/genética , Elementos Alu/genética , Humanos , Elementos Nucleotídeos Longos e Dispersos/genéticaRESUMO
Suicidal behavior is a complex and devastating phenotype with a heritable component that has not been fully explained by existing common genetic variant analyses. This study represents the first large-scale DNA sequencing project designed to assess the role of rare functional genetic variation in suicidal behavior risk. To accomplish this, whole-exome sequencing data for â¼19,000 genes were generated for 387 bipolar disorder subjects with a history of suicide attempt and 631 bipolar disorder subjects with no prior suicide attempts. Rare functional variants were assessed in all exome genes as well as pathways hypothesized to contribute to suicidal behavior risk. No result survived conservative Bonferroni correction, though many suggestive findings have arisen that merit additional attention. In addition, nominal support for past associations in genes, such as BDNF, and pathways, such as the hypothalamic-pituitary-adrenal axis, was also observed. Finally, a novel pathway was identified that is driven by aldehyde dehydrogenase genes. Ultimately, this investigation explores variation left largely untouched by existing efforts in suicidal behavior, providing a wealth of novel information to add to future investigations, such as meta-analyses.
RESUMO
IMPORTANCE: Complex disorders, such as bipolar disorder (BD), likely result from the influence of both common and rare susceptibility alleles. While common variation has been widely studied, rare variant discovery has only recently become feasible with next-generation sequencing. OBJECTIVE: To utilize a combined family-based and case-control approach to exome sequencing in BD using multiplex families as an initial discovery strategy, followed by association testing in a large case-control meta-analysis. DESIGN, SETTING, AND PARTICIPANTS: We performed exome sequencing of 36 affected members with BD from 8 multiplex families and tested rare, segregating variants in 3 independent case-control samples consisting of 3541 BD cases and 4774 controls. MAIN OUTCOMES AND MEASURES: We used penalized logistic regression and 1-sided gene-burden analyses to test for association of rare, segregating damaging variants with BD. Permutation-based analyses were performed to test for overall enrichment with previously identified gene sets. RESULTS: We found 84 rare (frequency <1%), segregating variants that were bioinformatically predicted to be damaging. These variants were found in 82 genes that were enriched for gene sets previously identified in de novo studies of autism (19 observed vs. 10.9 expected, P = .0066) and schizophrenia (11 observed vs. 5.1 expected, P = .0062) and for targets of the fragile X mental retardation protein (FMRP) pathway (10 observed vs. 4.4 expected, P = .0076). The case-control meta-analyses yielded 19 genes that were nominally associated with BD based either on individual variants or a gene-burden approach. Although no gene was individually significant after correction for multiple testing, this group of genes continued to show evidence for significant enrichment of de novo autism genes (6 observed vs 2.6 expected, P = .028). CONCLUSIONS AND RELEVANCE: Our results are consistent with the presence of prominent locus and allelic heterogeneity in BD and suggest that very large samples will be required to definitively identify individual rare variants or genes conferring risk for this disorder. However, we also identify significant associations with gene sets composed of previously discovered de novo variants in autism and schizophrenia, as well as targets of the FRMP pathway, providing preliminary support for the overlap of potential autism and schizophrenia risk genes with rare, segregating variants in families with BD.
Assuntos
Transtorno Bipolar/genética , Exoma/genética , Análise de Sequência de DNA , Alelos , Transtorno Autístico/genética , Transtorno Autístico/psicologia , Transtorno Bipolar/diagnóstico , Transtorno Bipolar/psicologia , Estudos de Casos e Controles , Proteína do X Frágil da Deficiência Intelectual/genética , Heterogeneidade Genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , Humanos , Esquizofrenia/genética , Psicologia do EsquizofrênicoRESUMO
UNLABELLED: Pancreatic cancer is projected to become the second leading cause of cancer-related death in the United States by 2020. A familial aggregation of pancreatic cancer has been established, but the cause of this aggregation in most families is unknown. To determine the genetic basis of susceptibility in these families, we sequenced the germline genomes of 638 patients with familial pancreatic cancer and the tumor exomes of 39 familial pancreatic adenocarcinomas. Our analyses support the role of previously identified familial pancreatic cancer susceptibility genes such as BRCA2, CDKN2A, and ATM, and identify novel candidate genes harboring rare, deleterious germline variants for further characterization. We also show how somatic point mutations that occur during hematopoiesis can affect the interpretation of genome-wide studies of hereditary traits. Our observations have important implications for the etiology of pancreatic cancer and for the identification of susceptibility genes in other common cancer types. SIGNIFICANCE: The genetic basis of disease susceptibility in the majority of patients with familial pancreatic cancer is unknown. We whole genome sequenced 638 patients with familial pancreatic cancer and demonstrate that the genetic underpinning of inherited pancreatic cancer is highly heterogeneous. This has significant implications for the management of patients with familial pancreatic cancer.
Assuntos
Carcinoma/genética , Predisposição Genética para Doença , Mutação em Linhagem Germinativa , Neoplasias Pancreáticas/genética , Análise de Sequência de DNA/métodos , Proteínas Mutadas de Ataxia Telangiectasia/genética , Proteína BRCA2/genética , Inibidor p16 de Quinase Dependente de Ciclina/genética , Humanos , Mutação PuntualRESUMO
Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Proteína do X Frágil da Deficiência Intelectual/genética , Predisposição Genética para Doença , Mutação/genética , Criança , Transtornos Globais do Desenvolvimento Infantil/etiologia , Pré-Escolar , Saúde da Família , Feminino , Dosagem de Genes , Estudos de Associação Genética , Humanos , Masculino , Modelos Moleculares , Pais , FenótipoRESUMO
BACKGROUND: Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. RESULTS: Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. CONCLUSIONS: Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.