RESUMEN
MOTIVATION: High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. RESULTS: We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. AVAILABILITY AND IMPLEMENTATION: https://github.com/seqan/flexbar. CONTACT: johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Animales , Caenorhabditis elegans/genética , Genoma de los Helmintos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodosRESUMEN
Understanding the genetics of speciation and the processes that drive it is a central goal of evolutionary biology. Grasshoppers of the Chorthippus species group differ strongly in calling song (and corresponding female preferences) but are exceedingly similar in other characteristics such as morphology. Here, we performed a population genomic scan on three Chorthippus species (Chorthippus biguttulus, C. mollis and C. brunneus) to gain insight into the genes and processes involved in divergence and speciation in this group. Using an RNA-seq approach, we examined functional variation between the species by calling SNPs for each of the three species pairs and using FST -based approaches to identify outliers. We found approximately 1% of SNPs in each comparison to be outliers. Between 37% and 40% of these outliers were nonsynonymous SNPs (as opposed to a global level of 17%) indicating that we recovered loci under selection. Among the outliers were several genes that may be involved in song production and hearing as well as genes involved in other traits such as food preferences and metabolism. Differences in food preferences between species were confirmed with a behavioural experiment. This indicates that multiple phenotypic differences implicating multiple evolutionary processes (sexual selection and natural selection) are present between the species.
Asunto(s)
Genética de Población/métodos , Genoma de los Insectos , Saltamontes/clasificación , Animales , Teorema de Bayes , Femenino , Preferencias Alimentarias , Genómica/métodos , Genotipo , Saltamontes/genética , Masculino , Fenotipo , Polimorfismo de Nucleótido Simple , Aislamiento Reproductivo , Análisis de Secuencia de ARN , Especificidad de la Especie , TranscriptomaRESUMEN
More than 800 published genetic association studies have implicated dozens of potential risk loci in Parkinson's disease (PD). To facilitate the interpretation of these findings, we have created a dedicated online resource, PDGene, that comprehensively collects and meta-analyzes all published studies in the field. A systematic literature screen of -27,000 articles yielded 828 eligible articles from which relevant data were extracted. In addition, individual-level data from three publicly available genome-wide association studies (GWAS) were obtained and subjected to genotype imputation and analysis. Overall, we performed meta-analyses on more than seven million polymorphisms originating either from GWAS datasets and/or from smaller scale PD association studies. Meta-analyses on 147 SNPs were supplemented by unpublished GWAS data from up to 16,452 PD cases and 48,810 controls. Eleven loci showed genome-wide significant (P < 5 × 10(-8)) association with disease risk: BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, and SYT11/RAB25. In addition, we identified novel evidence for genome-wide significant association with a polymorphism in ITGA8 (rs7077361, OR 0.88, Pâ =â 1.3 × 10(-8)). All meta-analysis results are freely available on a dedicated online database (www.pdgene.org), which is cross-linked with a customized track on the UCSC Genome Browser. Our study provides an exhaustive and up-to-date summary of the status of PD genetics research that can be readily scaled to include the results of future large-scale genetics projects, including next-generation sequencing studies.
Asunto(s)
Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo , Enfermedad de Parkinson/genética , Genoma Humano , Humanos , Internet , Polimorfismo de Nucleótido SimpleRESUMEN
Distinguishing true from false positive findings is a major challenge in human genetic epidemiology. Several strategies have been devised to facilitate this, including the positive predictive value (PPV) and a set of epidemiological criteria, known as the "Venice" criteria. The PPV measures the probability of a true association, given a statistically significant finding, while the Venice criteria grade the credibility based on the amount of evidence, consistency of replication and protection from bias. A vast majority of journals use significance thresholds to identify the true positive findings. We studied the effect of p value thresholds on the PPV and used the PPV and Venice criteria to define usable thresholds of statistical significance. Theoretical and empirical analyses of data published on AlzGene show that at a nominal p value threshold of 0.05 most "positive" findings will turn out to be false if the prior probability of association is below 0.10 even if the statistical power of the study is higher than 0.80. However, in underpowered studies (0.25) with a low prior probability of 1 × 10(-3), a p value of 1 × 10(-5) yields a high PPV (>96 %). Here we have shown that the p value threshold of 1 × 10(-5) gives a very strong evidence of association in almost all studies. However, in the case of a very high prior probability of association (0.50) a p value threshold of 0.05 may be sufficient, while for studies with very low prior probability of association (1 × 10(-4); genome-wide association studies for instance) 1 × 10(-7) may serve as a useful threshold to declare significance.
Asunto(s)
Enfermedad de Alzheimer/genética , Sesgo , Reacciones Falso Positivas , Genómica/estadística & datos numéricos , Epidemiología Molecular , Genómica/métodos , Humanos , Epidemiología Molecular/métodos , Epidemiología Molecular/estadística & datos numéricosRESUMEN
BACKGROUND: Single nucleotide polymorphisms (SNPs) rs429358 (ε4) and rs7412 (ε2), both invoking changes in the amino-acid sequence of the apolipoprotein E (APOE) gene, have previously been tested for association with multiple sclerosis (MS) risk. However, none of these studies was sufficiently powered to detect modest effect sizes at acceptable type-I error rates. As both SNPs are only imperfectly captured on commonly used microarray genotyping platforms, their evaluation in the context of genome-wide association studies has been hindered until recently. METHODS: We genotyped 12 740 subjects hitherto not studied for their APOE status, imputed raw genotype data from 8739 subjects from five independent genome-wide association studies datasets using the most recent high-resolution reference panels, and extracted genotype data for 8265 subjects from previous candidate gene assessments. RESULTS: Despite sufficient power to detect associations at genome-wide significance thresholds across a range of ORs, our analyses did not support a role of rs429358 or rs7412 on MS susceptibility. This included meta-analyses of the combined data across 13 913 MS cases and 15 831 controls (OR=0.95, p=0.259, and OR 1.07, p=0.0569, for rs429358 and rs7412, respectively). CONCLUSION: Given the large sample size of our analyses, it is unlikely that the two APOE missense SNPs studied here exert any relevant effects on MS susceptibility.
Asunto(s)
Apolipoproteínas E/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Esclerosis Múltiple/genética , Bases de Datos Genéticas , Humanos , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo , Población Blanca/genéticaRESUMEN
Hi-C, capture Hi-C (CHC) and Capture-C have contributed greatly to our present understanding of the three-dimensional organization of genomes in the context of transcriptional regulation by characterizing the roles of topological associated domains, enhancer promoter loops and other three-dimensional genomic interactions. The analysis is based on counts of chimeric read pairs that map to interacting regions of the genome. However, the processing and quality control presents a number of unique challenges. We review here the experimental and computational foundations and explain how the characteristics of restriction digests, sonication fragments and read pairs can be exploited to distinguish technical artefacts from valid read pairs originating from true chromatin interactions.
Asunto(s)
Cromatina/genética , Biología Computacional , Genoma , Genómica , Mapeo Cromosómico , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Control de CalidadRESUMEN
OBJECTIVES: Recently, 2 independent studies reported that a rare missense variant, rs75932628 (R47H), in exon 2 of the gene encoding the "triggering receptor expressed on myeloid cells 2" (TREM2) significantly increases the risk of Alzheimer disease (AD) with an effect size comparable to that of the APOE ε4 allele. METHODS: In this study, we attempted to replicate the association between rs75932628 and AD risk by directly genotyping rs75932628 in 2 independent Caucasian family cohorts consisting of 927 families (with 1,777 affected and 1,235 unaffected) and in 2 Caucasian case-control cohorts composed of 1,314 cases and 1,609 controls. In addition, we imputed genotypes in 3 independent Caucasian case-control cohorts containing 1,906 cases and 1,503 controls. RESULTS: Meta-analysis of the 2 family-based and the 5 case-control cohorts yielded a p value of 0.0029, while the overall summary estimate (using case-control data only) resulted in an odds ratio of 1.67 (95% confidence interval 0.95-2.92) for the association between the TREM2 R47H and increased AD risk. CONCLUSIONS: While our results serve to confirm the association between R47H and risk of AD, the observed effect on risk was substantially smaller than that previously reported.
Asunto(s)
Enfermedad de Alzheimer/genética , Predisposición Genética a la Enfermedad/genética , Glicoproteínas de Membrana/genética , Receptores Inmunológicos/genética , Estudios de Casos y Controles , Genotipo , Humanos , Mutación Missense/genética , Población Blanca/genéticaRESUMEN
Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter sequences); (b) have been introduced by experimental design (e.g., sample barcodes); or (c) constitute some biological signal (e.g., splice leader sequences in nematodes). Our software FLEXBAR enables accurate recognition, sorting and trimming of sequence tags with maximal flexibility, based on exact overlap sequence alignment. The software supports data formats from all current sequencing platforms, including color-space reads. FLEXBAR maintains read pairings and processes separate barcode reads on demand. Our software facilitates the fine-grained adjustment of sequence tag detection parameters and search regions. FLEXBAR is a multi-threaded software and combines speed with precision. Even complex read processing scenarios might be executed with a single command line call. We demonstrate the utility of the software in terms of read mapping applications, library demultiplexing and splice leader detection. FLEXBAR and additional information is available for academic use from the website: http://sourceforge.net/projects/flexbar/.
RESUMEN
BACKGROUND: Although genetic studies have reported a number of loci associated with cutaneous melanoma (CM) risk, a comprehensive synopsis of genetic association studies published in the field and systematic meta-analysis for all eligible polymorphisms have not been reported. METHODS: We systematically annotated data from all genetic association studies published in the CM field (n = 145), including data from genome-wide association studies (GWAS), and performed random-effects meta-analyses across all eligible polymorphisms on the basis of four or more independent case-control datasets in the main analyses. Supplementary analyses of three available datasets derived from GWAS and GWAS-replication studies were also done. Nominally statistically significant associations between polymorphisms and CM were graded for the strength of epidemiological evidence on the basis of the Human Genome Epidemiology Network Venice criteria. All statistical tests were two-sided. RESULTS: Forty-two polymorphisms across 18 independent loci evaluated in four or more datasets including candidate gene studies and available GWAS data were subjected to meta-analysis. Eight loci were identified in the main meta-analyses as being associated with a risk of CM (P < .05) of which four loci showed a genome-wide statistically significant association (P < 1 × 10(-7)), including 16q24.3 (MC1R), 20q11.22 (MYH7B/PIGU/ASIP), 11q14.3 (TYR), and 5p13.2 (SLC45A2). Grading of the cumulative evidence by the Venice criteria suggested strong epidemiological credibility for all four loci with genome-wide statistical significance and one additional gene at 9p23 (TYRP1). In the supplementary meta-analyses, a locus at 9p21.3 (CDKN2A/MTAP) reached genome-wide statistical significance with CM and had strong epidemiological credibility. CONCLUSIONS: To the best of our knowledge, this is the first comprehensive field synopsis and systematic meta-analysis to identify genes associated with an increased susceptibility to CM.
Asunto(s)
Melanoma/epidemiología , Melanoma/genética , Polimorfismo de Nucleótido Simple , Neoplasias Cutáneas/epidemiología , Neoplasias Cutáneas/genética , Proteína de Señalización Agouti/genética , Antígenos de Neoplasias/genética , Miosinas Cardíacas/genética , Factores de Confusión Epidemiológicos , Inhibidor p16 de la Quinasa Dependiente de Ciclina/genética , Bases de Datos Genéticas , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Glicoproteínas de Membrana/genética , Proteínas de la Membrana/genética , Proteínas de Transporte de Membrana/genética , Metaanálisis como Asunto , Epidemiología Molecular , Cadenas Pesadas de Miosina/genética , Proteínas de Neoplasias/genética , Oxidorreductasas/genética , Receptor de Melanocortina Tipo 1/genética , Receptores de Calcitriol/genética , Reproducibilidad de los Resultados , Proyectos de InvestigaciónRESUMEN
Agonist-induced activation of the δ-opioid receptor (δOR) was recently shown to augment ß- and γ-secretase activities, which increased the production of ß-amyloid peptide (Aß), known to accumulate in the brain tissues of Alzheimer's disease (AD) patients. Previously, the δOR variant with a phenylalanine at position 27 (δOR-Phe27) exhibited more efficient receptor maturation and higher stability at the cell surface than did the less common cysteine (δOR-Cys27) variant. For this study, we expressed these variants in human SH-SY5Y and HEK293 cells expressing exogenous or endogenous amyloid precursor protein (APP) and assessed the effects on APP processing. Expression of δOR-Cys27, but not δOR-Phe27, resulted in a robust accumulation of the APP C83 C-terminal fragment and the APP intracellular domain, while the total soluble APP and, particularly, the ß-amyloid 40 levels were decreased. These changes upon δOR-Cys27 expression coincided with decreased localization of APP C-terminal fragments in late endosomes and lysosomes. Importantly, a long-term treatment with a subset of δOR-specific ligands or a c-Src tyrosine kinase inhibitor suppressed the δOR-Cys27-induced APP phenotype. These data suggest that an increased constitutive internalization and/or concurrent signaling of the δOR-Cys27 variant affects APP processing through altered endocytic trafficking of APP.