Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 38(3): 604-611, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34726732

RESUMEN

MOTIVATION: With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across tens of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared with other types of SVs due to the computational complexity of detecting them. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have limited scalability to larger sets of genomes. RESULTS: We introduce PopIns2, a tool to discover and characterize NRS variants in many genomes, which scales to considerably larger numbers of genomes than its predecessor PopIns. In this article, we briefly outline the PopIns2 workflow and highlight our novel algorithmic contributions. We developed an entirely new approach for merging contig assemblies of unaligned reads from many genomes into a single set of NRS using a colored de Bruijn graph. Our tests on simulated data indicate that the new merging algorithm ranks among the best approaches in terms of quality and reliability and that PopIns2 shows the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets. AVAILABILITY AND IMPLEMENTATION: The source code of PopIns2 is available from https://github.com/kehrlab/PopIns2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Humanos , Análisis de Secuencia de ADN/métodos , Reproducibilidad de los Resultados , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
2.
Datenbank Spektrum ; 21(3): 255-260, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34786019

RESUMEN

Today's scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 "FONDA -- Foundations of Workflows for Large-Scale Scientific Data Analysis", in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA's internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the "making of" a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.

3.
Bioinformatics ; 37(19): 3128-3135, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33830196

RESUMEN

MOTIVATION: Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM's ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a priori. So far, however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. RESULTS: We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimize phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. AVAILABILITY AND IMPLEMENTATION: GAMIBHEAR is available as an R package under the open-source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nat Commun ; 12(1): 730, 2021 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-33526789

RESUMEN

Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel's running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies.


Asunto(s)
Genoma Humano/genética , Variación Estructural del Genoma , Metagenómica/métodos , Eliminación de Secuencia , Estudios de Factibilidad , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Patrón de Herencia , Masculino , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN
5.
Circ Genom Precis Med ; 14(1): e003029, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33315477

RESUMEN

BACKGROUND: Loss-of-function mutations in the LDL (low-density lipoprotein) receptor gene (LDLR) cause elevated levels of LDL cholesterol and premature cardiovascular disease. To date, a gain-of-function mutation in LDLR with a large effect on LDL cholesterol levels has not been described. Here, we searched for sequence variants in LDLR that have a large effect on LDL cholesterol levels. METHODS: We analyzed whole-genome sequencing data from 43 202 Icelanders. Single-nucleotide polymorphisms and structural variants including deletions, insertions, and duplications were genotyped using whole-genome sequencing-based data. LDL cholesterol associations were carried out in a sample of >100 000 Icelanders with genetic information (imputed or whole-genome sequencing). Molecular analyses were performed using RNA sequencing and protein expression assays in Epstein-Barr virus-transformed lymphocytes. RESULTS: We discovered a 2.5-kb deletion (del2.5) overlapping the 3' untranslated region of LDLR in 7 heterozygous carriers from a single family. Mean level of LDL cholesterol was 74% lower in del2.5 carriers than in 101 851 noncarriers, a difference of 2.48 mmol/L (96 mg/dL; P=8.4×10-8). Del2.5 results in production of an alternative mRNA isoform with a truncated 3' untranslated region. The truncation leads to a loss of target sites for microRNAs known to repress translation of LDLR. In Epstein-Barr virus-transformed lymphocytes derived from del2.5 carriers, expression of alternative mRNA isoform was 1.84-fold higher than the wild-type isoform (P=0.0013), and there was 1.79-fold higher surface expression of the LDL receptor than in noncarriers (P=0.0086). We did not find a highly penetrant detrimental impact of lifelong very low levels of LDL cholesterol due to del2.5 on health of the carriers. CONCLUSIONS: Del2.5 is the first reported gain-of-function mutation in LDLR causing a large reduction in LDL cholesterol. These data point to a role for alternative polyadenylation of LDLR mRNA as a potent regulator of LDL receptor expression in humans.


Asunto(s)
LDL-Colesterol/sangre , Receptores de LDL/genética , Regiones no Traducidas 3' , Empalme Alternativo , Mutación con Ganancia de Función , Eliminación de Gen , Vectores Genéticos/genética , Vectores Genéticos/metabolismo , Herpesvirus Humano 4/genética , Heterocigoto , Humanos , Hiperlipoproteinemia Tipo II/genética , Hiperlipoproteinemia Tipo II/patología , Islandia , Linfocitos/citología , Linfocitos/metabolismo , MicroARNs/metabolismo , Linaje , Isoformas de Proteínas/genética , ARN Mensajero/metabolismo
6.
Med Genet ; 33(2): 133-145, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38836034

RESUMEN

High-throughput sequencing techniques have significantly increased the molecular diagnosis rate for patients with monogenic disorders. This is primarily due to a substantially increased identification rate of disease mutations in the coding sequence, primarily SNVs and indels. Further progress is hampered by difficulties in the detection of structural variants and the interpretation of variants outside the coding sequence. In this review, we provide an overview about how novel sequencing techniques and state-of-the-art algorithms can be used to discover small and structural variants across the whole genome and introduce bioinformatic tools for the prediction of effects variants may have in the non-coding part of the genome.

7.
Hum Mol Genet ; 28(7): 1199-1211, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30476138

RESUMEN

Urine dipstick tests are widely used in routine medical care to diagnose kidney and urinary tract and metabolic diseases. Several environmental factors are known to affect the test results, whereas the effects of genetic diversity are largely unknown. We tested 32.5 million sequence variants for association with urinary biomarkers in a set of 150 274 Icelanders with urine dipstick measurements. We detected 20 association signals, of which 14 are novel, associating with at least one of five clinical entities defined by the urine dipstick: glucosuria, ketonuria, proteinuria, hematuria and urine pH. These include three independent glucosuria variants at SLC5A2, the gene encoding the sodium-dependent glucose transporter (SGLT2), a protein targeted pharmacologically to increase urinary glucose excretion in the treatment of diabetes. Two variants associating with proteinuria are in LRP2 and CUBN, encoding the co-transporters megalin and cubilin, respectively, that mediate proximal tubule protein uptake. One of the hematuria-associated variants is a rare, previously unreported 2.5 kb exonic deletion in COL4A3. Of the four signals associated with urine pH, we note that the pH-increasing alleles of two variants (POU2AF1, WDR72) associate significantly with increased risk of kidney stones. Our results reveal that genetic factors affect variability in urinary biomarkers, in both a disease dependent and independent context.


Asunto(s)
Biomarcadores/análisis , Biomarcadores/orina , Variación Genética/genética , Adulto , Anciano , Alelos , Femenino , Hematuria/genética , Hematuria/orina , Humanos , Concentración de Iones de Hidrógeno , Islandia , Cetosis/genética , Cetosis/orina , Riñón/metabolismo , Masculino , Persona de Mediana Edad , Proteinuria/genética , Proteinuria/orina , Transportador 2 de Sodio-Glucosa/genética , Secuenciación Completa del Genoma/métodos
8.
Nat Genet ; 50(12): 1674-1680, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30397338

RESUMEN

De novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes.


Asunto(s)
Familia , Patrón de Herencia , Mutación , Relaciones Padres-Hijo , Adulto , Niño , Células Germinales Embrionarias/metabolismo , Composición Familiar , Femenino , Mutación de Línea Germinal , Humanos , Patrón de Herencia/genética , Masculino , Mosaicismo , Linaje
9.
Nat Genet ; 50(11): 1616, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30237445

RESUMEN

In the version of this article published, statements about the impact of insertions and deletions on gene conversions were incorrect. We reported a bias toward deletions, whereas in fact the bias was toward insertions. We are deeply indebted to Laurent Duret and Brice Letcher for noticing this mistake in our manuscript. The following statements are incorrect in the published manuscript.

10.
Sci Data ; 4: 170115, 2017 09 21.
Artículo en Inglés | MEDLINE | ID: mdl-28933420

RESUMEN

Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.


Asunto(s)
Genoma Humano , Humanos , Mutación INDEL , Islandia , Polimorfismo de Nucleótido Simple
11.
Nat Genet ; 49(11): 1654-1660, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-28945251

RESUMEN

A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.


Asunto(s)
Algoritmos , Genoma Humano , Técnicas de Genotipaje/instrumentación , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/estadística & datos numéricos , Alelos , Secuencia de Bases , Gráficos por Computador , Antígenos HLA/genética , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN/métodos , Programas Informáticos
12.
Nature ; 549(7673): 519-522, 2017 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-28959963

RESUMEN

The characterization of mutational processes that generate sequence diversity in the human genome is of paramount importance both to medical genetics and to evolutionary studies. To understand how the age and sex of transmitting parents affect de novo mutations, here we sequence 1,548 Icelanders, their parents, and, for a subset of 225, at least one child, to 35× genome-wide coverage. We find 108,778 de novo mutations, both single nucleotide polymorphisms and indels, and determine the parent of origin of 42,961. The number of de novo mutations from mothers increases by 0.37 per year of age (95% CI 0.32-0.43), a quarter of the 1.51 per year from fathers (95% CI 1.45-1.57). The number of clustered mutations increases faster with the mother's age than with the father's, and the genomic span of maternal de novo mutation clusters is greater than that of paternal ones. The types of de novo mutation from mothers change substantially with age, with a 0.26% (95% CI 0.19-0.33%) decrease in cytosine-phosphate-guanine to thymine-phosphate-guanine (CpG>TpG) de novo mutations and a 0.33% (95% CI 0.28-0.38%) increase in C>G de novo mutations per year, respectively. Remarkably, these age-related changes are not distributed uniformly across the genome. A striking example is a 20 megabase region on chromosome 8p, with a maternal C>G mutation rate that is up to 50-fold greater than the rest of the genome. The age-related accumulation of maternal non-crossover gene conversions also mostly occurs within these regions. Increased sequence diversity and linkage disequilibrium of C>G variants within regions affected by excess maternal mutations indicate that the underlying mutational process has persisted in humans for thousands of years. Moreover, the regional excess of C>G variation in humans is largely shared by chimpanzees, less by gorillas, and is almost absent from orangutans. This demonstrates that sequence diversity in humans results from evolving interactions between age, sex, mutation type, and genomic location.


Asunto(s)
Envejecimiento/genética , Mutación de Línea Germinal/genética , Edad Materna , Mutagénesis , Padres , Edad Paterna , Adolescente , Adulto , Anciano , Animales , Niño , Cromosomas Humanos Par 8/genética , Evolución Molecular , Femenino , Secuencia Rica en GC , Genoma Humano/genética , Gorilla gorilla/genética , Humanos , Mutación INDEL , Islandia , Desequilibrio de Ligamiento/genética , Masculino , Persona de Mediana Edad , Tasa de Mutación , Pan troglodytes/genética , Polimorfismo de Nucleótido Simple , Pongo/genética , Adulto Joven
13.
Nat Commun ; 8: 14755, 2017 05 03.
Artículo en Inglés | MEDLINE | ID: mdl-28466842

RESUMEN

Lynch syndrome, caused by germline mutations in the mismatch repair genes, is associated with increased cancer risk. Here using a large whole-genome sequencing data bank, cancer registry and colorectal tumour bank we determine the prevalence of Lynch syndrome, associated cancer risks and pathogenicity of several variants in the Icelandic population. We use colorectal cancer samples from 1,182 patients diagnosed between 2000-2009. One-hundred and thirty-two (11.2%) tumours are mismatch repair deficient per immunohistochemistry. Twenty-one (1.8%) have Lynch syndrome while 106 (9.0%) have somatic hypermethylation or mutations in the mismatch repair genes. The population prevalence of Lynch syndrome is 0.442%. We discover a translocation disrupting MLH1 and three mutations in MSH6 and PMS2 that increase endometrial, colorectal, brain and ovarian cancer risk. We find thirteen mismatch repair variants of uncertain significance that are not associated with cancer risk. We find that founder mutations in MSH6 and PMS2 prevail in Iceland unlike most other populations.


Asunto(s)
Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Proteínas de Unión al ADN/genética , Efecto Fundador , Mutación de Línea Germinal , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto/genética , Adulto , Anciano , Anciano de 80 o más Años , Disparidad de Par Base , Neoplasias Colorrectales Hereditarias sin Poliposis/epidemiología , Femenino , Predisposición Genética a la Enfermedad , Humanos , Islandia/epidemiología , Masculino , Persona de Mediana Edad , Prevalencia
14.
Hum Mol Genet ; 26(12): 2364-2376, 2017 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-28398513

RESUMEN

Common sequence variants at the haptoglobin gene (HP) have been associated with blood lipid levels. Through whole-genome sequencing of 8,453 Icelanders, we discovered a splice donor founder mutation in HP (NM_001126102.1:c.190 + 1G > C, minor allele frequency = 0.56%). This mutation occurs on the HP1 allele of the common copy number variant in HP and leads to a loss of function of HP1. It associates with lower levels of haptoglobin (P = 2.1 × 10-54), higher levels of non-high density lipoprotein cholesterol (ß = 0.26 mmol/l, P = 2.6 × 10-9) and greater risk of coronary artery disease (odds ratio = 1.30, 95% confidence interval: 1.10-1.54, P = 0.0024). Through haplotype analysis and with RNA sequencing, we provide evidence of a causal relationship between one of the two haptoglobin isoforms, namely Hp1, and lower levels of non-HDL cholesterol. Furthermore, we show that the HP1 allele associates with various other quantitative biological traits.


Asunto(s)
Enfermedad de la Arteria Coronaria/genética , Haptoglobinas/genética , Adulto , Alelos , Secuencia de Bases , Enfermedad de la Arteria Coronaria/metabolismo , Variaciones en el Número de Copia de ADN/genética , Femenino , Frecuencia de los Genes/genética , Estudios de Asociación Genética/métodos , Variación Genética , Haptoglobinas/metabolismo , Humanos , Islandia , Lípidos/sangre , Lípidos/genética , Lipoproteínas/genética , Masculino , Mutación , Oportunidad Relativa , Sitios de Empalme de ARN/genética , Factores de Riesgo
15.
Nat Genet ; 49(4): 588-593, 2017 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-28250455

RESUMEN

Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.


Asunto(s)
Secuencia de Bases/genética , Variación Genética/genética , Genoma Humano/genética , Animales , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Desequilibrio de Ligamiento/genética , Infarto del Miocardio/genética , Pan paniscus/genética , Fenotipo
16.
Bioinformatics ; 33(24): 4041-4048, 2017 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-27591079

RESUMEN

MOTIVATION: Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them. RESULTS: Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes. AVAILABILITY AND IMPLEMENTATION: Source code is available on Github: https://github.com/DecodeGenetics/popSTR. CONTACT: snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.


Asunto(s)
Repeticiones de Microsatélite , Genotipo , Humanos , Programas Informáticos , Secuenciación Completa del Genoma
17.
Sci Rep ; 6: 36189, 2016 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-27811963

RESUMEN

Only a few common variants in the sequence of the genome have been shown to impact cognitive traits. Here we demonstrate that polygenic scores of educational attainment predict specific aspects of childhood cognition, as measured with IQ. Recently, three sequence variants were shown to associate with educational attainment, a confluence phenotype of genetic and environmental factors contributing to academic success. We show that one of these variants associating with educational attainment, rs4851266-T, also associates with Verbal IQ in dyslexic children (P = 4.3 × 10-4, ß = 0.16 s.d.). The effect of 0.16 s.d. corresponds to 1.4 IQ points for heterozygotes and 2.8 IQ points for homozygotes. We verified this association in independent samples consisting of adults (P = 8.3 × 10-5, ß = 0.12 s.d., combined P = 2.2 x 10-7, ß = 0.14 s.d.). Childhood cognition is unlikely to be affected by education attained later in life, and the variant explains a greater fraction of the variance in verbal IQ than in educational attainment (0.7% vs 0.12%,. P = 1.0 × 10-5).


Asunto(s)
Cognición , Dislexia/genética , Inteligencia/genética , Polimorfismo de Nucleótido Simple , Éxito Académico , Adolescente , Adulto , Niño , Cromosomas Humanos Par 2/genética , Bases de Datos Genéticas , Escolaridad , Femenino , Marcadores Genéticos , Humanos , Islandia , Masculino , Herencia Multifactorial , Proteínas Nucleares/genética
18.
Nat Genet ; 48(11): 1377-1384, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27643539

RESUMEN

Meiotic recombination involves a combination of gene conversion and crossover events that, along with mutations, produce germline genetic diversity. Here we report the discovery of 3,176 SNP and 61 indel gene conversions. Our estimate of the non-crossover (NCO) gene conversion rate (G) is 7.0 for SNPs and 5.8 for indels per megabase per generation, and the GC bias is 67.6%. For indels, we demonstrate a 65.6% preference for the shorter allele. NCO gene conversions from mothers are longer than those from fathers, and G is 2.17 times greater in mothers. Notably, G increases with the age of mothers, but not the age of fathers. A disproportionate number of NCO gene conversions in older mothers occur outside double-strand break (DSB) regions and in regions with relatively low GC content. This points to age-related changes in the mechanisms of meiotic gene conversion in oocytes.


Asunto(s)
Conversión Génica , Meiosis , Adulto , Composición de Base , Niño , Femenino , Humanos , Masculino , Edad Materna , Polimorfismo de Nucleótido Simple , Caracteres Sexuales
19.
Bioinformatics ; 32(14): 2202-4, 2016 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153590

RESUMEN

UNLABELLED: Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10 kb genomic regions, eventually enabling the joint analysis of more than 10 000 individuals. AVAILABILITY AND IMPLEMENTATION: The software is implemented in C ++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAIContact:birte.kehr@decode.is.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Programas Informáticos , Humanos
20.
Hum Mol Genet ; 25(5): 1008-18, 2016 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-26740556

RESUMEN

Transcriptional and splicing anomalies have been observed in intron 8 of the CASP8 gene (encoding procaspase-8) in association with cutaneous basal-cell carcinoma (BCC) and linked to a germline SNP rs700635. Here, we show that the rs700635[C] allele, which is associated with increased risk of BCC and breast cancer, is protective against prostate cancer [odds ratio (OR) = 0.91, P = 1.0 × 10(-6)]. rs700635[C] is also associated with failures to correctly splice out CASP8 intron 8 in breast and prostate tumours and in corresponding normal tissues. Investigation of rs700635[C] carriers revealed that they have a human-specific short interspersed element-variable number of tandem repeat-Alu (SINE-VNTR-Alu), subfamily-E retrotransposon (SVA-E) inserted into CASP8 intron 8. The SVA-E shows evidence of prior activity, because it has transduced some CASP8 sequences during subsequent retrotransposition events. Whole-genome sequence (WGS) data were used to tag the SVA-E with a surrogate SNP rs1035142[T] (r(2) = 0.999), which showed associations with both the splicing anomalies (P = 6.5 × 10(-32)) and with protection against prostate cancer (OR = 0.91, P = 3.8 × 10(-7)).


Asunto(s)
Neoplasias de la Mama/genética , Carcinoma Basocelular/genética , Caspasa 8/genética , Neoplasias de la Próstata/genética , Empalme del ARN , Retroelementos , Neoplasias Cutáneas/genética , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Secuencia de Bases , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patología , Caspasa 8/metabolismo , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Intrones , Masculino , Persona de Mediana Edad , Datos de Secuencia Molecular , Oportunidad Relativa , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , Neoplasias de la Próstata/prevención & control , Factores Protectores , Neoplasias Cutáneas/metabolismo , Neoplasias Cutáneas/patología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...