Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 38(3): 604-611, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34726732

RESUMO

MOTIVATION: With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across tens of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared with other types of SVs due to the computational complexity of detecting them. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have limited scalability to larger sets of genomes. RESULTS: We introduce PopIns2, a tool to discover and characterize NRS variants in many genomes, which scales to considerably larger numbers of genomes than its predecessor PopIns. In this article, we briefly outline the PopIns2 workflow and highlight our novel algorithmic contributions. We developed an entirely new approach for merging contig assemblies of unaligned reads from many genomes into a single set of NRS using a colored de Bruijn graph. Our tests on simulated data indicate that the new merging algorithm ranks among the best approaches in terms of quality and reliability and that PopIns2 shows the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets. AVAILABILITY AND IMPLEMENTATION: The source code of PopIns2 is available from https://github.com/kehrlab/PopIns2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Humanos , Análise de Sequência de DNA/métodos , Reprodutibilidade dos Testes , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
Datenbank Spektrum ; 21(3): 255-260, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34786019

RESUMO

Today's scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 "FONDA -- Foundations of Workflows for Large-Scale Scientific Data Analysis", in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA's internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the "making of" a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.

3.
Bioinformatics ; 37(19): 3128-3135, 2021 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-33830196

RESUMO

MOTIVATION: Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM's ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a priori. So far, however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. RESULTS: We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimize phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. AVAILABILITY AND IMPLEMENTATION: GAMIBHEAR is available as an R package under the open-source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nat Commun ; 12(1): 730, 2021 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-33526789

RESUMO

Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel's running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies.


Assuntos
Genoma Humano/genética , Variação Estrutural do Genoma , Metagenômica/métodos , Deleção de Sequência , Estudos de Viabilidade , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Padrões de Herança , Masculino , Reprodutibilidade dos Testes , Análise de Sequência de DNA
5.
Circ Genom Precis Med ; 14(1): e003029, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33315477

RESUMO

BACKGROUND: Loss-of-function mutations in the LDL (low-density lipoprotein) receptor gene (LDLR) cause elevated levels of LDL cholesterol and premature cardiovascular disease. To date, a gain-of-function mutation in LDLR with a large effect on LDL cholesterol levels has not been described. Here, we searched for sequence variants in LDLR that have a large effect on LDL cholesterol levels. METHODS: We analyzed whole-genome sequencing data from 43 202 Icelanders. Single-nucleotide polymorphisms and structural variants including deletions, insertions, and duplications were genotyped using whole-genome sequencing-based data. LDL cholesterol associations were carried out in a sample of >100 000 Icelanders with genetic information (imputed or whole-genome sequencing). Molecular analyses were performed using RNA sequencing and protein expression assays in Epstein-Barr virus-transformed lymphocytes. RESULTS: We discovered a 2.5-kb deletion (del2.5) overlapping the 3' untranslated region of LDLR in 7 heterozygous carriers from a single family. Mean level of LDL cholesterol was 74% lower in del2.5 carriers than in 101 851 noncarriers, a difference of 2.48 mmol/L (96 mg/dL; P=8.4×10-8). Del2.5 results in production of an alternative mRNA isoform with a truncated 3' untranslated region. The truncation leads to a loss of target sites for microRNAs known to repress translation of LDLR. In Epstein-Barr virus-transformed lymphocytes derived from del2.5 carriers, expression of alternative mRNA isoform was 1.84-fold higher than the wild-type isoform (P=0.0013), and there was 1.79-fold higher surface expression of the LDL receptor than in noncarriers (P=0.0086). We did not find a highly penetrant detrimental impact of lifelong very low levels of LDL cholesterol due to del2.5 on health of the carriers. CONCLUSIONS: Del2.5 is the first reported gain-of-function mutation in LDLR causing a large reduction in LDL cholesterol. These data point to a role for alternative polyadenylation of LDLR mRNA as a potent regulator of LDL receptor expression in humans.


Assuntos
LDL-Colesterol/sangue , Receptores de LDL/genética , Regiões 3' não Traduzidas , Processamento Alternativo , Mutação com Ganho de Função , Deleção de Genes , Vetores Genéticos/genética , Vetores Genéticos/metabolismo , Herpesvirus Humano 4/genética , Heterozigoto , Humanos , Hiperlipoproteinemia Tipo II/genética , Hiperlipoproteinemia Tipo II/patologia , Islândia , Linfócitos/citologia , Linfócitos/metabolismo , MicroRNAs/metabolismo , Linhagem , Isoformas de Proteínas/genética , RNA Mensageiro/metabolismo
6.
Med Genet ; 33(2): 133-145, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38836034

RESUMO

High-throughput sequencing techniques have significantly increased the molecular diagnosis rate for patients with monogenic disorders. This is primarily due to a substantially increased identification rate of disease mutations in the coding sequence, primarily SNVs and indels. Further progress is hampered by difficulties in the detection of structural variants and the interpretation of variants outside the coding sequence. In this review, we provide an overview about how novel sequencing techniques and state-of-the-art algorithms can be used to discover small and structural variants across the whole genome and introduce bioinformatic tools for the prediction of effects variants may have in the non-coding part of the genome.

7.
Hum Mol Genet ; 28(7): 1199-1211, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30476138

RESUMO

Urine dipstick tests are widely used in routine medical care to diagnose kidney and urinary tract and metabolic diseases. Several environmental factors are known to affect the test results, whereas the effects of genetic diversity are largely unknown. We tested 32.5 million sequence variants for association with urinary biomarkers in a set of 150 274 Icelanders with urine dipstick measurements. We detected 20 association signals, of which 14 are novel, associating with at least one of five clinical entities defined by the urine dipstick: glucosuria, ketonuria, proteinuria, hematuria and urine pH. These include three independent glucosuria variants at SLC5A2, the gene encoding the sodium-dependent glucose transporter (SGLT2), a protein targeted pharmacologically to increase urinary glucose excretion in the treatment of diabetes. Two variants associating with proteinuria are in LRP2 and CUBN, encoding the co-transporters megalin and cubilin, respectively, that mediate proximal tubule protein uptake. One of the hematuria-associated variants is a rare, previously unreported 2.5 kb exonic deletion in COL4A3. Of the four signals associated with urine pH, we note that the pH-increasing alleles of two variants (POU2AF1, WDR72) associate significantly with increased risk of kidney stones. Our results reveal that genetic factors affect variability in urinary biomarkers, in both a disease dependent and independent context.


Assuntos
Biomarcadores/análise , Biomarcadores/urina , Variação Genética/genética , Adulto , Idoso , Alelos , Feminino , Hematúria/genética , Hematúria/urina , Humanos , Concentração de Íons de Hidrogênio , Islândia , Cetose/genética , Cetose/urina , Rim/metabolismo , Masculino , Pessoa de Meia-Idade , Proteinúria/genética , Proteinúria/urina , Transportador 2 de Glucose-Sódio/genética , Sequenciamento Completo do Genoma/métodos
8.
Nat Genet ; 50(12): 1674-1680, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30397338

RESUMO

De novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes.


Assuntos
Família , Padrões de Herança , Mutação , Relações Pais-Filho , Adulto , Criança , Células Germinativas Embrionárias/metabolismo , Características da Família , Feminino , Mutação em Linhagem Germinativa , Humanos , Padrões de Herança/genética , Masculino , Mosaicismo , Linhagem
9.
Nat Genet ; 50(11): 1616, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30237445

RESUMO

In the version of this article published, statements about the impact of insertions and deletions on gene conversions were incorrect. We reported a bias toward deletions, whereas in fact the bias was toward insertions. We are deeply indebted to Laurent Duret and Brice Letcher for noticing this mistake in our manuscript. The following statements are incorrect in the published manuscript.

10.
Sci Data ; 4: 170115, 2017 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-28933420

RESUMO

Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.


Assuntos
Genoma Humano , Humanos , Mutação INDEL , Islândia , Polimorfismo de Nucleotídeo Único
11.
Nat Genet ; 49(11): 1654-1660, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28945251

RESUMO

A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.


Assuntos
Algoritmos , Genoma Humano , Técnicas de Genotipagem/instrumentação , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/estatística & dados numéricos , Alelos , Sequência de Bases , Gráficos por Computador , Antígenos HLA/genética , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software
12.
Nature ; 549(7673): 519-522, 2017 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-28959963

RESUMO

The characterization of mutational processes that generate sequence diversity in the human genome is of paramount importance both to medical genetics and to evolutionary studies. To understand how the age and sex of transmitting parents affect de novo mutations, here we sequence 1,548 Icelanders, their parents, and, for a subset of 225, at least one child, to 35× genome-wide coverage. We find 108,778 de novo mutations, both single nucleotide polymorphisms and indels, and determine the parent of origin of 42,961. The number of de novo mutations from mothers increases by 0.37 per year of age (95% CI 0.32-0.43), a quarter of the 1.51 per year from fathers (95% CI 1.45-1.57). The number of clustered mutations increases faster with the mother's age than with the father's, and the genomic span of maternal de novo mutation clusters is greater than that of paternal ones. The types of de novo mutation from mothers change substantially with age, with a 0.26% (95% CI 0.19-0.33%) decrease in cytosine-phosphate-guanine to thymine-phosphate-guanine (CpG>TpG) de novo mutations and a 0.33% (95% CI 0.28-0.38%) increase in C>G de novo mutations per year, respectively. Remarkably, these age-related changes are not distributed uniformly across the genome. A striking example is a 20 megabase region on chromosome 8p, with a maternal C>G mutation rate that is up to 50-fold greater than the rest of the genome. The age-related accumulation of maternal non-crossover gene conversions also mostly occurs within these regions. Increased sequence diversity and linkage disequilibrium of C>G variants within regions affected by excess maternal mutations indicate that the underlying mutational process has persisted in humans for thousands of years. Moreover, the regional excess of C>G variation in humans is largely shared by chimpanzees, less by gorillas, and is almost absent from orangutans. This demonstrates that sequence diversity in humans results from evolving interactions between age, sex, mutation type, and genomic location.


Assuntos
Envelhecimento/genética , Mutação em Linhagem Germinativa/genética , Idade Materna , Mutagênese , Pais , Idade Paterna , Adolescente , Adulto , Idoso , Animais , Criança , Cromossomos Humanos Par 8/genética , Evolução Molecular , Feminino , Sequência Rica em GC , Genoma Humano/genética , Gorilla gorilla/genética , Humanos , Mutação INDEL , Islândia , Desequilíbrio de Ligação/genética , Masculino , Pessoa de Meia-Idade , Taxa de Mutação , Pan troglodytes/genética , Polimorfismo de Nucleotídeo Único , Pongo/genética , Adulto Jovem
13.
Nat Commun ; 8: 14755, 2017 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-28466842

RESUMO

Lynch syndrome, caused by germline mutations in the mismatch repair genes, is associated with increased cancer risk. Here using a large whole-genome sequencing data bank, cancer registry and colorectal tumour bank we determine the prevalence of Lynch syndrome, associated cancer risks and pathogenicity of several variants in the Icelandic population. We use colorectal cancer samples from 1,182 patients diagnosed between 2000-2009. One-hundred and thirty-two (11.2%) tumours are mismatch repair deficient per immunohistochemistry. Twenty-one (1.8%) have Lynch syndrome while 106 (9.0%) have somatic hypermethylation or mutations in the mismatch repair genes. The population prevalence of Lynch syndrome is 0.442%. We discover a translocation disrupting MLH1 and three mutations in MSH6 and PMS2 that increase endometrial, colorectal, brain and ovarian cancer risk. We find thirteen mismatch repair variants of uncertain significance that are not associated with cancer risk. We find that founder mutations in MSH6 and PMS2 prevail in Iceland unlike most other populations.


Assuntos
Neoplasias Colorretais Hereditárias sem Polipose/genética , Proteínas de Ligação a DNA/genética , Efeito Fundador , Mutação em Linhagem Germinativa , Endonuclease PMS2 de Reparo de Erro de Pareamento/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Pareamento Incorreto de Bases , Neoplasias Colorretais Hereditárias sem Polipose/epidemiologia , Feminino , Predisposição Genética para Doença , Humanos , Islândia/epidemiologia , Masculino , Pessoa de Meia-Idade , Prevalência
14.
Hum Mol Genet ; 26(12): 2364-2376, 2017 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-28398513

RESUMO

Common sequence variants at the haptoglobin gene (HP) have been associated with blood lipid levels. Through whole-genome sequencing of 8,453 Icelanders, we discovered a splice donor founder mutation in HP (NM_001126102.1:c.190 + 1G > C, minor allele frequency = 0.56%). This mutation occurs on the HP1 allele of the common copy number variant in HP and leads to a loss of function of HP1. It associates with lower levels of haptoglobin (P = 2.1 × 10-54), higher levels of non-high density lipoprotein cholesterol (ß = 0.26 mmol/l, P = 2.6 × 10-9) and greater risk of coronary artery disease (odds ratio = 1.30, 95% confidence interval: 1.10-1.54, P = 0.0024). Through haplotype analysis and with RNA sequencing, we provide evidence of a causal relationship between one of the two haptoglobin isoforms, namely Hp1, and lower levels of non-HDL cholesterol. Furthermore, we show that the HP1 allele associates with various other quantitative biological traits.


Assuntos
Doença da Artéria Coronariana/genética , Haptoglobinas/genética , Adulto , Alelos , Sequência de Bases , Doença da Artéria Coronariana/metabolismo , Variações do Número de Cópias de DNA/genética , Feminino , Frequência do Gene/genética , Estudos de Associação Genética/métodos , Variação Genética , Haptoglobinas/metabolismo , Humanos , Islândia , Lipídeos/sangue , Lipídeos/genética , Lipoproteínas/genética , Masculino , Mutação , Razão de Chances , Sítios de Splice de RNA/genética , Fatores de Risco
15.
Nat Genet ; 49(4): 588-593, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28250455

RESUMO

Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.


Assuntos
Sequência de Bases/genética , Variação Genética/genética , Genoma Humano/genética , Animais , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Desequilíbrio de Ligação/genética , Infarto do Miocárdio/genética , Pan paniscus/genética , Fenótipo
16.
Bioinformatics ; 33(24): 4041-4048, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-27591079

RESUMO

MOTIVATION: Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them. RESULTS: Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes. AVAILABILITY AND IMPLEMENTATION: Source code is available on Github: https://github.com/DecodeGenetics/popSTR. CONTACT: snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.


Assuntos
Repetições de Microssatélites , Genótipo , Humanos , Software , Sequenciamento Completo do Genoma
17.
Sci Rep ; 6: 36189, 2016 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-27811963

RESUMO

Only a few common variants in the sequence of the genome have been shown to impact cognitive traits. Here we demonstrate that polygenic scores of educational attainment predict specific aspects of childhood cognition, as measured with IQ. Recently, three sequence variants were shown to associate with educational attainment, a confluence phenotype of genetic and environmental factors contributing to academic success. We show that one of these variants associating with educational attainment, rs4851266-T, also associates with Verbal IQ in dyslexic children (P = 4.3 × 10-4, ß = 0.16 s.d.). The effect of 0.16 s.d. corresponds to 1.4 IQ points for heterozygotes and 2.8 IQ points for homozygotes. We verified this association in independent samples consisting of adults (P = 8.3 × 10-5, ß = 0.12 s.d., combined P = 2.2 x 10-7, ß = 0.14 s.d.). Childhood cognition is unlikely to be affected by education attained later in life, and the variant explains a greater fraction of the variance in verbal IQ than in educational attainment (0.7% vs 0.12%,. P = 1.0 × 10-5).


Assuntos
Cognição , Dislexia/genética , Inteligência/genética , Polimorfismo de Nucleotídeo Único , Sucesso Acadêmico , Adolescente , Adulto , Criança , Cromossomos Humanos Par 2/genética , Bases de Dados Genéticas , Escolaridade , Feminino , Marcadores Genéticos , Humanos , Islândia , Masculino , Herança Multifatorial , Proteínas Nucleares/genética
18.
Nat Genet ; 48(11): 1377-1384, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27643539

RESUMO

Meiotic recombination involves a combination of gene conversion and crossover events that, along with mutations, produce germline genetic diversity. Here we report the discovery of 3,176 SNP and 61 indel gene conversions. Our estimate of the non-crossover (NCO) gene conversion rate (G) is 7.0 for SNPs and 5.8 for indels per megabase per generation, and the GC bias is 67.6%. For indels, we demonstrate a 65.6% preference for the shorter allele. NCO gene conversions from mothers are longer than those from fathers, and G is 2.17 times greater in mothers. Notably, G increases with the age of mothers, but not the age of fathers. A disproportionate number of NCO gene conversions in older mothers occur outside double-strand break (DSB) regions and in regions with relatively low GC content. This points to age-related changes in the mechanisms of meiotic gene conversion in oocytes.


Assuntos
Conversão Gênica , Meiose , Adulto , Composição de Bases , Criança , Feminino , Humanos , Masculino , Idade Materna , Polimorfismo de Nucleotídeo Único , Caracteres Sexuais
19.
Bioinformatics ; 32(14): 2202-4, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153590

RESUMO

UNLABELLED: Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10 kb genomic regions, eventually enabling the joint analysis of more than 10 000 individuals. AVAILABILITY AND IMPLEMENTATION: The software is implemented in C ++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAIContact:birte.kehr@decode.is.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Humanos
20.
Hum Mol Genet ; 25(5): 1008-18, 2016 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-26740556

RESUMO

Transcriptional and splicing anomalies have been observed in intron 8 of the CASP8 gene (encoding procaspase-8) in association with cutaneous basal-cell carcinoma (BCC) and linked to a germline SNP rs700635. Here, we show that the rs700635[C] allele, which is associated with increased risk of BCC and breast cancer, is protective against prostate cancer [odds ratio (OR) = 0.91, P = 1.0 × 10(-6)]. rs700635[C] is also associated with failures to correctly splice out CASP8 intron 8 in breast and prostate tumours and in corresponding normal tissues. Investigation of rs700635[C] carriers revealed that they have a human-specific short interspersed element-variable number of tandem repeat-Alu (SINE-VNTR-Alu), subfamily-E retrotransposon (SVA-E) inserted into CASP8 intron 8. The SVA-E shows evidence of prior activity, because it has transduced some CASP8 sequences during subsequent retrotransposition events. Whole-genome sequence (WGS) data were used to tag the SVA-E with a surrogate SNP rs1035142[T] (r(2) = 0.999), which showed associations with both the splicing anomalies (P = 6.5 × 10(-32)) and with protection against prostate cancer (OR = 0.91, P = 3.8 × 10(-7)).


Assuntos
Neoplasias da Mama/genética , Carcinoma Basocelular/genética , Caspase 8/genética , Neoplasias da Próstata/genética , Splicing de RNA , Retroelementos , Neoplasias Cutâneas/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Sequência de Bases , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patologia , Caspase 8/metabolismo , Feminino , Estudo de Associação Genômica Ampla , Humanos , Íntrons , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Razão de Chances , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Neoplasias da Próstata/prevenção & controle , Fatores de Proteção , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...