Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 38(3): 604-611, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34726732

RESUMO

MOTIVATION: With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across tens of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared with other types of SVs due to the computational complexity of detecting them. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have limited scalability to larger sets of genomes. RESULTS: We introduce PopIns2, a tool to discover and characterize NRS variants in many genomes, which scales to considerably larger numbers of genomes than its predecessor PopIns. In this article, we briefly outline the PopIns2 workflow and highlight our novel algorithmic contributions. We developed an entirely new approach for merging contig assemblies of unaligned reads from many genomes into a single set of NRS using a colored de Bruijn graph. Our tests on simulated data indicate that the new merging algorithm ranks among the best approaches in terms of quality and reliability and that PopIns2 shows the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets. AVAILABILITY AND IMPLEMENTATION: The source code of PopIns2 is available from https://github.com/kehrlab/PopIns2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Humanos , Análise de Sequência de DNA/métodos , Reprodutibilidade dos Testes , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
Nature ; 549(7673): 519-522, 2017 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-28959963

RESUMO

The characterization of mutational processes that generate sequence diversity in the human genome is of paramount importance both to medical genetics and to evolutionary studies. To understand how the age and sex of transmitting parents affect de novo mutations, here we sequence 1,548 Icelanders, their parents, and, for a subset of 225, at least one child, to 35× genome-wide coverage. We find 108,778 de novo mutations, both single nucleotide polymorphisms and indels, and determine the parent of origin of 42,961. The number of de novo mutations from mothers increases by 0.37 per year of age (95% CI 0.32-0.43), a quarter of the 1.51 per year from fathers (95% CI 1.45-1.57). The number of clustered mutations increases faster with the mother's age than with the father's, and the genomic span of maternal de novo mutation clusters is greater than that of paternal ones. The types of de novo mutation from mothers change substantially with age, with a 0.26% (95% CI 0.19-0.33%) decrease in cytosine-phosphate-guanine to thymine-phosphate-guanine (CpG>TpG) de novo mutations and a 0.33% (95% CI 0.28-0.38%) increase in C>G de novo mutations per year, respectively. Remarkably, these age-related changes are not distributed uniformly across the genome. A striking example is a 20 megabase region on chromosome 8p, with a maternal C>G mutation rate that is up to 50-fold greater than the rest of the genome. The age-related accumulation of maternal non-crossover gene conversions also mostly occurs within these regions. Increased sequence diversity and linkage disequilibrium of C>G variants within regions affected by excess maternal mutations indicate that the underlying mutational process has persisted in humans for thousands of years. Moreover, the regional excess of C>G variation in humans is largely shared by chimpanzees, less by gorillas, and is almost absent from orangutans. This demonstrates that sequence diversity in humans results from evolving interactions between age, sex, mutation type, and genomic location.


Assuntos
Envelhecimento/genética , Mutação em Linhagem Germinativa/genética , Idade Materna , Mutagênese , Pais , Idade Paterna , Adolescente , Adulto , Idoso , Animais , Criança , Cromossomos Humanos Par 8/genética , Evolução Molecular , Feminino , Sequência Rica em GC , Genoma Humano/genética , Gorilla gorilla/genética , Humanos , Mutação INDEL , Islândia , Desequilíbrio de Ligação/genética , Masculino , Pessoa de Meia-Idade , Taxa de Mutação , Pan troglodytes/genética , Polimorfismo de Nucleotídeo Único , Pongo/genética , Adulto Jovem
3.
Bioinformatics ; 37(19): 3128-3135, 2021 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-33830196

RESUMO

MOTIVATION: Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM's ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a priori. So far, however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. RESULTS: We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimize phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. AVAILABILITY AND IMPLEMENTATION: GAMIBHEAR is available as an R package under the open-source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Hum Mol Genet ; 28(7): 1199-1211, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30476138

RESUMO

Urine dipstick tests are widely used in routine medical care to diagnose kidney and urinary tract and metabolic diseases. Several environmental factors are known to affect the test results, whereas the effects of genetic diversity are largely unknown. We tested 32.5 million sequence variants for association with urinary biomarkers in a set of 150 274 Icelanders with urine dipstick measurements. We detected 20 association signals, of which 14 are novel, associating with at least one of five clinical entities defined by the urine dipstick: glucosuria, ketonuria, proteinuria, hematuria and urine pH. These include three independent glucosuria variants at SLC5A2, the gene encoding the sodium-dependent glucose transporter (SGLT2), a protein targeted pharmacologically to increase urinary glucose excretion in the treatment of diabetes. Two variants associating with proteinuria are in LRP2 and CUBN, encoding the co-transporters megalin and cubilin, respectively, that mediate proximal tubule protein uptake. One of the hematuria-associated variants is a rare, previously unreported 2.5 kb exonic deletion in COL4A3. Of the four signals associated with urine pH, we note that the pH-increasing alleles of two variants (POU2AF1, WDR72) associate significantly with increased risk of kidney stones. Our results reveal that genetic factors affect variability in urinary biomarkers, in both a disease dependent and independent context.


Assuntos
Biomarcadores/análise , Biomarcadores/urina , Variação Genética/genética , Adulto , Idoso , Alelos , Feminino , Hematúria/genética , Hematúria/urina , Humanos , Concentração de Íons de Hidrogênio , Islândia , Cetose/genética , Cetose/urina , Rim/metabolismo , Masculino , Pessoa de Meia-Idade , Proteinúria/genética , Proteinúria/urina , Transportador 2 de Glucose-Sódio/genética , Sequenciamento Completo do Genoma/métodos
5.
Hum Mol Genet ; 26(12): 2364-2376, 2017 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-28398513

RESUMO

Common sequence variants at the haptoglobin gene (HP) have been associated with blood lipid levels. Through whole-genome sequencing of 8,453 Icelanders, we discovered a splice donor founder mutation in HP (NM_001126102.1:c.190 + 1G > C, minor allele frequency = 0.56%). This mutation occurs on the HP1 allele of the common copy number variant in HP and leads to a loss of function of HP1. It associates with lower levels of haptoglobin (P = 2.1 × 10-54), higher levels of non-high density lipoprotein cholesterol (ß = 0.26 mmol/l, P = 2.6 × 10-9) and greater risk of coronary artery disease (odds ratio = 1.30, 95% confidence interval: 1.10-1.54, P = 0.0024). Through haplotype analysis and with RNA sequencing, we provide evidence of a causal relationship between one of the two haptoglobin isoforms, namely Hp1, and lower levels of non-HDL cholesterol. Furthermore, we show that the HP1 allele associates with various other quantitative biological traits.


Assuntos
Doença da Artéria Coronariana/genética , Haptoglobinas/genética , Adulto , Alelos , Sequência de Bases , Doença da Artéria Coronariana/metabolismo , Variações do Número de Cópias de DNA/genética , Feminino , Frequência do Gene/genética , Estudos de Associação Genética/métodos , Variação Genética , Haptoglobinas/metabolismo , Humanos , Islândia , Lipídeos/sangue , Lipídeos/genética , Lipoproteínas/genética , Masculino , Mutação , Razão de Chances , Sítios de Splice de RNA/genética , Fatores de Risco
6.
Hum Mol Genet ; 25(5): 1008-18, 2016 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-26740556

RESUMO

Transcriptional and splicing anomalies have been observed in intron 8 of the CASP8 gene (encoding procaspase-8) in association with cutaneous basal-cell carcinoma (BCC) and linked to a germline SNP rs700635. Here, we show that the rs700635[C] allele, which is associated with increased risk of BCC and breast cancer, is protective against prostate cancer [odds ratio (OR) = 0.91, P = 1.0 × 10(-6)]. rs700635[C] is also associated with failures to correctly splice out CASP8 intron 8 in breast and prostate tumours and in corresponding normal tissues. Investigation of rs700635[C] carriers revealed that they have a human-specific short interspersed element-variable number of tandem repeat-Alu (SINE-VNTR-Alu), subfamily-E retrotransposon (SVA-E) inserted into CASP8 intron 8. The SVA-E shows evidence of prior activity, because it has transduced some CASP8 sequences during subsequent retrotransposition events. Whole-genome sequence (WGS) data were used to tag the SVA-E with a surrogate SNP rs1035142[T] (r(2) = 0.999), which showed associations with both the splicing anomalies (P = 6.5 × 10(-32)) and with protection against prostate cancer (OR = 0.91, P = 3.8 × 10(-7)).


Assuntos
Neoplasias da Mama/genética , Carcinoma Basocelular/genética , Caspase 8/genética , Neoplasias da Próstata/genética , Splicing de RNA , Retroelementos , Neoplasias Cutâneas/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Sequência de Bases , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patologia , Caspase 8/metabolismo , Feminino , Estudo de Associação Genômica Ampla , Humanos , Íntrons , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Razão de Chances , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Neoplasias da Próstata/prevenção & controle , Fatores de Proteção , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia
7.
Bioinformatics ; 33(24): 4041-4048, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-27591079

RESUMO

MOTIVATION: Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them. RESULTS: Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes. AVAILABILITY AND IMPLEMENTATION: Source code is available on Github: https://github.com/DecodeGenetics/popSTR. CONTACT: snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.


Assuntos
Repetições de Microssatélites , Genótipo , Humanos , Software , Sequenciamento Completo do Genoma
8.
Bioinformatics ; 32(14): 2202-4, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153590

RESUMO

UNLABELLED: Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10 kb genomic regions, eventually enabling the joint analysis of more than 10 000 individuals. AVAILABILITY AND IMPLEMENTATION: The software is implemented in C ++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAIContact:birte.kehr@decode.is.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Humanos
9.
Bioinformatics ; 32(7): 961-7, 2016 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-25926346

RESUMO

MOTIVATION: The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions. RESULTS: We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach. AVAILABILITY AND IMPLEMENTATION: The source code of PopIns is available from http://github.com/bkehr/popins CONTACT: birte.kehr@decode.is SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Variação Estrutural do Genoma , Humanos , Mutagênese Insercional , Reprodutibilidade dos Testes
10.
Bioinformatics ; 30(4): 540-8, 2014 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-24336806

RESUMO

MOTIVATION: Owing to recent advancements in high-throughput technologies, protein-protein interaction networks of more and more species become available in public databases. The question of how to identify functionally conserved proteins across species attracts a lot of attention in computational biology. Network alignments provide a systematic way to solve this problem. However, most existing alignment tools encounter limitations in tackling this problem. Therefore, the demand for faster and more efficient alignment tools is growing. RESULTS: We present a fast and accurate algorithm, NetCoffee, which allows to find a global alignment of multiple protein-protein interaction networks. NetCoffee searches for a global alignment by maximizing a target function using simulated annealing on a set of weighted bipartite graphs that are constructed using a triplet approach similar to T-Coffee. To assess its performance, NetCoffee was applied to four real datasets. Our results suggest that NetCoffee remedies several limitations of previous algorithms, outperforms all existing alignment tools in terms of speed and nevertheless identifies biologically meaningful alignments. AVAILABILITY: The source code and data are freely available for download under the GNU GPL v3 license at https://code.google.com/p/netcoffee/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Alinhamento de Sequência/métodos , Animais , Bactérias , Bases de Dados de Proteínas , Humanos , Modelos Biológicos , Software
11.
BMC Bioinformatics ; 15: 99, 2014 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-24712884

RESUMO

BACKGROUND: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. RESULTS: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. CONCLUSION: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Algoritmos , Gráficos por Computador , Genoma
12.
BMC Bioinformatics ; 12 Suppl 9: S15, 2011 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-22151882

RESUMO

BACKGROUND: Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. RESULTS: We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. CONCLUSIONS: STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Animais , Drosophila/genética
13.
Med Genet ; 33(2): 133-145, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38836034

RESUMO

High-throughput sequencing techniques have significantly increased the molecular diagnosis rate for patients with monogenic disorders. This is primarily due to a substantially increased identification rate of disease mutations in the coding sequence, primarily SNVs and indels. Further progress is hampered by difficulties in the detection of structural variants and the interpretation of variants outside the coding sequence. In this review, we provide an overview about how novel sequencing techniques and state-of-the-art algorithms can be used to discover small and structural variants across the whole genome and introduce bioinformatic tools for the prediction of effects variants may have in the non-coding part of the genome.

14.
Nat Commun ; 12(1): 730, 2021 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-33526789

RESUMO

Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel's running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies.


Assuntos
Genoma Humano/genética , Variação Estrutural do Genoma , Metagenômica/métodos , Deleção de Sequência , Estudos de Viabilidade , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Padrões de Herança , Masculino , Reprodutibilidade dos Testes , Análise de Sequência de DNA
15.
Datenbank Spektrum ; 21(3): 255-260, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34786019

RESUMO

Today's scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 "FONDA -- Foundations of Workflows for Large-Scale Scientific Data Analysis", in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA's internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the "making of" a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.

16.
Circ Genom Precis Med ; 14(1): e003029, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33315477

RESUMO

BACKGROUND: Loss-of-function mutations in the LDL (low-density lipoprotein) receptor gene (LDLR) cause elevated levels of LDL cholesterol and premature cardiovascular disease. To date, a gain-of-function mutation in LDLR with a large effect on LDL cholesterol levels has not been described. Here, we searched for sequence variants in LDLR that have a large effect on LDL cholesterol levels. METHODS: We analyzed whole-genome sequencing data from 43 202 Icelanders. Single-nucleotide polymorphisms and structural variants including deletions, insertions, and duplications were genotyped using whole-genome sequencing-based data. LDL cholesterol associations were carried out in a sample of >100 000 Icelanders with genetic information (imputed or whole-genome sequencing). Molecular analyses were performed using RNA sequencing and protein expression assays in Epstein-Barr virus-transformed lymphocytes. RESULTS: We discovered a 2.5-kb deletion (del2.5) overlapping the 3' untranslated region of LDLR in 7 heterozygous carriers from a single family. Mean level of LDL cholesterol was 74% lower in del2.5 carriers than in 101 851 noncarriers, a difference of 2.48 mmol/L (96 mg/dL; P=8.4×10-8). Del2.5 results in production of an alternative mRNA isoform with a truncated 3' untranslated region. The truncation leads to a loss of target sites for microRNAs known to repress translation of LDLR. In Epstein-Barr virus-transformed lymphocytes derived from del2.5 carriers, expression of alternative mRNA isoform was 1.84-fold higher than the wild-type isoform (P=0.0013), and there was 1.79-fold higher surface expression of the LDL receptor than in noncarriers (P=0.0086). We did not find a highly penetrant detrimental impact of lifelong very low levels of LDL cholesterol due to del2.5 on health of the carriers. CONCLUSIONS: Del2.5 is the first reported gain-of-function mutation in LDLR causing a large reduction in LDL cholesterol. These data point to a role for alternative polyadenylation of LDLR mRNA as a potent regulator of LDL receptor expression in humans.


Assuntos
LDL-Colesterol/sangue , Receptores de LDL/genética , Regiões 3' não Traduzidas , Processamento Alternativo , Mutação com Ganho de Função , Deleção de Genes , Vetores Genéticos/genética , Vetores Genéticos/metabolismo , Herpesvirus Humano 4/genética , Heterozigoto , Humanos , Hiperlipoproteinemia Tipo II/genética , Hiperlipoproteinemia Tipo II/patologia , Islândia , Linfócitos/citologia , Linfócitos/metabolismo , MicroRNAs/metabolismo , Linhagem , Isoformas de Proteínas/genética , RNA Mensageiro/metabolismo
17.
Nat Genet ; 50(11): 1616, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30237445

RESUMO

In the version of this article published, statements about the impact of insertions and deletions on gene conversions were incorrect. We reported a bias toward deletions, whereas in fact the bias was toward insertions. We are deeply indebted to Laurent Duret and Brice Letcher for noticing this mistake in our manuscript. The following statements are incorrect in the published manuscript.

18.
Nat Genet ; 50(12): 1674-1680, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30397338

RESUMO

De novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes.


Assuntos
Família , Padrões de Herança , Mutação , Relações Pais-Filho , Adulto , Criança , Células Germinativas Embrionárias/metabolismo , Características da Família , Feminino , Mutação em Linhagem Germinativa , Humanos , Padrões de Herança/genética , Masculino , Mosaicismo , Linhagem
19.
Nat Genet ; 49(11): 1654-1660, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28945251

RESUMO

A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.


Assuntos
Algoritmos , Genoma Humano , Técnicas de Genotipagem/instrumentação , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/estatística & dados numéricos , Alelos , Sequência de Bases , Gráficos por Computador , Antígenos HLA/genética , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software
20.
Nat Genet ; 49(4): 588-593, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28250455

RESUMO

Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.


Assuntos
Sequência de Bases/genética , Variação Genética/genética , Genoma Humano/genética , Animais , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Desequilíbrio de Ligação/genética , Infarto do Miocárdio/genética , Pan paniscus/genética , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA