Búsqueda | Portal Regional de la BVS

Improving population scale statistical phasing with whole-genome sequencing data.

Wertenbroek, Rick; Hofmeister, Robin J; Xenarios, Ioannis; Thoma, Yann; Delaneau, Olivier.

PLoS Genet ; 20(7): e1011092, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38959269

RESUMEN

Haplotype estimation, or phasing, has gained significant traction in large-scale projects due to its valuable contributions to population genetics, variant analysis, and the creation of reference panels for imputation and phasing of new samples. To scale with the growing number of samples, haplotype estimation methods designed for population scale rely on highly optimized statistical models to phase genotype data, and usually ignore read-level information. Statistical methods excel in resolving common variants, however, they still struggle at rare variants due to the lack of statistical information. In this study we introduce SAPPHIRE, a new method that leverages whole-genome sequencing data to enhance the precision of haplotype calls produced by statistical phasing. SAPPHIRE achieves this by refining haplotype estimates through the realignment of sequencing reads, particularly targeting low-confidence phase calls. Our findings demonstrate that SAPPHIRE significantly enhances the accuracy of haplotypes obtained from state of the art methods and also provides the subset of phase calls that are validated by sequencing reads. Finally, we show that our method scales to large data sets by its successful application to the extensive 3.6 Petabytes of sequencing data of the last UK Biobank 200,031 sample release.

Asunto(s)

Genética de Población , Haplotipos , Secuenciación Completa del Genoma , Secuenciación Completa del Genoma/métodos , Humanos , Genética de Población/métodos , Genoma Humano , Polimorfismo de Nucleótido Simple/genética , Estudio de Asociación del Genoma Completo/métodos , Algoritmos

Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank.

Hofmeister, Robin J; Ribeiro, Diogo M; Rubinacci, Simone; Delaneau, Olivier.

Nat Genet ; 55(7): 1243-1249, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37386248

RESUMEN

Phasing involves distinguishing the two parentally inherited copies of each chromosome into haplotypes. Here, we introduce SHAPEIT5, a new phasing method that quickly and accurately processes large sequencing datasets and applied it to UK Biobank (UKB) whole-genome and whole-exome sequencing data. We demonstrate that SHAPEIT5 phases rare variants with low switch error rates of below 5% for variants present in just 1 sample out of 100,000. Furthermore, we outline a method for phasing singletons, which, although less precise, constitutes an important step towards future developments. We then demonstrate that the use of UKB as a reference panel improves the accuracy of genotype imputation, which is even more pronounced when phased with SHAPEIT5 compared with other methods. Finally, we screen the UKB data for loss-of-function compound heterozygous events and identify 549 genes where both gene copies are knocked out. These genes complement current knowledge of gene essentiality in the human genome.

Asunto(s)

Bancos de Muestras Biológicas , Genoma Humano , Humanos , Secuenciación del Exoma , Análisis de Secuencia de ADN/métodos , Genotipo , Haplotipos , Genoma Humano/genética , Reino Unido , Polimorfismo de Nucleótido Simple/genética

Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes.

Rubinacci, Simone; Hofmeister, Robin J; Sousa da Mota, Bárbara; Delaneau, Olivier.

Nat Genet ; 55(7): 1088-1090, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37386250

RESUMEN

The release of 150,119 UK Biobank sequences represents an unprecedented opportunity as a reference panel to impute low-coverage whole-genome sequencing data with high accuracy but current methods cannot cope with the size of the data. Here we introduce GLIMPSE2, a low-coverage whole-genome sequencing imputation method that scales sublinearly in both the number of samples and markers, achieving efficient whole-genome imputation from the UK Biobank reference panel while retaining high accuracy for ancient and modern genomes, particularly at rare variants and for very low-coverage samples.

Asunto(s)

Bancos de Muestras Biológicas , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes , Polimorfismo de Nucleótido Simple/genética , Genoma , Reino Unido , Genotipo

Parent-of-Origin inference for biobanks.

Hofmeister, Robin J; Rubinacci, Simone; Ribeiro, Diogo M; Buil, Alfonso; Kutalik, Zoltán; Delaneau, Olivier.

Nat Commun ; 13(1): 6668, 2022 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-36335127

RESUMEN

Identical genetic variations can have different phenotypic effects depending on their parent of origin. Yet, studies focusing on parent-of-origin effects have been limited in terms of sample size due to the lack of parental genomes or known genealogies. We propose a probabilistic approach to infer the parent-of-origin of individual alleles that does not require parental genomes nor prior knowledge of genealogy. Our model uses Identity-By-Descent sharing with second- and third-degree relatives to assign alleles to parental groups and leverages chromosome X data in males to distinguish maternal from paternal groups. We combine this with robust haplotype inference and haploid imputation to infer the parent-of-origin for 26,393 UK Biobank individuals. We screen 99 phenotypes for parent-of-origin effects and replicate the discoveries of 6 GWAS studies, confirming signals on body mass index, type 2 diabetes, standing height and multiple blood biomarkers, including the known maternal effect at the MEG3/DLK1 locus on platelet phenotypes. We also report a novel maternal effect at the TERT gene on telomere length, thereby providing new insights on the heritability of this phenotype. All our summary statistics are publicly available to help the community to better characterize the molecular mechanisms leading to parent-of-origin effects and their implications for human health.

Asunto(s)

Diabetes Mellitus Tipo 2 , Humanos , Masculino , Alelos , Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Fenotipo , Femenino

The molecular basis, genetic control and pleiotropic effects of local gene co-expression.

Ribeiro, Diogo M; Rubinacci, Simone; Ramisch, Anna; Hofmeister, Robin J; Dermitzakis, Emmanouil T; Delaneau, Olivier.

Nat Commun ; 12(1): 4842, 2021 08 10.

Artículo en Inglés | MEDLINE | ID: mdl-34376650

RESUMEN

Nearby genes are often expressed as a group. Yet, the prevalence, molecular mechanisms and genetic control of local gene co-expression are far from being understood. Here, by leveraging gene expression measurements across 49 human tissues and hundreds of individuals, we find that local gene co-expression occurs in 13% to 53% of genes per tissue. By integrating various molecular assays (e.g. ChIP-seq and Hi-C), we estimate the ability of several mechanisms, such as enhancer-gene interactions, in distinguishing gene pairs that are co-expressed from those that are not. Notably, we identify 32,636 expression quantitative trait loci (eQTLs) which associate with co-expressed gene pairs and often overlap enhancer regions. Due to affecting several genes, these eQTLs are more often associated with multiple human traits than other eQTLs. Our study paves the way to comprehend trait pleiotropy and functional interpretation of QTL and GWAS findings. All local gene co-expression identified here is available through a public database ( https://glcoex.unil.ch/ ).

Asunto(s)

Regulación de la Expresión Génica , Pleiotropía Genética/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo/genética , Sitios de Unión/genética , Ontología de Genes , Estudios de Asociación Genética/métodos , Humanos , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo

Publisher Correction: Efficient phasing and imputation of low-coverage sequencing data using large reference panels.

Rubinacci, Simone; Ribeiro, Diogo M; Hofmeister, Robin J; Delaneau, Olivier.

Nat Genet ; 53(3): 412, 2021 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-33473199

Efficient phasing and imputation of low-coverage sequencing data using large reference panels.

Rubinacci, Simone; Ribeiro, Diogo M; Hofmeister, Robin J; Delaneau, Olivier.

Nat Genet ; 53(1): 120-126, 2021 01.

Artículo en Inglés | MEDLINE | ID: mdl-33414550

RESUMEN

Low-coverage whole-genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined because current imputation methods are computationally expensive and unable to leverage large reference panels. Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. GLIMPSE achieves imputation of a genome for less than US$1 in computational cost, considerably outperforming other methods and improving imputation accuracy over the full allele frequency range. As a proof of concept, we show that 1× coverage enables effective gene expression association studies and outperforms dense SNP arrays in rare variant burden tests. Overall, this study illustrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.

Asunto(s)

Análisis de Secuencia de ADN , Genoma Humano , Genotipo , Humanos , Funciones de Verosimilitud , Polimorfismo de Nucleótido Simple/genética , Estándares de Referencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA