Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
medRxiv ; 2023 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-38076997

RESUMEN

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.

2.
Bioinformatics ; 38(22): 4999-5006, 2022 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-36130053

RESUMEN

MOTIVATION: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. RESULTS: We developed EagleImp, a software based on the methods used in the existing tools Eagle2 and PBWT, which allows accurate and accelerated phasing and imputation in a single tool by algorithmic and technical improvements and new features. We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with 1 million reference genomes. EagleImp was 2-30 times faster (depending on the single or multiprocessor configuration selected and the size of the reference panel) than Eagle2 combined with PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical genome-wide association studies, EagleImp provided same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. Additional features include automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files and various user-configurable algorithm and output options. Due to the technical optimizations, EagleImp can perform fast and accurate reference-based phasing and imputation and is ready for future large reference panels in the order of 1 million genomes. AVAILABILITY AND IMPLEMENTATION: EagleImp is implemented in C++ and freely available for download at https://github.com/ikmb/eagleimp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genoma , Estudio de Asociación del Genoma Completo/métodos , Haplotipos , Programas Informáticos , Genotipo , Polimorfismo de Nucleótido Simple
3.
Hum Mol Genet ; 31(23): 3945-3966, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-35848942

RESUMEN

Given the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), a deeper analysis of the host genetic contribution to severe COVID-19 is important to improve our understanding of underlying disease mechanisms. Here, we describe an extended genome-wide association meta-analysis of a well-characterized cohort of 3255 COVID-19 patients with respiratory failure and 12 488 population controls from Italy, Spain, Norway and Germany/Austria, including stratified analyses based on age, sex and disease severity, as well as targeted analyses of chromosome Y haplotypes, the human leukocyte antigen region and the SARS-CoV-2 peptidome. By inversion imputation, we traced a reported association at 17q21.31 to a ~0.9-Mb inversion polymorphism that creates two highly differentiated haplotypes and characterized the potential effects of the inversion in detail. Our data, together with the 5th release of summary statistics from the COVID-19 Host Genetics Initiative including non-Caucasian individuals, also identified a new locus at 19q13.33, including NAPSA, a gene which is expressed primarily in alveolar cells responsible for gas exchange in the lung.


Asunto(s)
COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Estudio de Asociación del Genoma Completo , Haplotipos , Polimorfismo Genético
4.
Gigascience ; 10(6)2021 06 29.
Artículo en Inglés | MEDLINE | ID: mdl-34184051

RESUMEN

BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.


Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Genoma , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Control de Calidad , Programas Informáticos
5.
Methods Mol Biol ; 2212: 17-35, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33733347

RESUMEN

We present SNPInt-GPU, a software providing several methods for statistical epistasis testing. SNPInt-GPU supports GPU acceleration using the Nvidia CUDA framework, but can also be used without GPU hardware. The software implements logistic regression (as in PLINK epistasis testing), BOOST, log-linear regression, mutual information (MI), and information gain (IG) for pairwise testing as well as mutual information and information gain for third-order tests. Optionally, r2 scores for testing for linkage disequilibrium (LD) can be calculated on-the-fly. SNPInt-GPU is publicly available at GitHub. The software requires a Linux-based operating system and CUDA libraries. This chapter describes detailed installation and usage instructions as well as examples for basic preliminary quality control and analysis of results.


Asunto(s)
Algoritmos , Curaduría de Datos/estadística & datos numéricos , Epistasis Genética , Programas Informáticos , Entropía , Humanos , Desequilibrio de Ligamiento , Modelos Logísticos , Control de Calidad
6.
N Engl J Med ; 383(16): 1522-1534, 2020 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-32558485

RESUMEN

BACKGROUND: There is considerable variation in disease behavior among patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019 (Covid-19). Genomewide association analysis may allow for the identification of potential genetic factors involved in the development of Covid-19. METHODS: We conducted a genomewide association study involving 1980 patients with Covid-19 and severe disease (defined as respiratory failure) at seven hospitals in the Italian and Spanish epicenters of the SARS-CoV-2 pandemic in Europe. After quality control and the exclusion of population outliers, 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain were included in the final analysis. In total, we analyzed 8,582,968 single-nucleotide polymorphisms and conducted a meta-analysis of the two case-control panels. RESULTS: We detected cross-replicating associations with rs11385942 at locus 3p21.31 and with rs657152 at locus 9q34.2, which were significant at the genomewide level (P<5×10-8) in the meta-analysis of the two case-control panels (odds ratio, 1.77; 95% confidence interval [CI], 1.48 to 2.11; P = 1.15×10-10; and odds ratio, 1.32; 95% CI, 1.20 to 1.47; P = 4.95×10-8, respectively). At locus 3p21.31, the association signal spanned the genes SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6 and XCR1. The association signal at locus 9q34.2 coincided with the ABO blood group locus; in this cohort, a blood-group-specific analysis showed a higher risk in blood group A than in other blood groups (odds ratio, 1.45; 95% CI, 1.20 to 1.75; P = 1.48×10-4) and a protective effect in blood group O as compared with other blood groups (odds ratio, 0.65; 95% CI, 0.53 to 0.79; P = 1.06×10-5). CONCLUSIONS: We identified a 3p21.31 gene cluster as a genetic susceptibility locus in patients with Covid-19 with respiratory failure and confirmed a potential involvement of the ABO blood-group system. (Funded by Stein Erik Hagen and others.).


Asunto(s)
Sistema del Grupo Sanguíneo ABO/genética , Betacoronavirus , Cromosomas Humanos Par 3/genética , Infecciones por Coronavirus/genética , Predisposición Genética a la Enfermedad , Neumonía Viral/genética , Polimorfismo de Nucleótido Simple , Insuficiencia Respiratoria/genética , Anciano , COVID-19 , Estudios de Casos y Controles , Cromosomas Humanos Par 9/genética , Infecciones por Coronavirus/complicaciones , Femenino , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Humanos , Italia , Masculino , Persona de Mediana Edad , Familia de Multigenes , Pandemias , Neumonía Viral/complicaciones , Insuficiencia Respiratoria/etiología , SARS-CoV-2 , España
7.
J Allergy Clin Immunol ; 145(4): 1208-1218, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31707051

RESUMEN

BACKGROUND: Fifteen percent of atopic dermatitis (AD) liability-scale heritability could be attributed to 31 susceptibility loci identified by using genome-wide association studies, with only 3 of them (IL13, IL-6 receptor [IL6R], and filaggrin [FLG]) resolved to protein-coding variants. OBJECTIVE: We examined whether a significant portion of unexplained AD heritability is further explained by low-frequency and rare variants in the gene-coding sequence. METHODS: We evaluated common, low-frequency, and rare protein-coding variants using exome chip and replication genotype data of 15,574 patients and 377,839 control subjects combined with whole-transcriptome data on lesional, nonlesional, and healthy skin samples of 27 patients and 38 control subjects. RESULTS: An additional 12.56% (SE, 0.74%) of AD heritability is explained by rare protein-coding variation. We identified docking protein 2 (DOK2) and CD200 receptor 1 (CD200R1) as novel genome-wide significant susceptibility genes. Rare coding variants associated with AD are further enriched in 5 genes (IL-4 receptor [IL4R], IL13, Janus kinase 1 [JAK1], JAK2, and tyrosine kinase 2 [TYK2]) of the IL13 pathway, all of which are targets for novel systemic AD therapeutics. Multiomics-based network and RNA sequencing analysis revealed DOK2 as a central hub interacting with, among others, CD200R1, IL6R, and signal transducer and activator of transcription 3 (STAT3). Multitissue gene expression profile analysis for 53 tissue types from the Genotype-Tissue Expression project showed that disease-associated protein-coding variants exert their greatest effect in skin tissues. CONCLUSION: Our discoveries highlight a major role of rare coding variants in AD acting independently of common variants. Further extensive functional studies are required to detect all potential causal variants and to specify the contribution of the novel susceptibility genes DOK2 and CD200R1 to overall disease susceptibility.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Dermatitis Atópica/genética , Genotipo , Receptores de Orexina/genética , Fosfoproteínas/genética , Piel/metabolismo , Adulto , Estudios de Cohortes , Proteínas Filagrina , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Especificidad de Órganos , Polimorfismo Genético , Riesgo , Transcriptoma
8.
Comput Struct Biotechnol J ; 17: 1082-1090, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31452861

RESUMEN

The evolutionary analysis of genetic data is an important subject of modern bioscience, with practical applications in diverse fields. Parameters of interest in this context include effective population sizes, mutation rates, population growth rates and the times to most recent common ancestors. Studying Y-chromosomal microsatellite data, in particular, has proven useful to unravel the recent patrilineal history of Homo sapiens populations. We compared the individual analysis options and technical details of four software tools that are widely used for this purpose, namely BATWING, BEAST, IMa2 and LAMARC, all of which use Bayesian coalescent-based Markov chain Monte Carlo (MCMC) methods for parameter estimation. More specifically, we simulated datasets for either eight or 20 hypothetical Y-chromosomal microsatellites, assuming a mutation rate of 0.0030 per generation and a constant or exponentially increasing population size, and used these data to evaluate the parameter estimation capacity of each tool. The datasets comprised between 100 and 1000 samples. In addition to runtime, the practical utility of the tools of interest can also be expected to depend critically upon the convergence behavior of the actual MCMC implementation. In fact, we found that runtime increased, and convergence rate decreased, with increasing sample size as expected. BATWING performed best with respect to runtime and convergence behavior, but only supports simple evolutionary models. As regards the spectrum of evolutionary models covered, and also in terms of cross-platform usability, BEAST provided the greatest flexibility. Finally, IMa2 and LAMARC turned out best to incorporate elaborate migration models in the analysis process.

9.
Artículo en Inglés | MEDLINE | ID: mdl-26451813

RESUMEN

High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderately-sized datasets and to a few hours for large-scale datasets.


Asunto(s)
Gráficos por Computador/instrumentación , Análisis Mutacional de ADN/instrumentación , Epistasis Genética/genética , Estudio de Asociación del Genoma Completo/instrumentación , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Polimorfismo de Nucleótido Simple/genética , Mapeo Cromosómico/instrumentación , Mapeo Cromosómico/métodos , Diseño de Equipo , Análisis de Falla de Equipo , Estudio de Asociación del Genoma Completo/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Procesamiento de Señales Asistido por Computador/instrumentación
10.
Sci Rep ; 5: 11534, 2015 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-26166306

RESUMEN

Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.


Asunto(s)
Biología Computacional , Programas Informáticos , Reacciones Falso Positivas , Genoma Humano , Células Germinativas/metabolismo , Virus de la Hepatitis B/genética , Virus de la Hepatitis B/fisiología , Herpesviridae/genética , Herpesviridae/fisiología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias Hepáticas/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/virología , Análisis de Secuencia de ADN , Integración Viral
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...