Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

Single-cell multiregion dissection of Alzheimer's disease.

Mathys, Hansruedi; Boix, Carles A; Akay, Leyla Anne; Xia, Ziting; Davila-Velderrain, Jose; Ng, Ayesha P; Jiang, Xueqiao; Abdelhady, Ghada; Galani, Kyriaki; Mantero, Julio; Band, Neil; James, Benjamin T; Babu, Sudhagar; Galiana-Melendez, Fabiola; Louderback, Kate; Prokopenko, Dmitry; Tanzi, Rudolph E; Bennett, David A; Tsai, Li-Huei; Kellis, Manolis.

Nature ; 2024 Jul 24.

Artículo en Inglés | MEDLINE | ID: mdl-39048816

RESUMEN

Alzheimer's disease is the leading cause of dementia worldwide, but the cellular pathways that underlie its pathological progression across brain regions remain poorly understood1-3. Here we report a single-cell transcriptomic atlas of six different brain regions in the aged human brain, covering 1.3 million cells from 283 post-mortem human brain samples across 48 individuals with and without Alzheimer's disease. We identify 76 cell types, including region-specific subtypes of astrocytes and excitatory neurons and an inhibitory interneuron population unique to the thalamus and distinct from canonical inhibitory subclasses. We identify vulnerable populations of excitatory and inhibitory neurons that are depleted in specific brain regions in Alzheimer's disease, and provide evidence that the Reelin signalling pathway is involved in modulating the vulnerability of these neurons. We develop a scalable method for discovering gene modules, which we use to identify cell-type-specific and region-specific modules that are altered in Alzheimer's disease and to annotate transcriptomic differences associated with diverse pathological variables. We identify an astrocyte program that is associated with cognitive resilience to Alzheimer's disease pathology, tying choline metabolism and polyamine biosynthesis in astrocytes to preserved cognitive function late in life. Together, our study develops a regional atlas of the ageing human brain and provides insights into cellular vulnerability, response and resilience to Alzheimer's disease pathology.

2.

Prediction of disease-free survival for precision medicine using cooperative learning on multi-omic data.

Hahn, Georg; Prokopenko, Dmitry; Hecker, Julian; Lutz, Sharon M; Mullin, Kristina; Sejour, Leinal; Hide, Winston; Vlachos, Ioannis; DeSantis, Stacia; Tanzi, Rudolph E; Lange, Christoph.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38836403

RESUMEN

In precision medicine, both predicting the disease susceptibility of an individual and forecasting its disease-free survival are areas of key research. Besides the classical epidemiological predictor variables, data from multiple (omic) platforms are increasingly available. To integrate this wealth of information, we propose new methodology to combine both cooperative learning, a recent approach to leverage the predictive power of several datasets, and polygenic hazard score models. Polygenic hazard score models provide a practitioner with a more differentiated view of the predicted disease-free survival than the one given by merely a point estimate, for instance computed with a polygenic risk score. Our aim is to leverage the advantages of cooperative learning for the computation of polygenic hazard score models via Cox's proportional hazard model, thereby improving the prediction of the disease-free survival. In our experimental study, we apply our methodology to forecast the disease-free survival for Alzheimer's disease (AD) using three layers of data. One layer contains epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status and 10 leading principal components. Another layer contains selected genomic loci, and the last layer contains methylation data for selected CpG sites. We demonstrate that the survival curves computed via cooperative learning yield an AUC of around $0.7$, above the state-of-the-art performance of its competitors. Importantly, the proposed methodology returns (1) a linear score that can be easily interpreted (in contrast to machine learning approaches), and (2) a weighting of the predictive power of the involved data layers, allowing for an assessment of the importance of each omic (or other) platform. Similarly to polygenic hazard score models, our methodology also allows one to compute individual survival curves for each patient.

Asunto(s)

Enfermedad de Alzheimer , Medicina de Precisión , Humanos , Medicina de Precisión/métodos , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/mortalidad , Supervivencia sin Enfermedad , Aprendizaje Automático , Modelos de Riesgos Proporcionales , Herencia Multifactorial , Masculino , Femenino , Multiómica

3.

Polygenic hazard score models for the prediction of Alzheimer's free survival using the lasso for Cox's proportional hazards model.

Hahn, Georg; Prokopenko, Dmitry; Hecker, Julian; Lutz, Sharon M; Mullin, Kristina; Tanzi, Rudolph E; DeSantis, Stacia; Lange, Christoph.

Genet Epidemiol ; 2024 Jul 09.

Artículo en Inglés | MEDLINE | ID: mdl-38982682

RESUMEN

The prediction of the susceptibility of an individual to a certain disease is an important and timely research area. An established technique is to estimate the risk of an individual with the help of an integrated risk model, that is, a polygenic risk score with added epidemiological covariates. However, integrated risk models do not capture any time dependence, and may provide a point estimate of the relative risk with respect to a reference population. The aim of this work is twofold. First, we explore and advocate the idea of predicting the time-dependent hazard and survival (defined as disease-free time) of an individual for the onset of a disease. This provides a practitioner with a much more differentiated view of absolute survival as a function of time. Second, to compute the time-dependent risk of an individual, we use published methodology to fit a Cox's proportional hazard model to data from a genetic SNP study of time to Alzheimer's disease (AD) onset, using the lasso to incorporate further epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status, 10 leading principal components, and selected genomic loci. We apply the lasso for Cox's proportional hazards to a data set of 6792 AD patients (composed of 4102 cases and 2690 controls) and 87 covariates. We demonstrate that fitting a lasso model for Cox's proportional hazards allows one to obtain more accurate survival curves than with state-of-the-art (likelihood-based) methods. Moreover, the methodology allows one to obtain personalized survival curves for a patient, thus giving a much more differentiated view of the expected progression of a disease than the view offered by integrated risk models. The runtime to compute personalized survival curves is under a minute for the entire data set of AD patients, thus enabling it to handle datasets with 60,000-100,000 subjects in less than 1 h.

4.

A comparison between similarity matrices for principal component analysis to assess population stratification in sequenced genetic data sets.

Lee, Sanghun; Hahn, Georg; Hecker, Julian; Lutz, Sharon M; Mullin, Kristina; Hide, Winston; Bertram, Lars; DeMeo, Dawn L; Tanzi, Rudolph E; Lange, Christoph; Prokopenko, Dmitry.

Brief Bioinform ; 24(1)2023 01 19.

Artículo en Inglés | MEDLINE | ID: mdl-36585781

RESUMEN

Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.

Asunto(s)

Genoma , Genómica , Análisis de Componente Principal , Frecuencia de los Genes , Simulación por Computador , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple

5.

A robust and adaptive framework for interaction testing in quantitative traits between multiple genetic loci and exposure variables.

Hecker, Julian; Prokopenko, Dmitry; Moll, Matthew; Lee, Sanghun; Kim, Wonji; Qiao, Dandi; Voorhies, Kirsten; Kim, Woori; Vansteelandt, Stijn; Hobbs, Brian D; Cho, Michael H; Silverman, Edwin K; Lutz, Sharon M; DeMeo, Dawn L; Weiss, Scott T; Lange, Christoph.

PLoS Genet ; 18(11): e1010464, 2022 11.

Artículo en Inglés | MEDLINE | ID: mdl-36383614

RESUMEN

The identification and understanding of gene-environment interactions can provide insights into the pathways and mechanisms underlying complex diseases. However, testing for gene-environment interaction remains a challenge since a.) statistical power is often limited and b.) modeling of environmental effects is nontrivial and such model misspecifications can lead to false positive interaction findings. To address the lack of statistical power, recent methods aim to identify interactions on an aggregated level using, for example, polygenic risk scores. While this strategy can increase the power to detect interactions, identifying contributing genes and pathways is difficult based on these relatively global results. Here, we propose RITSS (Robust Interaction Testing using Sample Splitting), a gene-environment interaction testing framework for quantitative traits that is based on sample splitting and robust test statistics. RITSS can incorporate sets of genetic variants and/or multiple environmental factors. Based on the user's choice of statistical/machine learning approaches, a screening step selects and combines potential interactions into scores with improved interpretability. In the testing step, the application of robust statistics minimizes the susceptibility to main effect misspecifications. Using extensive simulation studies, we demonstrate that RITSS controls the type 1 error rate in a wide range of scenarios, and we show how the screening strategy influences statistical power. In an application to lung function phenotypes and human height in the UK Biobank, RITSS identified highly significant interactions based on subcomponents of genetic risk scores. While the contributing single variant interaction signals are weak, our results indicate interaction patterns that result in strong aggregated effects, providing potential insights into underlying gene-environment interaction mechanisms.

Asunto(s)

Modelos Genéticos , Polimorfismo de Nucleótido Simple , Humanos , Sitios Genéticos , Interacción Gen-Ambiente , Fenotipo , Simulación por Computador , Estudio de Asociación del Genoma Completo

6.

Fast computation of the eigensystem of genomic similarity matrices.

Hahn, Georg; Lutz, Sharon M; Hecker, Julian; Prokopenko, Dmitry; Cho, Michael H; Silverman, Edwin K; Weiss, Scott T; Lange, Christoph.

BMC Bioinformatics ; 25(1): 43, 2024 Jan 25.

Artículo en Inglés | MEDLINE | ID: mdl-38273228

RESUMEN

The computation of a similarity measure for genomic data is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases due to confounding by population stratification, for instance in linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (called the covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for the application of a randomized SVD algorithm, which is faster than the traditional computation. The fast SVD algorithm we present is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. The algorithm only assumes that row-wise and column-wise subtraction and multiplication of a vector with a sparse matrix is available, an operation that is efficiently implemented in common sparse matrix packages. An exception is the so-called Jaccard matrix, which does not have a structure applicable for the fast SVD algorithm. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the accuracy (in [Formula: see text] norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation, and derive the theoretical runtime of our algorithm. We illustrate that the approximation error is low in practice and empirically verify the theoretical runtime scalings on both simulated data and data of the 1000 Genome Project.

Asunto(s)

Genoma , Genómica , Algoritmos , Modelos Lineales

7.

On the effect heterogeneity of established disease susceptibility loci for Alzheimer's disease across different genetic ancestries.

Lee, Sanghun; Hecker, Julian; Hahn, Georg; Mullin, Kristina; Lutz, Sharon M; Tanzi, Rudolph E; Lange, Christoph; Prokopenko, Dmitry.

Alzheimers Dement ; 20(5): 3397-3405, 2024 05.

Artículo en Inglés | MEDLINE | ID: mdl-38563508

RESUMEN

INTRODUCTION: Genome-wide association studies have identified numerous disease susceptibility loci (DSLs) for Alzheimer's disease (AD). However, only a limited number of studies have investigated the dependence of the genetic effect size of established DSLs on genetic ancestry. METHODS: We utilized the whole genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) including 35,569 participants. A total of 25,459 subjects in four distinct populations (African ancestry, non-Hispanic White, admixed Hispanic, and Asian) were analyzed. RESULTS: We found that nine DSLs showed significant heterogeneity across populations. Single nucleotide polymorphism (SNP) rs2075650 in translocase of outer mitochondrial membrane 40 (TOMM40) showed the largest heterogeneity (Cochran's Q = 0.00, I2 = 90.08), followed by other SNPs in apolipoprotein C1 (APOC1) and apolipoprotein E (APOE). Two additional loci, signal-induced proliferation-associated 1 like 2 (SIPA1L2) and solute carrier 24 member 4 (SLC24A4), showed significant heterogeneity across populations. DISCUSSION: We observed substantial heterogeneity for the APOE-harboring 19q13.32 region with TOMM40/APOE/APOC1 genes. The largest risk effect was seen among African Americans, while Asians showed a surprisingly small risk effect.

Asunto(s)

Enfermedad de Alzheimer , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Proteínas del Complejo de Importación de Proteínas Precursoras Mitocondriales , Polimorfismo de Nucleótido Simple , Humanos , Enfermedad de Alzheimer/genética , Predisposición Genética a la Enfermedad/genética , Polimorfismo de Nucleótido Simple/genética , Apolipoproteínas E/genética , Femenino , Masculino , Apolipoproteína C-I/genética , Anciano , Proteínas de Transporte de Membrana/genética , Sitios Genéticos/genética

8.

Zinc finger protein 33B demonstrates sex interaction with atopy-related markers in childhood asthma.

Lee, Sanghun; Prokopenko, Dmitry; Kelly, Rachel S; Lutz, Sharon; Ann Lasky-Su, Jessica; Cho, Michael H; Laurie, Cecelia; Celedón, Juan C; Lange, Christoph; Weiss, Scott T; Hecker, Julian; DeMeo, Dawn L.

Eur Respir J ; 61(1)2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-35953101

RESUMEN

BACKGROUND: Sex differences related to immune responses can influence atopic manifestations in childhood asthma. While genome-wide association studies have investigated a sex-specific genetic architecture of the immune response, gene-by-sex interactions have not been extensively analysed for atopy-related markers including allergy skin tests, IgE and eosinophils in asthmatic children. METHODS: We performed a genome-wide gene-by-sex interaction analysis for atopy-related markers using whole-genome sequencing data based on 889 trios from the Genetic Epidemiology of Asthma in Costa Rica Study (GACRS) and 284 trios from the Childhood Asthma Management Program (CAMP). We also tested the findings in UK Biobank participants with self-reported childhood asthma. Furthermore, downstream analyses in GACRS integrated gene expression to disentangle observed associations. RESULTS: Single nucleotide polymorphism (SNP) rs1255383 at 10q11.21 demonstrated a genome-wide significant gene-by-sex interaction (pinteraction=9.08×10-10) for atopy (positive skin test) with opposite direction of effects between females and males. In the UK Biobank participants with a history of childhood asthma, the signal was consistently observed with the same sex-specific effect directions for high eosinophil count (pinteraction=0.0058). Gene expression of ZNF33B (zinc finger protein 33B), located at 10q11.21, was moderately associated with atopy in girls, but not in boys. CONCLUSIONS: We report SNPs in/near a zinc finger gene as novel sex-differential loci for atopy-related markers with opposite effect directions in females and males. A potential role for ZNF33B should be studied further as an important driver of sex-divergent features of atopy in childhood asthma.

Asunto(s)

Asma , Hipersensibilidad Inmediata , Niño , Humanos , Masculino , Femenino , Estudio de Asociación del Genoma Completo , Inmunoglobulina E , Asma/epidemiología , Hipersensibilidad Inmediata/genética , Hipersensibilidad Inmediata/epidemiología , Eosinófilos , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad

9.

Region-based analysis of rare genomic variants in whole-genome sequencing datasets reveal two novel Alzheimer's disease-associated genes: DTNB and DLG2.

Prokopenko, Dmitry; Lee, Sanghun; Hecker, Julian; Mullin, Kristina; Morgan, Sarah; Katsumata, Yuriko; Weiner, Michael W; Fardo, David W; Laird, Nan; Bertram, Lars; Hide, Winston; Lange, Christoph; Tanzi, Rudolph E.

Mol Psychiatry ; 27(4): 1963-1969, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35246634

RESUMEN

Alzheimer's disease (AD) is a genetically complex disease for which nearly 40 loci have now been identified via genome-wide association studies (GWAS). We attempted to identify groups of rare variants (alternate allele frequency <0.01) associated with AD in a region-based, whole-genome sequencing (WGS) association study (rvGWAS) of two independent AD family datasets (NIMH/NIA; 2247 individuals; 605 families). Employing a sliding window approach across the genome, we identified several regions that achieved association p values <10-6, using the burden test or the SKAT statistic. The genomic region around the dystobrevin beta (DTNB) gene was identified with the burden and SKAT test and replicated in case/control samples from the ADSP study reaching genome-wide significance after meta-analysis (pmeta = 4.74 × 10-8). SKAT analysis also revealed region-based association around the Discs large homolog 2 (DLG2) gene and replicated in case/control samples from the ADSP study (pmeta = 1 × 10-6). In conclusion, in a region-based rvGWAS of AD we identified two novel AD genes, DLG2 and DTNB, based on association with rare variants.

Asunto(s)

Enfermedad de Alzheimer , Proteínas Asociadas a la Distrofina/genética , Neuropéptidos/genética , Enfermedad de Alzheimer/genética , Ácido Ditionitrobenzoico , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Genómica , Guanilato-Quinasas/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Proteínas Supresoras de Tumor/genética , Secuenciación Completa del Genoma

10.

Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest.

Hahn, Georg; Lee, Sanghun; Prokopenko, Dmitry; Abraham, Jonathan; Novak, Tanya; Hecker, Julian; Cho, Michael; Khurana, Surender; Baden, Lindsey R; Randolph, Adrienne G; Weiss, Scott T; Lange, Christoph.

BMC Bioinformatics ; 23(1): 547, 2022 Dec 19.

Artículo en Inglés | MEDLINE | ID: mdl-36536276

RESUMEN

As of June 2022, the GISAID database contains more than 11 million SARS-CoV-2 genomes, including several thousand nucleotide sequences for the most common variants such as delta or omicron. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We start by assessing the similarity of all pairs of nucleotide sequences using the Jaccard index and principal component analysis. As shown previously in the literature, an unsupervised cluster analysis applied to the SARS-CoV-2 genomes results in clusters of sequences according to certain characteristics such as their strain or their clade. Importantly, we observe that nucleotide sequences of common variants are often outliers in clusters of sequences stemming from variants identified earlier on during the pandemic. Motivated by this finding, we are interested in applying outlier detection to nucleotide sequences. We demonstrate that nucleotide sequences of common variants (such as alpha, delta, or omicron) can be identified solely based on a statistical outlier criterion. We argue that outlier detection might be a useful surveillance tool to identify emerging variants in real time as the pandemic progresses.

Asunto(s)

COVID-19 , Humanos , Secuencia de Bases , SARS-CoV-2 , Análisis por Conglomerados , Bases de Datos Factuales

11.

locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies.

Hahn, Georg; Lutz, Sharon M; Hecker, Julian; Prokopenko, Dmitry; Cho, Michael H; Silverman, Edwin K; Weiss, Scott T; Lange, Christoph.

Genet Epidemiol ; 45(1): 82-98, 2021 02.

Artículo en Inglés | MEDLINE | ID: mdl-32929743

RESUMEN

locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.

Asunto(s)

Estudio de Asociación del Genoma Completo , Genoma , Algoritmos , Genómica , Humanos , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma

12.

Negative evidence for a role of APH1B T27I variant in Alzheimer's disease.

Zhang, Xulun; Zhang, Can; Prokopenko, Dmitry; Liang, Yingxia; Han, Weinong; Tanzi, Rudolph E; Sisodia, Sangram S.

Hum Mol Genet ; 29(6): 955-966, 2020 04 15.

Artículo en Inglés | MEDLINE | ID: mdl-31995180

RESUMEN

Î³-secretase is a macromolecular complex that catalyzes intramembranous hydrolysis of more than 100 membrane-bound substrates. The complex is composed of presenilin (PS1 or PS2), anterior pharynx defect-1 (APH-1), nicastrin (NCT) and PEN-2 and early-onset; autosomal dominant forms of Alzheimer's disease (AD) are caused by inheritance of mutations of PS. No mutations in genes encoding NCT, or PEN-2 have been identified to date that cause AD. In this regard, a large genetic meta-analysis of four cohorts consisting of more than 600 000 individuals identified a common missense variant, rs117618017 in the APH1B gene that results in a T27I mutation, as a novel genome-wide significant locus. In order to confirm the findings that rs117618017 is associated with risk of AD, we performed a genetic screen from deep whole genome sequencing of the large NIMH family-based Alzheimer's Disease (AD) dataset. In parallel, we sought to uncover potential molecular mechanism(s) by which APH-1B T27I might be associated with AD by generating stable HEK293 cell lines, wherein endogenous APH-1A and APH-1B expression was silenced and into which either the wild type APH-1B or the APH-1B T27I variant was stably expressed. We then tested the impact of expressing either the wild type APH-1B or the APH-1B T27I variant on Î³-secretase processing of human APP, the murine Notch derivative mNΔE and human neuregulin-1. We now report that we fail to confirm the association of rs1047552 with AD in our cohort and that cells expressing the APH-1B T27I variant show no discernable impact on the Î³-secretase processing of established substrates compared with cells expressing wild-type APH-1B.

Asunto(s)

Enfermedad de Alzheimer/patología , Secretasas de la Proteína Precursora del Amiloide/metabolismo , Endopeptidasas/genética , Proteínas de la Membrana/genética , Polimorfismo de Nucleótido Simple , Enfermedad de Alzheimer/genética , Células HEK293 , Humanos , Mutación , Unión Proteica

13.

Genome-Wide Gene-by-Smoking Interaction Study of Chronic Obstructive Pulmonary Disease.

Kim, Woori; Prokopenko, Dmitry; Sakornsakolpat, Phuwanat; Hobbs, Brian D; Lutz, Sharon M; Hokanson, John E; Wain, Louise V; Melbourne, Carl A; Shrine, Nick; Tobin, Martin D; Silverman, Edwin K; Cho, Michael H; Beaty, Terri H.

Am J Epidemiol ; 190(5): 875-885, 2021 05 04.

Artículo en Inglés | MEDLINE | ID: mdl-33106845

RESUMEN

Risk of chronic obstructive pulmonary disease (COPD) is determined by both cigarette smoking and genetic susceptibility, but little is known about gene-by-smoking interactions. We performed a genome-wide association analysis of 179,689 controls and 21,077 COPD cases from UK Biobank subjects of European ancestry recruited from 2006 to 2010, considering genetic main effects and gene-by-smoking interaction effects simultaneously (2-degrees-of-freedom (df) test) as well as interaction effects alone (1-df interaction test). We sought to replicate significant results in COPDGene (United States, 2008-2010) and SpiroMeta Consortium (multiple countries, 1947-2015) data. We considered 2 smoking variables: 1) ever/never and 2) current/noncurrent. In the 1-df test, we identified 1 genome-wide significant locus on 15q25.1 (cholinergic receptor nicotinic ß4 subunit, or CHRNB4) for ever- and current smoking and identified PI*Z allele (rs28929474) of serpin family A member 1 (SERPINA1) for ever-smoking and 3q26.2 (MDS1 and EVI1 complex locus, or MECOM) for current smoking in an analysis of previously reported COPD loci. In the 2-df test, most of the significant signals were also significant for genetic marginal effects, aside from 16q22.1 (sphingomyelin phosphodiesterase 3, or SMPD3) and 19q13.2 (Egl-9 family hypoxia inducible factor 2, or EGLN2). The significant effects at 15q25.1 and 19q13.2 loci, both previously described in prior genome-wide association studies of COPD or smoking, were replicated in COPDGene and SpiroMeta. We identified interaction effects at previously reported COPD loci; however, we failed to identify novel susceptibility loci.

Asunto(s)

Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Enfermedad Pulmonar Obstructiva Crónica/genética , Fumar/genética , Estudios de Casos y Controles , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana Edad , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Pruebas de Función Respiratoria , Reino Unido , Población Blanca/genética

14.

Whole-genome sequencing reveals new Alzheimer's disease-associated rare variants in loci related to synaptic function and neuronal development.

Prokopenko, Dmitry; Morgan, Sarah L; Mullin, Kristina; Hofmann, Oliver; Chapman, Brad; Kirchner, Rory; Amberkar, Sandeep; Wohlers, Inken; Lange, Christoph; Hide, Winston; Bertram, Lars; Tanzi, Rudolph E.

Alzheimers Dement ; 17(9): 1509-1527, 2021 09.

Artículo en Inglés | MEDLINE | ID: mdl-33797837

RESUMEN

INTRODUCTION: Genome-wide association studies have led to numerous genetic loci associated with Alzheimer's disease (AD). Whole-genome sequencing (WGS) now permits genome-wide analyses to identify rare variants contributing to AD risk. METHODS: We performed single-variant and spatial clustering-based testing on rare variants (minor allele frequency [MAF] ≤1%) in a family-based WGS-based association study of 2247 subjects from 605 multiplex AD families, followed by replication in 1669 unrelated individuals. RESULTS: We identified 13 new AD candidate loci that yielded consistent rare-variant signals in discovery and replication cohorts (4 from single-variant, 9 from spatial-clustering), implicating these genes: FNBP1L, SEL1L, LINC00298, PRKCH, C15ORF41, C2CD3, KIF2A, APC, LHX9, NALCN, CTNNA2, SYTL3, and CLSTN2. DISCUSSION: Downstream analyses of these novel loci highlight synaptic function, in contrast to common AD-associated variants, which implicate innate immunity and amyloid processing. These loci have not been associated previously with AD, emphasizing the ability of WGS to identify AD-associated rare variants, particularly outside of the exome.

Asunto(s)

Enfermedad de Alzheimer/genética , Frecuencia de los Genes/genética , Predisposición Genética a la Enfermedad , Secuenciación Completa del Genoma , Estudio de Asociación del Genoma Completo , Humanos , Canales Iónicos/genética , Cinesinas/genética , Proteínas de la Membrana/genética , Proteínas Asociadas a Microtúbulos/genética , Proteínas/genética

15.

TMEM106B and CPOX are genetic determinants of cerebrospinal fluid Alzheimer's disease biomarker levels.

Hong, Shengjun; Dobricic, Valerija; Ohlei, Olena; Bos, Isabelle; Vos, Stephanie J B; Prokopenko, Dmitry; Tijms, Betty M; Andreasson, Ulf; Blennow, Kaj; Vandenberghe, Rik; Gabel, Silvy; Scheltens, Philip; Teunissen, Charlotte E; Engelborghs, Sebastiaan; Frisoni, Giovanni; Blin, Olivier; Richardson, Jill C; Bordet, Regis; Lleó, Alberto; Alcolea, Daniel; Popp, Julius; Clark, Christopher; Peyratout, Gwendoline; Martinez-Lage, Pablo; Tainta, Mikel; Dobson, Richard J B; Legido-Quigley, Cristina; Sleegers, Kristel; Van Broeckhoven, Christine; Tanzi, Rudolph E; Ten Kate, Mara; Wittig, Michael; Franke, Andre; Lill, Christina M; Barkhof, Frederik; Lovestone, Simon; Streffer, Johannes; Zetterberg, Henrik; Visser, Pieter Jelle; Bertram, Lars.

Alzheimers Dement ; 17(10): 1628-1640, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-33991015

RESUMEN

INTRODUCTION: Neurofilament light (NfL), chitinase-3-like protein 1 (YKL-40), and neurogranin (Ng) are biomarkers for Alzheimer's disease (AD) to monitor axonal damage, astroglial activation, and synaptic degeneration, respectively. METHODS: We performed genome-wide association studies (GWAS) using DNA and cerebrospinal fluid (CSF) samples from the EMIF-AD Multimodal Biomarker Discovery study for discovery, and the Alzheimer's Disease Neuroimaging Initiative study for validation analyses. GWAS were performed for all three CSF biomarkers using linear regression models adjusting for relevant covariates. RESULTS: We identify novel genome-wide significant associations between DNA variants in TMEM106B and CSF levels of NfL, and between CPOX and YKL-40. We confirm previous work suggesting that YKL-40 levels are associated with DNA variants in CHI3L1. DISCUSSION: Our study provides important new insights into the genetic architecture underlying interindividual variation in three AD-related CSF biomarkers. In particular, our data shed light on the sequence of events regarding the initiation and progression of neuropathological processes relevant in AD.

Asunto(s)

Enfermedad de Alzheimer/genética , Biomarcadores/líquido cefalorraquídeo , Estudio de Asociación del Genoma Completo , Proteínas de la Membrana/genética , Proteínas del Tejido Nervioso/genética , Anciano , Proteína 1 Similar a Quitinasa-3/genética , Femenino , Humanos , Masculino , Proteínas de Neurofilamentos/genética , Neurogranina/líquido cefalorraquídeo

16.

Whole exome sequencing analysis in severe chronic obstructive pulmonary disease.

Qiao, Dandi; Ameli, Asher; Prokopenko, Dmitry; Chen, Han; Kho, Alvin T; Parker, Margaret M; Morrow, Jarrett; Hobbs, Brian D; Liu, Yanhong; Beaty, Terri H; Crapo, James D; Barnes, Kathleen C; Nickerson, Deborah A; Bamshad, Michael; Hersh, Craig P; Lomas, David A; Agusti, Alvar; Make, Barry J; Calverley, Peter M A; Donner, Claudio F; Wouters, Emiel F; Vestbo, Jørgen; Paré, Peter D; Levy, Robert D; Rennard, Stephen I; Tal-Singer, Ruth; Spitz, Margaret R; Sharma, Amitabh; Ruczinski, Ingo; Lange, Christoph; Silverman, Edwin K; Cho, Michael H.

Hum Mol Genet ; 27(21): 3801-3812, 2018 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-30060175

RESUMEN

Chronic obstructive pulmonary disease (COPD), one of the leading causes of death worldwide, is substantially influenced by genetic factors. Alpha-1 antitrypsin deficiency demonstrates that rare coding variants of large effect can influence COPD susceptibility. To identify additional rare coding variants in patients with severe COPD, we conducted whole exome sequencing analysis in 2543 subjects from two family-based studies (Boston Early-Onset COPD Study and International COPD Genetics Network) and one case-control study (COPDGene). Applying a gene-based segregation test in the family-based data, we identified significant segregation of rare loss of function variants in TBC1D10A and RFPL1 (P-value < 2x10-6), but were unable to find similar variants in the case-control study. In single-variant, gene-based and pathway association analyses, we were unable to find significant findings that replicated or were significant in meta-analysis. However, we found that the top results in the two datasets were in proximity to each other in the protein-protein interaction network (P-value = 0.014), suggesting enrichment of these results for similar biological processes. A network of these association results and their neighbors was significantly enriched in the transforming growth factor beta-receptor binding and cilia-related pathways. Finally, in a more detailed examination of candidate genes, we identified individuals with putative high-risk variants, including patients harboring homozygous mutations in genes associated with cutis laxa and Niemann-Pick Disease Type C. Our results likely reflect heterogeneity of genetic risk for COPD along with limitations of statistical power and functional annotation, and highlight the potential of network analysis to gain insight into genetic association studies.

Asunto(s)

Secuenciación del Exoma , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Enfermedad Pulmonar Obstructiva Crónica/genética , Adolescente , Adulto , Anciano , Estudios de Casos y Controles , Análisis Mutacional de ADN , Femenino , Estudios de Asociación Genética , Humanos , Masculino , Persona de Mediana Edad , Mutación , Adulto Joven

17.

PolyGEE: a generalized estimating equation approach to the efficient and robust estimation of polygenic effects in large-scale association studies.

Hecker, Julian; Prokopenko, Dmitry; Lange, Christoph; Fier, Heide Loehlein.

Biostatistics ; 19(3): 295-306, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-28968646

RESUMEN

To quantify polygenic effects, i.e. undetected genetic effects, in large-scale association studies, we propose a generalized estimating equation (GEE) based estimation framework. We develop a marginal model for single-variant association test statistics of complex diseases that generalizes existing approaches such as LD Score regression and that is applicable to population-based designs, to family-based designs or to arbitrary combinations of both. We extend the standard GEE approach so that the parameters of the proposed marginal model can be estimated based on working-correlation/linkage-disequilibrium (LD) matrices from external reference panels. Our method achieves substantial efficiency gains over standard approaches, while it is robust against misspecification of the LD structure, i.e. the LD structure of the reference panel can differ substantially from the true LD structure in the study population. In simulation studies and in applications to population-based and family-based studies, we illustrate the features of the proposed GEE framework. Our results suggest that our approach can be up to 100% more efficient than existing methodology.

Asunto(s)

Bioestadística/métodos , Estudio de Asociación del Genoma Completo/métodos , Desequilibrio de Ligamiento , Modelos Estadísticos , Simulación por Computador , Humanos , Trastornos Mentales/genética , Análisis de Regresión

18.

Whole-Genome Sequencing in Severe Chronic Obstructive Pulmonary Disease.

Prokopenko, Dmitry; Sakornsakolpat, Phuwanat; Fier, Heide Loehlein; Qiao, Dandi; Parker, Margaret M; McDonald, Merry-Lynn N; Manichaikul, Ani; Rich, Stephen S; Barr, R Graham; Williams, Christopher J; Brantly, Mark L; Lange, Christoph; Beaty, Terri H; Crapo, James D; Silverman, Edwin K; Cho, Michael H.

Am J Respir Cell Mol Biol ; 59(5): 614-622, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-29949718

RESUMEN

Genome-wide association studies have identified common variants associated with chronic obstructive pulmonary disease (COPD). Whole-genome sequencing (WGS) offers comprehensive coverage of the entire genome, as compared with genotyping arrays or exome sequencing. We hypothesized that WGS in subjects with severe COPD and smoking control subjects with normal pulmonary function would allow us to identify novel genetic determinants of COPD. We sequenced 821 patients with severe COPD and 973 control subjects from the COPDGene and Boston Early-Onset COPD studies, including both non-Hispanic white and African American individuals. We performed single-variant and grouped-variant analyses, and in addition, we assessed the overlap of variants between sequencing- and array-based imputation. Our most significantly associated variant was in a known region near HHIP (combined P = 1.6 × 10-9); additional variants approaching genome-wide significance included previously described regions in CHRNA5, TNS1, and SERPINA6/SERPINA1 (the latter in African American individuals). None of our associations were clearly driven by rare variants, and we found minimal evidence of replication of genes identified by previously reported smaller sequencing studies. With WGS, we identified more than 20 million new variants, not seen with imputation, including more than 10,000 of potential importance in previously identified COPD genome-wide association study regions. WGS in severe COPD identifies a large number of potentially important functional variants, with the strongest associations being in known COPD risk loci, including HHIP and SERPINA1. Larger sample sizes will be needed to identify associated variants in novel regions of the genome.

Asunto(s)

Estudio de Asociación del Genoma Completo , Pulmón/metabolismo , Polimorfismo de Nucleótido Simple , Enfermedad Pulmonar Obstructiva Crónica/genética , Índice de Severidad de la Enfermedad , Secuenciación Completa del Genoma/métodos , Negro o Afroamericano/estadística & datos numéricos , Anciano , Estudios de Casos y Controles , Estudios de Cohortes , Femenino , Predisposición Genética a la Enfermedad , Humanos , Pulmón/patología , Masculino , Persona de Mediana Edad , Enfermedad Pulmonar Obstructiva Crónica/etnología , Población Blanca/estadística & datos numéricos

19.

On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.

Loehlein Fier, Heide; Prokopenko, Dmitry; Hecker, Julian; Cho, Michael H; Silverman, Edwin K; Weiss, Scott T; Tanzi, Rudolph E; Lange, Christoph.

Genet Epidemiol ; 41(4): 332-340, 2017 05.

Artículo en Inglés | MEDLINE | ID: mdl-28318110

RESUMEN

For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: https://github.com/heidefier/cluster_wgs_data.

Asunto(s)

Estudio de Asociación del Genoma Completo , Genoma , Análisis de Secuencia de ADN , Algoritmos , Enfermedad de Alzheimer/genética , Análisis por Conglomerados , Simulación por Computador , Genómica , Humanos , Modelos Genéticos , Programas Informáticos

20.

Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project.

Prokopenko, Dmitry; Hecker, Julian; Silverman, Edwin K; Pagano, Marcello; Nöthen, Markus M; Dina, Christian; Lange, Christoph; Fier, Heide Loehlein.

Bioinformatics ; 32(9): 1366-72, 2016 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-26722118

RESUMEN

MOTIVATION: Population stratification is one of the major sources of confounding in genetic association studies, potentially causing false-positive and false-negative results. Here, we present a novel approach for the identification of population substructure in high-density genotyping data/next generation sequencing data. The approach exploits the co-appearances of rare genetic variants in individuals. The method can be applied to all available genetic loci and is computationally fast. Using sequencing data from the 1000 Genomes Project, the features of the approach are illustrated and compared to existing methodology (i.e. EIGENSTRAT). We examine the effects of different cutoffs for the minor allele frequency on the performance of the approach. We find that our approach works particularly well for genetic loci with very small minor allele frequencies. The results suggest that the inclusion of rare-variant data/sequencing data in our approach provides a much higher resolution picture of population substructure than it can be obtained with existing methodology. Furthermore, in simulation studies, we find scenarios where our method was able to control the type 1 error more precisely and showed higher power. CONTACT: dmitry.prokopenko@uni-bonn.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genoma , Animales , Simulación por Computador , Frecuencia de los Genes , Estudios de Asociación Genética , Variación Genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA