Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38836403

RESUMEN

In precision medicine, both predicting the disease susceptibility of an individual and forecasting its disease-free survival are areas of key research. Besides the classical epidemiological predictor variables, data from multiple (omic) platforms are increasingly available. To integrate this wealth of information, we propose new methodology to combine both cooperative learning, a recent approach to leverage the predictive power of several datasets, and polygenic hazard score models. Polygenic hazard score models provide a practitioner with a more differentiated view of the predicted disease-free survival than the one given by merely a point estimate, for instance computed with a polygenic risk score. Our aim is to leverage the advantages of cooperative learning for the computation of polygenic hazard score models via Cox's proportional hazard model, thereby improving the prediction of the disease-free survival. In our experimental study, we apply our methodology to forecast the disease-free survival for Alzheimer's disease (AD) using three layers of data. One layer contains epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status and 10 leading principal components. Another layer contains selected genomic loci, and the last layer contains methylation data for selected CpG sites. We demonstrate that the survival curves computed via cooperative learning yield an AUC of around $0.7$, above the state-of-the-art performance of its competitors. Importantly, the proposed methodology returns (1) a linear score that can be easily interpreted (in contrast to machine learning approaches), and (2) a weighting of the predictive power of the involved data layers, allowing for an assessment of the importance of each omic (or other) platform. Similarly to polygenic hazard score models, our methodology also allows one to compute individual survival curves for each patient.


Asunto(s)
Enfermedad de Alzheimer , Medicina de Precisión , Humanos , Medicina de Precisión/métodos , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/mortalidad , Supervivencia sin Enfermedad , Aprendizaje Automático , Modelos de Riesgos Proporcionales , Herencia Multifactorial , Masculino , Femenino , Multiómica
2.
Genet Epidemiol ; 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38982682

RESUMEN

The prediction of the susceptibility of an individual to a certain disease is an important and timely research area. An established technique is to estimate the risk of an individual with the help of an integrated risk model, that is, a polygenic risk score with added epidemiological covariates. However, integrated risk models do not capture any time dependence, and may provide a point estimate of the relative risk with respect to a reference population. The aim of this work is twofold. First, we explore and advocate the idea of predicting the time-dependent hazard and survival (defined as disease-free time) of an individual for the onset of a disease. This provides a practitioner with a much more differentiated view of absolute survival as a function of time. Second, to compute the time-dependent risk of an individual, we use published methodology to fit a Cox's proportional hazard model to data from a genetic SNP study of time to Alzheimer's disease (AD) onset, using the lasso to incorporate further epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status, 10 leading principal components, and selected genomic loci. We apply the lasso for Cox's proportional hazards to a data set of 6792 AD patients (composed of 4102 cases and 2690 controls) and 87 covariates. We demonstrate that fitting a lasso model for Cox's proportional hazards allows one to obtain more accurate survival curves than with state-of-the-art (likelihood-based) methods. Moreover, the methodology allows one to obtain personalized survival curves for a patient, thus giving a much more differentiated view of the expected progression of a disease than the view offered by integrated risk models. The runtime to compute personalized survival curves is under a minute for the entire data set of AD patients, thus enabling it to handle datasets with 60,000-100,000 subjects in less than 1 h.

3.
Hum Mol Genet ; 32(4): 696-707, 2023 01 27.
Artículo en Inglés | MEDLINE | ID: mdl-36255742

RESUMEN

BACKGROUND: Asthma is a heterogeneous common respiratory disease that remains poorly understood. The established genetic associations fail to explain the high estimated heritability, and the prevalence of asthma differs between populations and geographic regions. Robust association analyses incorporating different genetic ancestries and whole-genome sequencing data may identify novel genetic associations. METHODS: We performed family-based genome-wide association analyses of childhood-onset asthma based on whole-genome sequencing (WGS) data for the 'The Genetic Epidemiology of Asthma in Costa Rica' study (GACRS) and the Childhood Asthma Management Program (CAMP). Based on parent-child trios with children diagnosed with asthma, we performed a single variant analysis using an additive and a recessive genetic model and a region-based association analysis of low-frequency and rare variants. RESULTS: Based on 1180 asthmatic trios (894 GACRS trios and 286 CAMP trios, a total of 3540 samples with WGS data), we identified three novel genetic loci associated with childhood-onset asthma: rs4832738 on 4p14 ($P=1.72\ast{10}^{-9}$, recessive model), rs1581479 on 8p22 ($P=1.47\ast{10}^{-8}$, additive model) and rs73367537 on 10q26 ($P=1.21\ast{10}^{-8}$, additive model in GACRS only). Integrative analyses suggested potential novel candidate genes underlying these associations: PGM2 on 4p14 and FGF20 on 8p22. CONCLUSION: Our family-based whole-genome sequencing analysis identified three novel genetic loci for childhood-onset asthma. Gene expression data and integrative analyses point to PGM2 on 4p14 and FGF20 on 8p22 as linked genes. Furthermore, region-based analyses suggest independent potential low-frequency/rare variant associations on 8p22. Follow-up analyses are needed to understand the functional mechanisms and generalizability of these associations.


Asunto(s)
Asma , Estudio de Asociación del Genoma Completo , Humanos , Predisposición Genética a la Enfermedad , Asma/genética , Sitios Genéticos , Secuenciación Completa del Genoma , Polimorfismo de Nucleótido Simple/genética , Factores de Crecimiento de Fibroblastos/genética
4.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36585781

RESUMEN

Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.


Asunto(s)
Genoma , Genómica , Análisis de Componente Principal , Frecuencia de los Genes , Simulación por Computador , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple
5.
PLoS Genet ; 18(11): e1010464, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36383614

RESUMEN

The identification and understanding of gene-environment interactions can provide insights into the pathways and mechanisms underlying complex diseases. However, testing for gene-environment interaction remains a challenge since a.) statistical power is often limited and b.) modeling of environmental effects is nontrivial and such model misspecifications can lead to false positive interaction findings. To address the lack of statistical power, recent methods aim to identify interactions on an aggregated level using, for example, polygenic risk scores. While this strategy can increase the power to detect interactions, identifying contributing genes and pathways is difficult based on these relatively global results. Here, we propose RITSS (Robust Interaction Testing using Sample Splitting), a gene-environment interaction testing framework for quantitative traits that is based on sample splitting and robust test statistics. RITSS can incorporate sets of genetic variants and/or multiple environmental factors. Based on the user's choice of statistical/machine learning approaches, a screening step selects and combines potential interactions into scores with improved interpretability. In the testing step, the application of robust statistics minimizes the susceptibility to main effect misspecifications. Using extensive simulation studies, we demonstrate that RITSS controls the type 1 error rate in a wide range of scenarios, and we show how the screening strategy influences statistical power. In an application to lung function phenotypes and human height in the UK Biobank, RITSS identified highly significant interactions based on subcomponents of genetic risk scores. While the contributing single variant interaction signals are weak, our results indicate interaction patterns that result in strong aggregated effects, providing potential insights into underlying gene-environment interaction mechanisms.


Asunto(s)
Modelos Genéticos , Polimorfismo de Nucleótido Simple , Humanos , Sitios Genéticos , Interacción Gen-Ambiente , Fenotipo , Simulación por Computador , Estudio de Asociación del Genoma Completo
6.
BMC Bioinformatics ; 25(1): 43, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38273228

RESUMEN

The computation of a similarity measure for genomic data is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases due to confounding by population stratification, for instance in linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (called the covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for the application of a randomized SVD algorithm, which is faster than the traditional computation. The fast SVD algorithm we present is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. The algorithm only assumes that row-wise and column-wise subtraction and multiplication of a vector with a sparse matrix is available, an operation that is efficiently implemented in common sparse matrix packages. An exception is the so-called Jaccard matrix, which does not have a structure applicable for the fast SVD algorithm. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the accuracy (in [Formula: see text] norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation, and derive the theoretical runtime of our algorithm. We illustrate that the approximation error is low in practice and empirically verify the theoretical runtime scalings on both simulated data and data of the 1000 Genome Project.


Asunto(s)
Genoma , Genómica , Algoritmos , Modelos Lineales
7.
Hum Mol Genet ; 31(22): 3873-3885, 2022 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-35766891

RESUMEN

RATIONALE: Genetic variation has a substantial contribution to chronic obstructive pulmonary disease (COPD) and lung function measurements. Heritability estimates using genome-wide genotyping data can be biased if analyses do not appropriately account for the nonuniform distribution of genetic effects across the allele frequency and linkage disequilibrium (LD) spectrum. In addition, the contribution of rare variants has been unclear. OBJECTIVES: We sought to assess the heritability of COPD and lung function using whole-genome sequence data from the Trans-Omics for Precision Medicine program. METHODS: Using the genome-based restricted maximum likelihood method, we partitioned the genome into bins based on minor allele frequency and LD scores and estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio in 11 051 European ancestry and 5853 African-American participants. MEASUREMENTS AND MAIN RESULTS: In European ancestry participants, the estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio were 35.5%, 55.6% and 32.5%, of which 18.8%, 19.7%, 17.8% were from common variants, and 16.6%, 35.8%, and 14.6% were from rare variants. These estimates had wide confidence intervals, with common variants and some sets of rare variants showing a statistically significant contribution (P-value < 0.05). In African-Americans, common variant heritability was similar to European ancestry participants, but lower sample size precluded calculation of rare variant heritability. CONCLUSIONS: Our study provides updated and unbiased estimates of heritability for COPD and lung function, and suggests an important contribution of rare variants. Larger studies of more diverse ancestry will improve accuracy of these estimates.


Asunto(s)
Predisposición Genética a la Enfermedad , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Polimorfismo de Nucleótido Simple/genética , Enfermedad Pulmonar Obstructiva Crónica/genética , Estudio de Asociación del Genoma Completo , Fenotipo
8.
Eur Respir J ; 63(5)2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38514093

RESUMEN

RATIONALE: Respiratory virus-induced inflammation is the leading cause of asthma exacerbation, frequently accompanied by induction of interferon-stimulated genes (ISGs). How asthma-susceptibility genes modulate cellular response upon viral infection by fine-tuning ISG induction and subsequent airway inflammation in genetically susceptible asthma patients remains largely unknown. OBJECTIVES: To decipher the functions of gasdermin B (encoded by GSDMB) in respiratory virus-induced lung inflammation. METHODS: In two independent cohorts, we analysed expression correlation between GSDMB and ISG s. In human bronchial epithelial cell line or primary bronchial epithelial cells, we generated GSDMB-overexpressing and GSDMB-deficient cells. A series of quantitative PCR, ELISA and co-immunoprecipitation assays were performed to determine the function and mechanism of GSDMB for ISG induction. We also generated a novel transgenic mouse line with inducible expression of human unique GSDMB gene in airway epithelial cells and infected the mice with respiratory syncytial virus to determine the role of GSDMB in respiratory syncytial virus-induced lung inflammation in vivo. RESULTS: GSDMB is one of the most significant asthma-susceptibility genes at 17q21 and acts as a novel RNA sensor, promoting mitochondrial antiviral-signalling protein (MAVS)-TANK binding kinase 1 (TBK1) signalling and subsequent inflammation. In airway epithelium, GSDMB is induced by respiratory viral infections. Expression of GSDMB and ISGs significantly correlated in respiratory epithelium from two independent asthma cohorts. Notably, inducible expression of human GSDMB in mouse airway epithelium led to enhanced ISGs induction and increased airway inflammation with mucus hypersecretion upon respiratory syncytial virus infection. CONCLUSIONS: GSDMB promotes ISGs expression and airway inflammation upon respiratory virus infection, thereby conferring asthma risk in risk allele carriers.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales , Asma , Gasderminas , Proteínas Serina-Treonina Quinasas , Transducción de Señal , Animales , Humanos , Asma/metabolismo , Asma/genética , Ratones , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Proteínas Adaptadoras Transductoras de Señales/genética , Proteínas Serina-Treonina Quinasas/metabolismo , Proteínas Serina-Treonina Quinasas/genética , Ratones Transgénicos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Predisposición Genética a la Enfermedad , Infecciones por Virus Sincitial Respiratorio/metabolismo , Infecciones por Virus Sincitial Respiratorio/genética , Células Epiteliales/metabolismo , Línea Celular , Bronquios/metabolismo , Bronquios/patología , Neumonía/metabolismo , Neumonía/genética , Neumonía/virología , Femenino , Pulmón/metabolismo , Pulmón/patología
9.
Am J Respir Crit Care Med ; 208(7): 791-801, 2023 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-37523715

RESUMEN

Rationale: In addition to rare genetic variants and the MUC5B locus, common genetic variants contribute to idiopathic pulmonary fibrosis (IPF) risk. The predictive power of common variants outside the MUC5B locus for IPF and interstitial lung abnormalities (ILAs) is unknown. Objectives: We tested the predictive value of IPF polygenic risk scores (PRSs) with and without the MUC5B region on IPF, ILA, and ILA progression. Methods: We developed PRSs that included (PRS-M5B) and excluded (PRS-NO-M5B) the MUC5B region (500-kb window around rs35705950-T) using an IPF genome-wide association study. We assessed PRS associations with area under the receiver operating characteristic curve (AUC) metrics for IPF, ILA, and ILA progression. Measurements and Main Results: We included 14,650 participants (1,970 IPF; 1,068 ILA) from six multi-ancestry population-based and case-control cohorts. In cases excluded from genome-wide association study, the PRS-M5B (odds ratio [OR] per SD of the score, 3.1; P = 7.1 × 10-95) and PRS-NO-M5B (OR per SD, 2.8; P = 2.5 × 10-87) were associated with IPF. Participants in the top PRS-NO-M5B quintile had ∼sevenfold odds for IPF compared with those in the first quintile. A clinical model predicted IPF (AUC, 0.61); rs35705950-T and PRS-NO-M5B demonstrated higher AUCs (0.73 and 0.7, respectively), and adding both genetic predictors to a clinical model yielded the highest performance (AUC, 0.81). The PRS-NO-M5B was associated with ILA (OR, 1.25) and ILA progression (OR, 1.16) in European ancestry participants. Conclusions: A common genetic variant risk score complements the MUC5B variant to identify individuals at high risk of interstitial lung abnormalities and pulmonary fibrosis.


Asunto(s)
Estudio de Asociación del Genoma Completo , Fibrosis Pulmonar Idiopática , Humanos , Fibrosis Pulmonar Idiopática/genética , Factores de Riesgo , Pulmón , Mucina 5B/genética , Predisposición Genética a la Enfermedad
10.
Alzheimers Dement ; 20(5): 3397-3405, 2024 05.
Artículo en Inglés | MEDLINE | ID: mdl-38563508

RESUMEN

INTRODUCTION: Genome-wide association studies have identified numerous disease susceptibility loci (DSLs) for Alzheimer's disease (AD). However, only a limited number of studies have investigated the dependence of the genetic effect size of established DSLs on genetic ancestry. METHODS: We utilized the whole genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) including 35,569 participants. A total of 25,459 subjects in four distinct populations (African ancestry, non-Hispanic White, admixed Hispanic, and Asian) were analyzed. RESULTS: We found that nine DSLs showed significant heterogeneity across populations. Single nucleotide polymorphism (SNP) rs2075650 in translocase of outer mitochondrial membrane 40 (TOMM40) showed the largest heterogeneity (Cochran's Q = 0.00, I2 = 90.08), followed by other SNPs in apolipoprotein C1 (APOC1) and apolipoprotein E (APOE). Two additional loci, signal-induced proliferation-associated 1 like 2 (SIPA1L2) and solute carrier 24 member 4 (SLC24A4), showed significant heterogeneity across populations. DISCUSSION: We observed substantial heterogeneity for the APOE-harboring 19q13.32 region with TOMM40/APOE/APOC1 genes. The largest risk effect was seen among African Americans, while Asians showed a surprisingly small risk effect.


Asunto(s)
Enfermedad de Alzheimer , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Proteínas del Complejo de Importación de Proteínas Precursoras Mitocondriales , Polimorfismo de Nucleótido Simple , Humanos , Enfermedad de Alzheimer/genética , Predisposición Genética a la Enfermedad/genética , Polimorfismo de Nucleótido Simple/genética , Apolipoproteínas E/genética , Femenino , Masculino , Apolipoproteína C-I/genética , Anciano , Proteínas de Transporte de Membrana/genética , Sitios Genéticos/genética
11.
Eur Respir J ; 61(1)2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35953101

RESUMEN

BACKGROUND: Sex differences related to immune responses can influence atopic manifestations in childhood asthma. While genome-wide association studies have investigated a sex-specific genetic architecture of the immune response, gene-by-sex interactions have not been extensively analysed for atopy-related markers including allergy skin tests, IgE and eosinophils in asthmatic children. METHODS: We performed a genome-wide gene-by-sex interaction analysis for atopy-related markers using whole-genome sequencing data based on 889 trios from the Genetic Epidemiology of Asthma in Costa Rica Study (GACRS) and 284 trios from the Childhood Asthma Management Program (CAMP). We also tested the findings in UK Biobank participants with self-reported childhood asthma. Furthermore, downstream analyses in GACRS integrated gene expression to disentangle observed associations. RESULTS: Single nucleotide polymorphism (SNP) rs1255383 at 10q11.21 demonstrated a genome-wide significant gene-by-sex interaction (pinteraction=9.08×10-10) for atopy (positive skin test) with opposite direction of effects between females and males. In the UK Biobank participants with a history of childhood asthma, the signal was consistently observed with the same sex-specific effect directions for high eosinophil count (pinteraction=0.0058). Gene expression of ZNF33B (zinc finger protein 33B), located at 10q11.21, was moderately associated with atopy in girls, but not in boys. CONCLUSIONS: We report SNPs in/near a zinc finger gene as novel sex-differential loci for atopy-related markers with opposite effect directions in females and males. A potential role for ZNF33B should be studied further as an important driver of sex-divergent features of atopy in childhood asthma.


Asunto(s)
Asma , Hipersensibilidad Inmediata , Niño , Humanos , Masculino , Femenino , Estudio de Asociación del Genoma Completo , Inmunoglobulina E , Asma/epidemiología , Hipersensibilidad Inmediata/genética , Hipersensibilidad Inmediata/epidemiología , Eosinófilos , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
12.
Respir Res ; 24(1): 63, 2023 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-36842969

RESUMEN

BACKGROUND: Asthma is a heterogeneous disease with high morbidity. Advancement in high-throughput multi-omics approaches has enabled the collection of molecular assessments at different layers, providing a complementary perspective of complex diseases. Numerous computational methods have been developed for the omics-based patient classification or disease outcome prediction. Yet, a systematic benchmarking of those methods using various combinations of omics data for the prediction of asthma development is still lacking. OBJECTIVE: We aimed to investigate the computational methods in disease status prediction using multi-omics data. METHOD: We systematically benchmarked 18 computational methods using all the 63 combinations of six omics data (GWAS, miRNA, mRNA, microbiome, metabolome, DNA methylation) collected in The Vitamin D Antenatal Asthma Reduction Trial (VDAART) cohort. We evaluated each method using standard performance metrics for each of the 63 omics combinations. RESULTS: Our results indicate that overall Logistic Regression, Multi-Layer Perceptron, and MOGONET display superior performance, and the combination of transcriptional, genomic and microbiome data achieves the best prediction. Moreover, we find that including the clinical data can further improve the prediction performance for some but not all the omics combinations. CONCLUSIONS: Specific omics combinations can reach the optimal prediction of asthma development in children. And certain computational methods showed superior performance than other methods.


Asunto(s)
Asma , MicroARNs , Embarazo , Humanos , Femenino , Niño , Benchmarking , Genómica/métodos , Asma/diagnóstico , Asma/epidemiología , Asma/genética , Pronóstico
13.
Mol Psychiatry ; 27(4): 1963-1969, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35246634

RESUMEN

Alzheimer's disease (AD) is a genetically complex disease for which nearly 40 loci have now been identified via genome-wide association studies (GWAS). We attempted to identify groups of rare variants (alternate allele frequency <0.01) associated with AD in a region-based, whole-genome sequencing (WGS) association study (rvGWAS) of two independent AD family datasets (NIMH/NIA; 2247 individuals; 605 families). Employing a sliding window approach across the genome, we identified several regions that achieved association p values <10-6, using the burden test or the SKAT statistic. The genomic region around the dystobrevin beta (DTNB) gene was identified with the burden and SKAT test and replicated in case/control samples from the ADSP study reaching genome-wide significance after meta-analysis (pmeta = 4.74 × 10-8). SKAT analysis also revealed region-based association around the Discs large homolog 2 (DLG2) gene and replicated in case/control samples from the ADSP study (pmeta = 1 × 10-6). In conclusion, in a region-based rvGWAS of AD we identified two novel AD genes, DLG2 and DTNB, based on association with rare variants.


Asunto(s)
Enfermedad de Alzheimer , Proteínas Asociadas a la Distrofina/genética , Neuropéptidos/genética , Enfermedad de Alzheimer/genética , Ácido Ditionitrobenzoico , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Genómica , Guanilato-Quinasas/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Proteínas Supresoras de Tumor/genética , Secuenciación Completa del Genoma
14.
BMC Bioinformatics ; 23(1): 547, 2022 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-36536276

RESUMEN

As of June 2022, the GISAID database contains more than 11 million SARS-CoV-2 genomes, including several thousand nucleotide sequences for the most common variants such as delta or omicron. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We start by assessing the similarity of all pairs of nucleotide sequences using the Jaccard index and principal component analysis. As shown previously in the literature, an unsupervised cluster analysis applied to the SARS-CoV-2 genomes results in clusters of sequences according to certain characteristics such as their strain or their clade. Importantly, we observe that nucleotide sequences of common variants are often outliers in clusters of sequences stemming from variants identified earlier on during the pandemic. Motivated by this finding, we are interested in applying outlier detection to nucleotide sequences. We demonstrate that nucleotide sequences of common variants (such as alpha, delta, or omicron) can be identified solely based on a statistical outlier criterion. We argue that outlier detection might be a useful surveillance tool to identify emerging variants in real time as the pandemic progresses.


Asunto(s)
COVID-19 , Humanos , Secuencia de Bases , SARS-CoV-2 , Análisis por Conglomerados , Bases de Datos Factuales
15.
Genet Epidemiol ; 45(1): 82-98, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32929743

RESUMEN

locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genoma , Algoritmos , Genómica , Humanos , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma
16.
Genet Epidemiol ; 45(7): 685-693, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34159627

RESUMEN

SARS-CoV-2 mortality has been extensively studied in relation to host susceptibility. How sequence variations in the SARS-CoV-2 genome affect pathogenicity is poorly understood. Starting in October 2020, using the methodology of genome-wide association studies (GWAS), we looked at the association between whole-genome sequencing (WGS) data of the virus and COVID-19 mortality as a potential method of early identification of highly pathogenic strains to target for containment. Although continuously updating our analysis, in December 2020, we analyzed 7548 single-stranded SARS-CoV-2 genomes of COVID-19 patients in the GISAID database and associated variants with mortality using a logistic regression. In total, evaluating 29,891 sequenced loci of the viral genome for association with patient/host mortality, two loci, at 12,053 and 25,088 bp, achieved genome-wide significance (p values of 4.09e-09 and 4.41e-23, respectively), though only 25,088 bp remained significant in follow-up analyses. Our association findings were exclusively driven by the samples that were submitted from Brazil (p value of 4.90e-13 for 25,088 bp). The mutation frequency of 25,088 bp in the Brazilian samples on GISAID has rapidly increased from about 0.4 in October/December 2020 to 0.77 in March 2021. Although GWAS methodology is suitable for samples in which mutation frequencies varies between geographical regions, it cannot account for mutation frequencies that change rapidly overtime, rendering a GWAS follow-up analysis of the GISAID samples that have been submitted after December 2020 as invalid. The locus at 25,088 bp is located in the P.1 strain, which later (April 2021) became one of the distinguishing loci (precisely, substitution V1176F) of the Brazilian strain as defined by the Centers for Disease Control. Specifically, the mutations at 25,088 bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Since the mutations alter amino acid coding sequences, they potentially imposing structural changes that could enhance viral infectivity and symptom severity. Our analysis suggests that GWAS methodology can provide suitable analysis tools for the real-time detection of new more transmissible and pathogenic viral strains in databases such as GISAID, though new approaches are needed to accommodate rapidly changing mutation frequencies over time, in the presence of simultaneously changing case/control ratios. Improvements of the associated metadata/patient information in terms of quality and availability will also be important to fully utilize the potential of GWAS methodology in this field.


Asunto(s)
COVID-19 , Glicoproteína de la Espiga del Coronavirus , Brasil , Estudio de Asociación del Genoma Completo , Humanos , Mutación , Filogenia , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus/genética
17.
Bioinformatics ; 36(22-23): 5432-5438, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33367522

RESUMEN

MOTIVATION: Analysis of rare variants in family-based studies remains a challenge. Transmission-based approaches provide robustness against population stratification, but the evaluation of the significance of test statistics based on asymptotic theory can be imprecise. Also, power will depend heavily on the choice of the test statistic and on the underlying genetic architecture of the locus, which will be generally unknown. RESULTS: In our proposed framework, we utilize the FBAT haplotype algorithm to obtain the conditional offspring genotype distribution under the null hypothesis given the sufficient statistic. Based on this conditional offspring genotype distribution, the significance of virtually any association test statistic can be evaluated based on simulations or exact computations, without the need for asymptotic approximations. Besides standard linear burden-type statistics, this enables our approach to also evaluate other test statistics such as variance components statistics, higher criticism approaches, and maximum-single-variant-statistics, where asymptotic theory might be involved or does not provide accurate approximations for rare variant data. Based on these P-values, combined test statistics such as the aggregated Cauchy association test (ACAT) can also be utilized. In simulation studies, we show that our framework outperforms existing approaches for family-based studies in several scenarios. We also applied our methodology to a TOPMed whole-genome sequencing dataset with 897 asthmatic trios from Costa Rica. AVAILABILITY AND IMPLEMENTATION: FBAT software is available at https://sites.google.com/view/fbatwebpage. Simulation code is available at https://github.com/julianhecker/FBAT_rare_variant_test_simulations. Whole-genome sequencing data for 'NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica' is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000988.v4.p1. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

18.
Mol Psychiatry ; 26(8): 4179-4190, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-31712720

RESUMEN

Panic disorder (PD) has a lifetime prevalence of 2-4% and heritability estimates of 40%. The contributory genetic variants remain largely unknown, with few and inconsistent loci having been reported. The present report describes the largest genome-wide association study (GWAS) of PD to date comprising genome-wide genotype data of 2248 clinically well-characterized PD patients and 7992 ethnically matched controls. The samples originated from four European countries (Denmark, Estonia, Germany, and Sweden). Standard GWAS quality control procedures were conducted on each individual dataset, and imputation was performed using the 1000 Genomes Project reference panel. A meta-analysis was then performed using the Ricopili pipeline. No genome-wide significant locus was identified. Leave-one-out analyses generated highly significant polygenic risk scores (PRS) (explained variance of up to 2.6%). Linkage disequilibrium (LD) score regression analysis of the GWAS data showed that the estimated heritability for PD was 28.0-34.2%. After correction for multiple testing, a significant genetic correlation was found between PD and major depressive disorder, depressive symptoms, and neuroticism. A total of 255 single-nucleotide polymorphisms (SNPs) with p < 1 × 10-4 were followed up in an independent sample of 2408 PD patients and 228,470 controls from Denmark, Iceland and the Netherlands. In the combined analysis, SNP rs144783209 showed the strongest association with PD (pcomb = 3.10 × 10-7). Sign tests revealed a significant enrichment of SNPs with a discovery p-value of <0.0001 in the combined follow up cohort (p = 0.048). The present integrative analysis represents a major step towards the elucidation of the genetic susceptibility to PD.


Asunto(s)
Trastorno Depresivo Mayor , Neuroticismo , Trastorno de Pánico , Dinamarca , Depresión/genética , Trastorno Depresivo Mayor/genética , Estonia , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Alemania , Humanos , Trastorno de Pánico/genética , Polimorfismo de Nucleótido Simple , Suecia
19.
Genet Epidemiol ; 44(2): 139-147, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31713269

RESUMEN

In the analysis of current life science datasets, we often encounter scenarios in which the application of asymptotic theory to hypothesis testing can be problematic. Besides improved asymptotic results, permutation/simulation-based tests are a general approach to address this issue. However, these randomized tests can impose a massive computational burden, for example, in scenarios in which large numbers of statistical tests are computed, and the specified significance level is very small. Stopping rules aim to assess significance with the smallest possible number of draws while controlling the probabilities of errors due to statistical uncertainty. In this communication, we derive a general stopping rule, QUICK-STOP, based on the sequential testing theory that is easy to implement, controls the error probabilities rigorously, and is nearly optimal in terms of expected draws. In a simulation study, we show that our approach outperforms current stopping approaches for general randomized tests by factor 10 and does not impose an additional computational burden. We illustrate our approach by applying our stopping rule to a single-variant analysis of a whole-genome sequencing study for lung function.


Asunto(s)
Simulación por Computador , Intervalos de Confianza , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Análisis Numérico Asistido por Computador , Docilidad , Probabilidad , Enfermedad Pulmonar Obstructiva Crónica/genética
20.
Thorax ; 76(12): 1227-1230, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-33888571

RESUMEN

Most genome-wide association studies of obesity and body mass index (BMI) have so far assumed an additive mode of inheritance in their analysis, although association testing supports a recessive effect for some of the established loci, for example, rs1421085 in FTO In two whole-genome sequencing (WGS) studies of children with asthma and their parents (892 Costa Rican trios and 286 North American trios), we discovered an association between a locus (rs9292139) in LOC102724122 and BMI that reaches genome-wide significance under a recessive model in the combined analysis. As the association does not achieve significance under an additive model, our finding illustrates the benefits of the recessive model in WGS analyses.


Asunto(s)
Asma , Estudio de Asociación del Genoma Completo , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética , Asma/genética , Índice de Masa Corporal , Niño , Predisposición Genética a la Enfermedad , Humanos , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA