Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 159
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
PLoS Genet ; 20(3): e1011192, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38517939

RESUMEN

The HostSeq initiative recruited 10,059 Canadians infected with SARS-CoV-2 between March 2020 and March 2023, obtained clinical information on their disease experience and whole genome sequenced (WGS) their DNA. We analyzed the WGS data for genetic contributors to severe COVID-19 (considering 3,499 hospitalized cases and 4,975 non-hospitalized after quality control). We investigated the evidence for replication of loci reported by the International Host Genetics Initiative (HGI); analyzed the X chromosome; conducted rare variant gene-based analysis and polygenic risk score testing. Population stratification was adjusted for using meta-analysis across ancestry groups. We replicated two loci identified by the HGI for COVID-19 severity: the LZTFL1/SLC6A20 locus on chromosome 3 and the FOXP4 locus on chromosome 6 (the latter with a variant significant at P < 5E-8). We found novel significant associations with MRAS and WDR89 in gene-based analyses, and constructed a polygenic risk score that explained 1.01% of the variance in severe COVID-19. This study provides independent evidence confirming the robustness of previously identified COVID-19 severity loci by the HGI and identifies novel genes for further investigation.


Asunto(s)
COVID-19 , Pueblos de América del Norte , Humanos , COVID-19/genética , SARS-CoV-2/genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Canadá/epidemiología , Estudio de Asociación del Genoma Completo , Proteínas de Transporte de Membrana , Factores de Transcripción Forkhead
2.
Stat Med ; 2024 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-38932470

RESUMEN

Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."

3.
Nat Rev Genet ; 19(2): 110-124, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29225335

RESUMEN

Genetic architecture describes the characteristics of genetic variation that are responsible for heritable phenotypic variability. It depends on the number of genetic variants affecting a trait, their frequencies in the population, the magnitude of their effects and their interactions with each other and the environment. Defining the genetic architecture of a complex trait or disease is central to the scientific and clinical goals of human genetics, which are to understand disease aetiology and aid in disease screening, diagnosis, prognosis and therapy. Recent technological advances have enabled genome-wide association studies and emerging next-generation sequencing studies to begin to decipher the nature of the heritable contribution to traits and disease. Here, we describe the types of genetic architecture that have been observed, how architecture can be measured and why an improved understanding of genetic architecture is central to future advances in the field.


Asunto(s)
Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Estudio de Asociación del Genoma Completo , Humanos
4.
Nucleic Acids Res ; 50(15): 8441-8458, 2022 08 26.
Artículo en Inglés | MEDLINE | ID: mdl-35947648

RESUMEN

Defining the impact of missense mutations on the recognition of DNA motifs is highly dependent on bioinformatic tools that define DNA binding elements. However, classical motif analysis tools remain limited in their capacity to identify subtle changes in complex binding motifs between distinct conditions. To overcome this limitation, we developed a new tool, MoMotif, that facilitates a sensitive identification, at the single base-pair resolution, of complex, or subtle, alterations to core binding motifs, discerned from ChIP-seq data. We employed MoMotif to define the previously uncharacterized recognition motif of CTCF zinc-finger 1 (ZF1), and to further define the impact of CTCF ZF1 mutation on its association with chromatin. Mutations of CTCF ZF1 are exclusive to breast cancer and are associated with metastasis and therapeutic resistance, but the underlying mechanisms are unclear. Using MoMotif, we identified an extension of the CTCF core binding motif, necessitating a functional ZF1 to bind appropriately. Using a combination of ChIP-Seq and RNA-Seq, we discover that the inability to bind this extended motif drives an altered transcriptional program associated with the oncogenic phenotypes observed clinically. Our study demonstrates that MoMotif is a powerful new tool for comparative ChIP-seq analysis and characterising DNA-protein contacts.


Asunto(s)
Cromatina , Zinc , Factor de Unión a CCCTC/genética , Factor de Unión a CCCTC/metabolismo , Zinc/metabolismo , Cromatina/genética , ADN/química , Mutación , Sitios de Unión
5.
Dev Psychobiol ; 66(4): e22481, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38538956

RESUMEN

This study explored the interactions among prenatal stress, child sex, and polygenic risk scores (PGS) for attention-deficit/hyperactivity disorder (ADHD) on structural developmental changes of brain regions implicated in ADHD. We used data from two population-based birth cohorts: Growing Up in Singapore Towards healthy Outcomes (GUSTO) from Singapore (n = 113) and Generation R from Rotterdam, the Netherlands (n = 433). Prenatal stress was assessed using questionnaires. We obtained latent constructs of prenatal adversity and prenatal mood problems using confirmatory factor analyses. The participants were genotyped using genome-wide single nucleotide polymorphism arrays, and ADHD PGSs were computed. Magnetic resonance imaging scans were acquired at 4.5 and 6 years (GUSTO), and at 10 and 14 years (Generation R). We estimated the age-related rate of change for brain outcomes related to ADHD and performed (1) prenatal stress by sex interaction models, (2) prenatal stress by ADHD PGS interaction models, and (3) 3-way interaction models, including prenatal stress, sex, and ADHD PGS. We observed an interaction between prenatal stress and ADHD PGS on mean cortical thickness annual rate of change in Generation R (i.e., in individuals with higher ADHD PGS, higher prenatal stress was associated with a lower rate of cortical thinning, whereas in individuals with lower ADHD PGS, higher prenatal stress was associated with a higher rate of cortical thinning). None of the other tested interactions were statistically significant. Higher prenatal stress may promote a slower brain developmental rate during adolescence in individuals with higher ADHD genetic vulnerability, whereas it may promote a faster brain developmental rate in individuals with lower ADHD genetic vulnerability.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad , Niño , Adolescente , Humanos , Trastorno por Déficit de Atención con Hiperactividad/genética , Adelgazamiento de la Corteza Cerebral , Encéfalo/diagnóstico por imagen , Puntuación de Riesgo Genético , Herencia Multifactorial
6.
Genet Epidemiol ; 46(7): 446-462, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35753057

RESUMEN

5-hydroxymethylcytosine (5hmC) is a methylation state linked with gene regulation, commonly found in cells of the central nervous system. 5hmC is associated with demethylation of cytosines from 5-methylcytosine (5mC) to the unmethylated state. The presence of 5hmC can be inferred by a paired experiment involving bisulfite and oxidation-bisulfite treatments on the same sample, followed by a methylation assay using a platform such as the Illumina Infinium MethylationEPIC BeadChip (EPIC). Existing methods for analysis of the resulting EPIC data are not ideal. Most approaches ignore the correlation between the two experiments and any imprecision associated with DNA damage from the additional treatment. Estimates of 5mC/5hmC levels free from these limitations are desirable to reveal associations between methylation states and phenotypes. We propose a hierarchical Bayesian method called Constrained HYdroxy Methylation Estimation (CHYME) to simultaneously estimate 5mC/5hmC signals as well as any associations between these signals and covariates or phenotypes, while accounting for the potential impact of DNA damage and dependencies induced by the experimental design. Simulations show that CHYME has valid type 1 error and better power than a range of alternative methods, including the popular OxyBS method and linear models on transformed proportions. Other methods we examined suffer from hugely inflated type 1 error for inference on 5hmC proportions. We use CHYME to explore genome-wide associations between 5mC/5hmC levels and cause of death in postmortem prefrontal cortex brain tissue samples. These analyses indicate that CHYME is a useful tool to reveal phenotypic associations with 5mC/5hmC levels.


Asunto(s)
Metilación de ADN , Modelos Genéticos , Teorema de Bayes , Citosina , Metilación de ADN/genética , Humanos , Fenotipo
7.
PLoS Genet ; 16(5): e1008766, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32365090

RESUMEN

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects' relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).


Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Animales , Simulación por Computador , Cruzamientos Genéticos , Genética de Población/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Leishmania tropica/genética , Leishmaniasis Cutánea/genética , Modelos Lineales , Ratones , Ratones Endogámicos , Herencia Multifactorial/genética , Mycobacterium bovis , Dinámica Poblacional , Tamaño de la Muestra , Programas Informáticos , Tuberculosis/genética , Tuberculosis/patología
8.
Genet Epidemiol ; 45(8): 874-890, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34468045

RESUMEN

Medical research increasingly includes high-dimensional regression modeling with a need for error-in-variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error-corrected cross-validation to enable error-in-variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high-dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross-validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate-adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error-in-variables adjustments more accessible for high-dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics-facilitated personalized medicine research.


Asunto(s)
Algoritmos , Modelos Genéticos , Humanos , Proyectos de Investigación
9.
Hum Genet ; 141(8): 1431-1447, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35147782

RESUMEN

Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single-nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript-altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.


Asunto(s)
Diabetes Mellitus Tipo 2 , Estudio de Asociación del Genoma Completo , Cromatina/genética , Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo
10.
Genet Med ; 24(7): 1545-1555, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35460399

RESUMEN

PURPOSE: The study aimed to evaluate whether polygenic risk scores could be helpful in addition to family history for triaging individuals to undergo deep-depth diagnostic sequencing for identifying monogenic causes of complex diseases. METHODS: Among 44,550 exome-sequenced European ancestry UK Biobank participants, we identified individuals with a clinically reported or computationally predicted monogenic pathogenic variant for breast cancer, bowel cancer, heart disease, diabetes, or Alzheimer disease. We derived polygenic risk scores for these diseases. We tested whether a polygenic risk score could identify rare pathogenic variant heterozygotes among individuals with a parental disease history. RESULTS: Monogenic causes of complex diseases were more prevalent among individuals with a parental disease history than in the rest of the population. Polygenic risk scores showed moderate discriminative power to identify familial monogenic causes. For instance, we showed that prescreening the patients with a polygenic risk score for type 2 diabetes can prioritize individuals to undergo diagnostic sequencing for monogenic diabetes variants and reduce needs for such sequencing by up to 37%. CONCLUSION: Among individuals with a family history of complex diseases, those with a low polygenic risk score are more likely to have monogenic causes of the disease and could be prioritized to undergo genetic testing.


Asunto(s)
Diabetes Mellitus Tipo 2 , Herencia Multifactorial , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/genética , Exoma , Predisposición Genética a la Enfermedad , Humanos , Herencia Multifactorial/genética , Factores de Riesgo
11.
Clin Proteomics ; 19(1): 34, 2022 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-36171541

RESUMEN

INTRODUCTION: Severe COVID-19 leads to important changes in circulating immune-related proteins. To date it has been difficult to understand their temporal relationship and identify cytokines that are drivers of severe COVID-19 outcomes and underlie differences in outcomes between sexes. Here, we measured 147 immune-related proteins during acute COVID-19 to investigate these questions. METHODS: We measured circulating protein abundances using the SOMAscan nucleic acid aptamer panel in two large independent hospital-based COVID-19 cohorts in Canada and the United States. We fit generalized additive models with cubic splines from the start of symptom onset to identify protein levels over the first 14 days of infection which were different between severe cases and controls, adjusting for age and sex. Severe cases were defined as individuals with COVID-19 requiring invasive or non-invasive mechanical respiratory support. RESULTS: 580 individuals were included in the analysis. Mean subject age was 64.3 (sd 18.1), and 47% were male. Of the 147 proteins, 69 showed a significant difference between cases and controls (p < 3.4 × 10-4). Three clusters were formed by 108 highly correlated proteins that replicated in both cohorts, making it difficult to determine which proteins have a true causal effect on severe COVID-19. Six proteins showed sex differences in levels over time, of which 3 were also associated with severe COVID-19: CCL26, IL1RL2, and IL3RA, providing insights to better understand the marked differences in outcomes by sex. CONCLUSIONS: Severe COVID-19 is associated with large changes in 69 immune-related proteins. Further, five proteins were associated with sex differences in outcomes. These results provide direct insights into immune-related proteins that are strongly influenced by severe COVID-19 infection.

12.
Mol Psychiatry ; 26(6): 2663-2676, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33414497

RESUMEN

Genomic copy number variants (CNVs) are routinely identified and reported back to patients with neuropsychiatric disorders, but their quantitative effects on essential traits such as cognitive ability are poorly documented. We have recently shown that the effect size of deletions on cognitive ability can be statistically predicted using measures of intolerance to haploinsufficiency. However, the effect sizes of duplications remain unknown. It is also unknown if the effect of multigenic CNVs are driven by a few genes intolerant to haploinsufficiency or distributed across tolerant genes as well. Here, we identified all CNVs > 50 kilobases in 24,092 individuals from unselected and autism cohorts with assessments of general intelligence. Statistical models used measures of intolerance to haploinsufficiency of genes included in CNVs to predict their effect size on intelligence. Intolerant genes decrease general intelligence by 0.8 and 2.6 points of intelligence quotient when duplicated or deleted, respectively. Effect sizes showed no heterogeneity across cohorts. Validation analyses demonstrated that models could predict CNV effect sizes with 78% accuracy. Data on the inheritance of 27,766 CNVs showed that deletions and duplications with the same effect size on intelligence occur de novo at the same frequency. We estimated that around 10,000 intolerant and tolerant genes negatively affect intelligence when deleted, and less than 2% have large effect sizes. Genes encompassed in CNVs were not enriched in any GOterms but gene regulation and brain expression were GOterms overrepresented in the intolerant subgroup. Such pervasive effects on cognition may be related to emergent properties of the genome not restricted to a limited number of biological pathways.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma , Cognición , Variaciones en el Número de Copia de ADN/genética , Dosificación de Gen , Humanos , Pruebas de Inteligencia
13.
J Child Psychol Psychiatry ; 63(6): 636-645, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-34389974

RESUMEN

BACKGROUND: Polygenic risk scores (PRSs) operationalize genetic propensity toward a particular mental disorder and hold promise as early predictors of psychopathology, but before a PRS can be used clinically, explanatory power must be increased and the specificity for a psychiatric domain established. To enable early detection, it is crucial to study these psychometric properties in childhood. We examined whether PRSs associate more with general or with specific psychopathology in school-aged children. Additionally, we tested whether psychiatric PRSs can be combined into a multi-PRS score for improved performance. METHODS: We computed 16 PRSs based on GWASs of psychiatric phenotypes, but also neuroticism and cognitive ability, in mostly adult populations. Study participants were 9,247 school-aged children from three population-based cohorts of the DREAM-BIG consortium: ALSPAC (UK), The Generation R Study (Netherlands), and MAVAN (Canada). We associated each PRS with general and specific psychopathology factors, derived from a bifactor model based on self-report and parental, teacher, and observer reports. After fitting each PRS in separate models, we also tested a multi-PRS model, in which all PRSs are entered simultaneously as predictors of the general psychopathology factor. RESULTS: Seven PRSs were associated with the general psychopathology factor after multiple testing adjustment, two with specific externalizing and five with specific internalizing psychopathology. PRSs predicted general psychopathology independently of each other, with the exception of depression and depressive symptom PRSs. Most PRSs associated with a specific psychopathology domain, were also associated with general child psychopathology. CONCLUSIONS: The results suggest that PRSs based on current GWASs of psychiatric phenotypes tend to be associated with general psychopathology, or both general and specific psychiatric domains, but not with one specific psychopathology domain only. Furthermore, PRSs can be combined to improve predictive ability. PRS users should therefore be conscious of nonspecificity and consider using multiple PRSs simultaneously, when predicting psychiatric disorders.


Asunto(s)
Trastorno Depresivo Mayor , Trastornos Mentales , Niño , Trastorno Depresivo Mayor/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Trastornos Mentales/genética , Herencia Multifactorial , Fenotipo , Factores de Riesgo
14.
PLoS Genet ; 15(8): e1008344, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31469826

RESUMEN

Pancreatic adenocarcinoma (PC) is a lethal malignancy that is familial or associated with genetic syndromes in 10% of cases. Gene-based surveillance strategies for at-risk individuals may improve clinical outcomes. However, familial PC (FPC) is plagued by genetic heterogeneity and the genetic basis for the majority of FPC remains elusive, hampering the development of gene-based surveillance programs. The study was powered to identify genes with a cumulative pathogenic variant prevalence of at least 3%, which includes the most prevalent PC susceptibility gene, BRCA2. Since the majority of known PC susceptibility genes are involved in DNA repair, we focused on genes implicated in these pathways. We performed a region-based association study using the Mixed-Effects Score Test, followed by leave-one-out characterization of PC-associated gene regions and variants to identify the genes and variants driving risk associations. We evaluated 398 cases from two case series and 987 controls without a personal history of cancer. The first case series consisted of 109 patients with either FPC (n = 101) or PC at ≤50 years of age (n = 8). The second case series was composed of 289 unselected PC cases. We validated this discovery strategy by identifying known pathogenic BRCA2 variants, and also identified SMG1, encoding a serine/threonine protein kinase, to be significantly associated with PC following correction for multiple testing (p = 3.22x10-7). The SMG1 association was validated in a second independent series of 532 FPC cases and 753 controls (p<0.0062, OR = 1.88, 95%CI 1.17-3.03). We showed segregation of the c.4249A>G SMG1 variant in 3 affected relatives in a FPC kindred, and we found c.103G>A to be a recurrent SMG1 variant associating with PC in both the discovery and validation series. These results suggest that SMG1 is a novel PC susceptibility gene, and we identified specific SMG1 gene variants associated with PC risk.


Asunto(s)
Estudios de Asociación Genética/métodos , Neoplasias Pancreáticas/genética , Análisis de Secuencia de ADN/métodos , Adenocarcinoma/genética , Adulto , Proteína BRCA2/genética , Carcinoma/genética , Femenino , Genes BRCA2 , Predisposición Genética a la Enfermedad/genética , Mutación de Línea Germinal/genética , Humanos , Masculino , Persona de Mediana Edad , Páncreas/patología , Neoplasias Pancreáticas/metabolismo , Proteínas Serina-Treonina Quinasas/genética , Neoplasias Pancreáticas
15.
Genet Epidemiol ; 44(8): 825-840, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32783248

RESUMEN

It is challenging to estimate the phenotypic impact of the structural genome changes known as copy-number variations (CNVs), since there are many unique CNVs which are nonrecurrent, and most are too rare to be studied individually. In recent work, we found that CNV-aggregated genomic annotations, that is, specifically the intolerance to mutation as measured by the pLI score (probability of being loss-of-function intolerant), can be strong predictors of intellectual quotient (IQ) loss. However, this aggregation method only estimates the individual CNV effects indirectly. Here, we propose the use of hierarchical Bayesian models to directly estimate individual effects of rare CNVs on measures of intelligence. Annotation information on the impact of major mutations in genomic regions is extracted from genomic databases and used to define prior information for the approach we call HBIQ. We applied HBIQ to the analysis of CNV deletions and duplications from three datasets and identified several genomic regions containing CNVs demonstrating significant deleterious effects on IQ, some of which validate previously known associations. We also show that several CNVs were identified as deleterious by HBIQ even if they have a zero pLI score, and the converse is also true. Furthermore, we show that our new model yields higher out-of-sample concordance (78%) for predicting the consequences of carrying known recurrent CNVs compared with our previous approach.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Inteligencia/genética , Modelos Genéticos , Adolescente , Teorema de Bayes , Niño , Cromosomas Humanos Par 16/genética , Cromosomas Humanos Par 22/genética , Estudios de Cohortes , Genoma , Humanos , Pruebas de Inteligencia , Modelos Lineales , Análisis de Componente Principal , Tamaño de la Muestra
16.
Bioinformatics ; 36(6): 1840-1847, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31697315

RESUMEN

MOTIVATION: The human microbiota is the collection of microorganisms colonizing the human body, and plays an integral part in human health. A growing trend in microbiome analysis is to construct a network to estimate the co-occurrence patterns among taxa through precision matrices. Existing methods do not facilitate investigation into how these networks change with respect to covariates. RESULTS: We propose a new model called Microbiome Differential Network Estimation (MDiNE) to estimate network changes with respect to a binary covariate. The counts of individual taxa in the samples are modeled through a multinomial distribution whose probabilities depend on a latent Gaussian random variable. A sparse precision matrix over all the latent terms determines the co-occurrence network among taxa. The model fit is obtained and evaluated using Hamiltonian Monte Carlo methods. The performance of our model is evaluated through an extensive simulation study and is shown to outperform existing methods in terms of estimation of network parameters. We also demonstrate an application of the model to estimate changes in the intestinal microbial network topology with respect to Crohn's disease. AVAILABILITY AND IMPLEMENTATION: MDiNE is implemented in a freely available R package: https://github.com/kevinmcgregor/mdine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Microbiota , Humanos , Consorcios Microbianos , Método de Montecarlo , Distribución Normal , Probabilidad
17.
Genet Med ; 23(3): 508-515, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33110269

RESUMEN

PURPOSE: Identifying rare genetic causes of common diseases can improve diagnostic and treatment strategies, but incurs high costs. We tested whether individuals with common disease and low polygenic risk score (PRS) for that disease generated from less expensive genome-wide genotyping data are more likely to carry rare pathogenic variants. METHODS: We identified patients with one of five common complex diseases among 44,550 individuals who underwent exome sequencing in the UK Biobank. We derived PRS for these five diseases, and identified pathogenic rare variant heterozygotes. We tested whether individuals with disease and low PRS were more likely to carry rare pathogenic variants. RESULTS: While rare pathogenic variants conferred, at most, 5.18-fold (95% confidence interval [CI]: 2.32-10.13) increased odds of disease, a standard deviation increase in PRS, at most, increased the odds of disease by 5.25-fold (95% CI: 5.06-5.45). Among diseased patients, a standard deviation decrease in the PRS was associated with, at most, 2.82-fold (95% CI: 1.14-7.46) increased odds of identifying rare variant heterozygotes. CONCLUSION: Rare pathogenic variants were more prevalent among affected patients with a low PRS. Therefore, prioritizing individuals for sequencing who have disease but low PRS may increase the yield of sequencing studies to identify rare variant heterozygotes.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Predisposición Genética a la Enfermedad , Humanos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple , Factores de Riesgo
18.
Biometrics ; 77(2): 424-438, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-32438470

RESUMEN

Identifying disease-associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high-throughput methylation profiles at single-base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data is still very challenging due to variable read depth, missing data patterns, long-range correlations, data errors, and confounding from cell type mixtures. We propose a regression-based hierarchical model that allows covariate effects to vary smoothly along genomic positions and we have built a specialized EM algorithm, which explicitly allows for experimental errors and cell type mixtures, to make inference about smooth covariate effects in the model. Simulations show that the proposed method provides accurate estimates of covariate effects and captures the major underlying methylation patterns with excellent power. We also apply our method to analyze data from rheumatoid arthritis patients and controls. The method has been implemented in R package SOMNiBUS.


Asunto(s)
Metilación de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Metilación de ADN/genética , Humanos , Análisis de Secuencia de ADN , Sulfitos
19.
Nature ; 526(7571): 82-90, 2015 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-26367797

RESUMEN

The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.


Asunto(s)
Enfermedad/genética , Variación Genética/genética , Genoma Humano/genética , Salud , Adiponectina/sangre , Alelos , Estudios de Cohortes , Exoma/genética , Femenino , Predisposición Genética a la Enfermedad/genética , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Metabolismo de los Lípidos/genética , Masculino , Anotación de Secuencia Molecular , Receptores de LDL/genética , Estándares de Referencia , Análisis de Secuencia de ADN , Triglicéridos/sangre , Reino Unido
20.
Genet Epidemiol ; 43(4): 373-401, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30635941

RESUMEN

In Mendelian randomization (MR), inference about causal relationship between a phenotype of interest and a response or disease outcome can be obtained by constructing instrumental variables from genetic variants. However, MR inference requires three assumptions, one of which is that the genetic variants only influence the outcome through phenotype of interest. Pleiotropy, that is, the situation in which some genetic variants affect more than one phenotype, can invalidate these genetic variants for use as instrumental variables; thus a naive analysis will give biased estimates of the causal relation. Here, we present new methods (constrained instrumental variable [CIV] methods) to construct valid instrumental variables and perform adjusted causal effect estimation when pleiotropy exists and when the pleiotropic phenotypes are available. We demonstrate that a smoothed version of CIV performs approximate selection of genetic variants that are valid instruments, and provides unbiased estimates of the causal effects. We provide details on a number of existing methods, together with a comparison of their performance in a large series of simulations. CIV performs robustly across different pleiotropic violations of the MR assumptions. We also analyzed the data from the Alzheimer's disease (AD) neuroimaging initiative (ADNI; Mueller et al., 2005. Alzheimer's Dementia, 11(1), 55-66) to disentangle causal relationships of several biomarkers with AD progression.


Asunto(s)
Pleiotropía Genética/fisiología , Análisis de la Aleatorización Mendeliana/métodos , Algoritmos , Factores de Confusión Epidemiológicos , Estudios de Asociación Genética , Variación Genética , Humanos , Modelos Genéticos , Fenotipo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA