RESUMO
Major depressive disorder (MDD) often goes undiagnosed due to the absence of clear biomarkers. We sought to identify voice biomarkers for MDD and separate biomarkers indicative of MDD predisposition from biomarkers reflecting current depressive symptoms. Using a two-stage meta-analytic design to remove confounds, we tested the association between features representing vocal pitch and MDD in a multisite case-control cohort study of Chinese women with recurrent depression. Sixteen features were replicated in an independent cohort, with absolute association coefficients (beta values) from the combined analysis ranging from 0.24 to 1.07, indicating moderate to large effects. The statistical significance of these associations remained robust, with P-values ranging from 7.2 × 10-6 to 6.8 × 10-58. Eleven features were significantly associated with current depressive symptoms. Using genotype data, we found that this association was driven in part by a genetic correlation with MDD. Significant voice features, reflecting a slower pitch change and a lower pitch, achieved an AUC-ROC of 0.90 (sensitivity of 0.85 and specificity of 0.81) in MDD classification. Our results return vocal features to a more central position in clinical and research work on MDD.
RESUMO
Identifying and refining clinically significant patient stratification is a critical step toward realizing the promise of precision medicine in asthma. Several peripheral blood hallmarks, including total peripheral blood eosinophil count (BEC) and immunoglobulin E (IgE) levels, are routinely used in asthma clinical practice for endotype classification and predicting response to state-of-the-art targeted biologic drugs. However, these biomarkers appear ineffective in predicting treatment outcomes in some patients, and they differ in distribution between racially and ethnically diverse populations, potentially compromising medical care and hindering health equity due to biases in drug eligibility. Here, we propose constructing an unbiased patient stratification score based on DNA methylation (DNAm) and utilizing it to refine the efficacy of hallmark biomarkers for predicting drug response. We developed Phenotype Aware Component Analysis (PACA), a novel contrastive machine-learning method for learning combinations of DNAm sites reflecting biomedically meaningful patient stratifications. Leveraging whole-blood DNAm from Latino (discovery; n=1,016) and African American (replication; n=756) pediatric asthma case-control cohorts, we applied PACA to refine the prediction of bronchodilator response (BDR) to the short-acting ß2-agonist albuterol, the most used drug to treat acute bronchospasm worldwide. While BEC and IgE correlate with BDR in the general patient population, our PACA-derived DNAm score renders these biomarkers predictive of drug response only in patients with high DNAm scores. BEC correlates with BDR in patients with upper-quartile DNAm scores (OR 1.12; 95% CI [1.04, 1.22]; P=7.9 e-4) but not in patients with lower-quartile scores (OR 1.05; 95% CI [0.95, 1.17]; P=0.21); and IgE correlates with BDR in above-median (OR for response 1.42; 95% CI [1.24, 1.63]; P=3.9e-7) but not in below-median patients (OR 1.05; 95% CI [0.92, 1.2]; P=0.57). These results hold within the commonly recognized type 2 (T2)-high asthma endotype but not in T2-low patients, suggesting that our DNAm score primarily represents an unknown variation of T2 asthma. Among T2-high patients with high DNAm scores, elevated BEC or IgE also corresponds to baseline clinical presentation that is known to benefit more from biologic treatment, including higher exacerbation scores, higher allergen sensitization, lower BMI, more recent oral corticosteroids prescription, and lower lung function. Our findings suggest that BEC and IgE, the traditional asthma biomarkers of T2-high asthma, are poor biomarkers for millions worldwide. Revisiting existing drug eligibility criteria relying on these biomarkers in asthma medical care may enhance precision and equity in treatment.
RESUMO
Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving the power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank data sets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium-sized sets of genetic variants (window size ≤100) on a trait and provide quantified interpretation of these effects. Comprehensive simulations show that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ≈300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9515 protein-coding genes. We detect 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is comparable to additive effects, with five pairs having a ratio >1. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.
Assuntos
Algoritmos , Epistasia Genética , Humanos , Modelos Genéticos , Locos de Características Quantitativas , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodosRESUMO
The genetic influence on human vocal pitch in tonal and non-tonal languages remains largely unknown. In tonal languages, such as Mandarin Chinese, pitch changes differentiate word meanings, whereas in non-tonal languages, such as Icelandic, pitch is used to convey intonation. We addressed this question by searching for genetic associations with interindividual variation in median pitch in a Chinese major depression case-control cohort and compared our results with a genome-wide association study from Iceland. The same genetic variant, rs11046212-T in an intron of the ABCC9 gene, was one of the most strongly associated loci with median pitch in both samples. Our meta-analysis revealed four genome-wide significant hits, including two novel associations. The discovery of genetic variants influencing vocal pitch across both tonal and non-tonal languages suggests the possibility of a common genetic contribution to the human vocal system shared in two distinct populations with languages that differ in tonality (Icelandic and Mandarin).
Assuntos
Estudo de Associação Genômica Ampla , Idioma , Humanos , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Adulto , Islândia , Estudos de Casos e Controles , Pessoa de Meia-Idade , Voz/fisiologia , Percepção da Altura Sonora , Povo Asiático/genéticaRESUMO
Knowing the genes involved in quantitative traits provides an entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six quantitative trait loci (QTLs) by quantitative complementation, and identified six genes. Four genes, Lamp, Ptprd, Nptx2, and Sh3gl, have known roles in synapse function; the fifth, Psip1, was not previously implicated in behavior; and the sixth is a long non-coding RNA, 4933413L06Rik, of unknown function. Variation in transcriptome and epigenetic modalities occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results relieve a bottleneck in using genetic mapping of QTLs to uncover biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.
Assuntos
Medo , Locos de Características Quantitativas , Animais , Feminino , Masculino , Camundongos , Comportamento Animal/fisiologia , Mapeamento Cromossômico , Medo/fisiologia , Camundongos Endogâmicos C57BL , Teste de Complementação GenéticaRESUMO
Knowing the genes involved in quantitative traits provides a critical entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. Here we address a key step towards that goal by deploying a test that directly queries whether a gene mediates the effect of a quantitative trait locus (QTL). To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six QTLs, and identified six genes. Four genes, Lsamp, Ptprd, Nptx2 and Sh3gl, have known roles in synapse function; the fifth gene, Psip1, is a transcriptional co-activator not previously implicated in behavior; the sixth is a long non-coding RNA 4933413L06Rik with no known function. Single nucleus transcriptomic and epigenetic analyses implicated excitatory neurons as likely mediating the genetic effects. Surprisingly, variation in transcriptome and epigenetic modalities between inbred strains occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results open a bottleneck in using genetic mapping of QTLs to find novel biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.
RESUMO
Attention deficit hyperactivity disorder (ADHD) is a complex disorder that manifests variability in long-term outcomes and clinical presentations. The genetic contributions to such heterogeneity are not well understood. Here we show several genetic links to clinical heterogeneity in ADHD in a case-only study of 14,084 diagnosed individuals. First, we identify one genome-wide significant locus by comparing cases with ADHD and autism spectrum disorder (ASD) to cases with ADHD but not ASD. Second, we show that cases with ASD and ADHD, substance use disorder and ADHD, or first diagnosed with ADHD in adulthood have unique polygenic score (PGS) profiles that distinguish them from complementary case subgroups and controls. Finally, a PGS for an ASD diagnosis in ADHD cases predicted cognitive performance in an independent developmental cohort. Our approach uncovered evidence of genetic heterogeneity in ADHD, helping us to understand its etiology and providing a model for studies of other disorders.
Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Transtorno do Espectro Autista , Humanos , Transtorno do Espectro Autista/genética , Transtorno do Deficit de Atenção com Hiperatividade/genética , Herança Multifatorial/genéticaRESUMO
Relating genetic variants to behavior remains a fundamental challenge. To assess the utility of DNA methylation marks in discovering causative variants, we examined their relationship to genetic variation by generating single-nucleus methylomes from the hippocampus of eight inbred mouse strains. At CpG sequence densities under 40 CpG/Kb, cells compensate for loss of methylated sites by methylating additional sites to maintain methylation levels. At higher CpG sequence densities, the exact location of a methylated site becomes more important, suggesting that variants affecting methylation will have a greater effect when occurring in higher CpG densities than in lower. We found this to be true for a variant's effect on transcript abundance, indicating that candidate variants can be prioritized based on CpG sequence density. Our findings imply that DNA methylation influences the likelihood that mutations occur at specific sites in the genome, supporting the view that the distribution of mutations is not random.
RESUMO
An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.
Assuntos
Atenção à Saúde , Aceitação pelo Paciente de Cuidados de Saúde , Humanos , Los Angeles , Irã (Geográfico) , EtnicidadeRESUMO
A picture description task is a component of Miro Health's platform for self-administration of neurobehavioral assessments. Picture description has been used as a screening tool for identification of individuals with Alzheimer's disease and mild cognitive impairment (MCI), but currently requires in-person administration and scoring by someone with access to and familiarity with a scoring rubric. The Miro Health implementation allows broader use of this assessment through self-administration and automated processing, analysis, and scoring to deliver clinically useful quantifications of the users' speech production, vocal characteristics, and language. Picture description responses were collected from 62 healthy controls (HC), and 33 participants with MCI: 18 with amnestic MCI (aMCI) and 15 with non-amnestic MCI (naMCI). Speech and language features and contrasts between pairs of features were evaluated for differences in their distributions in the participant subgroups. Picture description features were selected and combined using penalized logistic regression to form risk scores for classification of HC versus MCI as well as HC versus specific MCI subtypes. A picture-description based risk score distinguishes MCI and HC with an area under the receiver operator curve (AUROC) of 0.74. When contrasting specific subtypes of MCI and HC, the classifiers have an AUROC of 0.88 for aMCI versus HC and and AUROC of 0.61 for naMCI versus HC. Tests of association of individual features or contrasts of pairs of features with HC versus aMCI identified 20 features with p-values below 5e-3 and False Discovery Rates (FDRs) at or below 0.113, and 61 contrasts with p-values below 5e-4 and FDRs at or below 0.132. Findings suggest that performance of picture description as a screening tool for MCI detection will vary greatly by MCI subtype or by the proportion of various subtypes in an undifferentiated MCI population.
RESUMO
BACKGROUND: The Miro Health Mobile Assessment Platform consists of self-administered neurobehavioral and cognitive assessments that measure behaviors typically measured by specialized clinicians. OBJECTIVE: To evaluate the Miro Health Mobile Assessment Platform's concurrent validity, test-retest reliability, and mild cognitive impairment (MCI) classification performance. METHOD: Sixty study participants were evaluated with Miro Health version V.2. Healthy controls (HC), amnestic MCI (aMCI), and nonamnestic MCI (naMCI) ages 64-85 were evaluated with version V.3. Additional participants were recruited at Johns Hopkins Hospital to represent clinic patients, with wider ranges of age and diagnosis. In all, 90 HC, 21 aMCI, 17 naMCI, and 15 other cases were evaluated with V.3. Concurrent validity of the Miro Health variables and legacy neuropsychological test scores was assessed with Spearman correlations. Reliability was quantified with the scores' intraclass correlations. A machine-learning algorithm combined Miro Health variable scores into a Risk score to differentiate HC from MCI or MCI subtypes. RESULTS: In HC, correlations of Miro Health variables with legacy test scores ranged 0.27-0.68. Test-retest reliabilities ranged 0.25-0.79, with minimal learning effects. The Risk score differentiated individuals with aMCI from HC with an area under the receiver operator curve (AUROC) of 0.97; naMCI from HC with an AUROC of 0.80; combined MCI from HC with an AUROC of 0.89; and aMCI from naMCI with an AUROC of 0.83. CONCLUSION: The Miro Health Mobile Assessment Platform provides valid and reliable assessment of neurobehavioral and cognitive status, effectively distinguishes between HC and MCI, and differentiates aMCI from naMCI.
Assuntos
Disfunção Cognitiva , Idoso , Idoso de 80 Anos ou mais , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/psicologia , Humanos , Aprendizado de Máquina , Pessoa de Meia-Idade , Testes Neuropsicológicos , Reprodutibilidade dos Testes , Processamento de Sinais Assistido por ComputadorRESUMO
Circulating cell-free DNA (cfDNA) in the bloodstream originates from dying cells and is a promising noninvasive biomarker for cell death. Here, we propose an algorithm, CelFiE, to accurately estimate the relative abundances of cell types and tissues contributing to cfDNA from epigenetic cfDNA sequencing. In contrast to previous work, CelFiE accommodates low coverage data, does not require CpG site curation, and estimates contributions from multiple unknown cell types that are not available in external reference data. In simulations, CelFiE accurately estimates known and unknown cell type proportions from low coverage and noisy cfDNA mixtures, including from cell types composing less than 1% of the total mixture. When used in two clinically-relevant situations, CelFiE correctly estimates a large placenta component in pregnant women, and an elevated skeletal muscle component in amyotrophic lateral sclerosis (ALS) patients, consistent with the occurrence of muscle wasting typical in these patients. Together, these results show how CelFiE could be a useful tool for biomarker discovery and monitoring the progression of degenerative disease.
Assuntos
Algoritmos , Esclerose Lateral Amiotrófica/genética , Ácidos Nucleicos Livres/genética , Metilação de DNA , Epigênese Genética , Adulto , Esclerose Lateral Amiotrófica/sangue , Esclerose Lateral Amiotrófica/imunologia , Esclerose Lateral Amiotrófica/patologia , Linfócitos B/imunologia , Linfócitos B/metabolismo , Biomarcadores/sangue , Estudos de Casos e Controles , Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/classificação , Feminino , Humanos , Macrófagos/imunologia , Macrófagos/metabolismo , Masculino , Monócitos/imunologia , Monócitos/metabolismo , Músculo Esquelético/imunologia , Músculo Esquelético/metabolismo , Músculo Esquelético/patologia , Neutrófilos/imunologia , Neutrófilos/metabolismo , Especificidade de Órgãos , Gravidez , Trimestres da Gravidez/sangue , Trimestres da Gravidez/genética , Linfócitos T/imunologia , Linfócitos T/metabolismoRESUMO
To identify rare variants associated with prostate cancer susceptibility and better characterize the mechanisms and cumulative disease risk associated with common risk variants, we conducted an integrated study of prostate cancer genetic etiology in two cohorts using custom genotyping microarrays, large imputation reference panels, and functional annotation approaches. Specifically, 11,984 men (6,196 prostate cancer cases and 5,788 controls) of European ancestry from Northern California Kaiser Permanente were genotyped and meta-analyzed with 196,269 men of European ancestry (7,917 prostate cancer cases and 188,352 controls) from the UK Biobank. Three novel loci, including two rare variants (European ancestry minor allele frequency < 0.01, at 3p21.31 and 8p12), were significant genome wide in a meta-analysis. Gene-based rare variant tests implicated a known prostate cancer gene (HOXB13), as well as a novel candidate gene (ILDR1), which encodes a receptor highly expressed in prostate tissue and is related to the B7/CD28 family of T-cell immune checkpoint markers. Haplotypic patterns of long-range linkage disequilibrium were observed for rare genetic variants at HOXB13 and other loci, reflecting their evolutionary history. In addition, a polygenic risk score (PRS) of 188 prostate cancer variants was strongly associated with risk (90th vs. 40th-60th percentile OR = 2.62, P = 2.55 × 10-191). Many of the 188 variants exhibited functional signatures of gene expression regulation or transcription factor binding, including a 6-fold difference in log-probability of androgen receptor binding at the variant rs2680708 (17q22). Rare variant and PRS associations, with concomitant functional interpretation of risk mechanisms, can help clarify the full genetic architecture of prostate cancer and other complex traits. SIGNIFICANCE: This study maps the biological relationships between diverse risk factors for prostate cancer, integrating different functional datasets to interpret and model genome-wide data from over 200,000 men with and without prostate cancer.See related commentary by Lachance, p. 1637.
Assuntos
Herança Multifatorial , Neoplasias da Próstata , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genômica , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/genéticaRESUMO
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.
Assuntos
Negro ou Afro-Americano/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Transcriptoma , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Estudo de Associação Genômica Ampla/normas , Humanos , Locos de Características Quantitativas , RNA-Seq/métodos , RNA-Seq/normas , Padrões de ReferênciaRESUMO
The observation that disease-associated genetic variants typically reside outside of exons has inspired widespread investigation into the genetic basis of transcriptional regulation. While associations between the mRNA abundance of a gene and its proximal SNPs (cis-eQTLs) are now readily identified, identification of high-quality distal associations (trans-eQTLs) has been limited by a heavy multiple testing burden and the proneness to false-positive signals. To address these issues, we develop GBAT, a powerful gene-based pipeline that allows robust detection of high-quality trans-gene regulation signal.
Assuntos
Regulação da Expressão Gênica , Testes Genéticos/métodos , Estudo de Associação Genômica Ampla , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , RNA MensageiroRESUMO
Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRSs). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications, including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed-model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work, we present a novel reference-free method to produce a PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial overfitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues and can produce informative in-sample PRSs over a single cohort without overfitting. We then demonstrate several novel applications of reference-free PRSs, including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Herança Multifatorial/genética , Humanos , Modelos Lineares , Fenótipo , Fatores de RiscoRESUMO
Genetic studies of metabolites have identified thousands of variants, many of which are associated with downstream metabolic and obesogenic disorders. However, these studies have relied on univariate analyses, reducing power and limiting context-specific understanding. Here we aim to provide an integrated perspective of the genetic basis of metabolites by leveraging the Finnish Metabolic Syndrome In Men (METSIM) cohort, a unique genetic resource which contains metabolic measurements, mostly lipids, across distinct time points as well as information on statin usage. We increase effective sample size by an average of two-fold by applying the Covariates for Multi-phenotype Studies (CMS) approach, identifying 588 significant SNP-metabolite associations, including 228 new associations. Our analysis pinpoints a small number of master metabolic regulator genes, balancing the relative proportion of dozens of metabolite levels. We further identify associations to changes in metabolic levels across time as well as genetic interactions with statin at both the master metabolic regulator and genome-wide level.
Assuntos
Pleiotropia Genética , Síndrome Metabólica/genética , Metaboloma/genética , Idoso , Aminoácidos/genética , Aminoácidos/metabolismo , Estudos de Coortes , Ácidos Graxos/genética , Ácidos Graxos/metabolismo , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Lipoproteínas HDL/genética , Lipoproteínas HDL/metabolismo , Lipoproteínas IDL/genética , Lipoproteínas IDL/metabolismo , Lipoproteínas LDL/genética , Lipoproteínas LDL/metabolismo , Lipoproteínas VLDL/genética , Lipoproteínas VLDL/metabolismo , Espectroscopia de Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo ÚnicoRESUMO
High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have [Formula: see text], standard two-step methods all have [Formula: see text]-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Análise de Componente Principal/métodos , Animais , Estudo de Associação Genômica Ampla/normas , Humanos , Modelos Genéticos , Análise de Componente Principal/normas , Locos de Características Quantitativas , Reprodutibilidade dos TestesRESUMO
Common diseases often show sex differences in prevalence, onset, symptomology, treatment, or prognosis. Although studies have been performed to evaluate sex differences at specific SNP associations, this work aims to comprehensively survey a number of complex heritable diseases and anthropometric traits. Potential genetically encoded sex differences we investigated include differential genetic liability thresholds or distributions, gene-sex interaction at autosomal loci, major contribution of the X-chromosome, or gene-environment interactions reflected in genes responsive to androgens or estrogens. Finally, we tested the overlap between sex-differential association with anthropometric traits and disease risk. We utilized complementary approaches of assessing GWAS association enrichment and SNP-based heritability estimation to explore explicit sex differences, as well as enrichment in sex-implicated functional categories. We do not find consistent increased genetic load in the lower-prevalence sex, or a disproportionate role for the X-chromosome in disease risk, despite sex-heterogeneity on the X for several traits. We find that all anthropometric traits show less than complete correlation between the genetic contribution to males and females, and find a convincing example of autosome-wide genome-sex interaction in multiple sclerosis (P = 1 × 10-9). We also find some evidence for hormone-responsive gene enrichment, and striking evidence of the contribution of sex-differential anthropometric associations to common disease risk, implying that general mechanisms of sexual dimorphism determining secondary sex characteristics have shared effects on disease risk.
Assuntos
Predisposição Genética para Doença , Modelos Genéticos , Caracteres Sexuais , Tamanho Corporal/genética , Cromossomos Humanos X/genética , Feminino , Carga Genética , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Característica Quantitativa HerdávelRESUMO
OBJECTIVE: Plasma kynurenine/tryptophan ratio, a biomarker of indoleamine 2,3-dioxygenase-1 (IDO) activity, is a strong independent predictor of mortality in HIV-infected Ugandans initiating antiretroviral therapy (ART) and may play a key role in HIV pathogenesis. We performed a genome-wide study to identify potential host genetic determinants of kynurenine/tryptophan ratio in HIV-infected ART-suppressed Ugandans. DESIGN/METHODS: We performed genome-wide and exome array genotyping and measured plasma kynurenine/tryptophan ratio during the initial 6-12 months of suppressive ART in Ugandans. We evaluated more than 16 million single nucleotide polymorphisms in association with log10 kynurenine/tryptophan ratio using linear mixed models adjusted for cohort, sex, pregnancy, and ancestry. RESULTS: Among 597 Ugandans, 62% were woman, median age was 35, median baseline CD4 cell count was 135 cells/µl, and median baseline HIV-1 RNA was 5.1âlog10âcopies/ml. Several polymorphisms in candidate genes TNF, IFNGR1, and TLR4 were associated with log10 kynurenine/tryptophan ratio (Pâ<â5.0â×â10). An intergenic polymorphism between CSPG5 and ELP6 was genome-wide significant, whereas several others exhibited suggestive associations (Pâ<â5.0â×â10), including genes encoding protein tyrosine phosphatases (PTPRM and PTPRN2) and the vitamin D metabolism gene, CYP24A1. Several of these single nucleotide polymorphisms were associated with markers of inflammation, coagulation, and monocyte activation, but did not replicate in a small US cohort (Nâ=â262; 33% African-American). CONCLUSION: Our findings highlight a potentially important role of IFN-γ, TNF-α, and Toll-like receptor signaling in determining IDO activity and subsequent mortality risk in HIV-infected ART-suppressed Ugandans. These results also identify potential novel pathways involved in IDO immunoregulation. Further studies are needed to confirm these findings in treated HIV-infected populations.