Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
PLoS Genet ; 16(8): e1008927, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32797036

RESUMO

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.


Assuntos
Negro ou Afro-Americano/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Transcriptoma , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Estudo de Associação Genômica Ampla/normas , Humanos , Locos de Características Quantitativas , RNA-Seq/métodos , RNA-Seq/normas , Padrões de Referência
2.
Cogn Behav Neurol ; 35(3): 169-178, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-35749748

RESUMO

BACKGROUND: The Miro Health Mobile Assessment Platform consists of self-administered neurobehavioral and cognitive assessments that measure behaviors typically measured by specialized clinicians. OBJECTIVE: To evaluate the Miro Health Mobile Assessment Platform's concurrent validity, test-retest reliability, and mild cognitive impairment (MCI) classification performance. METHOD: Sixty study participants were evaluated with Miro Health version V.2. Healthy controls (HC), amnestic MCI (aMCI), and nonamnestic MCI (naMCI) ages 64-85 were evaluated with version V.3. Additional participants were recruited at Johns Hopkins Hospital to represent clinic patients, with wider ranges of age and diagnosis. In all, 90 HC, 21 aMCI, 17 naMCI, and 15 other cases were evaluated with V.3. Concurrent validity of the Miro Health variables and legacy neuropsychological test scores was assessed with Spearman correlations. Reliability was quantified with the scores' intraclass correlations. A machine-learning algorithm combined Miro Health variable scores into a Risk score to differentiate HC from MCI or MCI subtypes. RESULTS: In HC, correlations of Miro Health variables with legacy test scores ranged 0.27-0.68. Test-retest reliabilities ranged 0.25-0.79, with minimal learning effects. The Risk score differentiated individuals with aMCI from HC with an area under the receiver operator curve (AUROC) of 0.97; naMCI from HC with an AUROC of 0.80; combined MCI from HC with an AUROC of 0.89; and aMCI from naMCI with an AUROC of 0.83. CONCLUSION: The Miro Health Mobile Assessment Platform provides valid and reliable assessment of neurobehavioral and cognitive status, effectively distinguishes between HC and MCI, and differentiates aMCI from naMCI.


Assuntos
Disfunção Cognitiva , Idoso , Idoso de 80 Anos ou mais , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/psicologia , Humanos , Aprendizado de Máquina , Pessoa de Meia-Idade , Testes Neuropsicológicos , Reprodutibilidade dos Testes , Processamento de Sinais Assistido por Computador
3.
Birth Defects Res A Clin Mol Teratol ; 103(8): 692-702, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26010994

RESUMO

BACKGROUND: The National Birth Defects Prevention Study (NBDPS) contains a wealth of information on affected and unaffected family triads, and thus provides numerous opportunities to study gene-environment interactions (G×E) in the etiology of birth defect outcomes. Depending on the research objective, several analytic options exist to estimate G×E effects that use varying combinations of individuals drawn from available triads. METHODS: In this study, we discuss important considerations in the collection of genetic data and environmental exposures. RESULTS: We will also present several population- and family-based approaches that can be applied to data from the NBDPS including case-control, case-only, family-based trio, and maternal versus fetal effects. For each, we describe the data requirements, applicable statistical methods, advantages, and disadvantages. CONCLUSION: A range of approaches can be used to evaluate potentially important G×E effects in the NBDPS. Investigators should be aware of the limitations inherent to each approach when choosing a study design and interpreting results.


Assuntos
Anormalidades Congênitas/etiologia , Interação Gene-Ambiente , Predisposição Genética para Doença , Modelos Estatísticos , Projetos de Pesquisa , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Linhagem , Polimorfismo de Nucleotídeo Único , Fatores de Risco
4.
Cell Genom ; 4(5): 100545, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38697120

RESUMO

Knowing the genes involved in quantitative traits provides an entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six quantitative trait loci (QTLs) by quantitative complementation, and identified six genes. Four genes, Lamp, Ptprd, Nptx2, and Sh3gl, have known roles in synapse function; the fifth, Psip1, was not previously implicated in behavior; and the sixth is a long non-coding RNA, 4933413L06Rik, of unknown function. Variation in transcriptome and epigenetic modalities occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results relieve a bottleneck in using genetic mapping of QTLs to uncover biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.


Assuntos
Medo , Locos de Características Quantitativas , Animais , Feminino , Masculino , Camundongos , Comportamento Animal/fisiologia , Mapeamento Cromossômico , Medo/fisiologia , Camundongos Endogâmicos C57BL , Teste de Complementação Genética
5.
Commun Biol ; 7(1): 540, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38714798

RESUMO

The genetic influence on human vocal pitch in tonal and non-tonal languages remains largely unknown. In tonal languages, such as Mandarin Chinese, pitch changes differentiate word meanings, whereas in non-tonal languages, such as Icelandic, pitch is used to convey intonation. We addressed this question by searching for genetic associations with interindividual variation in median pitch in a Chinese major depression case-control cohort and compared our results with a genome-wide association study from Iceland. The same genetic variant, rs11046212-T in an intron of the ABCC9 gene, was one of the most strongly associated loci with median pitch in both samples. Our meta-analysis revealed four genome-wide significant hits, including two novel associations. The discovery of genetic variants influencing vocal pitch across both tonal and non-tonal languages suggests the possibility of a common genetic contribution to the human vocal system shared in two distinct populations with languages that differ in tonality (Icelandic and Mandarin).


Assuntos
Estudo de Associação Genômica Ampla , Idioma , Humanos , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Adulto , Islândia , Estudos de Casos e Controles , Pessoa de Meia-Idade , Voz/fisiologia , Percepção da Altura Sonora , Povo Asiático/genética
6.
bioRxiv ; 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38260483

RESUMO

Knowing the genes involved in quantitative traits provides a critical entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. Here we address a key step towards that goal by deploying a test that directly queries whether a gene mediates the effect of a quantitative trait locus (QTL). To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six QTLs, and identified six genes. Four genes, Lsamp, Ptprd, Nptx2 and Sh3gl, have known roles in synapse function; the fifth gene, Psip1, is a transcriptional co-activator not previously implicated in behavior; the sixth is a long non-coding RNA 4933413L06Rik with no known function. Single nucleus transcriptomic and epigenetic analyses implicated excitatory neurons as likely mediating the genetic effects. Surprisingly, variation in transcriptome and epigenetic modalities between inbred strains occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results open a bottleneck in using genetic mapping of QTLs to find novel biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.

7.
Nat Genet ; 56(2): 234-244, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38036780

RESUMO

Attention deficit hyperactivity disorder (ADHD) is a complex disorder that manifests variability in long-term outcomes and clinical presentations. The genetic contributions to such heterogeneity are not well understood. Here we show several genetic links to clinical heterogeneity in ADHD in a case-only study of 14,084 diagnosed individuals. First, we identify one genome-wide significant locus by comparing cases with ADHD and autism spectrum disorder (ASD) to cases with ADHD but not ASD. Second, we show that cases with ASD and ADHD, substance use disorder and ADHD, or first diagnosed with ADHD in adulthood have unique polygenic score (PGS) profiles that distinguish them from complementary case subgroups and controls. Finally, a PGS for an ASD diagnosis in ADHD cases predicted cognitive performance in an independent developmental cohort. Our approach uncovered evidence of genetic heterogeneity in ADHD, helping us to understand its etiology and providing a model for studies of other disorders.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Transtorno do Espectro Autista , Humanos , Transtorno do Espectro Autista/genética , Transtorno do Deficit de Atenção com Hiperatividade/genética , Herança Multifatorial/genética
8.
Genet Epidemiol ; 36(6): 642-51, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22807252

RESUMO

New sequencing technologies provide an opportunity for assessing the impact of rare and common variants on complex diseases. Several methods have been developed for evaluating rare variants, many of which use weighted collapsing to combine rare variants. Some approaches require arbitrary frequency thresholds below which to collapse alleles, and most assume that effect sizes for each collapsed variant are either the same or a function of minor allele frequency. Some methods also further assume that all rare variants are deleterious rather than protective. We expect that such assumptions will not hold in general, and as a result performance of these tests will be adversely affected. We propose a hierarchical model, implemented in the new program CHARM, to detect the joint signal from rare and common variants within a genomic region while properly accounting for linkage disequilibrium between variants. Our model explores the scale, rather than the center of the odds ratio distribution, allowing for both causative and protective effects. We use cross-validation to assess the evidence for association in a region. We use model averaging to widen the range of disease models under which we will have good power. To assess this approach, we simulate data under a range of disease models with effects at common and/or rare variants. Overall, our method had more power than other well-known rare variant approaches; it performed well when either only rare, or only common variants were causal, and better than other approaches when both common and rare variants contributed to disease.


Assuntos
Variação Genética , Modelos Genéticos , Doenças Raras/genética , Cromossomos Humanos Par 17 , Frequência do Gene , Humanos , Funções Verossimilhança , Desequilíbrio de Ligação , Modelos Logísticos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes
9.
J Hum Genet ; 58(6): 353-61, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23677058

RESUMO

Acute myeloid leukemia (AML) is a clinically heterogeneous disease, with a 5-year disease-free survival (DFS) ranging from under 10% to over 70% for distinct groups of patients. At our institution, cytarabine, etoposide and busulfan are used in first or second remission patients treated with a two-step approach to autologous stem cell transplantation (ASCT). In this study, we tested the hypothesis that polymorphisms in the pharmacokinetic and pharmacodynamic pathway genes of these drugs are associated with DFS in AML patients. A total of 1659 variants in 42 genes were analyzed for their association with DFS using a Cox-proportional hazards model. One hundred and fifty-four genetically European patients were used for the primary analysis. An intronic single nucleotide polymorphism (SNP) in ABCC3 (rs4148405) was associated with a significantly shorter DFS (hazard ratios (HR)=3.2, P=5.6 × 10(-6)) in our primary cohort. In addition, a SNP in the GSTM1-GSTM5 locus, rs3754446, was significantly associated with a shorter DFS in all patients (HR=1.8, P=0.001 for 154 European ancestry; HR=1.7, P=0.028 for 125 non-European patients). Thus, for the first time, genetic variants in drug pathway genes are shown to be associated with DFS in AML patients treated with chemotherapy-based autologous ASCT.


Assuntos
Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Polimorfismo de Nucleotídeo Único , Adolescente , Adulto , Idoso , Bussulfano/uso terapêutico , Mapeamento Cromossômico , Citarabina/uso terapêutico , DNA/genética , Intervalo Livre de Doença , Etoposídeo/uso terapêutico , Feminino , Seguimentos , Loci Gênicos , Genótipo , Glutationa Transferase/genética , Transplante de Células-Tronco Hematopoéticas , Humanos , Masculino , Proteínas de Membrana Transportadoras/genética , Pessoa de Meia-Idade , Fenótipo , Modelos de Riscos Proporcionais , Indução de Remissão , Adulto Jovem
10.
PLOS Digit Health ; 2(3): e0000197, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36913425

RESUMO

A picture description task is a component of Miro Health's platform for self-administration of neurobehavioral assessments. Picture description has been used as a screening tool for identification of individuals with Alzheimer's disease and mild cognitive impairment (MCI), but currently requires in-person administration and scoring by someone with access to and familiarity with a scoring rubric. The Miro Health implementation allows broader use of this assessment through self-administration and automated processing, analysis, and scoring to deliver clinically useful quantifications of the users' speech production, vocal characteristics, and language. Picture description responses were collected from 62 healthy controls (HC), and 33 participants with MCI: 18 with amnestic MCI (aMCI) and 15 with non-amnestic MCI (naMCI). Speech and language features and contrasts between pairs of features were evaluated for differences in their distributions in the participant subgroups. Picture description features were selected and combined using penalized logistic regression to form risk scores for classification of HC versus MCI as well as HC versus specific MCI subtypes. A picture-description based risk score distinguishes MCI and HC with an area under the receiver operator curve (AUROC) of 0.74. When contrasting specific subtypes of MCI and HC, the classifiers have an AUROC of 0.88 for aMCI versus HC and and AUROC of 0.61 for naMCI versus HC. Tests of association of individual features or contrasts of pairs of features with HC versus aMCI identified 20 features with p-values below 5e-3 and False Discovery Rates (FDRs) at or below 0.113, and 61 contrasts with p-values below 5e-4 and FDRs at or below 0.132. Findings suggest that performance of picture description as a screening tool for MCI detection will vary greatly by MCI subtype or by the proportion of various subtypes in an undifferentiated MCI population.

11.
Cell Genom ; 3(12): 100454, 2023 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-38116123

RESUMO

Relating genetic variants to behavior remains a fundamental challenge. To assess the utility of DNA methylation marks in discovering causative variants, we examined their relationship to genetic variation by generating single-nucleus methylomes from the hippocampus of eight inbred mouse strains. At CpG sequence densities under 40 CpG/Kb, cells compensate for loss of methylated sites by methylating additional sites to maintain methylation levels. At higher CpG sequence densities, the exact location of a methylated site becomes more important, suggesting that variants affecting methylation will have a greater effect when occurring in higher CpG densities than in lower. We found this to be true for a variant's effect on transcript abundance, indicating that candidate variants can be prioritized based on CpG sequence density. Our findings imply that DNA methylation influences the likelihood that mutations occur at specific sites in the genome, supporting the view that the distribution of mutations is not random.

12.
Nat Med ; 29(7): 1845-1856, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37464048

RESUMO

An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.


Assuntos
Atenção à Saúde , Aceitação pelo Paciente de Cuidados de Saúde , Humanos , Los Angeles , Irã (Geográfico) , Etnicidade
13.
Nat Commun ; 12(1): 2717, 2021 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-33976150

RESUMO

Circulating cell-free DNA (cfDNA) in the bloodstream originates from dying cells and is a promising noninvasive biomarker for cell death. Here, we propose an algorithm, CelFiE, to accurately estimate the relative abundances of cell types and tissues contributing to cfDNA from epigenetic cfDNA sequencing. In contrast to previous work, CelFiE accommodates low coverage data, does not require CpG site curation, and estimates contributions from multiple unknown cell types that are not available in external reference data. In simulations, CelFiE accurately estimates known and unknown cell type proportions from low coverage and noisy cfDNA mixtures, including from cell types composing less than 1% of the total mixture. When used in two clinically-relevant situations, CelFiE correctly estimates a large placenta component in pregnant women, and an elevated skeletal muscle component in amyotrophic lateral sclerosis (ALS) patients, consistent with the occurrence of muscle wasting typical in these patients. Together, these results show how CelFiE could be a useful tool for biomarker discovery and monitoring the progression of degenerative disease.


Assuntos
Algoritmos , Esclerose Lateral Amiotrófica/genética , Ácidos Nucleicos Livres/genética , Metilação de DNA , Epigênese Genética , Adulto , Esclerose Lateral Amiotrófica/sangue , Esclerose Lateral Amiotrófica/imunologia , Esclerose Lateral Amiotrófica/patologia , Linfócitos B/imunologia , Linfócitos B/metabolismo , Biomarcadores/sangue , Estudos de Casos e Controles , Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/classificação , Feminino , Humanos , Macrófagos/imunologia , Macrófagos/metabolismo , Masculino , Monócitos/imunologia , Monócitos/metabolismo , Músculo Esquelético/imunologia , Músculo Esquelético/metabolismo , Músculo Esquelético/patologia , Neutrófilos/imunologia , Neutrófilos/metabolismo , Especificidade de Órgãos , Gravidez , Trimestres da Gravidez/sangue , Trimestres da Gravidez/genética , Linfócitos T/imunologia , Linfócitos T/metabolismo
14.
Cancer Res ; 81(7): 1695-1703, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33293427

RESUMO

To identify rare variants associated with prostate cancer susceptibility and better characterize the mechanisms and cumulative disease risk associated with common risk variants, we conducted an integrated study of prostate cancer genetic etiology in two cohorts using custom genotyping microarrays, large imputation reference panels, and functional annotation approaches. Specifically, 11,984 men (6,196 prostate cancer cases and 5,788 controls) of European ancestry from Northern California Kaiser Permanente were genotyped and meta-analyzed with 196,269 men of European ancestry (7,917 prostate cancer cases and 188,352 controls) from the UK Biobank. Three novel loci, including two rare variants (European ancestry minor allele frequency < 0.01, at 3p21.31 and 8p12), were significant genome wide in a meta-analysis. Gene-based rare variant tests implicated a known prostate cancer gene (HOXB13), as well as a novel candidate gene (ILDR1), which encodes a receptor highly expressed in prostate tissue and is related to the B7/CD28 family of T-cell immune checkpoint markers. Haplotypic patterns of long-range linkage disequilibrium were observed for rare genetic variants at HOXB13 and other loci, reflecting their evolutionary history. In addition, a polygenic risk score (PRS) of 188 prostate cancer variants was strongly associated with risk (90th vs. 40th-60th percentile OR = 2.62, P = 2.55 × 10-191). Many of the 188 variants exhibited functional signatures of gene expression regulation or transcription factor binding, including a 6-fold difference in log-probability of androgen receptor binding at the variant rs2680708 (17q22). Rare variant and PRS associations, with concomitant functional interpretation of risk mechanisms, can help clarify the full genetic architecture of prostate cancer and other complex traits. SIGNIFICANCE: This study maps the biological relationships between diverse risk factors for prostate cancer, integrating different functional datasets to interpret and model genome-wide data from over 200,000 men with and without prostate cancer.See related commentary by Lachance, p. 1637.


Assuntos
Herança Multifatorial , Neoplasias da Próstata , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genômica , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/genética
15.
BMC Genomics ; 11: 482, 2010 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-20731868

RESUMO

BACKGROUND: Two-way hierarchical clustering, with results visualized as heatmaps, has served as the method of choice for exploring structure in large matrices of expression data since the advent of microarrays. While it has delivered important insights, including a typology of breast cancer subtypes, it suffers from instability in the face of gene or sample selection, and an inability to detect small sets that may be dominated by larger sets such as the estrogen-related genes in breast cancer. The rank-based partitioning algorithm introduced in this paper addresses several of these limitations. It delivers results comparable to two-way hierarchical clustering, and much more. Applied systematically across a range of parameter settings, it enumerates all the partition-inducing gene sets in a matrix of expression values. RESULTS: Applied to four large breast cancer datasets, this alternative exploratory method detects more than thirty sets of co-regulated genes, many of which are conserved across experiments and across platforms. Many of these sets are readily identified in biological terms, e.g., "estrogen", "erbb2", and 8p11-12, and several are clinically significant as prognostic of either increased survival ("adipose", "stromal"...) or diminished survival ("proliferation", "immune/interferon", "histone",...). Of special interest are the sets that effectively factor "immune response" and "stromal signalling". CONCLUSION: The gene sets induced by the enumeration include many of the sets reported in the literature. In this regard these inventories confirm and consolidate findings from microarray-based work on breast cancer over the last decade. But, the enumerations also identify gene sets that have not been studied as of yet, some of which are prognostic of survival. The sets induced are robust, biologically meaningful, and serve to reveal a finer structure in existing breast cancer microarrays.


Assuntos
Neoplasias da Mama/genética , Bases de Dados Genéticas , Genes Neoplásicos/genética , Neoplasias da Mama/imunologia , Neoplasias da Mama/patologia , Análise por Conglomerados , Estudos de Coortes , Sequência Conservada/genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Receptor ErbB-2/genética , Receptor ErbB-2/metabolismo , Células Estromais/metabolismo , Células Estromais/patologia , Análise de Sobrevida , Suécia
17.
J Comput Biol ; 27(4): 599-612, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32077750

RESUMO

Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRSs). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications, including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed-model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work, we present a novel reference-free method to produce a PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial overfitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues and can produce informative in-sample PRSs over a single cohort without overfitting. We then demonstrate several novel applications of reference-free PRSs, including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Herança Multifatorial/genética , Humanos , Modelos Lineares , Fenótipo , Fatores de Risco
18.
Genome Biol ; 21(1): 211, 2020 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-32831138

RESUMO

The observation that disease-associated genetic variants typically reside outside of exons has inspired widespread investigation into the genetic basis of transcriptional regulation. While associations between the mRNA abundance of a gene and its proximal SNPs (cis-eQTLs) are now readily identified, identification of high-quality distal associations (trans-eQTLs) has been limited by a heavy multiple testing burden and the proneness to false-positive signals. To address these issues, we develop GBAT, a powerful gene-based pipeline that allows robust detection of high-quality trans-gene regulation signal.


Assuntos
Regulação da Expressão Gênica , Testes Genéticos/métodos , Estudo de Associação Genômica Ampla , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , RNA Mensageiro
19.
Genetics ; 211(4): 1179-1189, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30692194

RESUMO

High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have [Formula: see text], standard two-step methods all have [Formula: see text]-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Análise de Componente Principal/métodos , Animais , Estudo de Associação Genômica Ampla/normas , Humanos , Modelos Genéticos , Análise de Componente Principal/normas , Locos de Características Quantitativas , Reprodutibilidade dos Testes
20.
Nat Commun ; 10(1): 4788, 2019 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-31636271

RESUMO

Genetic studies of metabolites have identified thousands of variants, many of which are associated with downstream metabolic and obesogenic disorders. However, these studies have relied on univariate analyses, reducing power and limiting context-specific understanding. Here we aim to provide an integrated perspective of the genetic basis of metabolites by leveraging the Finnish Metabolic Syndrome In Men (METSIM) cohort, a unique genetic resource which contains metabolic measurements, mostly lipids, across distinct time points as well as information on statin usage. We increase effective sample size by an average of two-fold by applying the Covariates for Multi-phenotype Studies (CMS) approach, identifying 588 significant SNP-metabolite associations, including 228 new associations. Our analysis pinpoints a small number of master metabolic regulator genes, balancing the relative proportion of dozens of metabolite levels. We further identify associations to changes in metabolic levels across time as well as genetic interactions with statin at both the master metabolic regulator and genome-wide level.


Assuntos
Pleiotropia Genética , Síndrome Metabólica/genética , Metaboloma/genética , Idoso , Aminoácidos/genética , Aminoácidos/metabolismo , Estudos de Coortes , Ácidos Graxos/genética , Ácidos Graxos/metabolismo , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Lipoproteínas HDL/genética , Lipoproteínas HDL/metabolismo , Lipoproteínas IDL/genética , Lipoproteínas IDL/metabolismo , Lipoproteínas LDL/genética , Lipoproteínas LDL/metabolismo , Lipoproteínas VLDL/genética , Lipoproteínas VLDL/metabolismo , Espectroscopia de Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA