Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 120
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Cell ; 185(16): 3041-3055.e25, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35917817

RESUMEN

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Variaciones en el Número de Copia de ADN/genética , Dosificación de Gen , Haploinsuficiencia/genética , Humanos
2.
Am J Hum Genet ; 111(4): 654-667, 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-38471507

RESUMEN

Allele-specific methylation (ASM) is an epigenetic modification whereby one parental allele becomes methylated and the other unmethylated at a specific locus. ASM is most often driven by the presence of nearby heterozygous variants that influence methylation, but also occurs somatically in the context of genomic imprinting. In this study, we investigate ASM using publicly available single-cell reduced representation bisulfite sequencing (scRRBS) data on 608 B cells sampled from six healthy B cell samples and 1,230 cells from 11 chronic lymphocytic leukemia (CLL) samples. We developed a likelihood-based criterion to test whether a CpG exhibited ASM, based on the distributions of methylated and unmethylated reads both within and across cells. Applying our likelihood ratio test, 65,998 CpG sites exhibited ASM in healthy B cell samples according to a Bonferroni criterion (p < 8.4 × 10-9), and 32,862 CpG sites exhibited ASM in CLL samples (p < 8.5 × 10-9). We also called ASM at the sample level. To evaluate the accuracy of our method, we called heterozygous variants from the scRRBS data, which enabled variant-based calls of ASM within each cell. Comparing sample-level ASM calls to the variant-based measures of ASM, we observed a positive predictive value of 76%-100% across samples. We observed high concordance of ASM across samples and an overrepresentation of ASM in previously reported imprinted genes and genes with imprinting binding motifs. Our study demonstrates that single-cell bisulfite sequencing is a potentially powerful tool to investigate ASM, especially as studies expand to increase the number of samples and cells sequenced.


Asunto(s)
Metilación de ADN , Leucemia Linfocítica Crónica de Células B , Sulfitos , Humanos , Metilación de ADN/genética , Alelos , Leucemia Linfocítica Crónica de Células B/genética , Funciones de Verosimilitud , Impresión Genómica/genética , Islas de CpG/genética
3.
Am J Hum Genet ; 111(7): 1448-1461, 2024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-38821058

RESUMEN

Both trio and population designs are popular study designs for identifying risk genetic variants in genome-wide association studies (GWASs). The trio design, as a family-based design, is robust to confounding due to population structure, whereas the population design is often more powerful due to larger sample sizes. Here, we propose KnockoffHybrid, a knockoff-based statistical method for hybrid analysis of both the trio and population designs. KnockoffHybrid provides a unified framework that brings together the advantages of both designs and produces powerful hybrid analysis while controlling the false discovery rate (FDR) in the presence of linkage disequilibrium and population structure. Furthermore, KnockoffHybrid has the flexibility to leverage different types of summary statistics for hybrid analyses, including expression quantitative trait loci (eQTL) and GWAS summary statistics. We demonstrate in simulations that KnockoffHybrid offers power gains over non-hybrid methods for the trio and population designs with the same number of cases while controlling the FDR with complex correlation among variants and population structure among subjects. In hybrid analyses of three trio cohorts for autism spectrum disorders (ASDs) from the Autism Speaks MSSNG, Autism Sequencing Consortium, and Autism Genome Project with GWAS summary statistics from the iPSYCH project and eQTL summary statistics from the MetaBrain project, KnockoffHybrid outperforms conventional methods by replicating several known risk genes for ASDs and identifying additional associations with variants in other genes, including the PRAME family genes involved in axon guidance and which may act as common targets for human speech/language evolution and related disorders.


Asunto(s)
Trastorno del Espectro Autista , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Humanos , Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Simulación por Computador , Modelos Genéticos
4.
Am J Hum Genet ; 110(1): 23-29, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-36480927

RESUMEN

We present LDAK-GBAT, a tool for gene-based association testing using summary statistics from genome-wide association studies that is computationally efficient, produces well-calibrated p values, and is significantly more powerful than existing tools. LDAK-GBAT takes approximately 30 min to analyze imputed data (2.9M common, genic SNPs), requiring less than 10 Gb memory. It shows good control of type 1 error given an appropriate reference panel. Across 109 phenotypes (82 from the UK Biobank, 18 from the Million Veteran Program, and nine from the Psychiatric Genetics Consortium), LDAK-GBAT finds on average 19% (SE: 1%) more significant genes than the existing tool sumFREGAT-ACAT, with even greater gains in comparison with MAGMA, GCTA-fastBAT, sumFREGAT-SKAT-O, and sumFREGAT-PCA.


Asunto(s)
Pruebas Genéticas , Estudio de Asociación del Genoma Completo , Fenotipo , Polimorfismo de Nucleótido Simple/genética
5.
Am J Hum Genet ; 110(8): 1304-1318, 2023 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-37433298

RESUMEN

Multimorbidity is a rising public health challenge with important implications for health management and policy. The most common multimorbidity pattern is the combination of cardiometabolic and osteoarticular diseases. Here, we study the genetic underpinning of the comorbidity between type 2 diabetes and osteoarthritis. We find genome-wide genetic correlation between the two diseases and robust evidence for association-signal colocalization at 18 genomic regions. We integrate multi-omics and functional information to resolve the colocalizing signals and identify high-confidence effector genes, including FTO and IRX3, which provide proof-of-concept insights into the epidemiologic link between obesity and both diseases. We find enrichment for lipid metabolism and skeletal formation pathways for signals underpinning the knee and hip osteoarthritis comorbidities with type 2 diabetes, respectively. Causal inference analysis identifies complex effects of tissue-specific gene expression on comorbidity outcomes. Our findings provide insights into the biological basis for the type 2 diabetes-osteoarthritis disease co-occurrence.


Asunto(s)
Diabetes Mellitus Tipo 2 , Osteoartritis , Humanos , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/genética , Comorbilidad , Osteoartritis/epidemiología , Osteoartritis/genética , Obesidad/complicaciones , Obesidad/epidemiología , Obesidad/genética , Causalidad , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Polimorfismo de Nucleótido Simple , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética
6.
Trends Genet ; 38(10): 1013-1018, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35581032

RESUMEN

Some rare genetic disorders, such as retinitis pigmentosa or Alport syndrome, are caused by the co-inheritance of DNA variants at two different genetic loci (digenic inheritance). To capture the effects of these disease-causing variants and their possible interactive effects, various statistical methods have been developed in human genetics. Analogous developments have taken place in the field of machine learning, particularly for the field that is now called Big Data. In the past, these two areas have grown independently and have started to converge only in recent years. We discuss an overview of each of the two fields, paying special attention to machine learning methods for uncovering the combined effects of pairs of variants on human disease.


Asunto(s)
Patrón de Herencia , Herencia Multifactorial , Humanos , Patrón de Herencia/genética , Aprendizaje Automático , Mutación , Linaje
7.
Am J Hum Genet ; 109(8): 1388-1404, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35931050

RESUMEN

Transcriptome-wide association studies (TWASs) are a powerful approach to identify genes whose expression is associated with complex disease risk. However, non-causal genes can exhibit association signals due to confounding by linkage disequilibrium (LD) patterns and eQTL pleiotropy at genomic risk regions, which necessitates fine-mapping of TWAS signals. Here, we present MA-FOCUS, a multi-ancestry framework for the improved identification of genes underlying traits of interest. We demonstrate that by leveraging differences in ancestry-specific patterns of LD and eQTL signals, MA-FOCUS consistently outperforms single-ancestry fine-mapping approaches with equivalent total sample sizes across multiple metrics. We perform TWASs for 15 blood traits using genome-wide summary statistics (average nEA = 511 k, nAA = 13 k) and lymphoblastoid cell line eQTL data from cohorts of primarily European and African continental ancestries. We recapitulate evidence demonstrating shared genetic architectures for eQTL and blood traits between the two ancestry groups and observe that gene-level effects correlate 20% more strongly across ancestries than SNP-level effects. Lastly, we perform fine-mapping using MA-FOCUS and find evidence that genes at TWAS risk regions are more likely to be shared across ancestries than they are to be ancestry specific. Using multiple lines of evidence to validate our findings, we find that gene sets produced by MA-FOCUS are more enriched in hematopoietic categories than alternative approaches (p = 2.36 × 10-15). Our work demonstrates that including and appropriately accounting for genetic diversity can drive more profound insights into the genetic architecture of complex traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Transcriptoma/genética
8.
Am J Hum Genet ; 109(7): 1286-1297, 2022 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-35716666

RESUMEN

Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , LDL-Colesterol , Expresión Génica , Humanos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
9.
Am J Hum Genet ; 109(9): 1638-1652, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-36055212

RESUMEN

Hypoxia-inducible factor prolyl hydroxylase inhibitors (HIF-PHIs) are currently under clinical development for treating anemia in chronic kidney disease (CKD), but it is important to monitor their cardiovascular safety. Genetic variants can be used as predictors to help inform the potential risk of adverse effects associated with drug treatments. We therefore aimed to use human genetics to help assess the risk of adverse cardiovascular events associated with therapeutically altered EPO levels to help inform clinical trials studying the safety of HIF-PHIs. By performing a genome-wide association meta-analysis of EPO (n = 6,127), we identified a cis-EPO variant (rs1617640) lying in the EPO promoter region. We validated this variant as most likely causal in controlling EPO levels by using genetic and functional approaches, including single-base gene editing. Using this variant as a partial predictor for therapeutic modulation of EPO and large genome-wide association data in Mendelian randomization tests, we found no evidence (at p < 0.05) that genetically predicted long-term rises in endogenous EPO, equivalent to a 2.2-unit increase, increased risk of coronary artery disease (CAD, OR [95% CI] = 1.01 [0.93, 1.07]), myocardial infarction (MI, OR [95% CI] = 0.99 [0.87, 1.15]), or stroke (OR [95% CI] = 0.97 [0.87, 1.07]). We could exclude increased odds of 1.15 for cardiovascular disease for a 2.2-unit EPO increase. A combination of genetic and functional studies provides a powerful approach to investigate the potential therapeutic profile of EPO-increasing therapies for treating anemia in CKD.


Asunto(s)
Anemia , Enfermedad de la Arteria Coronaria , Infarto del Miocardio , Insuficiencia Renal Crónica , Anemia/tratamiento farmacológico , Anemia/genética , Enfermedad de la Arteria Coronaria/genética , Estudio de Asociación del Genoma Completo , Humanos , Análisis de la Aleatorización Mendeliana , Infarto del Miocardio/genética , Insuficiencia Renal Crónica/genética
10.
BMC Genomics ; 25(1): 375, 2024 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-38627641

RESUMEN

BACKGROUND: Approximately 95% of samples analyzed in univariate genome-wide association studies (GWAS) are of European ancestry. This bias toward European ancestry populations in association screening also exists for other analyses and methods that are often developed and tested on European ancestry only. However, existing data in non-European populations, which are often of modest sample size, could benefit from innovative approaches as recently illustrated in the context of polygenic risk scores. METHODS: Here, we extend and assess the potential limitations and gains of our multi-trait GWAS pipeline, JASS (Joint Analysis of Summary Statistics), for the analysis of non-European ancestries. To this end, we conducted the joint GWAS of 19 hematological traits and glycemic traits across five ancestries (European (EUR), admixed American (AMR), African (AFR), East Asian (EAS), and South-East Asian (SAS)). RESULTS: We detected 367 new genome-wide significant associations in non-European populations (15 in Admixed American (AMR), 72 in African (AFR) and 280 in East Asian (EAS)). New associations detected represent 5%, 17% and 13% of associations in the AFR, AMR and EAS populations, respectively. Overall, multi-trait testing increases the replication of European associated loci in non-European ancestry by 15%. Pleiotropic effects were highly similar at significant loci across ancestries (e.g. the mean correlation between multi-trait genetic effects of EUR and EAS ancestries was 0.88). For hematological traits, strong discrepancies in multi-trait genetic effects are tied to known evolutionary divergences: the ARKC1 loci, which is adaptive to overcome p.vivax induced malaria. CONCLUSIONS: Multi-trait GWAS can be a valuable tool to narrow the genetic knowledge gap between European and non-European populations.


Asunto(s)
Pueblo Asiatico , Población Negra , Estudio de Asociación del Genoma Completo , Humanos , Pueblo Asiatico/genética , Población Negra/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple , Pueblo Europeo/genética
11.
Am J Hum Genet ; 108(11): 2099-2111, 2021 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-34678161

RESUMEN

The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.


Asunto(s)
Atención a la Salud/organización & administración , Predisposición Genética a la Enfermedad , Hepatopatías/genética , Subfamilia B de Transportador de Casetes de Unión a ATP/genética , Registros Electrónicos de Salud , Haplotipos , Heterocigoto , Hispánicos o Latinos/genética , Homocigoto , Humanos , Puerto Rico
12.
Biostatistics ; 2023 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-37897441

RESUMEN

Microbiome scientists critically need modern tools to explore and analyze microbial evolution. Often this involves studying the evolution of microbial genomes as a whole. However, different genes in a single genome can be subject to different evolutionary pressures, which can result in distinct gene-level evolutionary histories. To address this challenge, we propose to treat estimated gene-level phylogenies as data objects, and present an interactive method for the analysis of a collection of gene phylogenies. We use a local linear approximation of phylogenetic tree space to visualize estimated gene trees as points in low-dimensional Euclidean space, and address important practical limitations of existing related approaches, allowing an intuitive visualization of complex data objects. We demonstrate the utility of our proposed approach through microbial data analyses, including by identifying outlying gene histories in strains of Prevotella, and by contrasting Streptococcus phylogenies estimated using different gene sets. Our method is available as an open-source R package, and assists with estimating, visualizing, and interacting with a collection of bacterial gene phylogenies.

13.
BMC Bioinformatics ; 24(1): 170, 2023 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-37101120

RESUMEN

BACKGROUND: Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing. RESULTS: We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all. CONCLUSION: Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Densidad de Población , Transcriptoma
14.
Rep Prog Phys ; 86(5)2023 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-36944245

RESUMEN

This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase ofquasi-linkage equilibrium(QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal equilibrium state in statistical mechanics, was discovered by M Kimura for a two-locus two-allele model, and was extended and generalized to the global genome scale byNeher&Shraiman (2011). What we will refer to as the Kimura-Neher-Shraiman theory describes a population evolving due to the mutations, recombination, natural selection and possibly genetic drift. A QLE phase exists at sufficiently high recombination rate (r) and/or mutation ratesµwith respect to selection strength. We show how in QLE it is possible to infer the epistatic parameters of the fitness function from the knowledge of the (dynamical) distribution of genotypes in a population. We further consider the breakdown of the QLE regime for high enough selection strength. We review recent results for the selection-mutation and selection-recombination dynamics. Finally, we identify and characterize a new phase which we call the non-random coexistence where variability persists in the population without either fixating or disappearing.


Asunto(s)
Modelos Genéticos , Selección Genética , Desequilibrio de Ligamiento , Mutación , Genotipo , Genética de Población
15.
Brief Bioinform ; 22(1): 515-525, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31982909

RESUMEN

By reviewing previous CpG-related studies, we consider that the transcription regulation of about half of the human genes, mostly housekeeping (HK) genes, involves CpG islands (CGIs), their methylation states, CpG spacing and other chromosomal parameters. However, the precise CGI definition and positioning of CGIs within gene structures, as well as specific CGI-associated regulatory mechanisms, all remain to be explained at individual gene and gene-family levels, together with consideration of species and lineage specificity. Although previous studies have already classified CGIs into high-CpG (HCGI), intermediate-CpG (ICGI) and low-CpG (LCGI) densities based on CpG density variation, the correlation between CGI density and gene expression regulation, such as co-regulation of CGIs and TATA box on HK genes, remains to be elucidated. First, this study introduces such a problem-solving protocol for human-genome annotation, which is based on a combination of GTEx, JBLA and Gene Ontology (GO) analysis. Next, we discuss why CGI-associated genes are most likely regulated by HCGI and tend to be HK genes; the HCGI/TATA± and LCGI/TATA± combinations show different GO enrichment, whereas the ICGI/TATA± combination is less characteristic based on GO enrichment analysis. Finally, we demonstrate that Hadoop MapReduce-based MR-JBLA algorithm is more efficient than the original JBLA in k-mer counting and CGI-associated gene analysis.


Asunto(s)
Islas de CpG , Genes Esenciales , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Metilación de ADN , Humanos , TATA Box
16.
Genet Med ; 25(3): 100355, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36496179

RESUMEN

PURPOSE: The congenital Long QT Syndrome (LQTS) and Brugada Syndrome (BrS) are Mendelian autosomal dominant diseases that frequently precipitate fatal cardiac arrhythmias. Incomplete penetrance is a barrier to clinical management of heterozygotes harboring variants in the major implicated disease genes KCNQ1, KCNH2, and SCN5A. We apply and evaluate a Bayesian penetrance estimation strategy that accounts for this phenomenon. METHODS: We generated Bayesian penetrance models for KCNQ1-LQT1 and SCN5A-LQT3 using variant-specific features and clinical data from the literature, international arrhythmia genetic centers, and population controls. We analyzed the distribution of posterior penetrance estimates across 4 genotype-phenotype relationships and compared continuous estimates with ClinVar annotations. Posterior estimates were mapped onto protein structure. RESULTS: Bayesian penetrance estimates of KCNQ1-LQT1 and SCN5A-LQT3 are empirically equivalent to 10 and 5 clinically phenotype heterozygotes, respectively. Posterior penetrance estimates were bimodal for KCNQ1-LQT1 and KCNH2-LQT2, with a higher fraction of missense variants with high penetrance among KCNQ1 variants. There was a wide distribution of variant penetrance estimates among identical ClinVar categories. Structural mapping revealed heterogeneity among "hot spot" regions and featured high penetrance estimates for KCNQ1 variants in contact with calmodulin and the S6 domain. CONCLUSIONS: Bayesian penetrance estimates provide a continuous framework for variant interpretation.


Asunto(s)
Canalopatías , Canal de Potasio KCNQ1 , Humanos , Canal de Potasio KCNQ1/genética , Mutación , Penetrancia , Teorema de Bayes , Canalopatías/genética , Arritmias Cardíacas/genética
17.
Biom J ; 65(1): e2100309, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35839474

RESUMEN

False discovery rates are routinely controlled by application of the Benjamini-Hochberg step-up procedure to a set of p-values. A method is demonstrated for representing the values so obtained (the BH-FDRs) on a quantile-quantile (Q-Q) plot of the p-values transformed to the negative-logarithmic scale. Recognition of this connection between the BH-FDR and the Q-Q plot facilitates both understanding of the meaning of the BH-FDR and interpretation of the BH-FDR in a particular data set.

18.
BMC Genomics ; 23(1): 842, 2022 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-36539699

RESUMEN

BACKGROUND: Organisms in the wild can acquire disease- and stress-resistance traits that outstrip the programs endogenous to humans. Finding the molecular basis of such natural resistance characters is a key goal of evolutionary genetics. Standard statistical-genetic methods toward this end can perform poorly in organismal systems that lack high rates of meiotic recombination, like Caenorhabditis worms. RESULTS: Here we discovered unique ER stress resistance in a wild Kenyan C. elegans isolate, which in inter-strain crosses was passed by hermaphrodite mothers to hybrid offspring. We developed an unbiased version of the reciprocal hemizygosity test, RH-seq, to explore the genetics of this parent-of-origin-dependent phenotype. Among top-scoring gene candidates from a partial-coverage RH-seq screen, we focused on the neuronally-expressed, cuticlin-like gene cutl-24 for validation. In gene-disruption and controlled crossing experiments, we found that cutl-24 was required in Kenyan hermaphrodite mothers for ER stress tolerance in their inter-strain hybrid offspring; cutl-24 was also a contributor to the trait in purebred backgrounds. CONCLUSIONS: These data establish the Kenyan strain allele of cutl-24 as a determinant of a natural stress-resistant state, and they set a precedent for the dissection of natural trait diversity in invertebrate animals without the need for a panel of meiotic recombinants.


Asunto(s)
Proteínas de Caenorhabditis elegans , Caenorhabditis , Humanos , Animales , Caenorhabditis elegans/genética , Kenia , Fenotipo , Proteínas de Caenorhabditis elegans/genética
19.
Am J Hum Genet ; 104(6): 1025-1039, 2019 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-31056107

RESUMEN

Genome-wide association studies (GWASs) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on "gold standard" genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of similarity-based prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to 20 well-powered GWASs and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression; genes prioritized based on gene sets had higher per-SNP heritability than those prioritized based on gene expression. Additionally, in a direct comparison of three methods, DEPICT and MAGMA outperformed NetWAS. We also evaluated combinations of methods; our results indicated that combining data sources and algorithms can help prioritize higher-quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any similarity-based method that provides genome-wide prioritization of genes, variants, or gene sets and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.


Asunto(s)
Algoritmos , Sitios Genéticos , Estudio de Asociación del Genoma Completo/métodos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Benchmarking , Mapeo Cromosómico , Humanos , Fenotipo
20.
Biostatistics ; 22(2): 365-380, 2021 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-31612223

RESUMEN

The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal detection is particularly popular in neuroimaging and genetics. We provide evidence that using a classifier's accuracy as a test statistic can be an underpowered strategy for finding differences between populations, compared to a bona fide statistical test. It is also computationally more demanding than a statistical test. Via simulation, we compare test statistics that are based on classification accuracy, to others based on multivariate test statistics. We find that the probability of detecting differences between two distributions is lower for accuracy-based statistics. We examine several candidate causes for the low power of accuracy-tests. These causes include: the discrete nature of the accuracy-test statistic, the type of signal accuracy-tests are designed to detect, their inefficient use of the data, and their suboptimal regularization. When the purpose of the analysis is the evaluation of a particular classifier, not signal detection, we suggest several improvements to increase power. In particular, to replace V-fold cross-validation with the Leave-One-Out Bootstrap.


Asunto(s)
Neuroimagen , Aprendizaje Automático Supervisado , Simulación por Computador , Humanos , Probabilidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA