Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
bioRxiv ; 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38854009

RESUMO

Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of six ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle/ASMC-clust , and SINGER , using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust used samples ten times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.

2.
bioRxiv ; 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38854110

RESUMO

With advances in sequencing technology, forensic workers can access genetic information from increasingly challenging samples. A recently published computational approach, IBDGem , analyzes sequencing reads, including from low-coverage samples, in order to arrive at likelihood ratios for tests of identity. Here, we show that likelihood ratios produced by IBDGem test a null hypothesis different from the traditional one used in a forensic genetics context. In particular, rather than testing the hypothesis that the sample comes from a person unrelated to the person of interest, IBDGem tests the hypothesis that the sample comes from an individual who is included in the reference sample used to run the method. This null hypothesis is not generally of forensic interest, because the defense hypothesis is not that the evidence comes from an individual included in a reference panel. Further, it does not take into account genetic variation outside the reference panel, and as a result, the computed likelihood ratios can be much larger than likelihood ratios computed for the standard forensic null hypothesis, often by many orders of magnitude, thus potentially creating an impression of stronger evidence for identity than is warranted. We lay out this result and illustrate it with examples, giving suggestions for directions that might lead to likelihood ratios that have the traditional interpretation.

3.
bioRxiv ; 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38496530

RESUMO

In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including covariance matrix eigenvectors as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.

4.
Proc Natl Acad Sci U S A ; 121(12): e2319496121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38470926

RESUMO

Without the ability to control or randomize environments (or genotypes), it is difficult to determine the degree to which observed phenotypic differences between two groups of individuals are due to genetic vs. environmental differences. However, some have suggested that these concerns may be limited to pathological cases, and methods have appeared that seem to give-directly or indirectly-some support to claims that aggregate heritable variation within groups can be related to heritable variation among groups. We consider three families of approaches: the "between-group heritability" sometimes invoked in behavior genetics, the statistic [Formula: see text] used in empirical work in evolutionary quantitative genetics, and methods based on variation in ancestry in an admixed population, used in anthropological and statistical genetics. We take up these examples to show mathematically that information on within-group genetic and phenotypic information in the aggregate cannot separate among-group differences into genetic and environmental components, and we provide simulation results that support our claims. We discuss these results in terms of the long-running debate on this topic.


Assuntos
Evolução Biológica , Genética Populacional , Humanos , Fenótipo , Genótipo , Simulação por Computador , Variação Genética
5.
bioRxiv ; 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-37986815

RESUMO

Without the ability to control or randomize environments (or genotypes), it is difficult to determine the degree to which observed phenotypic differences between two groups of individuals are due to genetic vs. environmental differences. However, some have suggested that these concerns may be limited to pathological cases, and methods have appeared that seem to give-directly or indirectly-some support to claims that aggregate heritable variation within groups can be related to heritable variation among groups. We consider three families of approaches: the "between-group heritability" sometimes invoked in behavior genetics, the statistic PST used in empirical work in evolutionary quantitative genetics, and methods based on variation in ancestry in an admixed population, used in anthropological and statistical genetics. We take up these examples to show mathematically that information on within-group genetic and phenotypic information in the aggregate cannot separate among-group differences into genetic and environmental components, and we provide simulation results that support our claims. We discuss these results in terms of the long-running debate on this topic.

6.
Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38065072

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas/genética , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética
7.
bioRxiv ; 2023 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-37873208

RESUMO

The demographic history of a population drives the pattern of genetic variation and is encoded in the gene-genealogical trees of the sampled alleles. However, existing methods to infer demographic history from genetic data tend to use relatively low-dimensional summaries of the genealogy, such as allele frequency spectra. As a step toward capturing more of the information encoded in the genome-wide sequence of genealogical trees, here we propose a novel framework called the genealogical likelihood (gLike), which derives the full likelihood of a genealogical tree under any hypothesized demographic history. Employing a graph-based structure, gLike summarizes across independent trees the relationships among all lineages in a tree with all possible trajectories of population memberships through time and efficiently computes the exact marginal probability under a parameterized demographic model. Through extensive simulations and empirical applications on populations that have experienced multiple admixtures, we showed that gLike can accurately estimate dozens of demographic parameters when the true genealogy is known, including ancestral population sizes, admixture timing, and admixture proportions. Moreover, when using genealogical trees inferred from genetic data, we showed that gLike outperformed conventional demographic inference methods that leverage only the allele-frequency spectrum and yielded parameter estimates that align with established historical knowledge of the past demographic histories for populations like Latino Americans and Native Hawaiians. Furthermore, our framework can trace ancestral histories by analyzing a sample from the admixed population without proxies for its source populations, removing the need to sample ancestral populations that may no longer exist. Taken together, our proposed gLike framework harnesses underutilized genealogical information to offer exceptional sensitivity and accuracy in inferring complex demographies for humans and other species, particularly as estimation of genome-wide genealogies improves.

8.
iScience ; 26(10): 107992, 2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37841589

RESUMO

The 20 short tandem repeat (STR) loci of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS loci are thought to contain little information about ancestry or traits. However, in the past 20 years, a growing field has identified hundreds of thousands of genotype-trait associations. Here, we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. Although this study cannot establish or quantify associations between CODIS genotypes and phenotypes, we find that the regions around the CODIS loci are enriched for both known pathogenic variants (> 90th percentile) and for trait-associated SNPs identified in genome-wide association studies (GWAS) (≥ 95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs.

9.
Cell Genom ; 3(5): 100297, 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37228747

RESUMO

Sex differences in complex traits are suspected to be in part due to widespread gene-by-sex interactions (GxSex), but empirical evidence has been elusive. Here, we infer the mixture of ways in which polygenic effects on physiological traits covary between males and females. We find that GxSex is pervasive but acts primarily through systematic sex differences in the magnitude of many genetic effects ("amplification") rather than in the identity of causal variants. Amplification patterns account for sex differences in trait variance. In some cases, testosterone may mediate amplification. Finally, we develop a population-genetic test linking GxSex to contemporary natural selection and find evidence of sexually antagonistic selection on variants affecting testosterone levels. Our results suggest that amplification of polygenic effects is a common mode of GxSex that may contribute to sex differences and fuel their evolution.

10.
bioRxiv ; 2023 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37066144

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.

11.
bioRxiv ; 2023 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-36945578

RESUMO

The 20 short tandem repeat (STR) markers of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS markers are thought to contain information relevant to identification only (such as a human fingerprint would), with little information about ancestry or traits. However, in the past 20 years, a quickly growing field has identified hundreds of thousands of genotype-trait associations. Here we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. We find that the regions around the CODIS markers are enriched for both known pathogenic variants (>90th percentile) and for SNPs identified as trait-associated in genome-wide association studies (GWAS) (≥95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs. Although it is not obvious how much phenotypic information CODIS would need to convey to strain the "DNA fingerprint" analogy, the CODIS markers, considered as a set, are in regions unusually dense with variants with known phenotypic associations.

12.
Genetics ; 222(4)2022 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-36218390

RESUMO

The 1997 film Gattaca has emerged as a canonical pop culture reference used to discuss modern controversies in genetics and bioethics. It appeared in theaters a few years prior to the announcement of the "completion" of the human genome (2000), as the science of human genetics was developing a renewed sense of its social implications. The story is set in a near-future world in which parents can, with technological assistance, influence the genetic composition of their offspring on the basis of predicted life outcomes. The current moment-25 years after the film's release-offers an opportunity to reflect on where society currently stands with respect to the ideas explored in Gattaca. Here, we review and discuss several active areas of genetic research-genetic prediction, embryo selection, forensic genetics, and others-that interface directly with scenes and concepts in the film. On its silver anniversary, we argue that Gattaca remains an important reflection of society's expectations and fears with respect to the ways that genetic science has manifested in the real world. In accompanying supplemental material, we offer some thought questions to guide group discussions inside and outside of the classroom.

14.
Trends Genet ; 38(2): 113-115, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34740452

RESUMO

Advocates of transparency in science often point to the benefits of open practices for the scientific process. Here, we focus on a possibly underappreciated effect of standards for transparency: their influence on non-scientific decisions. As a case study, we consider the current state of probabilistic genotyping software in forensics.


Assuntos
Genética Forense , Ciências Forenses , Humanos
16.
Am J Phys Anthropol ; 175(2): 406-421, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33772750

RESUMO

OBJECTIVES: In genetic admixture processes, source groups for an admixed population possess distinct patterns of genotype and phenotype at the onset of admixture. Particularly in the context of recent and ongoing admixture, such differences are sometimes taken to serve as markers of ancestry for individuals-that is, phenotypes initially associated with the ancestral background in one source population are assumed to continue to reflect ancestry in that population. Such phenotypes might possess ongoing significance in social categorizations of individuals, owing in part to perceived continuing correlations with ancestry. However, genotypes or phenotypes initially associated with ancestry in one specific source population have been seen to decouple from overall admixture levels, so that they no longer serve as proxies for genetic ancestry. Here, we aim to develop an understanding of the joint dynamics of admixture levels and phenotype distributions in an admixed population. METHODS: We devise a mechanistic model, consisting of an admixture model, a quantitative trait model, and a mating model. We analyze the behavior of the mechanistic model in relation to the model parameters. RESULTS: We find that it is possible for the decoupling of genetic ancestry and phenotype to proceed quickly, and that it occurs faster if the phenotype is driven by fewer loci. Positive assortative mating attenuates the process of dissociation relative to a scenario in which mating is random with respect to genetic admixture and with respect to phenotype. CONCLUSIONS: The mechanistic framework suggests that in an admixed population, a trait that initially differed between source populations might serve as a reliable proxy for ancestry for only a short time, especially if the trait is determined by few loci. It follows that a social categorization based on such a trait is increasingly uninformative about genetic ancestry and about other traits that differed between source populations at the onset of admixture.


Assuntos
Frequência do Gene/genética , Genética Populacional , Antropologia Física , Feminino , Fluxo Gênico/genética , Genoma Humano/genética , Genótipo , Humanos , Masculino , Fenótipo , Pigmentação da Pele/genética
17.
Elife ; 92020 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-31908268

RESUMO

Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.


Assuntos
Gerenciamento de Dados/estatística & dados numéricos , Bases de Dados Genéticas/normas , Privacidade Genética , Humanos
19.
Evol Med Public Health ; 2019(1): 26-34, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30838127

RESUMO

Recent analyses of polygenic scores have opened new discussions concerning the genetic basis and evolutionary significance of differences among populations in distributions of phenotypes. Here, we highlight limitations in research on polygenic scores, polygenic adaptation and population differences. We show how genetic contributions to traits, as estimated by polygenic scores, combine with environmental contributions so that differences among populations in trait distributions need not reflect corresponding differences in genetic propensity. Under a null model in which phenotypes are selectively neutral, genetic propensity differences contributing to phenotypic differences among populations are predicted to be small. We illustrate this null hypothesis in relation to health disparities between African Americans and European Americans, discussing alternative hypotheses with selective and environmental effects. Close attention to the limitations of research on polygenic phenomena is important for the interpretation of their relationship to human population differences.

20.
Genetics ; 211(1): 235-262, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30389808

RESUMO

Genome-wide association studies (GWAS) have revealed that many traits are highly polygenic, in that their within-population variance is governed, in part, by small-effect variants at many genetic loci. Standard population-genetic methods for inferring evolutionary history are ill-suited for polygenic traits: when there are many variants of small effect, signatures of natural selection are spread across the genome and are subtle at any one locus. In the last several years, various methods have emerged for detecting the action of natural selection on polygenic scores, sums of genotypes weighted by GWAS effect sizes. However, most existing methods do not reveal the timing or strength of selection. Here, we present a set of methods for estimating the historical time course of a population-mean polygenic score using local coalescent trees at GWAS loci. These time courses are estimated by using coalescent theory to relate the branch lengths of trees to allele-frequency change. The resulting time course can be tested for evidence of natural selection. We present theory and simulations supporting our procedures, as well as estimated time courses of polygenic scores for human height. Because of its grounding in coalescent theory, the framework presented here can be extended to a variety of demographic scenarios, and its usefulness will increase as both GWAS and ancestral-recombination-graph inference continue to progress.


Assuntos
Estatura/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Herança Multifatorial , Evolução Molecular , Frequência do Gene , Estudo de Associação Genômica Ampla/normas , Humanos , Locos de Características Quantitativas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA