Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Elife ; 132024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-38913556

RESUMO

LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis-interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.


Assuntos
Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Humanos , Japão , Reino Unido , Polimorfismo de Nucleotídeo Único/genética , Modelos Genéticos , Fenótipo , Variação Genética , Herança Multifatorial/genética , Bancos de Espécimes Biológicos
2.
bioRxiv ; 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38766004

RESUMO

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe research participants have zero non-single SNP switches, compared to 42.4% of white British trios. South Asian ancestry 23andMe research participants have the highest median SER amongst the 23andMe populations, but it is still remarkably low at 0.46%. We also investigate the relationship between identity-by-descent (IBD) and SER, finding that switch errors tend to occur in regions of little or no IBD segment coverage. SHAPEIT and Beagle excel at 'intra-chromosomal' phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method, called HAPTIC ( HAP lotype TI ling and C lustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses identity-by-descent (IBD) segments to phase blocks of variants on different chromosomes. HAPTIC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs bipartite clustering on the signed graph using spectral clustering. We test HAPTIC on 1022 UKB trios, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites). We also ran HAPTIC in the 23andMe database and found a median phase error rate (the rate of mismatching alleles between the inferred and true phase) of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC's precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.

3.
Histopathology ; 85(1): 116-132, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38556922

RESUMO

AIMS: Deep learning holds immense potential for histopathology, automating tasks that are simple for expert pathologists and revealing novel biology for tasks that were previously considered difficult or impossible to solve by eye alone. However, the extent to which the visual strategies learned by deep learning models in histopathological analysis are trustworthy or not has yet to be systematically analysed. Here, we systematically evaluate deep neural networks (DNNs) trained for histopathological analysis in order to understand if their learned strategies are trustworthy or deceptive. METHODS AND RESULTS: We trained a variety of DNNs on a novel data set of 221 whole-slide images (WSIs) from lung adenocarcinoma patients, and evaluated their effectiveness at (1) molecular profiling of KRAS versus EGFR mutations, (2) determining the primary tissue of a tumour and (3) tumour detection. While DNNs achieved above-chance performance on molecular profiling, they did so by exploiting correlations between histological subtypes and mutations, and failed to generalise to a challenging test set obtained through laser capture microdissection (LCM). In contrast, DNNs learned robust and trustworthy strategies for determining the primary tissue of a tumour as well as detecting and localising tumours in tissue. CONCLUSIONS: Our work demonstrates that DNNs hold immense promise for aiding pathologists in analysing tissue. However, they are also capable of achieving seemingly strong performance by learning deceptive strategies that leverage spurious correlations, and are ultimately unsuitable for research or clinical work. The framework we propose for model evaluation and interpretation is an important step towards developing reliable automated systems for histopathological analysis.


Assuntos
Adenocarcinoma de Pulmão , Aprendizado Profundo , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/genética , Adenocarcinoma de Pulmão/patologia , Adenocarcinoma de Pulmão/genética , Redes Neurais de Computação , Mutação
4.
Cell ; 187(5): 1059-1075, 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38428388

RESUMO

Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.


Assuntos
Genética Humana , Humanos , Variação Genética , Herança Multifatorial , Fenótipo
5.
PLoS Genet ; 19(8): e1010399, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37578977

RESUMO

Evidence of interbreeding between archaic hominins and humans comes from methods that infer the locations of segments of archaic haplotypes, or 'archaic coverage' using the genomes of people living today. As more estimates of archaic coverage have emerged, it has become clear that most of this coverage is found on the autosomes- very little is retained on chromosome X. Here, we summarize published estimates of archaic coverage on autosomes and chromosome X from extant human samples. We find on average 7 times more archaic coverage on autosomes than chromosome X, and identify broad continental patterns in this ratio: greatest in European samples, and least in South Asian samples. We also perform extensive simulation studies to investigate how the amount of archaic coverage, lengths of coverage, and rates of purging of archaic coverage are affected by sex-bias caused by an unequal sex ratio within the archaic introgressors. Our results generally confirm that, with increasing male sex-bias, less archaic coverage is retained on chromosome X. Ours is the first study to explicitly model such sex-bias and its potential role in creating the dearth of archaic coverage on chromosome X.


Assuntos
Introgressão Genética , Genoma Humano , Hominidae , Cromossomo X , Animais , Humanos , Masculino , Povo Asiático/genética , Genoma , Genoma Humano/genética , Hominidae/genética , Homem de Neandertal/genética , Cromossomo X/genética , Fatores Sexuais , Haplótipos/genética , Introgressão Genética/genética , Cromossomos Humanos/genética , Feminino , População do Sul da Ásia/genética , População Europeia/genética
6.
PLoS Comput Biol ; 19(5): e1011175, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37235578

RESUMO

Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.


Assuntos
Genoma , Aprendizado de Máquina , Reprodutibilidade dos Testes
8.
Am J Hum Genet ; 109(9): 1667-1679, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055213

RESUMO

African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.


Assuntos
Variação Genética , Genética Populacional , África Austral , População Negra/genética , Estruturas Genéticas , Variação Genética/genética , Humanos
9.
iScience ; 25(7): 104553, 2022 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-35769876

RESUMO

In this paper, we propose a new approach for variable selection using a collection of Bayesian neural networks with a focus on quantifying uncertainty over which variables are selected. Motivated by fine-mapping applications in statistical genetics, we refer to our framework as an "ensemble of single-effect neural networks" (ESNN) which generalizes the "sum of single effects" regression framework by both accounting for nonlinear structure in genotypic data (e.g., dominance effects) and having the capability to model discrete phenotypes (e.g., case-control studies). Through extensive simulations, we demonstrate our method's ability to produce calibrated posterior summaries such as credible sets and posterior inclusion probabilities, particularly for traits with genetic architectures that have significant proportions of non-additive variation driven by correlated variants. Lastly, we use real data to demonstrate that the ESNN framework improves upon the state of the art for identifying true effect variables underlying various complex traits.

10.
Philos Trans R Soc Lond B Biol Sci ; 377(1852): 20200410, 2022 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-35430881

RESUMO

Over the past 50 years, geneticists have made great strides in understanding how our species' evolutionary history gave rise to current patterns of human genetic diversity classically summarized by Lewontin in his 1972 paper, 'The Apportionment of Human Diversity'. One evolutionary process that requires special attention in both population genetics and statistical genetics is admixture: gene flow between two or more previously separated source populations to form a new admixed population. The admixture process introduces ancestry-based structure into patterns of genetic variation within and between populations, which in turn influences the inference of demographic histories, identification of genetic targets of selection and prediction of complex traits. In this review, we outline some challenges for admixture population genetics, including limitations of applying methods designed for populations without recent admixture to the study of admixed populations. We highlight recent studies and methodological advances that aim to overcome such challenges, leveraging genomic signatures of admixture that occurred in the past tens of generations to gain insights into human history, natural selection and complex trait architecture. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.


Assuntos
Genética Populacional , Metagenômica , Fluxo Gênico , Variação Genética , Genética Humana , Humanos , Seleção Genética
12.
Am J Hum Genet ; 109(5): 871-884, 2022 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-35349783

RESUMO

Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Grupos Raciais
13.
PLoS Genet ; 17(8): e1009754, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34411094

RESUMO

In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Anotação de Sequência Molecular/métodos , Animais , Genoma/genética , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos , Herança Multifatorial/genética , Redes Neurais de Computação , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Software
14.
PLoS Genet ; 17(3): e1008887, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33735180

RESUMO

The winged insects of the order Diptera are colloquially named for their most recognizable phenotype: flight. These insects rely on flight for a number of important life history traits, such as dispersal, foraging, and courtship. Despite the importance of flight, relatively little is known about the genetic architecture of flight performance. Accordingly, we sought to uncover the genetic modifiers of flight using a measure of flies' reaction and response to an abrupt drop in a vertical flight column. We conducted a genome wide association study (GWAS) using 197 of the Drosophila Genetic Reference Panel (DGRP) lines, and identified a combination of additive and marginal variants, epistatic interactions, whole genes, and enrichment across interaction networks. Egfr, a highly pleiotropic developmental gene, was among the most significant additive variants identified. We functionally validated 13 of the additive candidate genes' (Adgf-A/Adgf-A2/CG32181, bru1, CadN, flapper (CG11073), CG15236, flippy (CG9766), CREG, Dscam4, form3, fry, Lasp/CG9692, Pde6, Snoo), and introduce a novel approach to whole gene significance screens: PEGASUS_flies. Additionally, we identified ppk23, an Acid Sensing Ion Channel (ASIC) homolog, as an important hub for epistatic interactions. We propose a model that suggests genetic modifiers of wing and muscle morphology, nervous system development and function, BMP signaling, sexually dimorphic neural wiring, and gene regulation are all important for the observed differences flight performance in a natural population. Additionally, these results represent a snapshot of the genetic modifiers affecting drop-response flight performance in Drosophila, with implications for other insects.


Assuntos
Drosophila melanogaster/genética , Drosophila/genética , Regulação da Expressão Gênica no Desenvolvimento , Variação Genética , Neurogênese/genética , Animais , Drosophila/embriologia , Drosophila melanogaster/metabolismo , Epigênese Genética , Feminino , Voo Animal , Estudos de Associação Genética , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único
15.
PLoS Genet ; 16(6): e1008855, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32542026

RESUMO

Traditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Genéticos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética , Interpretação Estatística de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Humanos , Reino Unido , População Branca/genética
16.
Genetics ; 215(2): 511-529, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32245788

RESUMO

Emerging large-scale biobanks pairing genotype data with phenotype data present new opportunities to prioritize shared genetic associations across multiple phenotypes for molecular validation. Past research, by our group and others, has shown gene-level tests of association produce biologically interpretable characterization of the genetic architecture of a given phenotype. Here, we present a new method, Ward clustering to identify Internal Node branch length outliers using Gene Scores (WINGS), for identifying shared genetic architecture among multiple phenotypes. The objective of WINGS is to identify groups of phenotypes, or "clusters," sharing a core set of genes enriched for mutations in cases. We validate WINGS using extensive simulation studies and then combine gene-level association tests with WINGS to identify shared genetic architecture among 81 case-control and seven quantitative phenotypes in 349,468 European-ancestry individuals from the UK Biobank. We identify eight prioritized phenotype clusters and recover multiple published gene-level associations within prioritized clusters.


Assuntos
Estudo de Associação Genômica Ampla , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único , População Branca/genética , Estudos de Casos e Controles , Análise por Conglomerados , Simulação por Computador , Humanos
17.
Cell Rep ; 30(9): 2900-2908.e4, 2020 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-32130895

RESUMO

The immune composition of the tumor microenvironment influences response and resistance to immunotherapies. While numerous studies have identified somatic correlates of immune infiltration, germline features that associate with immune infiltrates in cancers remain incompletely characterized. We analyze seven million autosomal germline variants in the TCGA cohort and test for association with established immune-related phenotypes that describe the tumor immune microenvironment. We identify one SNP associated with the amount of infiltrating follicular helper T cells; 23 candidate genes, some of which are involved in cytokine-mediated signaling and others containing cancer-risk SNPs; and networks with genes that are part of the DNA repair and transcription elongation pathways. In addition, we find a positive association between polygenic risk for rheumatoid arthritis and amount of infiltrating CD8+ T cells. Overall, we identify multiple germline genetic features associated with tumor-immune phenotypes and develop a framework for probing inherited features that contribute to differences in immune infiltration.


Assuntos
Células Germinativas/metabolismo , Linfócitos do Interstício Tumoral/imunologia , Neoplasias/genética , Neoplasias/imunologia , Doenças Autoimunes/imunologia , Reparo do DNA/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Leucócitos/metabolismo , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Linfócitos T Auxiliares-Indutores/imunologia , Transcrição Gênica
19.
Am J Hum Genet ; 105(5): 921-932, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31607426

RESUMO

Meiotic nondisjunction and resulting aneuploidy can lead to severe health consequences in humans. Aneuploidy rescue can restore euploidy but may result in uniparental disomy (UPD), the inheritance of both homologs of a chromosome from one parent with no representative copy from the other. Current understanding of UPD is limited to ∼3,300 case subjects for which UPD was associated with clinical presentation due to imprinting disorders or recessive diseases. Thus, the prevalence of UPD and its phenotypic consequences in the general population are unknown. We searched for instances of UPD across 4,400,363 consented research participants from the personal genetics company 23andMe, Inc., and 431,094 UK Biobank participants. Using computationally detected DNA segments identical-by-descent (IBD) and runs of homozygosity (ROH), we identified 675 instances of UPD across both databases. We estimate that UPD is twice as common as previously thought, and we present a machine-learning framework to detect UPD using ROH. While we find a nominally significant association between UPD of chromosome 22 and autism risk, we do not find significant associations between UPD and deleterious traits in the 23andMe database.


Assuntos
Dissomia Uniparental/genética , Aneuploidia , Feminino , Impressão Genômica/genética , Homozigoto , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Prevalência
20.
PLoS Genet ; 15(9): e1008293, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31539367

RESUMO

Sex-biased demographic events ("sex-bias") involve unequal numbers of females and males. These events are typically inferred from the relative amount of X-chromosomal to autosomal genetic variation and have led to conflicting conclusions about human demographic history. Though population size changes alter the relative amount of X-chromosomal to autosomal genetic diversity even in the absence of sex-bias, this has generally not been accounted for in sex-bias estimators to date. Here, we present a novel method to identify sex-bias from genetic sequence data that models population size changes and estimates the female fraction of the effective population size during each time epoch. Compared to recent sex-bias inference methods, our approach can detect sex-bias that changes on a single population branch without requiring data from an outgroup or knowledge of divergence events. When applied to simulated data, conventional sex-bias estimators are biased by population size changes, especially recent growth or bottlenecks, while our estimator is unbiased. We next apply our method to high-coverage exome data from the 1000 Genomes Project and estimate a male bias in Yorubans (47% female) and Europeans (44%), possibly due to stronger background selection on the X chromosome than on the autosomes. Finally, we apply our method to the 1000 Genomes Project Phase 3 high-coverage Complete Genomics whole-genome data and estimate a female bias in Yorubans (63% female), Europeans (84%), Punjabis (82%), as well as Peruvians (56%), and a male bias in the Southern Han Chinese (45%). Our method additionally identifies a male-biased migration out of Africa based on data from Europeans (20% female). Our results demonstrate that modeling population size change is necessary to estimate sex-bias parameters accurately. Our approach gives insight into signatures of sex-bias in sexual species, and the demographic models it produces can serve as more accurate null models for tests of selection.


Assuntos
Demografia/métodos , Genética Populacional/métodos , Análise de Sequência de DNA/métodos , Viés , Cromossomos Humanos X/genética , Feminino , Variação Genética/genética , Genoma/genética , Humanos , Masculino , Modelos Genéticos , Densidade Demográfica , Seleção Genética/genética , Sequenciamento Completo do Genoma/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...