Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 187(5): 1059-1075, 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38428388

ABSTRACT

Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.


Subject(s)
Human Genetics , Humans , Genetic Variation , Multifactorial Inheritance , Phenotype
2.
Cell ; 174(6): 1424-1435.e15, 2018 09 06.
Article in English | MEDLINE | ID: mdl-30078708

ABSTRACT

FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.


Subject(s)
Forkhead Transcription Factors/genetics , Brain/cytology , Brain/metabolism , Cell Line , Databases, Genetic , Exons , Female , Genome, Human , Haplotypes , Humans , Introns , Male , Markov Chains , Polymorphism, Single Nucleotide , Prefrontal Cortex/metabolism
3.
PLoS Genet ; 19(8): e1010399, 2023 08.
Article in English | MEDLINE | ID: mdl-37578977

ABSTRACT

Evidence of interbreeding between archaic hominins and humans comes from methods that infer the locations of segments of archaic haplotypes, or 'archaic coverage' using the genomes of people living today. As more estimates of archaic coverage have emerged, it has become clear that most of this coverage is found on the autosomes- very little is retained on chromosome X. Here, we summarize published estimates of archaic coverage on autosomes and chromosome X from extant human samples. We find on average 7 times more archaic coverage on autosomes than chromosome X, and identify broad continental patterns in this ratio: greatest in European samples, and least in South Asian samples. We also perform extensive simulation studies to investigate how the amount of archaic coverage, lengths of coverage, and rates of purging of archaic coverage are affected by sex-bias caused by an unequal sex ratio within the archaic introgressors. Our results generally confirm that, with increasing male sex-bias, less archaic coverage is retained on chromosome X. Ours is the first study to explicitly model such sex-bias and its potential role in creating the dearth of archaic coverage on chromosome X.


Subject(s)
Genetic Introgression , Genome, Human , Hominidae , X Chromosome , Animals , Humans , Male , Asian People/genetics , Genome , Genome, Human/genetics , Hominidae/genetics , Neanderthals/genetics , X Chromosome/genetics , Sex Factors , Haplotypes/genetics , Genetic Introgression/genetics , Chromosomes, Human/genetics , Female , South Asian People/genetics , European People/genetics
4.
Am J Hum Genet ; 109(5): 871-884, 2022 05 05.
Article in English | MEDLINE | ID: mdl-35349783

ABSTRACT

Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Multifactorial Inheritance , Phenotype , Polymorphism, Single Nucleotide/genetics , Racial Groups
5.
Am J Hum Genet ; 109(9): 1667-1679, 2022 09 01.
Article in English | MEDLINE | ID: mdl-36055213

ABSTRACT

African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.


Subject(s)
Genetic Variation , Genetics, Population , Africa, Southern , Black People/genetics , Genetic Structures , Genetic Variation/genetics , Humans
6.
Histopathology ; 85(1): 116-132, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38556922

ABSTRACT

AIMS: Deep learning holds immense potential for histopathology, automating tasks that are simple for expert pathologists and revealing novel biology for tasks that were previously considered difficult or impossible to solve by eye alone. However, the extent to which the visual strategies learned by deep learning models in histopathological analysis are trustworthy or not has yet to be systematically analysed. Here, we systematically evaluate deep neural networks (DNNs) trained for histopathological analysis in order to understand if their learned strategies are trustworthy or deceptive. METHODS AND RESULTS: We trained a variety of DNNs on a novel data set of 221 whole-slide images (WSIs) from lung adenocarcinoma patients, and evaluated their effectiveness at (1) molecular profiling of KRAS versus EGFR mutations, (2) determining the primary tissue of a tumour and (3) tumour detection. While DNNs achieved above-chance performance on molecular profiling, they did so by exploiting correlations between histological subtypes and mutations, and failed to generalise to a challenging test set obtained through laser capture microdissection (LCM). In contrast, DNNs learned robust and trustworthy strategies for determining the primary tissue of a tumour as well as detecting and localising tumours in tissue. CONCLUSIONS: Our work demonstrates that DNNs hold immense promise for aiding pathologists in analysing tissue. However, they are also capable of achieving seemingly strong performance by learning deceptive strategies that leverage spurious correlations, and are ultimately unsuitable for research or clinical work. The framework we propose for model evaluation and interpretation is an important step towards developing reliable automated systems for histopathological analysis.


Subject(s)
Adenocarcinoma of Lung , Deep Learning , Lung Neoplasms , Humans , Lung Neoplasms/pathology , Lung Neoplasms/genetics , Adenocarcinoma of Lung/pathology , Adenocarcinoma of Lung/genetics , Neural Networks, Computer , Mutation
7.
PLoS Comput Biol ; 19(5): e1011175, 2023 May.
Article in English | MEDLINE | ID: mdl-37235578

ABSTRACT

Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.


Subject(s)
Genome , Machine Learning , Reproducibility of Results
8.
PLoS Genet ; 17(8): e1009754, 2021 08.
Article in English | MEDLINE | ID: mdl-34411094

ABSTRACT

In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.


Subject(s)
Genome-Wide Association Study/methods , Molecular Sequence Annotation/methods , Animals , Genome/genetics , Genomics/methods , Genotype , Humans , Models, Genetic , Multifactorial Inheritance/genetics , Neural Networks, Computer , Phenotype , Polymorphism, Single Nucleotide/genetics , Software
9.
PLoS Genet ; 17(3): e1008887, 2021 03.
Article in English | MEDLINE | ID: mdl-33735180

ABSTRACT

The winged insects of the order Diptera are colloquially named for their most recognizable phenotype: flight. These insects rely on flight for a number of important life history traits, such as dispersal, foraging, and courtship. Despite the importance of flight, relatively little is known about the genetic architecture of flight performance. Accordingly, we sought to uncover the genetic modifiers of flight using a measure of flies' reaction and response to an abrupt drop in a vertical flight column. We conducted a genome wide association study (GWAS) using 197 of the Drosophila Genetic Reference Panel (DGRP) lines, and identified a combination of additive and marginal variants, epistatic interactions, whole genes, and enrichment across interaction networks. Egfr, a highly pleiotropic developmental gene, was among the most significant additive variants identified. We functionally validated 13 of the additive candidate genes' (Adgf-A/Adgf-A2/CG32181, bru1, CadN, flapper (CG11073), CG15236, flippy (CG9766), CREG, Dscam4, form3, fry, Lasp/CG9692, Pde6, Snoo), and introduce a novel approach to whole gene significance screens: PEGASUS_flies. Additionally, we identified ppk23, an Acid Sensing Ion Channel (ASIC) homolog, as an important hub for epistatic interactions. We propose a model that suggests genetic modifiers of wing and muscle morphology, nervous system development and function, BMP signaling, sexually dimorphic neural wiring, and gene regulation are all important for the observed differences flight performance in a natural population. Additionally, these results represent a snapshot of the genetic modifiers affecting drop-response flight performance in Drosophila, with implications for other insects.


Subject(s)
Drosophila melanogaster/genetics , Drosophila/genetics , Gene Expression Regulation, Developmental , Genetic Variation , Neurogenesis/genetics , Animals , Drosophila/embryology , Drosophila melanogaster/metabolism , Epigenesis, Genetic , Female , Flight, Animal , Genetic Association Studies , Male , Phenotype , Polymorphism, Single Nucleotide
11.
PLoS Genet ; 16(6): e1008855, 2020 06.
Article in English | MEDLINE | ID: mdl-32542026

ABSTRACT

Traditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.


Subject(s)
Genome-Wide Association Study/statistics & numerical data , Models, Genetic , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Data Interpretation, Statistical , Databases, Genetic/statistics & numerical data , Humans , United Kingdom , White People/genetics
12.
Am J Hum Genet ; 105(5): 921-932, 2019 11 07.
Article in English | MEDLINE | ID: mdl-31607426

ABSTRACT

Meiotic nondisjunction and resulting aneuploidy can lead to severe health consequences in humans. Aneuploidy rescue can restore euploidy but may result in uniparental disomy (UPD), the inheritance of both homologs of a chromosome from one parent with no representative copy from the other. Current understanding of UPD is limited to ∼3,300 case subjects for which UPD was associated with clinical presentation due to imprinting disorders or recessive diseases. Thus, the prevalence of UPD and its phenotypic consequences in the general population are unknown. We searched for instances of UPD across 4,400,363 consented research participants from the personal genetics company 23andMe, Inc., and 431,094 UK Biobank participants. Using computationally detected DNA segments identical-by-descent (IBD) and runs of homozygosity (ROH), we identified 675 instances of UPD across both databases. We estimate that UPD is twice as common as previously thought, and we present a machine-learning framework to detect UPD using ROH. While we find a nominally significant association between UPD of chromosome 22 and autism risk, we do not find significant associations between UPD and deleterious traits in the 23andMe database.


Subject(s)
Uniparental Disomy/genetics , Aneuploidy , Female , Genomic Imprinting/genetics , Homozygote , Humans , Male , Phenotype , Polymorphism, Single Nucleotide/genetics , Prevalence
13.
PLoS Genet ; 15(9): e1008293, 2019 09.
Article in English | MEDLINE | ID: mdl-31539367

ABSTRACT

Sex-biased demographic events ("sex-bias") involve unequal numbers of females and males. These events are typically inferred from the relative amount of X-chromosomal to autosomal genetic variation and have led to conflicting conclusions about human demographic history. Though population size changes alter the relative amount of X-chromosomal to autosomal genetic diversity even in the absence of sex-bias, this has generally not been accounted for in sex-bias estimators to date. Here, we present a novel method to identify sex-bias from genetic sequence data that models population size changes and estimates the female fraction of the effective population size during each time epoch. Compared to recent sex-bias inference methods, our approach can detect sex-bias that changes on a single population branch without requiring data from an outgroup or knowledge of divergence events. When applied to simulated data, conventional sex-bias estimators are biased by population size changes, especially recent growth or bottlenecks, while our estimator is unbiased. We next apply our method to high-coverage exome data from the 1000 Genomes Project and estimate a male bias in Yorubans (47% female) and Europeans (44%), possibly due to stronger background selection on the X chromosome than on the autosomes. Finally, we apply our method to the 1000 Genomes Project Phase 3 high-coverage Complete Genomics whole-genome data and estimate a female bias in Yorubans (63% female), Europeans (84%), Punjabis (82%), as well as Peruvians (56%), and a male bias in the Southern Han Chinese (45%). Our method additionally identifies a male-biased migration out of Africa based on data from Europeans (20% female). Our results demonstrate that modeling population size change is necessary to estimate sex-bias parameters accurately. Our approach gives insight into signatures of sex-bias in sexual species, and the demographic models it produces can serve as more accurate null models for tests of selection.


Subject(s)
Demography/methods , Genetics, Population/methods , Sequence Analysis, DNA/methods , Bias , Chromosomes, Human, X/genetics , Female , Genetic Variation/genetics , Genome/genetics , Humans , Male , Models, Genetic , Population Density , Selection, Genetic/genetics , Whole Genome Sequencing/methods
14.
Proc Natl Acad Sci U S A ; 112(5): 1265-72, 2015 Feb 03.
Article in English | MEDLINE | ID: mdl-25605893

ABSTRACT

Worldwide patterns of genetic variation are driven by human demographic history. Here, we test whether this demographic history has left similar signatures on phonemes-sound units that distinguish meaning between words in languages-to those it has left on genes. We analyze, jointly and in parallel, phoneme inventories from 2,082 worldwide languages and microsatellite polymorphisms from 246 worldwide populations. On a global scale, both genetic distance and phonemic distance between populations are significantly correlated with geographic distance. Geographically close language pairs share significantly more phonemes than distant language pairs, whether or not the languages are closely related. The regional geographic axes of greatest phonemic differentiation correspond to axes of genetic differentiation, suggesting that there is a relationship between human dispersal and linguistic variation. However, the geographic distribution of phoneme inventory sizes does not follow the predictions of a serial founder effect during human expansion out of Africa. Furthermore, although geographically isolated populations lose genetic diversity via genetic drift, phonemes are not subject to drift in the same way: within a given geographic radius, languages that are relatively isolated exhibit more variance in number of phonemes than languages with many neighbors. This finding suggests that relatively isolated languages are more susceptible to phonemic change than languages with many neighbors. Within a language family, phoneme evolution along genetic, geographic, or cognate-based linguistic trees predicts similar ancestral phoneme states to those predicted from ancient sources. More genetic sampling could further elucidate the relative roles of vertical and horizontal transmission in phoneme evolution.


Subject(s)
Genetic Variation , Linguistics , Founder Effect , Humans , Phylogeography , Principal Component Analysis
15.
Bioinformatics ; 32(18): 2817-23, 2016 09 15.
Article in English | MEDLINE | ID: mdl-27283948

ABSTRACT

MOTIVATION: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. RESULTS: We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. AVAILABILITY AND IMPLEMENTATION: pong is freely available and can be installed using the Python package management system pip. pong's source code is available at https://github.com/abehr/pong CONTACT: aaron_behr@alumni.brown.edu or sramachandran@brown.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Data Mining , Population Groups , Software , Algorithms , Cluster Analysis , Computer Graphics , Genetics, Population , Humans , Programming Languages , Protein Conformation
16.
Annu Rev Genomics Hum Genet ; 12: 245-74, 2011.
Article in English | MEDLINE | ID: mdl-21801023

ABSTRACT

Human groups show structured levels of genetic similarity as a consequence of factors such as geographical subdivision and genetic drift. Surveying this structure gives us a scientific perspective on human origins, sheds light on evolutionary processes that shape both human adaptation and disease, and is integral to effectively carrying out the mission of global medical genetics and personalized medicine. Surveys of population structure have been ongoing for decades, but in the past three years, single-nucleotide-polymorphism (SNP) array technology has provided unprecedented detail on human population structure at global and regional scales. These studies have confirmed well-known relationships between distantly related populations and uncovered previously unresolvable relationships among closely related human groups. SNPs represent the first dense genome-wide markers, and as such, their analysis has raised many challenges and insights relevant to the study of population genetics with whole-genome sequences. Here we draw on the lessons from these studies to anticipate the directions that will be most fruitful to pursue during the emerging whole-genome sequencing era.


Subject(s)
Biological Evolution , Genetics, Population , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide
18.
Adv Exp Med Biol ; 825: 227-66, 2014.
Article in English | MEDLINE | ID: mdl-25201108

ABSTRACT

At its most fundamental level the goal of genetics is to connect genotype to phenotype. This question is asked at a basic level evaluating the role of genes and pathways in genetic model organism. Increasingly, this question is being asked in the clinic. Genomes of individuals and populations are being sequenced and compared. The challenge often comes at the stage of analysis. The variant positions are analyzed with the hope of understanding human disease. However after a genome or exome has been sequenced, the researcher is often deluged with hundreds of potentially relevant variations. Traditionally, amino-acid changing mutations were considered the tractable class of disease-causing mutations; however, mutations that disrupt noncoding elements are the subject of growing interest. These noncoding changes are a major avenue of disease (e.g., one in three hereditary disease alleles are predicted to affect splicing). Here, we review some current practices of medical genetics, the basic theory behind biochemical binding and functional assays, and then explore technical advances in how variations that alter RNA protein recognition events are detected and studied. These advances are advances in scale-high-throughput implementations of traditional biochemical assays that are feasible to perform in any molecular biology laboratory. This chapter utilizes a case study approach to illustrate some methods for analyzing polymorphisms. The first characterizes a functional intronic SNP that deletes a high affinity PTB site using traditional low-throughput biochemical and functional assays. From here we demonstrate the utility of high-throughput splicing and spliceosome assembly assays for screening large sets of SNPs and disease alleles for allelic differences in gene expression. Finally we perform three pilot drug screens with small molecules (G418, tetracycline, and valproic acid) that illustrate how compounds that rescue specific instances of differential pre-mRNA processing can be discovered.


Subject(s)
Alleles , Genetic Diseases, Inborn , Mutation, Missense , Polymorphism, Single Nucleotide , RNA-Binding Proteins , Amino Acid Substitution , Animals , DNA Mutational Analysis/methods , Genetic Diseases, Inborn/genetics , Genetic Diseases, Inborn/metabolism , Humans , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism
19.
Proc Natl Acad Sci U S A ; 108(13): 5154-62, 2011 Mar 29.
Article in English | MEDLINE | ID: mdl-21383195

ABSTRACT

Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the ≠Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by F(ST), in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.


Subject(s)
Biological Evolution , Black People/genetics , Genetic Variation , Genetics, Population , Polymorphism, Single Nucleotide , Africa , Culture , Ethnicity/genetics , Genome, Human , Humans , Linkage Disequilibrium
20.
bioRxiv ; 2024 May 10.
Article in English | MEDLINE | ID: mdl-38766004

ABSTRACT

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe research participants have zero non-single SNP switches, compared to 42.4% of white British trios. South Asian ancestry 23andMe research participants have the highest median SER amongst the 23andMe populations, but it is still remarkably low at 0.46%. We also investigate the relationship between identity-by-descent (IBD) and SER, finding that switch errors tend to occur in regions of little or no IBD segment coverage. SHAPEIT and Beagle excel at 'intra-chromosomal' phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method, called HAPTIC ( HAP lotype TI ling and C lustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses identity-by-descent (IBD) segments to phase blocks of variants on different chromosomes. HAPTIC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs bipartite clustering on the signed graph using spectral clustering. We test HAPTIC on 1022 UKB trios, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites). We also ran HAPTIC in the 23andMe database and found a median phase error rate (the rate of mismatching alleles between the inferred and true phase) of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC's precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.

SELECTION OF CITATIONS
SEARCH DETAIL