Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 97
Filter
1.
Nat Med ; 30(5): 1384-1394, 2024 May.
Article in English | MEDLINE | ID: mdl-38740997

ABSTRACT

How human genetic variation contributes to vaccine effectiveness in infants is unclear, and data are limited on these relationships in populations with African ancestries. We undertook genetic analyses of vaccine antibody responses in infants from Uganda (n = 1391), Burkina Faso (n = 353) and South Africa (n = 755), identifying associations between human leukocyte antigen (HLA) and antibody response for five of eight tested antigens spanning pertussis, diphtheria and hepatitis B vaccines. In addition, through HLA typing 1,702 individuals from 11 populations of African ancestry derived predominantly from the 1000 Genomes Project, we constructed an imputation resource, fine-mapping class II HLA-DR and DQ associations explaining up to 10% of antibody response variance in our infant cohorts. We observed differences in the genetic architecture of pertussis antibody response between the cohorts with African ancestries and an independent cohort with European ancestry, but found no in silico evidence of differences in HLA peptide binding affinity or breadth. Using immune cell expression quantitative trait loci datasets derived from African-ancestry samples from the 1000 Genomes Project, we found evidence of differential HLA-DRB1 expression correlating with inferred protection from pertussis following vaccination. This work suggests that HLA-DRB1 expression may play a role in vaccine response and should be considered alongside peptide selection to improve vaccine design.


Subject(s)
HLA-DRB1 Chains , Humans , HLA-DRB1 Chains/genetics , HLA-DRB1 Chains/immunology , Infant , Black People/genetics , Hepatitis B Vaccines/immunology , Quantitative Trait Loci , Male , Female , Uganda , Antibody Formation/genetics , Antibody Formation/immunology , Pertussis Vaccine/immunology , Pertussis Vaccine/genetics , Vaccination , Whooping Cough/prevention & control , Whooping Cough/immunology , Whooping Cough/genetics
2.
Am J Hum Genet ; 111(2): 295-308, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38232728

ABSTRACT

Infectious agents contribute significantly to the global burden of diseases through both acute infection and their chronic sequelae. We leveraged the UK Biobank to identify genetic loci that influence humoral immune response to multiple infections. From 45 genome-wide association studies in 9,611 participants from UK Biobank, we identified NFKB1 as a locus associated with quantitative antibody responses to multiple pathogens, including those from the herpes, retro-, and polyoma-virus families. An insertion-deletion variant thought to affect NFKB1 expression (rs28362491), was mapped as the likely causal variant and could play a key role in regulation of the immune response. Using 121 infection- and inflammation-related traits in 487,297 UK Biobank participants, we show that the deletion allele was associated with an increased risk of infection from diverse pathogens but had a protective effect against allergic disease. We propose that altered expression of NFKB1, as a result of the deletion, modulates hematopoietic pathways and likely impacts cell survival, antibody production, and inflammation. Taken together, we show that disruptions to the tightly regulated immune processes may tip the balance between exacerbated immune responses and allergy, or increased risk of infection and impaired resolution of inflammation.


Subject(s)
Genetic Predisposition to Disease , Hypersensitivity , Inflammation , Humans , Genome-Wide Association Study , Hypersensitivity/genetics , Inflammation/genetics , NF-kappa B p50 Subunit/genetics , UK Biobank
3.
Nat Genet ; 55(11): 1854-1865, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37814053

ABSTRACT

The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnoses and enable personalized medicine, motivating efforts to identify disease subtypes from patient comorbidity information. Here we introduce an age-dependent topic modeling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR datasets. We applied ATM to 282,957 UK Biobank samples, identifying 52 diseases with heterogeneous comorbidity profiles; analyses of 211,908 All of Us samples produced concordant results. We defined subtypes of the 52 heterogeneous diseases based on their comorbidity profiles and compared genetic risk across disease subtypes using polygenic risk scores (PRSs), identifying 18 disease subtypes whose PRS differed significantly from other subtypes of the same disease. We further identified specific genetic variants with subtype-dependent effects on disease risk. In conclusion, ATM identifies disease subtypes with differential genome-wide and locus-specific genetic risk profiles.


Subject(s)
Genetic Predisposition to Disease , Population Health , Humans , Biological Specimen Banks , Genome-Wide Association Study/methods , Risk Factors , Comorbidity , Multifactorial Inheritance/genetics , United Kingdom/epidemiology
4.
Cell Genom ; 3(8): 100371, 2023 Aug 09.
Article in English | MEDLINE | ID: mdl-37601973

ABSTRACT

Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.

5.
Nat Commun ; 14(1): 4023, 2023 07 07.
Article in English | MEDLINE | ID: mdl-37419925

ABSTRACT

Polygenic scores (PGSs) are individual-level measures that aggregate the genome-wide genetic predisposition to a given trait. As PGS have predominantly been developed using European-ancestry samples, trait prediction using such European ancestry-derived PGS is less accurate in non-European ancestry individuals. Although there has been recent progress in combining multiple PGS trained on distinct populations, the problem of how to maximize performance given a multiple-ancestry cohort is largely unexplored. Here, we investigate the effect of sample size and ancestry composition on PGS performance for fifteen traits in UK Biobank. For some traits, PGS estimated using a relatively small African-ancestry training set outperformed, on an African-ancestry test set, PGS estimated using a much larger European-ancestry only training set. We observe similar, but not identical, results when considering other minority-ancestry groups within UK Biobank. Our results emphasise the importance of targeted data collection from underrepresented groups in order to address existing disparities in PGS performance.


Subject(s)
Black People , Genetics, Population , Multifactorial Inheritance , Humans , Black People/genetics , Data Collection , Genetic Predisposition to Disease , Genome-Wide Association Study , Minority Groups
6.
Nat Commun ; 13(1): 4398, 2022 07 29.
Article in English | MEDLINE | ID: mdl-35906236

ABSTRACT

Fetal growth restriction (FGR) affects 5-10% of pregnancies, and can have serious consequences for both mother and child. Prevention and treatment are limited because FGR pathogenesis is poorly understood. Genetic studies implicate KIR and HLA genes in FGR, however, linkage disequilibrium, genetic influence from both parents, and challenges with investigating human pregnancies make the risk alleles and their functional effects difficult to map. Here, we demonstrate that the interaction between the maternal KIR2DL1, expressed on uterine natural killer (NK) cells, and the paternally inherited HLA-C*0501, expressed on fetal trophoblast cells, leads to FGR in a humanized mouse model. We show that the KIR2DL1 and C*0501 interaction leads to pathogenic uterine arterial remodeling and modulation of uterine NK cell function. This initial effect cascades to altered transcriptional expression and intercellular communication at the maternal-fetal interface. These findings provide mechanistic insight into specific FGR risk alleles, and provide avenues of prevention and treatment.


Subject(s)
Fetal Growth Retardation , Trophoblasts , Animals , Cell Communication/genetics , Female , Fetal Growth Retardation/genetics , Fetal Growth Retardation/metabolism , Fetus/metabolism , HLA-C Antigens/genetics , HLA-C Antigens/metabolism , Mice , Pregnancy , Trophoblasts/metabolism
7.
PLoS Biol ; 20(5): e3001669, 2022 05.
Article in English | MEDLINE | ID: mdl-35639797

ABSTRACT

The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.


Subject(s)
Genomics , Metagenomics , Genomics/methods
8.
Nat Commun ; 13(1): 1818, 2022 04 05.
Article in English | MEDLINE | ID: mdl-35383168

ABSTRACT

Certain infectious agents are recognised causes of cancer and other chronic diseases. To understand the pathological mechanisms underlying such relationships, here we design a Multiplex Serology platform to measure quantitative antibody responses against 45 antigens from 20 infectious agents including human herpes, hepatitis, polyoma, papilloma, and retroviruses, as well as Chlamydia trachomatis, Helicobacter pylori and Toxoplasma gondii, then assayed a random subset of 9695 UK Biobank participants. We find seroprevalence estimates consistent with those expected from prior literature and confirm multiple associations of antibody responses with sociodemographic characteristics (e.g., lifetime sexual partners with C. trachomatis), HLA genetic variants (rs6927022 with Epstein-Barr virus (EBV) EBNA1 antibodies) and disease outcomes (human papillomavirus-16 seropositivity with cervical intraepithelial neoplasia, and EBV responses with multiple sclerosis). Our accessible dataset is one of the largest incorporating diverse infectious agents in a prospective UK cohort offering opportunities to improve our understanding of host-pathogen-disease relationships with significant clinical and public health implications.


Subject(s)
Epstein-Barr Virus Infections , Uterine Cervical Neoplasms , Biological Specimen Banks , Female , Herpesvirus 4, Human/genetics , Humans , Prospective Studies , Seroepidemiologic Studies , United Kingdom/epidemiology
9.
Science ; 375(6583): eabi8264, 2022 02 25.
Article in English | MEDLINE | ID: mdl-35201891

ABSTRACT

The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history.


Subject(s)
DNA, Ancient , Genome, Human , Genomics , Pedigree , Africa , Chromosomes, Human, Pair 20/genetics , Computer Simulation , Databases, Nucleic Acid , Datasets as Topic , Evolution, Molecular , Genetic Variation , Genetics, Population , Geography , Haplotypes , Human Migration , Humans , Mutation , Sequence Analysis, DNA , Spatio-Temporal Analysis , Statistics, Nonparametric
10.
Nat Genet ; 53(11): 1543-1552, 2021 11.
Article in English | MEDLINE | ID: mdl-34741163

ABSTRACT

Irritable bowel syndrome (IBS) results from disordered brain-gut interactions. Identifying susceptibility genes could highlight the underlying pathophysiological mechanisms. We designed a digestive health questionnaire for UK Biobank and combined identified cases with IBS with independent cohorts. We conducted a genome-wide association study with 53,400 cases and 433,201 controls and replicated significant associations in a 23andMe panel (205,252 cases and 1,384,055 controls). Our study identified and confirmed six genetic susceptibility loci for IBS. Implicated genes included NCAM1, CADM2, PHF2/FAM120A, DOCK9, CKAP2/TPTE2P3 and BAG6. The first four are associated with mood and anxiety disorders, expressed in the nervous system, or both. Mirroring this, we also found strong genome-wide correlation between the risk of IBS and anxiety, neuroticism and depression (rg > 0.5). Additional analyses suggested this arises due to shared pathogenic pathways rather than, for example, anxiety causing abdominal symptoms. Implicated mechanisms require further exploration to help understand the altered brain-gut interactions underlying IBS.


Subject(s)
Anxiety Disorders/genetics , Irritable Bowel Syndrome/genetics , Mood Disorders/genetics , Aged , CD56 Antigen/genetics , Cell Adhesion Molecules/genetics , Cytoskeletal Proteins/genetics , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Guanine Nucleotide Exchange Factors/genetics , Homeodomain Proteins/genetics , Humans , Irritable Bowel Syndrome/epidemiology , Male , Middle Aged , Molecular Chaperones/genetics , Polymorphism, Single Nucleotide , United Kingdom/epidemiology
11.
PLoS Genet ; 17(8): e1009723, 2021 08.
Article in English | MEDLINE | ID: mdl-34437535

ABSTRACT

Inherited genetic variation contributes to individual risk for many complex diseases and is increasingly being used for predictive patient stratification. Previous work has shown that genetic factors are not equally relevant to human traits across age and other contexts, though the reasons for such variation are not clear. Here, we introduce methods to infer the form of the longitudinal relationship between genetic relative risk for disease and age and to test whether all genetic risk factors behave similarly. We use a proportional hazards model within an interval-based censoring methodology to estimate age-varying individual variant contributions to genetic relative risk for 24 common diseases within the British ancestry subset of UK Biobank, applying a Bayesian clustering approach to group variants by their relative risk profile over age and permutation tests for age dependency and multiplicity of profiles. We find evidence for age-varying relative risk profiles in nine diseases, including hypertension, skin cancer, atherosclerotic heart disease, hypothyroidism and calculus of gallbladder, several of which show evidence, albeit weak, for multiple distinct profiles of genetic relative risk. The predominant pattern shows genetic risk factors having the greatest relative impact on risk of early disease, with a monotonic decrease over time, at least for the majority of variants, although the magnitude and form of the decrease varies among diseases. As a consequence, for diseases where genetic relative risk decreases over age, genetic risk factors have stronger explanatory power among younger populations, compared to older ones. We show that these patterns cannot be explained by a simple model involving the presence of unobserved covariates such as environmental factors. We discuss possible models that can explain our observations and the implications for genetic risk prediction.


Subject(s)
Age Factors , Disease/genetics , Bayes Theorem , Humans , Models, Statistical , Proportional Hazards Models , Risk Factors
12.
PLoS Comput Biol ; 17(8): e1009287, 2021 08.
Article in English | MEDLINE | ID: mdl-34411093

ABSTRACT

There is an abundance of malaria genetic data being collected from the field, yet using these data to understand the drivers of regional epidemiology remains a challenge. A key issue is the lack of models that relate parasite genetic diversity to epidemiological parameters. Classical models in population genetics characterize changes in genetic diversity in relation to demographic parameters, but fail to account for the unique features of the malaria life cycle. In contrast, epidemiological models, such as the Ross-Macdonald model, capture malaria transmission dynamics but do not consider genetics. Here, we have developed an integrated model encompassing both parasite evolution and regional epidemiology. We achieve this by combining the Ross-Macdonald model with an intra-host continuous-time Moran model, thus explicitly representing the evolution of individual parasite genomes in a traditional epidemiological framework. Implemented as a stochastic simulation, we use the model to explore relationships between measures of parasite genetic diversity and parasite prevalence, a widely-used metric of transmission intensity. First, we explore how varying parasite prevalence influences genetic diversity at equilibrium. We find that multiple genetic diversity statistics are correlated with prevalence, but the strength of the relationships depends on whether variation in prevalence is driven by host- or vector-related factors. Next, we assess the responsiveness of a variety of statistics to malaria control interventions, finding that those related to mixed infections respond quickly (∼months) whereas other statistics, such as nucleotide diversity, may take decades to respond. These findings provide insights into the opportunities and challenges associated with using genetic data to monitor malaria epidemiology.


Subject(s)
Genetic Variation , Malaria, Falciparum/epidemiology , Plasmodium falciparum/pathogenicity , Animals , Humans , Malaria, Falciparum/parasitology , Malaria, Falciparum/transmission , Models, Theoretical , Plasmodium falciparum/genetics , Prevalence
13.
Am J Cardiol ; 148: 157-164, 2021 06 01.
Article in English | MEDLINE | ID: mdl-33675770

ABSTRACT

The American College of Cardiology / American Heart Association pooled cohort equations tool (ASCVD-PCE) is currently recommended to assess 10-year risk for atherosclerotic cardiovascular disease (ASCVD). ASCVD-PCE does not currently include genetic risk factors. Polygenic risk scores (PRSs) have been shown to offer a powerful new approach to measuring genetic risk for common diseases, including ASCVD, and to enhance risk prediction when combined with ASCVD-PCE. Most work to date, including the assessment of tools, has focused on performance in individuals of European ancestries. Here we present evidence for the clinical validation of a new integrated risk tool (IRT), ASCVD-IRT, which combines ASCVD-PCE with PRS to predict 10-year risk of ASCVD across diverse ethnicity and ancestry groups. We demonstrate improved predictive performance of ASCVD-IRT over ASCVD-PCE, not only in individuals of self-reported White ethnicities (net reclassification improvement [NRI]; with 95% confidence interval = 2.7% [1.1 to 4.2]) but also Black / African American / Black Caribbean / Black African (NRI = 2.5% [0.6-4.3]) and South Asian (Indian, Bangladeshi or Pakistani) ethnicities (NRI = 8.7% [3.1 to 14.4]). NRI confidence intervals were wider and included zero for ethnicities with smaller sample sizes, including Hispanic (NRI = 7.5% [-1.4 to 16.5]), but PRS effect sizes in these ethnicities were significant and of comparable size to those seen in individuals of White ethnicities. Comparable results were obtained when individuals were analyzed by genetically inferred ancestry. Together, these results validate the performance of ASCVD-IRT in multiple ethnicities and ancestries, and favor their generalization to all ethnicities and ancestries.


Subject(s)
Atherosclerosis/epidemiology , Genetic Predisposition to Disease , Heart Disease Risk Factors , Adult , Aged , Asia, Western , Asian People , Atherosclerosis/ethnology , Atherosclerosis/genetics , Black People , Cohort Studies , Female , Humans , Male , Middle Aged , Reproducibility of Results , White People
14.
Genome Res ; 30(8): 1154-1169, 2020 08.
Article in English | MEDLINE | ID: mdl-32817236

ABSTRACT

The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.


Subject(s)
Genome, Protozoan/genetics , High-Throughput Nucleotide Sequencing/methods , Mutation/genetics , Plasmodium falciparum/genetics , Whole Genome Sequencing/methods , Algorithms , Base Sequence , Genetic Variation/genetics , Sequence Alignment , Sequence Analysis, DNA/methods , Software
15.
PLoS Genet ; 16(5): e1008619, 2020 05.
Article in English | MEDLINE | ID: mdl-32369493

ABSTRACT

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.


Subject(s)
Algorithms , Base Sequence/physiology , Genetics, Population/methods , Genome-Wide Association Study/methods , Models, Genetic , Cohort Studies , Computer Simulation , Evolution, Molecular , Genome/genetics , Genome-Wide Association Study/statistics & numerical data , Humans , Linkage Disequilibrium , Recombination, Genetic/physiology , Sample Size
16.
PLoS Biol ; 18(1): e3000586, 2020 01.
Article in English | MEDLINE | ID: mdl-31951611

ABSTRACT

The origin and fate of new mutations within species is the fundamental process underlying evolution. However, while much attention has been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a nonparametric approach for estimating the date of origin of genetic variants in large-scale sequencing data sets. The accuracy and robustness of the approach is demonstrated through simulation. Using data from two publicly available human genomic diversity resources, we estimated the age of more than 45 million single-nucleotide polymorphisms (SNPs) in the human genome and release the Atlas of Variant Age as a public online database. We characterize the relationship between variant age and frequency in different geographical regions and demonstrate the value of age information in interpreting variants of functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the ancestry shared between individual genomes and to quantify genealogical relationships at different points in the past, as well as to describe and explore the evolutionary history of modern human populations.


Subject(s)
Genetic Speciation , Genetics, Population/methods , Polymorphism, Single Nucleotide , Racial Groups/genetics , Age Factors , Alleles , Computer Simulation , Datasets as Topic , Evolution, Molecular , Gene Frequency , Genetic Variation , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Pedigree , Phylogeny , Sequence Analysis, DNA , Statistics as Topic/methods , Time Factors
17.
Nat Genet ; 52(1): 126-134, 2020 01.
Article in English | MEDLINE | ID: mdl-31873298

ABSTRACT

Genetic risk factors frequently affect multiple common human diseases, providing insight into shared pathophysiological pathways and opportunities for therapeutic development. However, systematic identification of genetic profiles of disease risk is limited by the availability of both comprehensive clinical data on population-scale cohorts and the lack of suitable statistical methodology that can handle the scale of and differential power inherent in multi-phenotype data. Here, we develop a disease-agnostic approach to cluster the genetic risk profiles for 3,025 genome-wide independent loci across 19,155 disease classification codes from 320,644 participants in the UK Biobank, representing a large and heterogeneous population. We identify 339 distinct disease association profiles and use multiple approaches to link clusters to the underlying biological pathways. We show how clusters can decompose the variance and covariance in risk for disease, thereby identifying underlying biological processes and their impact. We demonstrate the use of clusters in defining disease relationships and their potential in informing therapeutic strategies.


Subject(s)
Biological Specimen Banks , Genetic Diseases, Inborn/genetics , Genetic Loci , Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Adult , Aged , Female , Gene-Environment Interaction , Humans , Male , Middle Aged , Phenotype , Prospective Studies , Risk Factors , United Kingdom
18.
Nat Genet ; 51(11): 1660, 2019 Nov.
Article in English | MEDLINE | ID: mdl-31591513

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

19.
Nat Genet ; 51(9): 1330-1338, 2019 09.
Article in English | MEDLINE | ID: mdl-31477934

ABSTRACT

Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.


Subject(s)
Algorithms , Evolution, Molecular , Genetics, Population , Genome, Human , Pedigree , Selection, Genetic , Computer Simulation , Datasets as Topic , Haplotypes , Humans , Models, Genetic , Mutation , Polymorphism, Single Nucleotide , Population Density
20.
Nat Commun ; 10(1): 3017, 2019 07 09.
Article in English | MEDLINE | ID: mdl-31289267

ABSTRACT

Differences among hosts, resulting from genetic variation in the immune system or heterogeneity in drug treatment, can impact within-host pathogen evolution. Genetic association studies can potentially identify such interactions. However, extensive and correlated genetic population structure in hosts and pathogens presents a substantial risk of confounding analyses. Moreover, the multiple testing burden of interaction scanning can potentially limit power. We present a Bayesian approach for detecting host influences on pathogen evolution that exploits vast existing data sets of pathogen diversity to improve power and control for stratification. The approach models key processes, including recombination and selection, and identifies regions of the pathogen genome affected by host factors. Our simulations and empirical analysis of drug-induced selection on the HIV-1 genome show that the method recovers known associations and has superior precision-recall characteristics compared to other approaches. We build a high-resolution map of HLA-induced selection in the HIV-1 genome, identifying novel epitope-allele combinations.


Subject(s)
Evolution, Molecular , HIV-1/genetics , HLA Antigens/immunology , Host-Pathogen Interactions/genetics , Models, Genetic , Anti-HIV Agents/pharmacology , Anti-HIV Agents/therapeutic use , Bayes Theorem , Datasets as Topic , Epitopes/drug effects , Epitopes/genetics , Epitopes/immunology , Genome, Viral/drug effects , HIV Infections/drug therapy , HIV Infections/immunology , HIV Infections/virology , HIV-1/drug effects , HIV-1/immunology , Host-Pathogen Interactions/immunology , Humans , Recombination, Genetic/drug effects , Recombination, Genetic/immunology , Selection, Genetic/drug effects , Selection, Genetic/immunology
SELECTION OF CITATIONS
SEARCH DETAIL
...