RESUMO
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
Assuntos
Predisposição Genética para Doença/genética , Herança Multifatorial/genética , Feminino , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla/métodos , Hematopoese/genética , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The origin and fate of new mutations within species is the fundamental process underlying evolution. However, while much attention has been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a nonparametric approach for estimating the date of origin of genetic variants in large-scale sequencing data sets. The accuracy and robustness of the approach is demonstrated through simulation. Using data from two publicly available human genomic diversity resources, we estimated the age of more than 45 million single-nucleotide polymorphisms (SNPs) in the human genome and release the Atlas of Variant Age as a public online database. We characterize the relationship between variant age and frequency in different geographical regions and demonstrate the value of age information in interpreting variants of functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the ancestry shared between individual genomes and to quantify genealogical relationships at different points in the past, as well as to describe and explore the evolutionary history of modern human populations.
Assuntos
Especiação Genética , Genética Populacional/métodos , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genética , Fatores Etários , Alelos , Simulação por Computador , Conjuntos de Dados como Assunto , Evolução Molecular , Frequência do Gene , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linhagem , Filogenia , Análise de Sequência de DNA , Estatística como Assunto/métodos , Fatores de TempoRESUMO
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α = 2.5 × 10(-6)) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
Assuntos
Doenças Genéticas Inatas , Variação Genética , Estudo de Associação Genômica Ampla , Modelos Teóricos , Alelos , Simulação por Computador , Diabetes Mellitus Tipo 2/genética , Exoma/genética , Predisposição Genética para Doença , Humanos , Desequilíbrio de Ligação , FenótipoRESUMO
The Atlantic Forest (AF) harbours one of the most diverse vertebrate faunas of the world, including 199 endemic species of birds. Understanding the evolutionary processes behind such diversity has become the focus of many recent, primarily single locus, phylogeographic studies. These studies suggest that isolation in forest refugia may have been a major mechanism promoting diversification, although there is also support for a role of riverine and geotectonic barriers, two sets of hypotheses that can best be tested with multilocus data. Here we combined multilocus data (one mtDNA marker and eight anonymous nuclear loci) from two species of parapatric antbirds, Myrmeciza loricata and M. squamosa, and Approximate Bayesian Computation to determine whether isolation in refugia explains current patterns of genetic variation and their status as independent evolutionary units. Patterns of population structure, differences in intraspecific levels of divergence and coalescent estimates of historical demography fit the predictions of a recently proposed model of refuge isolation in which climatic stability in the northern AF sustains higher diversity and demographic stability than in the southern AF. However, a pre-Pleistocene divergence associated with their abutting range limits in a region of past tectonic activity also suggests a role for rivers or geotectonic barriers. Little or no gene flow between these species suggests the development of reproductive barriers or competitive exclusion. Our results suggests that limited marker sampling in recent AF studies may compromise estimates of divergence times and historical demography, and we discuss the effects of such sampling on this and other studies.
Assuntos
Biodiversidade , DNA Mitocondrial/genética , Variação Genética , Passeriformes/genética , Animais , Ecossistema , Evolução Molecular , Marcadores Genéticos/genética , Haplótipos/genética , Filogeografia , Isolamento Reprodutivo , Análise de Sequência de DNARESUMO
COVID-19 is a respiratory illness caused by a novel coronavirus called SARS-CoV-2. The viral spike (S) protein engages the human angiotensin-converting enzyme 2 (ACE2) receptor to invade host cells with ~10-15-fold higher affinity compared to SARS-CoV S-protein, making it highly infectious. Here, we assessed if ACE2 polymorphisms can alter host susceptibility to SARS-CoV-2 by affecting this interaction. We analyzed over 290,000 samples representing >400 population groups from public genomic datasets and identified multiple ACE2 protein-altering variants. Using reported structural data, we identified natural ACE2 variants that could potentially affect virus-host interaction and thereby alter host susceptibility. These include variants S19P, I21V, E23K, K26R, T27A, N64K, T92I, Q102P and H378R that were predicted to increase susceptibility, while variants K31R, N33I, H34R, E35K, E37K, D38V, Y50F, N51S, M62V, K68E, F72V, Y83H, G326E, G352V, D355N, Q388L and D509Y were predicted to be protective variants that show decreased binding to S-protein. Using biochemical assays, we confirmed that K31R and E37K had decreased affinity, and K26R and T92I variants showed increased affinity for S-protein when compared to wildtype ACE2. Consistent with this, soluble ACE2 K26R and T92I were more effective in blocking entry of S-protein pseudotyped virus suggesting that ACE2 variants can modulate susceptibility to SARS-CoV-2.
Assuntos
Enzima de Conversão de Angiotensina 2/genética , COVID-19/genética , Predisposição Genética para Doença/genética , Mutação de Sentido Incorreto/genética , Polimorfismo Genético , Receptores Virais/genética , Sequência de Aminoácidos , Enzima de Conversão de Angiotensina 2/química , Enzima de Conversão de Angiotensina 2/metabolismo , COVID-19/metabolismo , COVID-19/virologia , Interações Hospedeiro-Patógeno , Humanos , Modelos Moleculares , Ligação Proteica , Domínios Proteicos , Receptores Virais/química , Receptores Virais/metabolismo , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Homologia de Sequência de Aminoácidos , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/metabolismo , Internalização do VírusRESUMO
Genetic risk factors frequently affect multiple common human diseases, providing insight into shared pathophysiological pathways and opportunities for therapeutic development. However, systematic identification of genetic profiles of disease risk is limited by the availability of both comprehensive clinical data on population-scale cohorts and the lack of suitable statistical methodology that can handle the scale of and differential power inherent in multi-phenotype data. Here, we develop a disease-agnostic approach to cluster the genetic risk profiles for 3,025 genome-wide independent loci across 19,155 disease classification codes from 320,644 participants in the UK Biobank, representing a large and heterogeneous population. We identify 339 distinct disease association profiles and use multiple approaches to link clusters to the underlying biological pathways. We show how clusters can decompose the variance and covariance in risk for disease, thereby identifying underlying biological processes and their impact. We demonstrate the use of clusters in defining disease relationships and their potential in informing therapeutic strategies.
Assuntos
Bancos de Espécimes Biológicos , Doenças Genéticas Inatas/genética , Loci Gênicos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Adulto , Idoso , Feminino , Interação Gene-Ambiente , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Estudos Prospectivos , Fatores de Risco , Reino UnidoRESUMO
Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.
Assuntos
Algoritmos , Evolução Molecular , Genética Populacional , Genoma Humano , Linhagem , Seleção Genética , Simulação por Computador , Conjuntos de Dados como Assunto , Haplótipos , Humanos , Modelos Genéticos , Mutação , Polimorfismo de Nucleotídeo Único , Densidade DemográficaRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
Hypoxia-inducible factor 1α is a key regulator of the hypoxia response in normal and cancer tissues. It is well recognized to regulate glycolysis and is a target for therapy. However, how tumor cells adapt to grow in the absence of HIF1α is poorly understood and an important concept to understand for developing targeted therapies is the flexibility of the metabolic response to hypoxia via alternative pathways. We analyzed pathways that allow cells to survive hypoxic stress in the absence of HIF1α, using the HCT116 colon cancer cell line with deleted HIF1α versus control. Spheroids were used to provide a 3D model of metabolic gradients. We conducted a metabolomic, transcriptomic, and proteomic analysis and integrated the results. These showed surprisingly that in three-dimensional growth, a key regulatory step of glycolysis is Aldolase A rather than phosphofructokinase. Furthermore, glucose uptake could be maintained in hypoxia through upregulation of GLUT14, not previously recognized in this role. Finally, there was a marked adaptation and change of phosphocreatine energy pathways, which made the cells susceptible to inhibition of creatine metabolism in hypoxic conditions. Overall, our studies show a complex adaptation to hypoxia that can bypass HIF1α, but it is targetable and it provides new insight into the key metabolic pathways involved in cancer growth. IMPLICATIONS: Under hypoxia and HIF1 blockade, cancer cells adapt their energy metabolism via upregulation of the GLUT14 glucose transporter and creatine metabolism providing new avenues for drug targeting.