Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Nucleic Acids Res ; 51(D1): D1109-D1116, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36243989

RESUMO

Structural variations (SVs) play important roles in human evolution and diseases, but there is a lack of data resources concerning representative samples, especially for East Asians. Taking advantage of both next-generation sequencing and third-generation sequencing data at the whole-genome level, we developed the database PGG.SV to provide a practical platform for both regionally and globally representative structural variants. In its current version, PGG.SV archives 584 277 SVs obtained from whole-genome sequencing data of 6048 samples, including 1030 long-read sequencing genomes representing 177 global populations. PGG.SV provides (i) high-quality SVs with fine-scale and precise genomic locations in both GRCh37 and GRCh38, covering underrepresented SVs in existing sequencing and microarray data; (ii) hierarchical estimation of SV prevalence in geographical populations; (iii) informative annotations of SV-related genes, potential functions and clinical effects; (iv) an analysis platform to facilitate SV-based case-control association studies and (v) various visualization tools for understanding the SV structures in the human genome. Taken together, PGG.SV provides a user-friendly online interface, easy-to-use analysis tools and a detailed presentation of results. PGG.SV is freely accessible via https://www.biosino.org/pggsv.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Genômica/métodos , Sequenciamento Completo do Genoma , Genoma Humano , Bases de Dados Genéticas , Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos
2.
Hum Mol Genet ; 27(6): 1067-1077, 2018 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-29346564

RESUMO

Transcriptomic diversity across human populations reflects differential regulatory mechanisms. Allelic-imbalanced gene expression is a genetic regulatory mechanism that contributes to human phenotypic variation. To systematically investigate genome-wide allele-specific expression (ASE), we analyzed RNA-Seq data from European and African populations provided by the Geuvadis project. We identified 11 sites in 8 genes showing ASE in both Europeans and Africans, and 9 sites in 9 genes showing population-specific ASE, including both novel and known ASE signals. Notably, the top signal of differentiated ASE between inter-continental populations was observed in DNAJC15, of which the derived allele of rs12015, a single nucleotide polymorphism (SNP), showed significantly higher expression than did the ancestral allele specifically in European individuals. We identified a unique haplotype of DNAJC15, where a few SNPs highly differentiated between European and African populations were strongly linked to sites with high ASE. Among these, SNP rs17553284 affected the binding of several transcription factors as well as the genotype-dependent expression of DNAJC15. Therefore, we speculated that rs17553284 could be a regulatory causal variant that mediates the ASE of rs12015. We found several variations in ASE between intercontinental populations. The highly differentiated ASE genes identified here may implicate in the phenotypic variations among populations that are both evolutionarily and medically important.


Assuntos
População Negra/genética , Frequência do Gene , População Branca/genética , Regulação da Expressão Gênica , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Proteínas de Choque Térmico HSP40/genética , Haplótipos , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Análise de Sequência de RNA/métodos , Transcriptoma
3.
BMC Genomics ; 20(1): 842, 2019 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-31718558

RESUMO

BACKGROUND: Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. RESULTS: We analyzed the whole-genome deep sequencing data (~ 30×) of five native trios from Peninsular Malaysia and North Borneo, and characterized the genomic variants, including single nucleotide variants (SNVs), small insertions and deletions (indels) and copy number variants (CNVs). We discovered approximately 6.9 million SNVs, 1.2 million indels, and 9000 CNVs in the 15 samples, of which 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the autosomal mutation rates to be 0.81 × 10- 8 - 1.33 × 10- 8, 1.0 × 10- 9 - 2.9 × 10- 9, and ~ 0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example is a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. CONCLUSION: Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.


Assuntos
Variação Genética , Genoma Humano , Animais , Bornéu/etnologia , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Hominidae/genética , Humanos , Mutação INDEL , Malásia/etnologia , Taxa de Mutação
4.
Am J Hum Genet ; 99(3): 580-594, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27569548

RESUMO

The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection.


Assuntos
Povo Asiático/genética , Fluxo Gênico/genética , Genoma Humano/genética , Filogenia , Altitude , Animais , China/etnologia , Etnicidade/genética , Pool Gênico , Genética Populacional , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Modelos Genéticos , Homem de Neandertal/genética , Oceania/etnologia , Seleção Genética , Tibet
6.
Mol Biol Evol ; 34(10): 2572-2582, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28595347

RESUMO

The Uyghur people residing in Xinjiang, a territory located in the far west of China and crossed by the Silk Road, are a key ethnic group for understanding the history of human dispersion in Eurasia. Here we assessed the genetic structure and ancestry of 951 Xinjiang's Uyghurs (XJU) representing 14 geographical subpopulations. We observed a southwest and northeast differentiation within XJU, which was likely shaped jointly by the Tianshan Mountains, which traverses from east to west as a natural barrier, and gene flow from both east and west directions. In XJU, we identified four major ancestral components that were potentially derived from two earlier admixed groups: one from the West, harboring European (25-37%) and South Asian ancestries (12-20%), and the other from the East, with Siberian (15-17%) and East Asian (29-47%) ancestries. By using a newly developed method, MultiWaver, the complex admixture history of XJU was modeled as a two-wave admixture. An ancient wave was dated back to ∼3,750 years ago (ya), which is much earlier than that estimated by previous studies, but fits within the range of dating of mummies that exhibited European features that were discovered in the Tarim basin, which is situated in southern Xinjiang (4,000-2,000 ya); a more recent wave occurred around 750 ya, which is in agreement with the estimate from a recent study using other methods. We unveiled a more complex scenario of ancestral origins and admixture history in XJU than previously reported, which further suggests Bronze Age massive migrations in Eurasia and East-West contacts across the Silk Road.


Assuntos
Povo Asiático/genética , Etnicidade/genética , Genética Populacional/métodos , China/etnologia , Fluxo Gênico , Geografia , Haplótipos/genética , Humanos , Filogeografia , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
7.
Am J Hum Genet ; 97(1): 54-66, 2015 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-26073780

RESUMO

Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10(-15)). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276-46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r(2) = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075-14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1.


Assuntos
Adaptação Biológica/genética , Altitude , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Variações do Número de Cópias de DNA/genética , Etnicidade/genética , Evolução Molecular , Hominidae/genética , Algoritmos , Animais , Sequência de Bases , Genética Populacional , Hemoglobinas/genética , Hemoglobinas/metabolismo , Humanos , Desequilíbrio de Ligação , Análise em Microsséries/métodos , Dados de Sequência Molecular , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA , Tibet
8.
Heredity (Edinb) ; 120(1): 83-89, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29234170

RESUMO

Disease-associated variants in the human genome are continually being identified using DNA sequencing technologies that are especially effective for Mendelian disorders. Here we sequenced whole genome to high coverage (>30×) of 6 members of a 7-generation family with dwarfism from a consanguineous tribe in Pakistan to determine the causal variant(s). We identified a missense variant rs111033552 (c.2011T>C [p.Ser671Pro]) located in COL10A1 (encodes the alpha chain of type X collagen) as the most likely contributor to the dwarfism. We further confirmed the variant in 22 family members using Sanger sequencing. All affected individuals are heterozygous for the missense mutation rs111033552 and no individual homozygous was observed. Moreover, the mutation was absent in 69,985 individuals representing >150 global populations. Taking advantage of whole-genome sequencing data, we also examined other variant forms, including copy number variation and insertion/deletion, but failed to identify such variants enriched in the affected individuals. Thus rs111033552 had priority for linkage with dwarfism.


Assuntos
Colágeno Tipo XI/genética , Nanismo/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação de Sentido Incorreto , Mutação Puntual , Sequenciamento Completo do Genoma/métodos , Adolescente , Adulto , Sequência de Bases , Criança , Consanguinidade , Saúde da Família , Feminino , Predisposição Genética para Doença/genética , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Paquistão , Linhagem , Polimorfismo de Nucleotídeo Único , Homologia de Sequência do Ácido Nucleico , Adulto Jovem
9.
J Med Genet ; 54(10): 685-692, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28705883

RESUMO

BACKGROUND: Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world. OBJECTIVES: To build a representative CNV map for the Han Chinese population. METHODS: We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery. RESULTS: A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population. CONCLUSIONS: Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population.


Assuntos
Povo Asiático/genética , Variações do Número de Cópias de DNA , Etnicidade/genética , Variação Genética , Genoma Humano , China , Humanos , Masculino
10.
Hum Genet ; 135(11): 1279-1286, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27487801

RESUMO

Hair straightness/curliness is one of the most conspicuous features of human variation and is particularly diverse among populations. A recent genome-wide scan found common variants in the Trichohyalin (TCHH) gene that are associated with hair straightness in Europeans, but different genes might affect this phenotype in other populations. By sampling 2899 Han Chinese, we performed the first genome-wide scan of hair straightness in East Asians, and found EDAR (rs3827760) as the predominant gene (P = 4.67 × 10-16), accounting for 3.66 % of the total variance. The candidate gene approach did not find further significant associations, suggesting that hair straightness may be affected by a large number of genes with subtle effects. Notably, genetic variants associated with hair straightness in Europeans are generally low in frequency in Han Chinese, and vice versa. To evaluate the relative contribution of these variants, we performed a second genome-wide scan in 709 samples from the Uyghur, an admixed population with both eastern and western Eurasian ancestries. In Uyghurs, both EDAR (rs3827760: P = 1.92 × 10-12) and TCHH (rs11803731: P = 1.46 × 10-3) are associated with hair straightness, but EDAR (OR 0.415) has a greater effect than TCHH (OR 0.575). We found no significant interaction between EDAR and TCHH (P = 0.645), suggesting that these two genes affect hair straightness through different mechanisms. Furthermore, haplotype analysis indicates that TCHH is not subject to selection. While EDAR is under strong selection in East Asia, it does not appear to be subject to selection after the admixture in Uyghurs. These suggest that hair straightness is unlikely a trait under selection.


Assuntos
Antígenos/genética , Receptor Edar/genética , Estudo de Associação Genômica Ampla , Cabelo , Proteínas de Filamentos Intermediários/genética , Povo Asiático/genética , China , Feminino , Frequência do Gene , Predisposição Genética para Doença , Cabelo/crescimento & desenvolvimento , Cabelo/metabolismo , Cabelo/ultraestrutura , Haplótipos , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , População Branca/genética
11.
J Med Genet ; 51(9): 614-22, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25074363

RESUMO

BACKGROUND: Drug absorption, distribution, metabolism and excretion (ADME) contribute to the high heterogeneity of drug responses in humans. However, the same standard for drug dosage has been applied to all populations in China although genetic differences in ADME genes are expected to exist in different ethnic groups. In particular, the ethnic minorities in northwestern China with substantial ancestry contribution from Western Eurasian people might violate such a single unified standard. METHODS: In this study, we used Affymetrix SNP Array 6.0 to investigate the genetic diversity of 282 ADME genes in five northwestern Chinese minority populations, namely, Tajik, Uyghur, Kazakh, Kirgiz and Hui, and attempted to identify the highly differential SNPs and haplotypes and further explore their clinical implications. RESULTS: We found that genetic diversity of many ADME genes in the five minority groups was substantially different from those in the Han Chinese population. For instance, we identified 10 functional SNPs with substantial allele frequency differences, 14 functional SNPs with highly different heterozygous states and eight genes with significant haplotype differences between these admixed minority populations and the Han Chinese population. We further confirmed that these differences mainly resulted from the European gene flow, that is, this gene flow increased the genetic diversity in the admixed populations. CONCLUSIONS: These results suggest that the ADME genes vary substantially among different Chinese ethnic groups. We suggest it could cause potential clinical risk if the same dosage of substances (eg, antitumour drugs) is used without considering population stratification.


Assuntos
Relação Dose-Resposta a Droga , Etnicidade/genética , Variação Genética , Farmacocinética , Povo Asiático/genética , Fluxo Gênico/genética , Frequência do Gene , Genética Populacional , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
12.
Hum Mol Genet ; 21(7): 1611-24, 2012 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-22186022

RESUMO

Traditionally, genetic disorders have been classified as either Mendelian diseases or complex diseases. This nosology has greatly benefited genetic counseling and the development of gene mapping strategies. However, based on two well-established databases, we identified that 54% (524 of 968) of the Mendelian disease genes were also involved in complex diseases, and this kind of genes has not been systematically analyzed. Here, we classified human genes into five categories: Mendelian and complex disease (MC) genes, Mendelian but not complex disease (MNC) genes, complex but not Mendelian disease (CNM) genes, essential genes and OTHER genes. First, we found that MC genes were associated with more diseases and phenotypes, and were involved in more complex protein-protein interaction network than MNC or CNM genes on average. Secondly, MC genes encoded the longest proteins and had the highest transcript count among all gene categories. Especially, tissue specificity of MC genes was much higher than that of any other gene categories (P < 7.5 × 10(-5)), although their expression level was similar to that of essential genes. Thirdly, evidences from different aspects supported that MC genes have been subjected to both purifying and positive selection. Interestingly, functions of some human disease genes might be different from those of their orthologous genes in non-primate mammalians since they were even less conserved than OTHER genes. The significant over-representation of copy number variations (CNVs) in CNM genes suggested the important roles of CNVs in complex diseases. In brief, our study not only revealed the characteristics of MC genes, but also provided new insights into the other four gene categories.


Assuntos
Doença/genética , Doenças Genéticas Inatas/genética , Evolução Molecular , Dosagem de Genes , Genes , Humanos , Fenótipo , Mapeamento de Interação de Proteínas , Proteínas/genética , Seleção Genética
13.
J Med Genet ; 50(8): 534-42, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23735306

RESUMO

BACKGROUND: Investigating variations in gene expression, which can be quantitatively measured on a genome-wide scale, is essential to understand and interpret phenotypic differences among human populations. Several previous studies have examined and compared variations in gene expression between continental populations. However, differences in gene expression variation between closely related populations have not been studied yet. METHOD: We performed a genome-wide analysis and systematically compared expression profiles of Han Chinese with those of the Japanese population. RESULTS: We identified 768 genes (4.4% of 17 354 expressed genes) which were expressed differentially between the two populations, with 165 showing highly differential expression and enriched in genes involved in the spliceosome pathway, mRNA processing, mRNA metabolic process, RNA processing, RNA splicing and mitochondrial transport. We further identified cis- and trans-variants that regulated these differential gene expressions, and found that cis-variants shared in the two populations were centred within a range of 200 kb around transcription start site. Our analysis indicated that genetic differences in the cis-associated genes between the two populations could explain 7-43% of the identified expression divergence. CONCLUSIONS: In summary, despite considerable heterogeneity, gene expression profiles between Han Chinese and Japanese did show an overall difference, with well-differentiated expressions regulated by genetic variants which have been reported associated with hematological and biochemical traits in Japanese populations. Our results supported that gene expression is regulated by genetic variants and there is a genetic basis for the phenotypic differences between Han Chinese and Japanese populations.


Assuntos
Povo Asiático/genética , Expressão Gênica , Genoma Humano , Perfilação da Expressão Gênica , Variação Genética , Genética Populacional , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Análise em Microsséries , Fenótipo , RNA Mensageiro/metabolismo
14.
Mol Biol Evol ; 28(2): 1003-11, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20961960

RESUMO

Genetic studies of Tibetans, an ethnic group with a long-lasting presence on the Tibetan Plateau which is known as the highest plateau in the world, may offer a unique opportunity to understand the biological adaptations of human beings to high-altitude environments. We conducted a genome-wide study of 1,000,000 genetic variants in 46 Tibetans (TBN) and 92 Han Chinese (HAN) for identifying the signals of high-altitude adaptations (HAAs) in Tibetan genomes. We discovered the most differentiated variants between TBN and HAN at chromosome 1q42.2 and 2p21. EGLN1 (or HIFPH2, MIM 606425) and EPAS1 (or HIF2A, MIM 603349), both related to hypoxia-inducible factor, were found most differentiated in the two regions, respectively. Strong positive correlations were also observed between the frequency of TBN-dominant haplotypes in the two gene regions and altitude in East Asian populations. Linkage disequilibrium and further haplotype network analyses of world-wide populations suggested the antiquity of the TBN-dominant haplotypes and long-term persistence of the natural selection. Finally, a "dominant haplotype carrier" hypothesis could describe the role of the two genes in HAA. All of our population genomic and statistical analyses indicate that EPAS1 and EGLN1 are most likely responsible for HAA of Tibetans. Interestingly, one each but not both of the two genes were also identified by three recent studies. We reanalyzed the available data and found the escaped top signal (EPAS1) could be recaptured with data quality control and our approaches. Based on this experience, we call for more attention to be paid to controlling data quality and batch effects introduced in public data integration. Our results also suggest limitations of extended haplotype homozygosity-based method due to its compromised power in case the natural selection initiated long time ago and particularly in genomic regions with recombination hotspots.


Assuntos
Doença da Altitude/genética , Povo Asiático/genética , Estudo de Associação Genômica Ampla , Altitude , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Haplótipos , Humanos , Prolina Dioxigenases do Fator Induzível por Hipóxia , Pró-Colágeno-Prolina Dioxigenase/genética , Tibet
15.
Am J Hum Genet ; 85(6): 762-74, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19944404

RESUMO

To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at approximately 160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (F(ST) = 0.0002 approximately 0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (F(ST) > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10(-101)). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.


Assuntos
Variação Genética/genética , Artrite Psoriásica/genética , Povo Asiático , China , Etnicidade , Reações Falso-Positivas , Ácidos Graxos Dessaturases/genética , Genética Populacional , Cardiopatias/genética , Humanos , Complexo Principal de Histocompatibilidade/genética , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Psoríase/genética , RNA Longo não Codificante , RNA não Traduzido
16.
Cell Syst ; 13(4): 321-333.e6, 2022 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-35180379

RESUMO

Even though the human reference genome assembly is continually being improved, it remains debatable whether a population-specific reference is necessary for every ethnic group. Here, we de novo assembled an individual genome (TJ1) from the Tujia population, an ethnic minority group most closely related to the Han Chinese. TJ1 provided a high-quality haplotype-resolved assembly of chromosome-scale with a scaffold N50 size >78 Mb. Compared with GRCh38 and other de novo assemblies, TJ1 improved short-read mapping, enhanced calling precision for structural variants, and detected rare and low-frequency variants. This revealed fine-scale differences between the closely related Han and Tujia populations, such as population-stratified variants of LCT and UBXN8, and improved screening for ancestry informative markers. We demonstrated that TJ1 could reduce false positives in clinical diagnosis and analyzed the PRSS1-PRSS2 locus as a test case. Our results suggest that population-specific assemblies are necessary for genetic and medical analysis, especially when closely related populations are studied. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Povo Asiático , Etnicidade , Povo Asiático/genética , Genoma Humano/genética , Haplótipos/genética , Humanos , Grupos Minoritários , Tripsina/genética , Tripsinogênio/genética
17.
Nat Commun ; 12(1): 6232, 2021 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-34716342

RESUMO

We developed a method, ArchaicSeeker 2.0, to identify introgressed hominin sequences and model multiple-wave admixture. The new method enabled us to discern two waves of introgression from both Denisovan-like and Neanderthal-like hominins in present-day Eurasian populations and an ancient Siberian individual. We estimated that an early Denisovan-like introgression occurred in Eurasia around 118.8-94.0 thousand years ago (kya). In contrast, we detected only one single episode of Denisovan-like admixture in indigenous peoples eastern to the Wallace-Line. Modeling ancient admixtures suggested an early dispersal of modern humans throughout Asia before the Toba volcanic super-eruption 74 kya, predating the initial peopling of Asia as proposed by the traditional Out-of-Africa model. Survived archaic sequences are involved in various phenotypes including immune and body mass (e.g., ZNF169), cardiovascular and lung function (e.g., HHAT), UV response and carbohydrate metabolism (e.g., HYAL1/HYAL2/HYAL3), while "archaic deserts" are enriched with genes associated with skin development and keratinization.


Assuntos
Introgressão Genética , Hominidae/genética , Metagenômica/métodos , Modelos Genéticos , Algoritmos , Animais , Ásia , Proteínas de Ligação a DNA/genética , Europa (Continente) , Genoma Humano , Humanos , Homem de Neandertal/genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sibéria
18.
Natl Sci Rev ; 7(2): 391-402, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34692055

RESUMO

Structural variants (SVs) may play important roles in human adaptation to extreme environments such as high altitude but have been under-investigated. Here, combining long-read sequencing with multiple scaffolding techniques, we assembled a high-quality Tibetan genome (ZF1), with a contig N50 length of 24.57 mega-base pairs (Mb) and a scaffold N50 length of 58.80 Mb. The ZF1 assembly filled 80 remaining N-gaps (0.25 Mb in total length) in the reference human genome (GRCh38). Markedly, we detected 17 900 SVs, among which the ZF1-specific SVs are enriched in GTPase activity that is required for activation of the hypoxic pathway. Further population analysis uncovered a 163-bp intronic deletion in the MKL1 gene showing large divergence between highland Tibetans and lowland Han Chinese. This deletion is significantly associated with lower systolic pulmonary arterial pressure, one of the key adaptive physiological traits in Tibetans. Moreover, with the use of the high-quality de novo assembly, we observed a much higher rate of genome-wide archaic hominid (Altai Neanderthal and Denisovan) shared non-reference sequences in ZF1 (1.32%-1.53%) compared to other East Asian genomes (0.70%-0.98%), reflecting a unique genomic composition of Tibetans. One such archaic hominid shared sequence-a 662-bp intronic insertion in the SCUBE2 gene-is enriched and associated with better lung function (the FEV1/FVC ratio) in Tibetans. Collectively, we generated the first high-resolution Tibetan reference genome, and the identified SVs may serve as valuable resources for future evolutionary and medical studies.

19.
Natl Sci Rev ; 6(6): 1201-1222, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34691999

RESUMO

Human genetic adaptation to high altitudes (>2500 m) has been extensively studied over the last few years, but few functional adaptive genetic variants have been identified, largely owing to the lack of deep-genome sequencing data available to previous studies. Here, we build a list of putative adaptive variants, including 63 missense, 7 loss-of-function, 1,298 evolutionarily conserved variants and 509 expression quantitative traits loci. Notably, the top signal of selection is located in TMEM247, a transmembrane protein-coding gene. The Tibetan version of TMEM247 harbors one high-frequency (76.3%) missense variant, rs116983452 (c.248C > T; p.Ala83Val), with the T allele derived from archaic ancestry and carried by >94% of Tibetans but absent or in low frequencies (<3%) in non-Tibetan populations. The rs116983452-T is strongly and positively correlated with altitude and significantly associated with reduced hemoglobin concentration (p = 5.78 × 10-5), red blood cell count (p = 5.72 × 10-7) and hematocrit (p = 2.57 × 10-6). In particular, TMEM247-rs116983452 shows greater effect size and better predicts the phenotypic outcome than any EPAS1 variants in association with adaptive traits in Tibetans. Modeling the interaction between TMEM247-rs116983452 and EPAS1 variants indicates weak but statistically significant epistatic effects. Our results support that multiple variants may jointly deliver the fitness of the Tibetans on the plateau, where a complex model is needed to elucidate the adaptive evolution mechanism.

20.
Syst Appl Microbiol ; 41(1): 1-12, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-29129355

RESUMO

Distinct enterotypes have been observed in the human gut but little is known about the genetic basis of the microbiome. Moreover, it is not clear how many genetic differences exist between enterotypes within or between populations. In this study, both the 16S rRNA gene and the metagenomes of the gut microbiota were sequenced from 48 Han Chinese, 48 Kazaks, and 96 Uyghurs, and taxonomies were assigned after de novo assembly. Single nucleotide polymorphisms were also identified by referring to data from the Human Microbiome Project. Systematic analysis of the gut communities in terms of their abundance and genetic composition was also performed, together with a genome-wide association study of the host genomes. The gut microbiota of 192 subjects was clearly classified into two enterotypes (Bacteroides and Prevotella). Interestingly, both enterotypes showed a clear genetic differentiation in terms of their functional catalogue of genes, especially for genes involved in amino acid and carbohydrate metabolism. In addition, several differentiated genera and genes were found among the three populations. Notably, one human variant (rs878394) was identified that showed significant association with the abundance of Prevotella, which is linked to LYPLAL1, a gene associated with body fat distribution, the waist-hip ratio and insulin sensitivity. Taken together, considerable differentiation was observed in gut microbes between enterotypes and among populations that was reflected in both the taxonomic composition and the genetic makeup of their functional genes, which could have been influenced by a variety of factors, such as diet and host genetic variation.


Assuntos
Bactérias/classificação , Bactérias/genética , Microbioma Gastrointestinal , Metagenômica , Microbiota , Povo Asiático , Análise por Conglomerados , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Ribossômico/química , DNA Ribossômico/genética , Etnicidade , Estudos de Associação Genética , Voluntários Saudáveis , Humanos , Islamismo , Lisofosfolipase/genética , Filogenia , Polimorfismo de Nucleotídeo Único , RNA Ribossômico 16S/genética , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA