Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
1.
Cell ; 185(18): 3426-3440.e19, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055201

RESUMO

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Assuntos
Genoma Humano , Sequenciamento Completo do Genoma , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Masculino , Polimorfismo de Nucleotídeo Único
2.
Genes Cells ; 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39308264

RESUMO

G-protein-coupled receptors (GPCRs) are the largest superfamily in the human genome and the major targets for the market drugs. Recent massive genomics studies revealed numerous natural variations in the general population. 54KJPN is the most extensive Japanese population genomics study, curating the whole genome sequences from about 54,000 individuals. Here, by analyzing 390 non-olfactory GPCR genes in the 54KJPN dataset, we annotated 25,443 missense single-nucleotide variations. Among them, we found 120 major variations that appear with an allele frequency greater than 0.5, including variations that occurred on posttranslational modification sites. Structural alignment of GPCRs using the generic numbering system in the GPCRdb reveals enrichment of alterations in the conserved arginine residue within the DRY motif, which contributes to downstream G-protein signaling. A comparison with the worldwide 1000 Genomes Project (1KGP) dataset found 23 variations that were present exclusively in the 54KJPN dataset. This study will be the basis for future pharmacogenomics studies for the Japanese population.

3.
RNA ; 28(4): 478-492, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35110373

RESUMO

Polymorphism drives survival under stress and provides adaptability. Genetic polymorphism of ribosomal RNA (rRNA) genes derives from internal repeat variation of this multicopy gene, and from interindividual variation. A considerable amount of rRNA sequence heterogeneity has been proposed but has been challenging to estimate given the scarcity of accurate reference sequences. We identified four rDNA copies on chromosome 21 (GRCh38) with 99% similarity to recently introduced reference sequence KY962518.1. We customized a GATK bioinformatics pipeline using the four rDNA loci, spanning a total 145 kb, for variant calling and used high-coverage whole-genome sequencing (WGS) data from the 1000 Genomes Project to analyze variants in 2504 individuals from 26 populations. We identified a total of 3791 variant positions. The variants positioned nonrandomly on the rRNA gene. Invariant regions included the promoter, early 5' ETS, most of 18S, 5.8S, ITS1, and large areas of the intragenic spacer. A total of 470 variant positions were observed on 28S rRNA. The majority of the 28S rRNA variants were located on highly flexible human-expanded rRNA helical folds ES7L and ES27L, suggesting that these represent positions of diversity and are potentially under continuous evolution. Several variants were validated based on RNA-seq analyses. Population analyses showed remarkable ancestry-linked genetic variance and the presence of both high penetrance and frequent variants in the 5' ETS, ITS2, and 28S regions segregating according to the continental populations. These findings provide a genetic view of rRNA gene array heterogeneity and raise the need to functionally assess how the 28S rRNA variants affect ribosome functions.


Assuntos
Heterogeneidade Genética , Genoma , DNA Ribossômico/genética , Genes de RNAr/genética , Humanos , RNA Ribossômico/genética , RNA Ribossômico 18S , RNA Ribossômico 28S/genética
4.
Hum Reprod ; 38(Supplement_2): ii57-ii68, 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-37982420

RESUMO

STUDY QUESTION: Was polycystic ovary syndrome (PCOS), which impairs fertility and adheres to the evolutionary paradox, subject to evolutionary selection during ancestral times and did rapidly diminish in prevalence? SUMMARY ANSWER: This study strengthened the hypothesis that positive selection of genetic variants occurred and may account for the high prevalence of PCOS observed today. WHAT IS KNOWN ALREADY: PCOS is a complex endocrine disorder characterized by both reproductive and metabolic disturbances. As a heritable disease that impairs fertility, PCOS should diminish rapidly in prevalence; however, it is the most common cause of female subfertility globally. Few scientific genetic studies have attempted to provide evidence for the positive selection of gene variants underlying PCOS. STUDY DESIGN, SIZE, DURATION: We performed an evolutionary analysis of 2,504 individuals from 14 populations of the 1000 Genomes Project. PARTICIPANTS/MATERIALS, SETTING, METHODS: We tested the signature of positive selection for 37 single-nucleotide polymorphisms (SNPs) associated with PCOS in previous genome-wide association studies using six parameters of positive selection. MAIN RESULTS AND THE ROLE OF CHANCE: Analyzing the evolutionary indices together, there was obvious positive selection at the PCOS-related SNPs loci, especially within the original evolution window of humans, demonstrated by significant Tajima's D values. Compared to the genome background, six of the 37 SNPs in or close to five genes (DENN domain-containing protein 1A: DENND1A, chromosome 9 open reading frame 3: AOPEP, aminopeptidase O: THADA, diacylglycerol kinase iota: DGKI, and netrin receptor UNC5C: UNC5C) showed significant evidence of positive selection, among which DENND1A, AOPEP, and THADA represent the set of most established susceptibility genes for PCOS. LIMITATIONS, REASONS FOR CAUTION: First, only well-documented SNPs were selected from well-designed experiments. Second, it is difficult to determine which hypothesis of PCOS evolution is at play. After considering the most significant functions of these genes, we found that they had a wide variety of functions with no obvious association between them. WIDER IMPLICATIONS OF THE FINDINGS: Our findings provide additional evidence for the positive evolution of PCOS. Our analyses require confirmation in a larger study with more evolutionary indicators and larger data range. Further research to identify the roles of the DENND1A, AOPEP, THADA, DGKI, and UNC5C genes is also necessary. STUDY FUNDING/COMPETING INTEREST(S): This study was supported by the National Key Research and Development Program of China (2021YFC2700400 and 2021YFC2700701), Basic Science Center Program of NSFC (31988101), CAMS Innovation Fund for Medical Sciences (2021-I2M-5-001), National Natural Science Foundation of China (82192874, 31871509, and 82071606), Shandong Provincial Key Research and Development Program (2020ZLYS02), Taishan Scholars Program of Shandong Province (ts20190988), and Fundamental Research Funds of Shandong University. The authors have no conflicts of interest to disclose. TRIAL REGISTRATION NUMBER: N/A.


Assuntos
Infertilidade Feminina , Síndrome do Ovário Policístico , Humanos , Feminino , Síndrome do Ovário Policístico/genética , Estudo de Associação Genômica Ampla , Fertilidade , Reprodução
5.
Hum Mutat ; 43(12): 1979-1993, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36054329

RESUMO

Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.


Assuntos
Neoplasias , Humanos , Alelos , Mutação , Neoplasias/genética , Sequenciamento Completo do Genoma
6.
Am J Hum Genet ; 105(1): 78-88, 2019 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-31178127

RESUMO

Relationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified only three second cousin or closer pairs across different populations, while commonly used methods identified a large number of such pairs. Similarly, within populations, we identified many fewer related pairs. Compared to methods relying on phased data, TRUFFLE has comparable accuracy but is drastically faster and has fewer broken segments. We also identified specific local genomic regions that are commonly shared within populations, suggesting selection. When applied to pedigree data, we observed 99.6% accuracy in detecting 1st to 5th degree relationships. As genomic datasets become much larger, TRUFFLE can enable disease gene mapping through implicit shared haplotypes by accurate IBD segment detection.


Assuntos
Mapeamento Cromossômico/métodos , Predisposição Genética para Doença , Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Software , Algoritmos , Simulação por Computador , Feminino , Ligação Genética , Genoma Humano , Genômica , Mutação em Linhagem Germinativa , Haplótipos , Humanos , Masculino , Modelos Genéticos , Linhagem
7.
BMC Biol ; 18(1): 167, 2020 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-33187521

RESUMO

BACKGROUND: Structural variants comprise diverse genomic arrangements including deletions, insertions, inversions, and translocations, which can generally be detected in humans through sequence comparison to the reference genome. Among structural variants, insertions are the least frequently identified variants, mainly due to ascertainment bias in the reference genome, lack of previous sequence knowledge, and low complexity of typical insertion sequences. Though recent developments in long-read sequencing deliver promise in annotating individual non-reference insertions, population-level catalogues on non-reference insertion variants have not been identified and the possible functional roles of these hidden variants remain elusive. RESULTS: To detect non-reference insertion variants, we developed a pipeline, InserTag, which generates non-reference contigs by local de novo assembly and then infers the full-sequence of insertion variants by tracing contigs from non-human primates and other human genome assemblies. Application of the pipeline to data from 2535 individuals of the 1000 Genomes Project helped identify 1696 non-reference insertion variants and re-classify the variants as retention of ancestral sequences or novel sequence insertions based on the ancestral state. Genotyping of the variants showed that individuals had, on average, 0.92-Mbp sequences missing from the reference genome, 92% of the variants were common (allele frequency > 5%) among human populations, and more than half of the variants were major alleles. Among human populations, African populations were the most divergent and had the most non-reference sequences, which was attributed to the greater prevalence of high-frequency insertion variants. The subsets of insertion variants were in high linkage disequilibrium with phenotype-associated SNPs and showed signals of recent continent-specific selection. CONCLUSIONS: Non-reference insertion variants represent an important type of genetic variation in the human population, and our developed pipeline, InserTag, provides the frameworks for the detection and genotyping of non-reference sequences missing from human populations.


Assuntos
Mapeamento de Sequências Contíguas , Frequência do Gene , Genoma Humano , Mutagênese Insercional , Humanos
8.
BMC Biol ; 18(1): 38, 2020 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-32279660

RESUMO

BACKGROUND: The advent of next generation sequencing (NGS) has allowed the discovery of short and long non-coding RNAs (ncRNAs) in an unbiased manner using reverse genetics approaches, enabling the discovery of multiple categories of ncRNAs and characterization of the way their expression is regulated. We previously showed that the identities and abundances of microRNA isoforms (isomiRs) and transfer RNA-derived fragments (tRFs) are tightly regulated, and that they depend on a person's sex and population origin, as well as on tissue type, tissue state, and disease type. Here, we characterize the regulation and distribution of fragments derived from ribosomal RNAs (rRNAs). rRNAs form a group that includes four (5S, 5.8S, 18S, 28S) rRNAs encoded by the human nuclear genome and two (12S, 16S) by the mitochondrial genome. rRNAs constitute the most abundant RNA type in eukaryotic cells. RESULTS: We analyzed rRNA-derived fragments (rRFs) across 434 transcriptomic datasets obtained from lymphoblastoid cell lines (LCLs) derived from healthy participants of the 1000 Genomes Project. The 434 datasets represent five human populations and both sexes. We examined each of the six rRNAs and their respective rRFs, and did so separately for each population and sex. Our analysis shows that all six rRNAs produce rRFs with unique identities, normalized abundances, and lengths. The rRFs arise from the 5'-end (5'-rRFs), the interior (i-rRFs), and the 3'-end (3'-rRFs) or straddle the 5' or 3' terminus of the parental rRNA (x-rRFs). Notably, a large number of rRFs are produced in a population-specific or sex-specific manner. Preliminary evidence suggests that rRF production is also tissue-dependent. Of note, we find that rRF production is not affected by the identity of the processing laboratory or the library preparation kit. CONCLUSIONS: Our findings suggest that rRFs are produced in a regimented manner by currently unknown processes that are influenced by both ubiquitous as well as population-specific and sex-specific factors. The properties of rRFs mirror the previously reported properties of isomiRs and tRFs and have implications for the study of homeostasis and disease.


Assuntos
MicroRNAs/genética , RNA Ribossômico/genética , Idoso , Linhagem Celular , Feminino , Humanos , Masculino , MicroRNAs/metabolismo , Pessoa de Meia-Idade , RNA Ribossômico/metabolismo , Fatores Sexuais , Transcriptoma
9.
BMC Bioinformatics ; 21(1): 14, 2020 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-31924160

RESUMO

BACKGROUND: Linkage disequilibrium (LD)-the non-random association of alleles at different loci-defines population-specific haplotypes which vary by genomic ancestry. Assessment of allelic frequencies and LD patterns from a variety of ancestral populations enables researchers to better understand population histories as well as improve genetic understanding of diseases in which risk varies by ethnicity. RESULTS: We created an interactive web module which allows for quick geographic visualization of linkage disequilibrium (LD) patterns between two user-specified germline variants across geographic populations included in the 1000 Genomes Project. Interactive maps and a downloadable, sortable summary table allow researchers to easily compute and compare allele frequencies and LD statistics of dbSNP catalogued variants. The geographic mapping of each SNP's allele frequencies by population as well as visualization of LD statistics allows the user to easily trace geographic allelic correlation patterns and examine population-specific differences. CONCLUSIONS: LDpop is a free and publicly available cross-platform web tool which can be accessed online at https://ldlink.nci.nih.gov/?tab=ldpop.


Assuntos
Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Interface Usuário-Computador , Alelos , Frequência do Gene , Genômica/métodos , Haplótipos , Humanos , Polimorfismo de Nucleotídeo Único
10.
Am J Hum Genet ; 100(4): 635-649, 2017 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-28366442

RESUMO

The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.


Assuntos
Predisposição Genética para Doença , Grupos Raciais/genética , América , Genética Médica , Genética Populacional , Haplótipos , Projeto Genoma Humano , Humanos , Herança Multifatorial
11.
Genet Epidemiol ; 42(7): 636-647, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30156736

RESUMO

Complex traits can share a substantial proportion of their polygenic heritability. However, genome-wide polygenic correlations between pairs of traits can mask heterogeneity in their shared polygenic effects across loci. We propose a novel method (weighted maximum likelihood-regional polygenic correlation [RPC]) to evaluate polygenic correlation between two complex traits in small genomic regions using summary association statistics. Our method tests for evidence that the polygenic effect at a given region affects two traits concurrently. We show through simulations that our method is well calibrated, powerful, and more robust to misspecification of linkage disequilibrium than other methods under a polygenic model. As small genomic regions are more likely to harbor specific genetic effects, our method is ideal to identify heterogeneity in shared polygenic correlation across regions. We illustrate the usefulness of our method by addressing two questions related to cardiometabolic traits. First, we explored how RPC can inform on the strong epidemiological association between high-density lipoprotein cholesterol and coronary artery disease (CAD), suggesting a key role for triglycerides metabolism. Second, we investigated the potential role of PPARγ activators in the prevention of CAD. Our results provide a compelling argument that shared heritability between complex traits is highly heterogeneous across loci.


Assuntos
Desequilíbrio de Ligação/genética , Herança Multifatorial/genética , HDL-Colesterol/genética , Simulação por Computador , Doença da Artéria Coronariana/tratamento farmacológico , Doença da Artéria Coronariana/genética , Loci Gênicos , Genoma Humano , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Modelos Genéticos , PPAR gama/metabolismo , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Tiazolidinedionas/uso terapêutico
12.
RNA ; 23(1): 14-22, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27807179

RESUMO

As most RNA structures are elusive to structure determination, obtaining solvent accessible surface areas (ASAs) of nucleotides in an RNA structure is an important first step to characterize potential functional sites and core structural regions. Here, we developed RNAsnap, the first machine-learning method trained on protein-bound RNA structures for solvent accessibility prediction. Built on sequence profiles from multiple sequence alignment (RNAsnap-prof), the method provided robust prediction in fivefold cross-validation and an independent test (Pearson correlation coefficients, r, between predicted and actual ASA values are 0.66 and 0.63, respectively). Application of the method to 6178 mRNAs revealed its positive correlation to mRNA accessibility by dimethyl sulphate (DMS) experimentally measured in vivo (r = 0.37) but not in vitro (r = 0.07), despite the lack of training on mRNAs and the fact that DMS accessibility is only an approximation to solvent accessibility. We further found strong association across coding and noncoding regions between predicted solvent accessibility of the mutation site of a single nucleotide variant (SNV) and the frequency of that variant in the population for 2.2 million SNVs obtained in the 1000 Genomes Project. Moreover, mapping solvent accessibility of RNAs to the human genome indicated that introns, 5' cap of 5' and 3' cap of 3' untranslated regions, are more solvent accessible, consistent with their respective functional roles. These results support conformational selections as the mechanism for the formation of RNA-protein complexes and highlight the utility of genome-scale characterization of RNA tertiary structures by RNAsnap. The server and its stand-alone downloadable version are available at http://sparks-lab.org.


Assuntos
RNA/química , RNA/genética , Solventes/química , Biologia Computacional/métodos , Genoma Humano , Humanos , Aprendizado de Máquina , Modelos Moleculares , Conformação Molecular
13.
Int J Immunogenet ; 46(2): 49-58, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30659741

RESUMO

Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.


Assuntos
Asma/genética , Frequência do Gene/genética , Genética Populacional , Genoma Humano , Projeto HapMap , Hepatopatias/genética , Polimorfismo de Nucleotídeo Único/genética , Estônia , Humanos
14.
Proc Natl Acad Sci U S A ; 113(16): E2326-34, 2016 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-27001843

RESUMO

Endogenous retroviruses (ERVs) have contributed to more than 8% of the human genome. The majority of these elements lack function due to accumulated mutations or internal recombination resulting in a solitary (solo) LTR, although members of one group of human ERVs (HERVs), HERV-K, were recently active with members that remain nearly intact, a subset of which is present as insertionally polymorphic loci that include approximately full-length (2-LTR) and solo-LTR alleles in addition to the unoccupied site. Several 2-LTR insertions have intact reading frames in some or all genes that are expressed as functional proteins. These properties reflect the activity of HERV-K and suggest the existence of additional unique loci within humans. We sought to determine the extent to which other polymorphic insertions are present in humans, using sequenced genomes from the 1000 Genomes Project and a subset of the Human Genome Diversity Project panel. We report analysis of a total of 36 nonreference polymorphic HERV-K proviruses, including 19 newly reported loci, with insertion frequencies ranging from <0.0005 to >0.75 that varied by population. Targeted screening of individual loci identified three new unfixed 2-LTR proviruses within our set, including an intact provirus present at Xq21.33 in some individuals, with the potential for retained infectivity.


Assuntos
Alelos , Retrovirus Endógenos/genética , Loci Gênicos , Mutagênese Insercional , Polimorfismo Genético , Sequências Repetidas Terminais , Feminino , Humanos , Masculino
15.
BMC Bioinformatics ; 18(1): 535, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-29191167

RESUMO

BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. RESULTS: Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. CONCLUSIONS: We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided.


Assuntos
Bases de Dados Genéticas , Estudos de Associação Genética , Alelos , Anormalidades Congênitas/genética , Anormalidades Congênitas/patologia , Frequência do Gene , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
16.
Hum Mutat ; 38(8): 1025-1032, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28493391

RESUMO

Recently, the Haplotype Reference Consortium (HRC) released a large imputation panel that allows more accurate imputation of genetic variants. In this study, we compared a set of directly assayed common and rare variants from an exome array to imputed genotypes, that is, 1000 genomes project (1000GP) and HRC. We showed that imputation using the HRC panel improved the concordance between assayed and imputed genotypes at common, and especially, low-frequency variants. Furthermore, we performed a genome-wide association meta-analysis of vertical cup-disc ratio, a highly heritable endophenotype of glaucoma, in four cohorts using 1000GP and HRC imputations. We compared the results of the meta-analysis using 1000GP to the meta-analysis results using HRC. Overall, we found that using HRC imputation significantly improved P values (P = 3.07 × 10-61 ), particularly for suggestive variants. Both meta-analyses were performed in the same sample size, yet we found eight genome-wide significant loci in the HRC-based meta-analysis versus seven genome-wide significant loci in the 1000GP-based meta-analysis. This study provides supporting evidence of the new avenues for gene discovery and fine mapping that the HRC imputation panel offers.


Assuntos
Exoma/genética , Haplótipos/genética , Frequência do Gene/genética , Variação Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética
17.
BMC Nephrol ; 18(1): 267, 2017 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-28800731

RESUMO

BACKGROUND: The APOL1 gene variants has been shown to be associated with an increased risk of multiple kinds of diseases, particularly in African Americans, but not in Caucasians and Asians. In this study, we explored the single nucleotide polymorphism (SNP) and haplotype diversity of APOL1 gene in different races provided by 1000 Genomes project. METHODS: Variants of APOL1 gene in 1000 Genome Project were obtained and SNPs located in the regulatory region or coding region were selected for genetic variation analysis. Total 2504 individuals from 26 populations were classified as four groups that included Africa, Europe, Asia and Admixed populations. Tag SNPs were selected to evaluate the haplotype diversities in the four populations by HaploStats software. RESULTS: APOL1 gene was surrounded by some of the most polymorphic genes in the human genome, variation of APOL1 gene was common, with up to 613 SNP (1000 Genome Project reported) and 99 of them (16.2%) with MAF ≥ 1%. There were 79 SNPs in the URR and 92 SNPs in 3'UTR. Total 12 SNPs in URR and 24 SNPs in 3'UTR were considered as common variants with MAF ≥ 1%. It is worth noting that URR-1 was presents lower frequencies in European populations, while other three haplotypes taken an opposite pattern; 3'UTR presents several high-frequency variation sites in a short segment, and the differences of its haplotypes among different population were significant (P < 0.01), UTR-1 and UTR-5 presented much higher frequency in African population, while UTR-2, UTR-3 and UTR-4 were much lower. APOL1 coding region showed that two SNP of G1 with higher frequency are actually pull down the haplotype H-1 frequency when considering all populations pooled together, and the diversity among the four populations be widen by the G1 two mutation (P 1 = 3.33E-4 vs P 2 = 3.61E-30). CONCLUSIONS: The distributions of APOL1 gene variants and haplotypes were significantly different among the different populations, in either regulatory or coding regions. It could provide clues for the future genetic study of APOL1 related diseases.


Assuntos
Apolipoproteína L1/genética , Variação Genética/genética , Genoma Humano/genética , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Negro ou Afro-Americano/genética , Povo Asiático/genética , Humanos , População Branca/genética
18.
Biopolymers ; 106(5): 633-44, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27160989

RESUMO

Defensins confer host defense against microorganisms and are important for human health. Single nucleotide polymorphisms (SNPs) in defensin gene-coding regions could lead to less active variants. Using SNP data available at the dbSNP database and frequency information from the 1000 Genomes Project, two DEFA5 (L26I and R13H) and eight DEFB1 (C35S, K31T, K33R, R29G, V06I, C12Y, Y28* and C05*) missense and nonsense SNPs that are located within mature regions of the coded defensins were retrieved. Such SNPs are rare and population restricted. In order to assess their antibacterial activity against Escherichia coli, two linear regression models were used from a previous work, which models the antibacterial activity as a function of solvation potential energy, using molecular dynamics data. Regarding only the antibacterial predictions, for HD5, no biological differences between wild-type and its variants were observed; while for HBD1, the results suggest that the R29G, K31T, Y28* and C05* variants could be less active than the wild-type one. The data here reported could lead to a substantial improvement in knowledge about the impact of missense SNPs in human defensins and their world distribution. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 633-644, 2016.


Assuntos
Antibacterianos , Escherichia coli/efeitos dos fármacos , Simulação de Dinâmica Molecular , Polimorfismo de Nucleotídeo Único , alfa-Defensinas , beta-Defensinas , Antibacterianos/química , Antibacterianos/farmacologia , Humanos , alfa-Defensinas/química , alfa-Defensinas/genética , alfa-Defensinas/farmacologia , beta-Defensinas/química , beta-Defensinas/genética , beta-Defensinas/farmacologia
19.
Genet Epidemiol ; 37(8): 787-801, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24123217

RESUMO

Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.


Assuntos
Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Componente Principal , África/etnologia , População Negra/genética , Análise por Conglomerados , Europa (Continente)/etnologia , Frequência do Gene , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
20.
Genes (Basel) ; 15(4)2024 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-38674455

RESUMO

The nomenclature of star alleles has been widely used in pharmacogenomics to enhance treatment outcomes, predict drug response variability, and reduce adverse reactions. However, the discovery of numerous rare functional variants through genome sequencing introduces complexities into the star-allele system. This study aimed to assess the nature and impact of the rapid discovery of numerous rare functional variants in the traditional haplotype-based star-allele system. We developed a new method to construct haplogroups, representing a common ancestry structure, by iteratively excluding rare and functional variants of the 25 representative pharmacogenes using the 2504 genomes from the 1000 Genomes Project. In total, 192 haplogroups and 288 star alleles were identified, with an average of 7.68 ± 4.2 cross-ethnic haplogroups per gene. Most of the haplogroups (70.8%, 136/192) were highly aligned with their corresponding classical star alleles (VI = 1.86 ± 0.78), exhibiting higher genetic diversity than the star alleles. Approximately 41.3% (N = 119) of the star alleles in the 2504 genomes did not belong to any of the haplogroups, and most of them (91.3%, 105/116) were determined by a single variant according to the allele-definition table provided by CPIC. These functional single variants had low allele frequency (MAF < 1%), high evolutionary conservation, and variant deleteriousness, which suggests significant negative selection. It is suggested that the traditional haplotype-based naming system for pharmacogenetic star alleles now needs to be adjusted by balancing both traditional haplotyping and newly emerging variant-sequencing approaches to reduce naming complexity.


Assuntos
Alelos , Haplótipos , Terminologia como Assunto , Humanos , Farmacogenética/métodos , Frequência do Gene , Variação Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA