Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 394
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 152(4): 703-13, 2013 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-23415221

RESUMO

Although several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes (1000G) Project and the composite of multiple signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes 35 high-scoring nonsynonymous variants, 59 variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate nonsynonymous variant in Toll-like receptor 5 (TLR5) and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PAPERFLICK:


Assuntos
Técnicas Genéticas , Genoma Humano , Estudo de Associação Genômica Ampla , Mutação , Animais , Bactérias/metabolismo , Flagelina/metabolismo , Projeto HapMap , Humanos , NF-kappa B/metabolismo , Locos de Características Quantitativas , Elementos Reguladores de Transcrição , Transdução de Sinais , Receptor 5 Toll-Like/genética , Receptor 5 Toll-Like/metabolismo
2.
PLoS Genet ; 17(2): e1009303, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33539374

RESUMO

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.


Assuntos
Simulação por Computador , Genoma Humano , Aprendizado de Máquina , População/genética , Algoritmos , Alelos , Cromossomos Humanos Par 15/genética , Bases de Dados Factuais , Bases de Dados Genéticas , Aprendizado Profundo , Projeto HapMap , Humanos , Cadeias de Markov , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único
3.
PLoS Genet ; 17(1): e1009210, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33428619

RESUMO

Modern day Saudi Arabia occupies the majority of historical Arabia, which may have contributed to ancient waves of migration out of Africa. This ancient history has left a lasting imprint in the genetics of the region, including the diverse set of tribes that call Saudi Arabia their home. How these tribes relate to each other and to the world's major populations remains an unanswered question. In an attempt to improve our understanding of the population structure of Saudi Arabia, we conducted genomic profiling of 957 unrelated individuals who self-identify with 28 large tribes in Saudi Arabia. Consistent with the tradition of intra-tribal unions, the subjects showed strong clustering along tribal lines with the distance between clusters correlating with their geographical proximities in Arabia. However, these individuals form a unique cluster when compared to the world's major populations. The ancient origin of these tribal affiliations is supported by analyses that revealed little evidence of ancestral origin from within the 28 tribes. Our results disclose a granular map of population structure and have important implications for future genetic studies into Mendelian and common diseases in the region.


Assuntos
Árabes/genética , Genoma Humano/genética , Grupos Populacionais/genética , África/epidemiologia , Arábia/epidemiologia , Árabes/história , Ásia/epidemiologia , Europa (Continente)/epidemiologia , Feminino , Projeto HapMap , Haplótipos/genética , História Antiga , Humanos , Endogamia , Masculino , Grupos Populacionais/história , Análise de Componente Principal , Arábia Saudita/epidemiologia
4.
Bioinformatics ; 38(2): 318-324, 2022 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-34601584

RESUMO

MOTIVATION: Tea is a cross-pollinated woody perennial plant, which is why, application of conventional breeding is limited for its genetic improvement. However, lack of the genome-wide high-density SNP markers and genome-wide haplotype information has greatly hampered the utilization of tea genetic resources toward fast-track tea breeding programs. To address this challenge, we have generated a first-generation haplotype map of tea (Tea HapMap-1). Out-crossing and highly heterozygous nature of tea plants, make them more complicated for DNA-level variant discovery. RESULTS: In this study, whole genome re-sequencing data of 369 tea genotypes were used to generate 2,334,564 biallelic SNPs and 1,447,985 InDels. Around 2928.04 million paired-end reads were used with an average mapping depth of ∼0.31× per accession. Identified polymorphic sites in this study will be useful in mapping the genomic regions responsible for important traits of tea. These resources lay the foundation for future research to understand the genetic diversity within tea germplasm and utilize genes that determine tea quality. This will further facilitate the understanding of tea genome evolution and tea metabolite pathways thus, offers an effective germplasm utilization for breeding the tea varieties. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Camellia sinensis , Camellia sinensis/genética , Haplótipos , Projeto HapMap , Melhoramento Vegetal , Chá , Genoma de Planta
5.
Hum Genet ; 139(8): 1107-1117, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32270270

RESUMO

Extensive studies have been conducted on the analysis of genome function, especially on the expression quantitative trait loci (eQTL). These studies offered promising results for characterization of the functional sequencing variation and understanding of the basic processes of gene regulation. Parent of origin effect (POE) is an important epigenetic phenomenon describing that the expression of certain genes depends on their allelic parent-of-origin and it is known to play important roles in human complex diseases. However, traditional eQTL mapping approaches do not allow for the detection of imprinting, or they focus on modeling the additive genetic effect thereby ignoring the estimation of the dominance genetic effect. In this study, we proposed a statistical framework to test the additive and dominance genetic effects of the candidate eQTLs along with detection of the POE with a functional model and an orthogonal model for RNA-seq data. We demonstrated the desirable power and preserved Type I errors of the methods in most scenarios, especially the orthogonal model with un-biased estimation of the genetic effects and over-dispersion of the RNA-seq data. The application to a HapMap project trio dataset validated existing imprinting genes and discovered two novel imprinting genes with potential dominance genetic effect and RB1 and IGF1R genes. This study provides new insights into the next generation statistical modeling of eQTL mapping for better understanding of the genetic architecture underlying the mechanisms of gene expression regulation.


Assuntos
Regulação da Expressão Gênica/genética , Impressão Genômica/genética , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Alelos , Criança , Simulação por Computador , Família , Feminino , Genes Dominantes/genética , Genótipo , Projeto HapMap , Humanos , Masculino , Pais , RNA-Seq
6.
Am J Hum Genet ; 100(4): 581-591, 2017 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-28285767

RESUMO

Efforts to decipher the causal relationships between differences in gene regulation and corresponding differences in phenotype have been stymied by several basic technical challenges. Although detecting local, cis-eQTLs is now routine, trans-eQTLs, which are distant from the genes of origin, are far more difficult to find because millions of SNPs must currently be compared to thousands of transcripts. Here, we demonstrate an alternative approach: we looked for SNPs associated with the expression of many genes simultaneously and found that hundreds of trans-eQTLs each affect hundreds of transcripts in lymphoblastoid cell lines across three African populations. These trans-eQTLs target the same genes across the three populations and show the same direction of effect. We discovered that target transcripts of a high-confidence set of trans-eQTLs encode proteins that interact more frequently than expected by chance, are bound by the same transcription factors, and are enriched for pathway annotations indicative of roles in basic cell homeostasis. We thus demonstrate that our approach can uncover trans-acting transcriptional control circuits that affect co-regulated groups of genes: a key to understanding how cellular pathways and processes are orchestrated.


Assuntos
Regulação da Expressão Gênica , Locos de Características Quantitativas , Transcrição Gênica , Algoritmos , População Negra/genética , Linhagem Celular , Perfilação da Expressão Gênica , Projeto HapMap , Humanos , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas
7.
Am J Hum Genet ; 100(2): 228-237, 2017 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-28065468

RESUMO

We analyzed the mRNA levels for 36,778 transcript expression traits (probes) from 2,765 individuals to comprehensively investigate the genetic architecture and degree of missing heritability for gene expression in peripheral blood. We identified 11,204 cis and 3,791 trans independent expression quantitative trait loci (eQTL) by using linear mixed models to perform genome-wide association analyses. Furthermore, using information on both closely and distantly related individuals, heritability was estimated for all expression traits. Of the set of expressed probes (15,966), 10,580 (66%) had an estimated narrow-sense heritability (h2) greater than zero with a mean (median) value of 0.192 (0.142). Across these probes, on average the proportion of genetic variance explained by all eQTL (hCOJO2) was 31% (0.060/0.192), meaning that 69% is missing, with the sentinel SNP of the largest eQTL explaining 87% (0.052/0.060) of the variance attributed to all identified cis- and trans-eQTL. For the same set of probes, the genetic variance attributed to genome-wide common (MAF > 0.01) HapMap 3 SNPs (hg2) accounted for on average 48% (0.093/0.192) of h2. Taken together, the evidence suggests that approximately half the genetic variance for gene expression is not tagged by common SNPs, and of the variance that is tagged by common SNPs, a large proportion can be attributed to identifiable eQTL of large effect, typically in cis. Finally, we present evidence that, compared with a meta-analysis, using individual-level data results in an increase of approximately 50% in power to detect eQTL.


Assuntos
Expressão Gênica , Padrões de Herança , Locos de Características Quantitativas , RNA Mensageiro/sangue , Estudos de Associação Genética , Genoma Humano , Genótipo , Projeto HapMap , Humanos , Modelos Lineares , Desequilíbrio de Ligação , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , RNA Mensageiro/genética
8.
Brief Bioinform ; 19(1): 89-100, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-27760738

RESUMO

At present, understanding of DNA methylation at the population level is still limited. Here, we first extended the classical framework of population genetics, such as single nucleotide polymorphism allele frequency, linkage disequilibrium (LD), LD block and haplotype, to epigenetics. Then, as an example, we compared the DNA methylation disequilibrium (MD) maps between HapMap CEU (Caucasian residents of European ancestry from Utah) population and YRI (Yoruba people from Ibadan) population (lymphoblastoid cell lines). We analyzed the differences and similarities between CEU and YRI from the following aspects: SMP (single methylation polymorphism) allele frequency, SMP allele association, MD, MD block and methylation haplotype (meplotype) frequency. The results showed that CEU and YRI had similar distribution of SMP allele frequency, and shared many MD block region. We believe that the framework of population genetics can be used in the population epigenetics. The population epigenetic framework also has potential prospects in the study of complex diseases, such as epigenome-wide association study.


Assuntos
População Negra/genética , Epigênese Genética , Genética Populacional/métodos , Polimorfismo de Nucleotídeo Único , População Branca/genética , Alelos , Células Cultivadas , Metilação de DNA , Projeto HapMap , Haplótipos , Humanos , Desequilíbrio de Ligação , Linfócitos/citologia , Linfócitos/metabolismo
9.
Int J Legal Med ; 134(1): 123-134, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31760471

RESUMO

Ancestry-informative markers (AIMs) can be used to infer the ancestry of an individual to minimize the inaccuracy of self-reported ethnicity in biomedical research. In this study, we describe three methods for selecting AIM SNPs for the Malay population (Malay AIM panel) using different approaches based on pairwise FST, informativeness for assignment (In), and PCA-correlated SNPs (PCAIMs). These Malay AIM panels were extracted from genotype data stored in SNP arrays hosted by the Malaysian node of the Human Variome Project (MyHVP) and the Singapore Genome Variation Project (SGVP). In particular, genotype data from a total of 165 Malay individuals were analyzed, comprising data on 117 individual genotypes from the Affymetrix SNP-6 SNP array platform and data on 48 individual genotypes from the OMNI 2.5 Illumina SNP array platform. The HapMap phase 3 database (1397 individuals from 11 populations) was used as a reference for comparison with the Malay genotype data. The accuracy of each resulting Malay AIM panel was evaluated using a machine learning "ancestry-predictive model" constructed by using WEKA, a comprehensive machine learning platform written in Java. A total of 1250 SNPs were finally selected, which successfully identified Malay individuals from other world populations with an accuracy of 90%, but the accuracy decreased to 80% using 157 SNPs according to the pairwise FST method, while a panel of 200 SNPs selected using In and PCAIMs could be used to identify Malay individuals with an accuracy of approximately 80%.


Assuntos
Bases de Dados Genéticas , Etnicidade/genética , Genética Populacional/métodos , Genótipo , Polimorfismo de Nucleotídeo Único , Povo Asiático/genética , Marcadores Genéticos , Projeto HapMap , Humanos , Malásia/etnologia , Modelos Estatísticos , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética , Análise de Componente Principal , Singapura
10.
PLoS Genet ; 13(6): e1006328, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28640878

RESUMO

Traditional genome-wide scans for positive selection have mainly uncovered selective sweeps associated with monogenic traits. While selection on quantitative traits is much more common, very few signals have been detected because of their polygenic nature. We searched for positive selection signals underlying coronary artery disease (CAD) in worldwide populations, using novel approaches to quantify relationships between polygenic selection signals and CAD genetic risk. We identified new candidate adaptive loci that appear to have been directly modified by disease pressures given their significant associations with CAD genetic risk. These candidates were all uniquely and consistently associated with many different male and female reproductive traits suggesting selection may have also targeted these because of their direct effects on fitness. We found that CAD loci are significantly enriched for lifetime reproductive success relative to the rest of the human genome, with evidence that the relationship between CAD and lifetime reproductive success is antagonistic. This supports the presence of antagonistic-pleiotropic tradeoffs on CAD loci and provides a novel explanation for the maintenance and high prevalence of CAD in modern humans. Lastly, we found that positive selection more often targeted CAD gene regulatory variants using HapMap3 lymphoblastoid cell lines, which further highlights the unique biological significance of candidate adaptive loci underlying CAD. Our study provides a novel approach for detecting selection on polygenic traits and evidence that modern human genomes have evolved in response to CAD-induced selection pressures and other early-life traits sharing pleiotropic links with CAD.


Assuntos
Doença da Artéria Coronariana/genética , Loci Gênicos , Pleiotropia Genética , Seleção Genética , Aptidão Genética , Projeto HapMap , Humanos , Polimorfismo de Nucleotídeo Único
11.
Genome Res ; 26(11): 1565-1574, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27646535

RESUMO

Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid genomes along the entire length of all chromosomes. We demonstrate this by building a complete haplotype for a HapMap individual (NA12878) at high accuracy (concordance 99.3%), without using generational information or statistical inference. By use of this approach, we mapped all meiotic recombination events in a family trio with high resolution (median range ∼14 kb) and phased larger structural variants like deletions, indels, and balanced rearrangements like inversions. Lastly, the single-cell resolution of Strand-seq allowed us to observe loss of heterozygosity regions in a small number of cells, a significant advantage for studies of heterogeneous cell populations, such as cancer cells. We conclude that Strand-seq is a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells.


Assuntos
Mapeamento Cromossômico/métodos , Cromossomos Humanos/genética , Haplótipos , Análise de Célula Única/métodos , Linhagem Celular , Projeto HapMap , Recombinação Homóloga , Humanos , Linfócitos/citologia , Linfócitos/metabolismo , Mutação
12.
Nature ; 499(7456): 79-82, 2013 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-23676674

RESUMO

Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica/genética , Fenótipo , Biossíntese de Proteínas , Proteoma/análise , Proteoma/genética , Linhagem Celular , Etnicidade/genética , Feminino , Variação Genética , Genótipo , Projeto HapMap , Humanos , Masculino , Espectrometria de Massas , Proteoma/biossíntese , Proteômica , Locos de Características Quantitativas , RNA Mensageiro/análise , RNA Mensageiro/genética , Transcriptoma
13.
Int J Immunogenet ; 46(2): 49-58, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30659741

RESUMO

Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.


Assuntos
Asma/genética , Frequência do Gene/genética , Genética Populacional , Genoma Humano , Projeto HapMap , Hepatopatias/genética , Polimorfismo de Nucleotídeo Único/genética , Estônia , Humanos
14.
J Autoimmun ; 94: 83-89, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30143393

RESUMO

Genome-wide association studies (GWAS) have identified a large number of genetic risk loci for autoimmune diseases. However, the functional variants underlying these disease associations remain largely unknown. There is evidence that microRNA-mediated regulation may play an important role in this context. Therefore, we assessed whether autoimmune disease loci unfold their effects via altering microRNA expression in relevant immune cells. To this end, we performed comprehensive data integration of many large and publicly available datasets to combine information on autoimmune disease risk loci with RNA-Seq-based microRNA expression data. Specifically, we carried out microRNA expression quantitative trait loci (eQTL) analyses across 115 GWAS regions associated with 12 autoimmune diseases using next-generation sequencing data of 345 lymphoblastoid cell lines. Statistical analyses included the application and extension of a recently proposed framework (joint likelihood mapping) to microRNA expression data and microRNA target gene enrichment analyses of relevant GWAS data. Overall, only a minority of autoimmune disease risk loci may exert their pathophysiologic effects by altering microRNA expression based on JLIM. However, detailed functional fine-mapping revealed two independent GWAS regions harboring autoimmune disease risk SNPs with significant effects on microRNA expression. These relate to SNPs associated with Crohn's disease (CD; rs102275) and rheumatoid arthritis (RA; rs968567), which affect the expression of miR-1908-5p (prs102275 = 1.44e-20, prs968567 = 2.54e-14). In addition, an independent CD risk SNP, rs3853824, was found to alter the expression of miR-3614-5p (p = 5.70e-7). To support these findings, we demonstrate that GWAS signals for RA and CD were enriched in genes predicted to be targeted by both microRNAs (all with p < 0.05). In summary, our study points towards a potential pathophysiological role of miR-1908-5p and miR-3614-5p in autoimmunity.


Assuntos
Artrite Reumatoide/genética , Doença de Crohn/genética , Linfócitos/imunologia , MicroRNAs/genética , Artrite Reumatoide/diagnóstico , Artrite Reumatoide/imunologia , Artrite Reumatoide/patologia , Linhagem Celular , Biologia Computacional/métodos , Doença de Crohn/diagnóstico , Doença de Crohn/imunologia , Doença de Crohn/patologia , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Projeto HapMap , Humanos , Linfócitos/patologia , MicroRNAs/imunologia , Locos de Características Quantitativas , Risco
15.
J Hum Genet ; 63(4): 533-536, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29410509

RESUMO

Discoveries from the human genome, HapMap, and 1000 genome projects have collectively contributed toward the creation of a catalog of human genetic variations that has improved our understanding of human diversity. Despite the collegial nature of many of these genome study consortiums, which has led to the cataloging of genetic variations of different ethnic groups from around the world, genome data on the Arab population remains overwhelmingly underrepresented. The National Arab Genome project in the United Arab Emirates (UAE) aims to address this deficiency by using Next Generation Sequencing (NGS) technology to provide data to improve our understanding of the Arab genome and catalog variants that are unique to the Arab population of the UAE. The project was conceived to shed light on the similarities and differences between the Arab genome and those of the other ethnic groups.


Assuntos
Árabes/genética , Genética Populacional , Genoma Humano , Genômica , Genômica/métodos , Projeto HapMap , Humanos , Emirados Árabes Unidos
16.
BMC Bioinformatics ; 18(1): 258, 2017 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-28499414

RESUMO

BACKGROUND: Several recent studies showed that next-generation sequencing (NGS)-based human leukocyte antigen (HLA) typing is a feasible and promising technique for variant calling of highly polymorphic regions. To date, however, no method with sufficient read depth has completely solved the allele phasing issue. In this study, we developed a new method (HLAscan) for HLA genotyping using NGS data. RESULTS: HLAscan performs alignment of reads to HLA sequences from the international ImMunoGeneTics project/human leukocyte antigen (IMGT/HLA) database. The distribution of aligned reads was used to calculate a score function to determine correctly phased alleles by progressively removing false-positive alleles. Comparative HLA typing tests using public datasets from the 1000 Genomes Project and the International HapMap Project demonstrated that HLAscan could perform HLA typing more accurately than previously reported NGS-based methods such as HLAreporter and PHLAT. In addition, the results of HLA-A, -B, and -DRB1 typing by HLAscan using data generated by NextGen were identical to those obtained using a Sanger sequencing-based method. We also applied HLAscan to a family dataset with various coverage depths generated on the Illumina HiSeq X-TEN platform. HLAscan identified allele types of HLA-A, -B, -C, -DQB1, and -DRB1 with 100% accuracy for sequences at ≥ 90× depth, and the overall accuracy was 96.9%. CONCLUSIONS: HLAscan, an alignment-based program that takes read distribution into account to determine true allele types, outperformed previously developed HLA typing tools. Therefore, HLAscan can be reliably applied for determination of HLA type across the whole-genome, exome, and target sequences.


Assuntos
Antígenos HLA/genética , Teste de Histocompatibilidade/métodos , Alelos , Área Sob a Curva , Éxons , Genótipo , Antígenos HLA/química , Antígenos HLA/metabolismo , Antígenos HLA-A/química , Antígenos HLA-A/genética , Antígenos HLA-A/metabolismo , Antígenos HLA-B/química , Antígenos HLA-B/metabolismo , Cadeias HLA-DRB1/química , Cadeias HLA-DRB1/genética , Cadeias HLA-DRB1/metabolismo , Projeto HapMap , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Curva ROC , Análise de Sequência de DNA
17.
Genome Res ; 24(4): 664-72, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24614977

RESUMO

The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual's genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy.


Assuntos
Privacidade Genética , Pesquisa em Genética , Genoma Humano , Família , Genômica , Projeto HapMap , Projeto Genoma Humano , Humanos
18.
Heredity (Edinb) ; 118(5): 503-510, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28198814

RESUMO

To infer the histories of population admixture, one important challenge with methods based on the admixture linkage disequilibrium (ALD) is to remove the effect of source LD (SLD), which is directly inherited from source populations. In previous methods, only the decay curve of weighted LD between pairs of sites whose genetic distance were larger than a certain starting distance was fitted by single or multiple exponential functions, for the inference of recent single- or multiple-wave admixture. However, the effect of SLD has not been well defined and no tool has been developed to estimate the effect of SLD on weighted LD decay. In this study, we defined the SLD in the formularized weighted LD statistic under the two-way admixture model and proposed a polynomial spectrum (p-spectrum) to study the weighted SLD and weighted LD. We also found that reference populations could be used to reduce the SLD in weighted LD statistics. We further developed a method, iMAAPs, to infer multiple-wave admixture by fitting ALD using a p-spectrum. We evaluated the performance of iMAAPs under various admixture models in simulated data and applied iMAAPs to the analysis of genome-wide single nucleotide polymorphism data from the Human Genome Diversity Project and the HapMap Project. We showed that iMAAPs is a considerable improvement over other current methods and further facilitates the inference of histories of complex population admixtures.


Assuntos
Genética Populacional/métodos , Desequilíbrio de Ligação , Modelos Genéticos , Algoritmos , Simulação por Computador , Genoma Humano , Projeto HapMap , Humanos
19.
PLoS Comput Biol ; 12(1): e1004714, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26808494

RESUMO

Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.


Assuntos
Algoritmos , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Projeto HapMap , Humanos , Fenótipo , Software
20.
Indian J Med Res ; 145(6): 753-757, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29067977

RESUMO

BACKGROUND & OBJECTIVES: Indian data have been largely missing from genome-wide databases that provide information on genetic variations in different populations. This hinders association studies for complex disorders in India. This study was aimed to determine whether the complex genetic structure and endogamy among Indians could potentially influence the design of case-control studies for autoimmune disorders in the south Indian population. METHODS: A total of 12 single nucleotide variations (SNVs) related to genes associated with autoimmune disorders were genotyped in 370 healthy individuals belonging to six different caste groups in southern India. Allele frequencies were estimated; genetic divergence and phylogenetic relationship within the various caste groups and other HapMap populations were ascertained. RESULTS: Allele frequencies for all genotyped SNVs did not vary significantly among the different groups studied. Wright's FSTwas 0.001 per cent among study population and 0.38 per cent when compared with Gujarati in Houston (GIH) population on HapMap data. The analysis of molecular variance results showed a 97 per cent variation attributable to differences within the study population and <1 per cent variation due to differences between castes. Phylogenetic analysis showed a separation of Dravidian population from other HapMap populations and particularly from GIH population. INTERPRETATION & CONCLUSIONS: Despite the complex genetic origins of the Indian population, our study indicated a low level of genetic differentiation among Dravidian language-speaking people of south India. Case-control studies of association among Dravidians of south India may not require stratification based on language and caste.


Assuntos
Doenças Autoimunes/genética , Estudos de Associação Genética , Genética Populacional , Filogenia , Adulto , Doenças Autoimunes/epidemiologia , Doenças Autoimunes/patologia , DNA Mitocondrial/genética , Etnicidade/genética , Feminino , Frequência do Gene , Variação Genética/genética , Genótipo , Projeto HapMap , Haplótipos , Humanos , Índia/epidemiologia , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA