Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 392
Filtrar
1.
Sci Rep ; 14(1): 4654, 2024 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-38409353

RESUMO

Admixture mapping has been useful in identifying genetic variations linked to phenotypes, adaptation and diseases. Copy number variations (CNVs) represents genomic structural variants spanning large regions of chromosomes reaching several megabases. In this investigation, the "Canary" algorithm was applied to 102 Tunisian samples and 991 individuals from eleven HapMap III populations to genotype 1279 copy number polymorphisms (CNPs). In this present work, we investigate the Tunisian population structure using the CNP makers previously identified among Tunisian. The study revealed that Sub-Saharan African populations exhibited the highest diversity with the highest proportions of allelic CNPs. Among all the African populations, Tunisia showed the least diversity. Individual ancestry proportions computed using STRUCTURE analysis revealed a major European component among Tunisians with lesser contribution from Sub-Saharan Africa and Asia. Population structure analysis indicated the genetic proximity with Europeans and noticeable distance from the Sub-Saharan African and East Asian clusters. Seven genes harbouring Tunisian high-frequent CNPs were identified known to be associated with 9 Mendelian diseases and/or phenotypes. Functional annotation of genes under selection highlighted a noteworthy enrichment of biological processes to receptor pathway and activity as well as glutathione metabolism. Additionally, pathways of potential concern for health such as drug metabolism, infectious diseases and cancers exhibited significant enrichment. The distinctive genetic makeup of the Tunisians might have been influenced by various factors including natural selection and genetic drift, resulting in the development of distinct genetic variations playing roles in specific biological processes. Our research provides a justification for focusing on the exclusive genome organization of this population and uncovers previously overlooked elements of the genome.


Assuntos
Variações do Número de Cópias de DNA , Genoma , População do Norte da África , Humanos , Projeto HapMap , Genótipo , Genética Populacional , Polimorfismo de Nucleotídeo Único
2.
Sci Data ; 10(1): 198, 2023 04 10.
Artigo em Inglês | MEDLINE | ID: mdl-37037860

RESUMO

Honey bee, Apis mellifera, drones are typically haploid, developing from an unfertilized egg, inheriting only their queen's alleles and none from the many drones she mated with. Thus the ordered combination or 'phase' of alleles is known, making drones a valuable haplotype resource. We collated whole-genome sequence data for 1,407 drones, including 45 newly sequenced Scottish drones, collectively representing 19 countries, 8 subspecies and various hybrids. Following alignment to Amel_HAv3.1, variant calling and quality filtering, we retained 17.4 M high quality variants across 1,328 samples with a genotyping rate of 98.7%. We demonstrate the utility of this haplotype resource, AmelHap, for genotype imputation, returning >95% concordance when up to 61% of data is missing in haploids and up to 12% of data is missing in diploids. AmelHap will serve as a useful resource for the community for imputation from low-depth sequencing or SNP chip data, accurate phasing of diploids for association studies, and as a comprehensive reference panel for population genetic and evolutionary analyses.


Assuntos
Abelhas , Genoma de Inseto , Animais , Feminino , Sequência de Bases , Abelhas/genética , Evolução Biológica , Genótipo , Projeto HapMap
3.
Genetics ; 222(4)2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36171678

RESUMO

Whole-exome sequencing (WES) enables the detection of copy number variants (CNVs) with high resolution in protein-coding regions. However, variants in the intergenic or intragenic regions are excluded from studies. Fortunately, many of these samples have been previously sequenced by other genotyping platforms which are sparse but cover a wide range of genomic regions, such as SNP array. Moreover, conventional single sample-based methods suffer from a high false discovery rate due to prominent data noise. Therefore, methods for integrating multiple genotyping platforms and multiple samples are highly demanded for improved copy number variant detection. We developed BMI-CNV, a Bayesian Multisample and Integrative CNV (BMI-CNV) profiling method with data sequenced by both whole-exome sequencing and microarray. For the multisample integration, we identify the shared copy number variants regions across samples using a Bayesian probit stick-breaking process model coupled with a Gaussian Mixture model estimation. With extensive simulations, BMI-copy number variant outperformed existing methods with improved accuracy. In the matched data from the 1000 Genomes Project and HapMap project data, BMI-CNV also accurately detected common variants and significantly enlarged the detection spectrum of whole-exome sequencing. Further application to the data from The Research of International Cancer of Lung consortium (TRICL) identified lung cancer risk variant candidates in 17q11.2, 1p36.12, 8q23.1, and 5q22.2 regions.


Assuntos
Variações do Número de Cópias de DNA , Genótipo , Teorema de Bayes , Índice de Massa Corporal , Projeto HapMap
4.
Int J Neural Syst ; 32(6): 2250028, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35579974

RESUMO

Over the last decades, the exuberant development of next-generation sequencing has revolutionized gene discovery. These technologies have boosted the mapping of single nucleotide polymorphisms (SNPs) across the human genome, providing a complex universe of heterogeneity characterizing individuals worldwide. Fractal dimension (FD) measures the degree of geometric irregularity, quantifying how "complex" a self-similar natural phenomenon is. We compared two FD algorithms, box-counting dimension (BCD) and Higuchi's fractal dimension (HFD), to characterize genome-wide patterns of SNPs extracted from the HapMap data set, which includes data from 1184 healthy subjects of eleven populations. In addition, we have used cluster and classification analysis to relate the genetic distances within chromosomes based on FD similarities to the geographical distances among the 11 global populations. We found that HFD outperformed BCD at both grand average clusterization analysis by the cophenetic correlation coefficient, in which the closest value to 1 represents the most accurate clustering solution (0.981 for the HFD and 0.956 for the BCD) and classification (79.0% accuracy, 61.7% sensitivity, and 96.4% specificity for the HFD with respect to 69.1% accuracy, 43.2% sensitivity, and 94.9% specificity for the BCD) of the 11 populations present in the HapMap data set. These results support the evidence that HFD is a reliable measure helpful in representing individual variations within all chromosomes and categorizing individuals and global populations.


Assuntos
Fractais , Genoma Humano , Algoritmos , Variação Genética , Projeto HapMap , Humanos
5.
Bioinformatics ; 38(2): 318-324, 2022 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-34601584

RESUMO

MOTIVATION: Tea is a cross-pollinated woody perennial plant, which is why, application of conventional breeding is limited for its genetic improvement. However, lack of the genome-wide high-density SNP markers and genome-wide haplotype information has greatly hampered the utilization of tea genetic resources toward fast-track tea breeding programs. To address this challenge, we have generated a first-generation haplotype map of tea (Tea HapMap-1). Out-crossing and highly heterozygous nature of tea plants, make them more complicated for DNA-level variant discovery. RESULTS: In this study, whole genome re-sequencing data of 369 tea genotypes were used to generate 2,334,564 biallelic SNPs and 1,447,985 InDels. Around 2928.04 million paired-end reads were used with an average mapping depth of ∼0.31× per accession. Identified polymorphic sites in this study will be useful in mapping the genomic regions responsible for important traits of tea. These resources lay the foundation for future research to understand the genetic diversity within tea germplasm and utilize genes that determine tea quality. This will further facilitate the understanding of tea genome evolution and tea metabolite pathways thus, offers an effective germplasm utilization for breeding the tea varieties. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Camellia sinensis , Camellia sinensis/genética , Haplótipos , Projeto HapMap , Melhoramento Vegetal , Chá , Genoma de Planta
6.
Genome Biol ; 22(1): 180, 2021 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-34120636

RESUMO

BACKGROUND: Canonical nonsense-mediated decay (NMD) is an important splicing-dependent process for mRNA surveillance in mammals. However, processed pseudogenes are not able to trigger NMD due to their lack of introns. It is largely unknown whether they have evolved other surveillance mechanisms. RESULTS: Here, we find that the RNAs of pseudogenes, especially processed pseudogenes, have dramatically higher m6A levels than their cognate protein-coding genes, associated with de novo m6A peaks and motifs in human cells. Furthermore, pseudogenes have rapidly accumulated m6A motifs during evolution. The m6A sites of pseudogenes are evolutionarily younger than neutral sites and their m6A levels are increasing, supporting the idea that m6A on the RNAs of pseudogenes is under positive selection. We then find that the m6A RNA modification of processed, rather than unprocessed, pseudogenes promotes cytosolic RNA degradation and attenuates interference with the RNAs of their cognate protein-coding genes. We experimentally validate the m6A RNA modification of two processed pseudogenes, DSTNP2 and NAP1L4P1, which promotes the RNA degradation of both pseudogenes and their cognate protein-coding genes DSTN and NAP1L4. In addition, the m6A of DSTNP2 regulation of DSTN is partially dependent on the miRNA miR-362-5p. CONCLUSIONS: Our discovery reveals a novel evolutionary role of m6A RNA modification in cleaning up the unnecessary processed pseudogene transcripts to attenuate their interference with the regulatory network of protein-coding genes.


Assuntos
Adenosina/análogos & derivados , Genoma Humano , Pseudogenes , Splicing de RNA , RNA Mensageiro/genética , Seleção Genética , Adenosina/genética , Adenosina/metabolismo , Linhagem Celular , Linhagem Celular Transformada , Destrina/genética , Destrina/metabolismo , Células HEK293 , Projeto HapMap , Células-Tronco Embrionárias Humanas/citologia , Células-Tronco Embrionárias Humanas/metabolismo , Humanos , Linfócitos/citologia , Linfócitos/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Degradação do RNAm Mediada por Códon sem Sentido , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , RNA Mensageiro/metabolismo
7.
Mol Brain ; 14(1): 52, 2021 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-33712038

RESUMO

The HapMap Project is a major international research effort to construct a resource to facilitate the discovery of relationships between human genetic variations and health and disease. The Ser19Stop single nucleotide polymorphism (SNP) of human phytanoyl-CoA hydroxylase-interacting protein-like (PHYHIPL) gene was detected in HapMap project and registered in the dbSNP. PHYHIPL gene expression is altered in global ischemia and glioblastoma multiforme. However, the function of PHYHIPL is unknown. We generated PHYHIPL Ser19Stop knock-in mice and found that PHYHIPL impacts the morphology of cerebellar Purkinje cells (PCs), the innervation of climbing fibers to PCs, the inhibitory inputs to PCs from molecular layer interneurons, and motor learning ability. Thus, the Ser19Stop SNP of the PHYHIPL gene may be associated with cerebellum-related diseases.


Assuntos
Cerebelo/citologia , Peptídeos e Proteínas de Sinalização Intracelular/genética , Polimorfismo de Nucleotídeo Único , Células de Purkinje/ultraestrutura , Sequência de Aminoácidos , Animais , Sistemas CRISPR-Cas , Forma Celular , Códon de Terminação , Feminino , Técnicas de Introdução de Genes , Projeto HapMap , Humanos , Interneurônios/fisiologia , Peptídeos e Proteínas de Sinalização Intracelular/fisiologia , Aprendizagem , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Atividade Motora , Fibras Nervosas/fisiologia , Células de Purkinje/metabolismo , Teste de Desempenho do Rota-Rod , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
8.
PLoS Genet ; 17(2): e1009303, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33539374

RESUMO

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.


Assuntos
Simulação por Computador , Genoma Humano , Aprendizado de Máquina , População/genética , Algoritmos , Alelos , Cromossomos Humanos Par 15/genética , Bases de Dados Factuais , Bases de Dados Genéticas , Aprendizado Profundo , Projeto HapMap , Humanos , Cadeias de Markov , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único
9.
PLoS Genet ; 17(1): e1009210, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33428619

RESUMO

Modern day Saudi Arabia occupies the majority of historical Arabia, which may have contributed to ancient waves of migration out of Africa. This ancient history has left a lasting imprint in the genetics of the region, including the diverse set of tribes that call Saudi Arabia their home. How these tribes relate to each other and to the world's major populations remains an unanswered question. In an attempt to improve our understanding of the population structure of Saudi Arabia, we conducted genomic profiling of 957 unrelated individuals who self-identify with 28 large tribes in Saudi Arabia. Consistent with the tradition of intra-tribal unions, the subjects showed strong clustering along tribal lines with the distance between clusters correlating with their geographical proximities in Arabia. However, these individuals form a unique cluster when compared to the world's major populations. The ancient origin of these tribal affiliations is supported by analyses that revealed little evidence of ancestral origin from within the 28 tribes. Our results disclose a granular map of population structure and have important implications for future genetic studies into Mendelian and common diseases in the region.


Assuntos
Árabes/genética , Genoma Humano/genética , Grupos Populacionais/genética , África/epidemiologia , Arábia/epidemiologia , Árabes/história , Ásia/epidemiologia , Europa (Continente)/epidemiologia , Feminino , Projeto HapMap , Haplótipos/genética , História Antiga , Humanos , Endogamia , Masculino , Grupos Populacionais/história , Análise de Componente Principal , Arábia Saudita/epidemiologia
10.
Hum Genet ; 139(8): 1107-1117, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32270270

RESUMO

Extensive studies have been conducted on the analysis of genome function, especially on the expression quantitative trait loci (eQTL). These studies offered promising results for characterization of the functional sequencing variation and understanding of the basic processes of gene regulation. Parent of origin effect (POE) is an important epigenetic phenomenon describing that the expression of certain genes depends on their allelic parent-of-origin and it is known to play important roles in human complex diseases. However, traditional eQTL mapping approaches do not allow for the detection of imprinting, or they focus on modeling the additive genetic effect thereby ignoring the estimation of the dominance genetic effect. In this study, we proposed a statistical framework to test the additive and dominance genetic effects of the candidate eQTLs along with detection of the POE with a functional model and an orthogonal model for RNA-seq data. We demonstrated the desirable power and preserved Type I errors of the methods in most scenarios, especially the orthogonal model with un-biased estimation of the genetic effects and over-dispersion of the RNA-seq data. The application to a HapMap project trio dataset validated existing imprinting genes and discovered two novel imprinting genes with potential dominance genetic effect and RB1 and IGF1R genes. This study provides new insights into the next generation statistical modeling of eQTL mapping for better understanding of the genetic architecture underlying the mechanisms of gene expression regulation.


Assuntos
Regulação da Expressão Gênica/genética , Impressão Genômica/genética , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Alelos , Criança , Simulação por Computador , Família , Feminino , Genes Dominantes/genética , Genótipo , Projeto HapMap , Humanos , Masculino , Pais , RNA-Seq
11.
Int J Legal Med ; 134(1): 123-134, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31760471

RESUMO

Ancestry-informative markers (AIMs) can be used to infer the ancestry of an individual to minimize the inaccuracy of self-reported ethnicity in biomedical research. In this study, we describe three methods for selecting AIM SNPs for the Malay population (Malay AIM panel) using different approaches based on pairwise FST, informativeness for assignment (In), and PCA-correlated SNPs (PCAIMs). These Malay AIM panels were extracted from genotype data stored in SNP arrays hosted by the Malaysian node of the Human Variome Project (MyHVP) and the Singapore Genome Variation Project (SGVP). In particular, genotype data from a total of 165 Malay individuals were analyzed, comprising data on 117 individual genotypes from the Affymetrix SNP-6 SNP array platform and data on 48 individual genotypes from the OMNI 2.5 Illumina SNP array platform. The HapMap phase 3 database (1397 individuals from 11 populations) was used as a reference for comparison with the Malay genotype data. The accuracy of each resulting Malay AIM panel was evaluated using a machine learning "ancestry-predictive model" constructed by using WEKA, a comprehensive machine learning platform written in Java. A total of 1250 SNPs were finally selected, which successfully identified Malay individuals from other world populations with an accuracy of 90%, but the accuracy decreased to 80% using 157 SNPs according to the pairwise FST method, while a panel of 200 SNPs selected using In and PCAIMs could be used to identify Malay individuals with an accuracy of approximately 80%.


Assuntos
Bases de Dados Genéticas , Etnicidade/genética , Genética Populacional/métodos , Genótipo , Polimorfismo de Nucleotídeo Único , Povo Asiático/genética , Marcadores Genéticos , Projeto HapMap , Humanos , Malásia/etnologia , Modelos Estatísticos , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética , Análise de Componente Principal , Singapura
12.
Hum Immunol ; 80(11): 897-905, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31558329

RESUMO

Since their inception, the International HLA & Immunogenetics Workshops (IHIW) served as a collaborative platform for exchange of specimens, reference materials, experiences and best practices. In this report we present a subset of the results of human leukocyte antigen (HLA) haplotypes in families tested by next generation sequencing (NGS) under the 17th IHIW. We characterized 961 haplotypes in 921 subjects belonging to 250 families from 8 countries (Argentina, Austria, Egypt, Jamaica, Germany, Greece, Kuwait, and Switzerland). These samples were tested in a single core laboratory in a high throughput fashion using 6 different reagents/software platforms. Families tested included patients evaluated clinically as transplant recipients (kidney and hematopoietic cell transplant) and their respective family members. We identified 486 HLA alleles at the following loci HLA-A, -B, -C, -DRB1, -DRB3, -DRB4, -DRB5, -DQA1, -DQB1, -DPA1, -DPB1 (77, 115, 68, 69, 10, 6, 4, 44, 31, 20 and 42 alleles, respectively). We also identified nine novel alleles with polymorphisms in coding regions. This approach of testing samples from multiple laboratories across the world in different stages of technology implementation in a single core laboratory may be useful for future international workshops. Although data presented may not be reflective of allele and haplotype frequencies in the countries to which the families belong, they represent an extensive collection of 3rd and 4th field resolution level 11-locus haplotype associations of 486 alleles identified in families from 8 countries.


Assuntos
Genótipo , Antígenos HLA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional , Educação , Família , Frequência do Gene , Projeto HapMap , Haplótipos , Teste de Histocompatibilidade/métodos , Humanos , Imunogenética , Cooperação Internacional , Desequilíbrio de Ligação , Modelos Biológicos , Linhagem , Polimorfismo Genético
13.
Thorac Cancer ; 10(4): 601-606, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30807688

RESUMO

BACKGROUND: The aim of this study was to evaluate the association between CYP2A13 polymorphisms and lung cancer susceptibility using the HapMap database. METHODS: A case-control analysis of 532 subjects with lung cancer and 614 controls with no personal history of the disease was performed. The tag SNPs rs1645690 and rs8192789 for CYP2A13 were selected, and the genetic polymorphisms were confirmed experimentally through real-time PCR, cloning, and sequencing assay. RESULTS: SNP frequency in this study was consistent with the HapMap Project database of Han-Chinese and lung cancer risk was associated with CYP2A13 polymorphisms in non-smokers. CYP2A13 shares a 93.5% identity with CYP2A6 in the amino acid sequence and the homologous sequences may interfere with the study of SNPs of CYP2A13. CONCLUSIONS: CYP2A13 may be a potential key metabolic enzyme gene in the carcinogenesis of lung cancer in non-smokers. The common polymorphisms of CYP2A13 may be candidate biomarkers for lung cancer susceptibility in Han-Chinese.


Assuntos
Povo Asiático/genética , Citocromo P-450 CYP2A6/genética , Neoplasias Pulmonares/genética , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/genética , Estudos de Casos e Controles , China/etnologia , Feminino , Estudos de Associação Genética , Predisposição Genética para Doença , Projeto HapMap , Humanos , Neoplasias Pulmonares/etnologia , Masculino , Pessoa de Meia-Idade , não Fumantes
14.
Int J Immunogenet ; 46(2): 49-58, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30659741

RESUMO

Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.


Assuntos
Asma/genética , Frequência do Gene/genética , Genética Populacional , Genoma Humano , Projeto HapMap , Hepatopatias/genética , Polimorfismo de Nucleotídeo Único/genética , Estônia , Humanos
16.
J Autoimmun ; 94: 83-89, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30143393

RESUMO

Genome-wide association studies (GWAS) have identified a large number of genetic risk loci for autoimmune diseases. However, the functional variants underlying these disease associations remain largely unknown. There is evidence that microRNA-mediated regulation may play an important role in this context. Therefore, we assessed whether autoimmune disease loci unfold their effects via altering microRNA expression in relevant immune cells. To this end, we performed comprehensive data integration of many large and publicly available datasets to combine information on autoimmune disease risk loci with RNA-Seq-based microRNA expression data. Specifically, we carried out microRNA expression quantitative trait loci (eQTL) analyses across 115 GWAS regions associated with 12 autoimmune diseases using next-generation sequencing data of 345 lymphoblastoid cell lines. Statistical analyses included the application and extension of a recently proposed framework (joint likelihood mapping) to microRNA expression data and microRNA target gene enrichment analyses of relevant GWAS data. Overall, only a minority of autoimmune disease risk loci may exert their pathophysiologic effects by altering microRNA expression based on JLIM. However, detailed functional fine-mapping revealed two independent GWAS regions harboring autoimmune disease risk SNPs with significant effects on microRNA expression. These relate to SNPs associated with Crohn's disease (CD; rs102275) and rheumatoid arthritis (RA; rs968567), which affect the expression of miR-1908-5p (prs102275 = 1.44e-20, prs968567 = 2.54e-14). In addition, an independent CD risk SNP, rs3853824, was found to alter the expression of miR-3614-5p (p = 5.70e-7). To support these findings, we demonstrate that GWAS signals for RA and CD were enriched in genes predicted to be targeted by both microRNAs (all with p < 0.05). In summary, our study points towards a potential pathophysiological role of miR-1908-5p and miR-3614-5p in autoimmunity.


Assuntos
Artrite Reumatoide/genética , Doença de Crohn/genética , Linfócitos/imunologia , MicroRNAs/genética , Artrite Reumatoide/diagnóstico , Artrite Reumatoide/imunologia , Artrite Reumatoide/patologia , Linhagem Celular , Biologia Computacional/métodos , Doença de Crohn/diagnóstico , Doença de Crohn/imunologia , Doença de Crohn/patologia , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Projeto HapMap , Humanos , Linfócitos/patologia , MicroRNAs/imunologia , Locos de Características Quantitativas , Risco
17.
PLoS One ; 13(4): e0196226, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29702671

RESUMO

Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.


Assuntos
Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Algoritmos , Genoma Humano , Projeto HapMap , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Projeto Genoma Humano , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
18.
Sci Rep ; 8(1): 5553, 2018 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-29615764

RESUMO

Differences among SNP panels for individual identification in SNP-selecting and populations led to few common SNPs, compromising their universal applicability. To screen all universal SNPs, we performed a genome-wide SNP mining in multiple populations based on HapMap and 1000Genomes databases. SNPs with high minor allele frequencies (MAF) in 37 populations were selected. With MAF from ≥0.35 to ≥0.43, the number of selected SNPs decreased from 2769 to 0. A total of 117 SNPs with MAF ≥0.39 have no linkage disequilibrium with each other in every population. For 116 of the 117 SNPs, cumulative match probability (CMP) ranged from 2.01 × 10-48 to 1.93 × 10-50 and cumulative exclusion probability (CEP) ranged from 0.9999999996653 to 0.9999999999945. In 134 tested Han samples, 110 of the 117 SNPs remained within high MAF and conformed to Hardy-Weinberg equilibrium, with CMP = 4.70 × 10-47 and CEP = 0.999999999862. By analyzing the same number of autosomal SNPs as in the HID-Ion AmpliSeq Identity Panel, i.e. 90 randomized out of the 110 SNPs, our panel yielded preferable CMP and CEP. Taken together, the 110-SNPs panel is advantageous for forensic test, and this study provided plenty of highly informative SNPs for compiling final universal panels.


Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Projeto HapMap , Polimorfismo de Nucleotídeo Único , Humanos
19.
Sci Rep ; 8(1): 4009, 2018 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-29507384

RESUMO

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.


Assuntos
Variações do Número de Cópias de DNA , Análise de Sequência de DNA/normas , Algoritmos , Deleção de Genes , Genoma Humano , Projeto HapMap , Heterozigoto , Homozigoto , Humanos , Distribuição de Poisson , Análise de Sequência de DNA/métodos
20.
Clin Interv Aging ; 13: 377-388, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29551892

RESUMO

BACKGROUND: Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. METHODS: A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy-Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. RESULTS: Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. CONCLUSION: A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool.


Assuntos
Diabetes Mellitus Tipo 2/genética , Etnicidade/genética , Marcadores Genéticos/genética , Genética Populacional , Adulto , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Genótipo , Projeto HapMap , Haplótipos/genética , Humanos , Cazaquistão , Masculino , Pessoa de Meia-Idade , Obesidade/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...