Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 394
Filtrar
1.
Sci Rep ; 14(1): 4654, 2024 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-38409353

RESUMEN

Admixture mapping has been useful in identifying genetic variations linked to phenotypes, adaptation and diseases. Copy number variations (CNVs) represents genomic structural variants spanning large regions of chromosomes reaching several megabases. In this investigation, the "Canary" algorithm was applied to 102 Tunisian samples and 991 individuals from eleven HapMap III populations to genotype 1279 copy number polymorphisms (CNPs). In this present work, we investigate the Tunisian population structure using the CNP makers previously identified among Tunisian. The study revealed that Sub-Saharan African populations exhibited the highest diversity with the highest proportions of allelic CNPs. Among all the African populations, Tunisia showed the least diversity. Individual ancestry proportions computed using STRUCTURE analysis revealed a major European component among Tunisians with lesser contribution from Sub-Saharan Africa and Asia. Population structure analysis indicated the genetic proximity with Europeans and noticeable distance from the Sub-Saharan African and East Asian clusters. Seven genes harbouring Tunisian high-frequent CNPs were identified known to be associated with 9 Mendelian diseases and/or phenotypes. Functional annotation of genes under selection highlighted a noteworthy enrichment of biological processes to receptor pathway and activity as well as glutathione metabolism. Additionally, pathways of potential concern for health such as drug metabolism, infectious diseases and cancers exhibited significant enrichment. The distinctive genetic makeup of the Tunisians might have been influenced by various factors including natural selection and genetic drift, resulting in the development of distinct genetic variations playing roles in specific biological processes. Our research provides a justification for focusing on the exclusive genome organization of this population and uncovers previously overlooked elements of the genome.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma , Pueblo Norteafricano , Humanos , Proyecto Mapa de Haplotipos , Genotipo , Genética de Población , Polimorfismo de Nucleótido Simple
2.
Sci Data ; 10(1): 198, 2023 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-37037860

RESUMEN

Honey bee, Apis mellifera, drones are typically haploid, developing from an unfertilized egg, inheriting only their queen's alleles and none from the many drones she mated with. Thus the ordered combination or 'phase' of alleles is known, making drones a valuable haplotype resource. We collated whole-genome sequence data for 1,407 drones, including 45 newly sequenced Scottish drones, collectively representing 19 countries, 8 subspecies and various hybrids. Following alignment to Amel_HAv3.1, variant calling and quality filtering, we retained 17.4 M high quality variants across 1,328 samples with a genotyping rate of 98.7%. We demonstrate the utility of this haplotype resource, AmelHap, for genotype imputation, returning >95% concordance when up to 61% of data is missing in haploids and up to 12% of data is missing in diploids. AmelHap will serve as a useful resource for the community for imputation from low-depth sequencing or SNP chip data, accurate phasing of diploids for association studies, and as a comprehensive reference panel for population genetic and evolutionary analyses.


Asunto(s)
Abejas , Genoma de los Insectos , Animales , Femenino , Secuencia de Bases , Abejas/genética , Evolución Biológica , Genotipo , Proyecto Mapa de Haplotipos
3.
Genetics ; 222(4)2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36171678

RESUMEN

Whole-exome sequencing (WES) enables the detection of copy number variants (CNVs) with high resolution in protein-coding regions. However, variants in the intergenic or intragenic regions are excluded from studies. Fortunately, many of these samples have been previously sequenced by other genotyping platforms which are sparse but cover a wide range of genomic regions, such as SNP array. Moreover, conventional single sample-based methods suffer from a high false discovery rate due to prominent data noise. Therefore, methods for integrating multiple genotyping platforms and multiple samples are highly demanded for improved copy number variant detection. We developed BMI-CNV, a Bayesian Multisample and Integrative CNV (BMI-CNV) profiling method with data sequenced by both whole-exome sequencing and microarray. For the multisample integration, we identify the shared copy number variants regions across samples using a Bayesian probit stick-breaking process model coupled with a Gaussian Mixture model estimation. With extensive simulations, BMI-copy number variant outperformed existing methods with improved accuracy. In the matched data from the 1000 Genomes Project and HapMap project data, BMI-CNV also accurately detected common variants and significantly enlarged the detection spectrum of whole-exome sequencing. Further application to the data from The Research of International Cancer of Lung consortium (TRICL) identified lung cancer risk variant candidates in 17q11.2, 1p36.12, 8q23.1, and 5q22.2 regions.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genotipo , Teorema de Bayes , Índice de Masa Corporal , Proyecto Mapa de Haplotipos
4.
Int J Neural Syst ; 32(6): 2250028, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35579974

RESUMEN

Over the last decades, the exuberant development of next-generation sequencing has revolutionized gene discovery. These technologies have boosted the mapping of single nucleotide polymorphisms (SNPs) across the human genome, providing a complex universe of heterogeneity characterizing individuals worldwide. Fractal dimension (FD) measures the degree of geometric irregularity, quantifying how "complex" a self-similar natural phenomenon is. We compared two FD algorithms, box-counting dimension (BCD) and Higuchi's fractal dimension (HFD), to characterize genome-wide patterns of SNPs extracted from the HapMap data set, which includes data from 1184 healthy subjects of eleven populations. In addition, we have used cluster and classification analysis to relate the genetic distances within chromosomes based on FD similarities to the geographical distances among the 11 global populations. We found that HFD outperformed BCD at both grand average clusterization analysis by the cophenetic correlation coefficient, in which the closest value to 1 represents the most accurate clustering solution (0.981 for the HFD and 0.956 for the BCD) and classification (79.0% accuracy, 61.7% sensitivity, and 96.4% specificity for the HFD with respect to 69.1% accuracy, 43.2% sensitivity, and 94.9% specificity for the BCD) of the 11 populations present in the HapMap data set. These results support the evidence that HFD is a reliable measure helpful in representing individual variations within all chromosomes and categorizing individuals and global populations.


Asunto(s)
Fractales , Genoma Humano , Algoritmos , Variación Genética , Proyecto Mapa de Haplotipos , Humanos
5.
Bioinformatics ; 38(2): 318-324, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34601584

RESUMEN

MOTIVATION: Tea is a cross-pollinated woody perennial plant, which is why, application of conventional breeding is limited for its genetic improvement. However, lack of the genome-wide high-density SNP markers and genome-wide haplotype information has greatly hampered the utilization of tea genetic resources toward fast-track tea breeding programs. To address this challenge, we have generated a first-generation haplotype map of tea (Tea HapMap-1). Out-crossing and highly heterozygous nature of tea plants, make them more complicated for DNA-level variant discovery. RESULTS: In this study, whole genome re-sequencing data of 369 tea genotypes were used to generate 2,334,564 biallelic SNPs and 1,447,985 InDels. Around 2928.04 million paired-end reads were used with an average mapping depth of ∼0.31× per accession. Identified polymorphic sites in this study will be useful in mapping the genomic regions responsible for important traits of tea. These resources lay the foundation for future research to understand the genetic diversity within tea germplasm and utilize genes that determine tea quality. This will further facilitate the understanding of tea genome evolution and tea metabolite pathways thus, offers an effective germplasm utilization for breeding the tea varieties. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Camellia sinensis , Camellia sinensis/genética , Haplotipos , Proyecto Mapa de Haplotipos , Fitomejoramiento , , Genoma de Planta
6.
Genome Biol ; 22(1): 180, 2021 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-34120636

RESUMEN

BACKGROUND: Canonical nonsense-mediated decay (NMD) is an important splicing-dependent process for mRNA surveillance in mammals. However, processed pseudogenes are not able to trigger NMD due to their lack of introns. It is largely unknown whether they have evolved other surveillance mechanisms. RESULTS: Here, we find that the RNAs of pseudogenes, especially processed pseudogenes, have dramatically higher m6A levels than their cognate protein-coding genes, associated with de novo m6A peaks and motifs in human cells. Furthermore, pseudogenes have rapidly accumulated m6A motifs during evolution. The m6A sites of pseudogenes are evolutionarily younger than neutral sites and their m6A levels are increasing, supporting the idea that m6A on the RNAs of pseudogenes is under positive selection. We then find that the m6A RNA modification of processed, rather than unprocessed, pseudogenes promotes cytosolic RNA degradation and attenuates interference with the RNAs of their cognate protein-coding genes. We experimentally validate the m6A RNA modification of two processed pseudogenes, DSTNP2 and NAP1L4P1, which promotes the RNA degradation of both pseudogenes and their cognate protein-coding genes DSTN and NAP1L4. In addition, the m6A of DSTNP2 regulation of DSTN is partially dependent on the miRNA miR-362-5p. CONCLUSIONS: Our discovery reveals a novel evolutionary role of m6A RNA modification in cleaning up the unnecessary processed pseudogene transcripts to attenuate their interference with the regulatory network of protein-coding genes.


Asunto(s)
Adenosina/análogos & derivados , Genoma Humano , Seudogenes , Empalme del ARN , ARN Mensajero/genética , Selección Genética , Adenosina/genética , Adenosina/metabolismo , Línea Celular , Línea Celular Transformada , Destrina/genética , Destrina/metabolismo , Células HEK293 , Proyecto Mapa de Haplotipos , Células Madre Embrionarias Humanas/citología , Células Madre Embrionarias Humanas/metabolismo , Humanos , Linfocitos/citología , Linfocitos/metabolismo , MicroARNs/genética , MicroARNs/metabolismo , Degradación de ARNm Mediada por Codón sin Sentido , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , ARN Mensajero/metabolismo
7.
Mol Brain ; 14(1): 52, 2021 03 12.
Artículo en Inglés | MEDLINE | ID: mdl-33712038

RESUMEN

The HapMap Project is a major international research effort to construct a resource to facilitate the discovery of relationships between human genetic variations and health and disease. The Ser19Stop single nucleotide polymorphism (SNP) of human phytanoyl-CoA hydroxylase-interacting protein-like (PHYHIPL) gene was detected in HapMap project and registered in the dbSNP. PHYHIPL gene expression is altered in global ischemia and glioblastoma multiforme. However, the function of PHYHIPL is unknown. We generated PHYHIPL Ser19Stop knock-in mice and found that PHYHIPL impacts the morphology of cerebellar Purkinje cells (PCs), the innervation of climbing fibers to PCs, the inhibitory inputs to PCs from molecular layer interneurons, and motor learning ability. Thus, the Ser19Stop SNP of the PHYHIPL gene may be associated with cerebellum-related diseases.


Asunto(s)
Cerebelo/citología , Péptidos y Proteínas de Señalización Intracelular/genética , Polimorfismo de Nucleótido Simple , Células de Purkinje/ultraestructura , Secuencia de Aminoácidos , Animales , Sistemas CRISPR-Cas , Forma de la Célula , Codón de Terminación , Femenino , Técnicas de Sustitución del Gen , Proyecto Mapa de Haplotipos , Humanos , Interneuronas/fisiología , Péptidos y Proteínas de Señalización Intracelular/fisiología , Aprendizaje , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Actividad Motora , Fibras Nerviosas/fisiología , Células de Purkinje/metabolismo , Prueba de Desempeño de Rotación con Aceleración Constante , Alineación de Secuencia , Homología de Secuencia de Aminoácido
8.
PLoS Genet ; 17(2): e1009303, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33539374

RESUMEN

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.


Asunto(s)
Simulación por Computador , Genoma Humano , Aprendizaje Automático , Población/genética , Algoritmos , Alelos , Cromosomas Humanos Par 15/genética , Bases de Datos Factuales , Bases de Datos Genéticas , Aprendizaje Profundo , Proyecto Mapa de Haplotipos , Humanos , Cadenas de Markov , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple
9.
PLoS Genet ; 17(1): e1009210, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33428619

RESUMEN

Modern day Saudi Arabia occupies the majority of historical Arabia, which may have contributed to ancient waves of migration out of Africa. This ancient history has left a lasting imprint in the genetics of the region, including the diverse set of tribes that call Saudi Arabia their home. How these tribes relate to each other and to the world's major populations remains an unanswered question. In an attempt to improve our understanding of the population structure of Saudi Arabia, we conducted genomic profiling of 957 unrelated individuals who self-identify with 28 large tribes in Saudi Arabia. Consistent with the tradition of intra-tribal unions, the subjects showed strong clustering along tribal lines with the distance between clusters correlating with their geographical proximities in Arabia. However, these individuals form a unique cluster when compared to the world's major populations. The ancient origin of these tribal affiliations is supported by analyses that revealed little evidence of ancestral origin from within the 28 tribes. Our results disclose a granular map of population structure and have important implications for future genetic studies into Mendelian and common diseases in the region.


Asunto(s)
Árabes/genética , Genoma Humano/genética , Grupos de Población/genética , África/epidemiología , Arabia/epidemiología , Árabes/historia , Asia/epidemiología , Europa (Continente)/epidemiología , Femenino , Proyecto Mapa de Haplotipos , Haplotipos/genética , Historia Antigua , Humanos , Endogamia , Masculino , Grupos de Población/historia , Análisis de Componente Principal , Arabia Saudita/epidemiología
10.
Hum Genet ; 139(8): 1107-1117, 2020 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32270270

RESUMEN

Extensive studies have been conducted on the analysis of genome function, especially on the expression quantitative trait loci (eQTL). These studies offered promising results for characterization of the functional sequencing variation and understanding of the basic processes of gene regulation. Parent of origin effect (POE) is an important epigenetic phenomenon describing that the expression of certain genes depends on their allelic parent-of-origin and it is known to play important roles in human complex diseases. However, traditional eQTL mapping approaches do not allow for the detection of imprinting, or they focus on modeling the additive genetic effect thereby ignoring the estimation of the dominance genetic effect. In this study, we proposed a statistical framework to test the additive and dominance genetic effects of the candidate eQTLs along with detection of the POE with a functional model and an orthogonal model for RNA-seq data. We demonstrated the desirable power and preserved Type I errors of the methods in most scenarios, especially the orthogonal model with un-biased estimation of the genetic effects and over-dispersion of the RNA-seq data. The application to a HapMap project trio dataset validated existing imprinting genes and discovered two novel imprinting genes with potential dominance genetic effect and RB1 and IGF1R genes. This study provides new insights into the next generation statistical modeling of eQTL mapping for better understanding of the genetic architecture underlying the mechanisms of gene expression regulation.


Asunto(s)
Regulación de la Expresión Génica/genética , Impresión Genómica/genética , Modelos Genéticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Alelos , Niño , Simulación por Computador , Familia , Femenino , Genes Dominantes/genética , Genotipo , Proyecto Mapa de Haplotipos , Humanos , Masculino , Padres , RNA-Seq
11.
Int J Legal Med ; 134(1): 123-134, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31760471

RESUMEN

Ancestry-informative markers (AIMs) can be used to infer the ancestry of an individual to minimize the inaccuracy of self-reported ethnicity in biomedical research. In this study, we describe three methods for selecting AIM SNPs for the Malay population (Malay AIM panel) using different approaches based on pairwise FST, informativeness for assignment (In), and PCA-correlated SNPs (PCAIMs). These Malay AIM panels were extracted from genotype data stored in SNP arrays hosted by the Malaysian node of the Human Variome Project (MyHVP) and the Singapore Genome Variation Project (SGVP). In particular, genotype data from a total of 165 Malay individuals were analyzed, comprising data on 117 individual genotypes from the Affymetrix SNP-6 SNP array platform and data on 48 individual genotypes from the OMNI 2.5 Illumina SNP array platform. The HapMap phase 3 database (1397 individuals from 11 populations) was used as a reference for comparison with the Malay genotype data. The accuracy of each resulting Malay AIM panel was evaluated using a machine learning "ancestry-predictive model" constructed by using WEKA, a comprehensive machine learning platform written in Java. A total of 1250 SNPs were finally selected, which successfully identified Malay individuals from other world populations with an accuracy of 90%, but the accuracy decreased to 80% using 157 SNPs according to the pairwise FST method, while a panel of 200 SNPs selected using In and PCAIMs could be used to identify Malay individuals with an accuracy of approximately 80%.


Asunto(s)
Bases de Datos Genéticas , Etnicidad/genética , Genética de Población/métodos , Genotipo , Polimorfismo de Nucleótido Simple , Pueblo Asiatico/genética , Marcadores Genéticos , Proyecto Mapa de Haplotipos , Humanos , Malasia/etnología , Modelos Estadísticos , Nativos de Hawái y Otras Islas del Pacífico/genética , Análisis de Componente Principal , Singapur
12.
Hum Immunol ; 80(11): 897-905, 2019 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-31558329

RESUMEN

Since their inception, the International HLA & Immunogenetics Workshops (IHIW) served as a collaborative platform for exchange of specimens, reference materials, experiences and best practices. In this report we present a subset of the results of human leukocyte antigen (HLA) haplotypes in families tested by next generation sequencing (NGS) under the 17th IHIW. We characterized 961 haplotypes in 921 subjects belonging to 250 families from 8 countries (Argentina, Austria, Egypt, Jamaica, Germany, Greece, Kuwait, and Switzerland). These samples were tested in a single core laboratory in a high throughput fashion using 6 different reagents/software platforms. Families tested included patients evaluated clinically as transplant recipients (kidney and hematopoietic cell transplant) and their respective family members. We identified 486 HLA alleles at the following loci HLA-A, -B, -C, -DRB1, -DRB3, -DRB4, -DRB5, -DQA1, -DQB1, -DPA1, -DPB1 (77, 115, 68, 69, 10, 6, 4, 44, 31, 20 and 42 alleles, respectively). We also identified nine novel alleles with polymorphisms in coding regions. This approach of testing samples from multiple laboratories across the world in different stages of technology implementation in a single core laboratory may be useful for future international workshops. Although data presented may not be reflective of allele and haplotype frequencies in the countries to which the families belong, they represent an extensive collection of 3rd and 4th field resolution level 11-locus haplotype associations of 486 alleles identified in families from 8 countries.


Asunto(s)
Genotipo , Antígenos HLA/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biología Computacional , Educación , Familia , Frecuencia de los Genes , Proyecto Mapa de Haplotipos , Haplotipos , Prueba de Histocompatibilidad/métodos , Humanos , Inmunogenética , Cooperación Internacional , Desequilibrio de Ligamiento , Modelos Biológicos , Linaje , Polimorfismo Genético
13.
Thorac Cancer ; 10(4): 601-606, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30807688

RESUMEN

BACKGROUND: The aim of this study was to evaluate the association between CYP2A13 polymorphisms and lung cancer susceptibility using the HapMap database. METHODS: A case-control analysis of 532 subjects with lung cancer and 614 controls with no personal history of the disease was performed. The tag SNPs rs1645690 and rs8192789 for CYP2A13 were selected, and the genetic polymorphisms were confirmed experimentally through real-time PCR, cloning, and sequencing assay. RESULTS: SNP frequency in this study was consistent with the HapMap Project database of Han-Chinese and lung cancer risk was associated with CYP2A13 polymorphisms in non-smokers. CYP2A13 shares a 93.5% identity with CYP2A6 in the amino acid sequence and the homologous sequences may interfere with the study of SNPs of CYP2A13. CONCLUSIONS: CYP2A13 may be a potential key metabolic enzyme gene in the carcinogenesis of lung cancer in non-smokers. The common polymorphisms of CYP2A13 may be candidate biomarkers for lung cancer susceptibility in Han-Chinese.


Asunto(s)
Pueblo Asiatico/genética , Citocromo P-450 CYP2A6/genética , Neoplasias Pulmonares/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/genética , Estudios de Casos y Controles , China/etnología , Femenino , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Proyecto Mapa de Haplotipos , Humanos , Neoplasias Pulmonares/etnología , Masculino , Persona de Mediana Edad , No Fumadores
14.
Int J Immunogenet ; 46(2): 49-58, 2019 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-30659741

RESUMEN

Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.


Asunto(s)
Asma/genética , Frecuencia de los Genes/genética , Genética de Población , Genoma Humano , Proyecto Mapa de Haplotipos , Hepatopatías/genética , Polimorfismo de Nucleótido Simple/genética , Estonia , Humanos
16.
J Autoimmun ; 94: 83-89, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30143393

RESUMEN

Genome-wide association studies (GWAS) have identified a large number of genetic risk loci for autoimmune diseases. However, the functional variants underlying these disease associations remain largely unknown. There is evidence that microRNA-mediated regulation may play an important role in this context. Therefore, we assessed whether autoimmune disease loci unfold their effects via altering microRNA expression in relevant immune cells. To this end, we performed comprehensive data integration of many large and publicly available datasets to combine information on autoimmune disease risk loci with RNA-Seq-based microRNA expression data. Specifically, we carried out microRNA expression quantitative trait loci (eQTL) analyses across 115 GWAS regions associated with 12 autoimmune diseases using next-generation sequencing data of 345 lymphoblastoid cell lines. Statistical analyses included the application and extension of a recently proposed framework (joint likelihood mapping) to microRNA expression data and microRNA target gene enrichment analyses of relevant GWAS data. Overall, only a minority of autoimmune disease risk loci may exert their pathophysiologic effects by altering microRNA expression based on JLIM. However, detailed functional fine-mapping revealed two independent GWAS regions harboring autoimmune disease risk SNPs with significant effects on microRNA expression. These relate to SNPs associated with Crohn's disease (CD; rs102275) and rheumatoid arthritis (RA; rs968567), which affect the expression of miR-1908-5p (prs102275 = 1.44e-20, prs968567 = 2.54e-14). In addition, an independent CD risk SNP, rs3853824, was found to alter the expression of miR-3614-5p (p = 5.70e-7). To support these findings, we demonstrate that GWAS signals for RA and CD were enriched in genes predicted to be targeted by both microRNAs (all with p < 0.05). In summary, our study points towards a potential pathophysiological role of miR-1908-5p and miR-3614-5p in autoimmunity.


Asunto(s)
Artritis Reumatoide/genética , Enfermedad de Crohn/genética , Linfocitos/inmunología , MicroARNs/genética , Artritis Reumatoide/diagnóstico , Artritis Reumatoide/inmunología , Artritis Reumatoide/patología , Línea Celular , Biología Computacional/métodos , Enfermedad de Crohn/diagnóstico , Enfermedad de Crohn/inmunología , Enfermedad de Crohn/patología , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Proyecto Mapa de Haplotipos , Humanos , Linfocitos/patología , MicroARNs/inmunología , Sitios de Carácter Cuantitativo , Riesgo
17.
Sci Rep ; 8(1): 5553, 2018 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-29615764

RESUMEN

Differences among SNP panels for individual identification in SNP-selecting and populations led to few common SNPs, compromising their universal applicability. To screen all universal SNPs, we performed a genome-wide SNP mining in multiple populations based on HapMap and 1000Genomes databases. SNPs with high minor allele frequencies (MAF) in 37 populations were selected. With MAF from ≥0.35 to ≥0.43, the number of selected SNPs decreased from 2769 to 0. A total of 117 SNPs with MAF ≥0.39 have no linkage disequilibrium with each other in every population. For 116 of the 117 SNPs, cumulative match probability (CMP) ranged from 2.01 × 10-48 to 1.93 × 10-50 and cumulative exclusion probability (CEP) ranged from 0.9999999996653 to 0.9999999999945. In 134 tested Han samples, 110 of the 117 SNPs remained within high MAF and conformed to Hardy-Weinberg equilibrium, with CMP = 4.70 × 10-47 and CEP = 0.999999999862. By analyzing the same number of autosomal SNPs as in the HID-Ion AmpliSeq Identity Panel, i.e. 90 randomized out of the 110 SNPs, our panel yielded preferable CMP and CEP. Taken together, the 110-SNPs panel is advantageous for forensic test, and this study provided plenty of highly informative SNPs for compiling final universal panels.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano/genética , Proyecto Mapa de Haplotipos , Polimorfismo de Nucleótido Simple , Humanos
18.
PLoS One ; 13(4): e0196226, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29702671

RESUMEN

Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.


Asunto(s)
Biología Computacional/métodos , Variaciones en el Número de Copia de ADN , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Algoritmos , Genoma Humano , Proyecto Mapa de Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proyecto Genoma Humano , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos
19.
Sci Rep ; 8(1): 4009, 2018 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-29507384

RESUMEN

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.


Asunto(s)
Variaciones en el Número de Copia de ADN , Análisis de Secuencia de ADN/normas , Algoritmos , Eliminación de Gen , Genoma Humano , Proyecto Mapa de Haplotipos , Heterocigoto , Homocigoto , Humanos , Distribución de Poisson , Análisis de Secuencia de ADN/métodos
20.
Clin Interv Aging ; 13: 377-388, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29551892

RESUMEN

BACKGROUND: Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. METHODS: A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy-Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. RESULTS: Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. CONCLUSION: A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Etnicidad/genética , Marcadores Genéticos/genética , Genética de Población , Adulto , Femenino , Frecuencia de los Genes/genética , Predisposición Genética a la Enfermedad/genética , Genotipo , Proyecto Mapa de Haplotipos , Haplotipos/genética , Humanos , Kazajstán , Masculino , Persona de Mediana Edad , Obesidad/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...