RESUMEN
BACKGROUND: Human mitochondrial heteroplasmy is an extensively investigated phenomenon in the context of medical diagnostics, forensic identification and molecular evolution. However, technical limitations of high-throughput sequencing hinder reliable determination of point heteroplasmies (PHPs) with minor allele frequencies (MAFs) within the noise threshold. RESULTS: To investigate the PHP landscape at an MAF threshold down to 0.1%, we sequenced whole mitochondrial genomes at approximately 7.700x coverage, in multiple technical and biological replicates of longitudinal blood and buccal swab samples from 11 human donors (159 libraries in total). The results obtained by two independent sequencing platforms and bioinformatics pipelines indicate distinctive PHP patterns below and above the 1% MAF cut-off. We found a high inter-individual prevalence of low-level PHPs (MAF < 1%) at polymorphic positions of the mitochondrial DNA control region (CR), their tissue preference, and a tissue-specific minor allele linkage. We also established the position-dependent potential of minor allele expansion in PHPs, and short-term PHP instability in a mitotically active tissue. We demonstrate that the increase in sensitivity of PHP detection to minor allele frequencies below 1% within a robust experimental and analytical pipeline, provides new information with potential applicative value. CONCLUSIONS: Our findings reliably show different mutational loads between tissues at sub-1% allele frequencies, which may serve as an informative medical biomarker of time-dependent, tissue-specific mutational burden, or help discriminate forensically relevant tissues in a single person, close maternal relatives or unrelated individuals of similar phylogenetic background.
Asunto(s)
Heteroplasmia , Mitocondrias , Humanos , Filogenia , Mitocondrias/genética , Secuenciación de Nucleótidos de Alto Rendimiento , ADN Mitocondrial/genéticaRESUMEN
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant-calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate to what extent the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find greatest topological accuracy when filtering sites for minor allele count >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with minor allele count >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short read genomic data for phylogenetic inference.
RESUMEN
Several studies have shown association of single nucleotide polymorphisms (SNPs) of hepcidin regulatory pathways genes with impaired iron status. The most common is in the TMPRSS6 gene. In Africa, very few studies have been reported. We aimed to investigate the correlation between the common SNPs in the transmembrane protease, serine 6 (TMPRSS6) gene and iron indicators in a sample of Egyptian children for identifying the suitable candidate for iron supplementation.Patients and methods One hundred and sixty children aged 5-13 years were included & classified into iron deficient, iron deficient anemia and normal healthy controls. All were subjected to assessment of serum iron, serum ferritin, total iron binding capacity, complete blood count, reticulocyte count, serum soluble transferrin receptor and serum hepcidin. Molecular study of TMPRSS6 genotyping polymorphisms (rs4820268, rs855791 and rs11704654) were also evaluated.Results There was an association of iron deficiency with AG of rs855791 SNP, (P = 0.01). The minor allele frequency for included children were 0.43, 0.45 & 0.17 for rs4820268, rs855791 & rs11704654 respectively. Genotype GG of rs4820268 expressed the highest hepcidin gene expression fold, the lowest serum ferroportin & iron store compared to AA and AG genotypes (p = 0.05, p = 0.05, p = 0.03 respectively). GG of rs855791 had lower serum ferritin than AA (p = 0.04), lowest iron store & highest serum hepcidin compared to AA and AG genotypes (p = 0.04, p = 0.01 respectively). Children having CC of rs11704654 had lower level of hemoglobin, serum ferritin and serum hepcidin compared with CT genotype (p = 0.01, p = 0.01, p = 0.02) respectively.Conclusion Possible contribution of SNPs (rs855791, rs4820268 and rs11704654) to low iron status.
Asunto(s)
Anemia Ferropénica , Hierro , Niño , Humanos , Hepcidinas/genética , Hepcidinas/metabolismo , Proyectos Piloto , Serina/genética , Péptido Hidrolasas/genética , Péptido Hidrolasas/metabolismo , Egipto , Serina Endopeptidasas/genética , Serina Endopeptidasas/metabolismo , Polimorfismo de Nucleótido Simple , Ferritinas , Anemia Ferropénica/genética , Proteínas de la Membrana/genéticaRESUMEN
Imputation may be used to rescue genomic data from animals that would otherwise be eliminated due to a lower than desired call rate. The aim of this study was to compare the accuracy of genotype imputation for Afrikaner, Brahman, and Brangus cattle of South Africa using within- and multiple-breed reference populations. A total of 373, 309, and 101 Afrikaner, Brahman, and Brangus cattle, respectively, were genotyped using the GeneSeek Genomic Profiler 150 K panel that contained 141,746 markers. Markers with MAF ≤ 0.02 and call rates ≤ 0.95 or that deviated from Hardy Weinberg Equilibrium frequency with a probability of ≤ 0.0001 were excluded from the data as were animals with a call rate ≤ 0.90. The remaining data included 99,086 SNPs and 360 Afrikaner, 75,291 SNPs and 288 animals Brahman, and 97,897 SNPs and 99 Brangus animals. A total of 7986, 7002, and 7000 SNP from 50 Afrikaner and Brahman and 30 Brangus cattle, respectively, were masked and then imputed using BEAGLE v3 and FImpute v2. The within-breed imputation yielded accuracies ranging from 89.9 to 96.6% for the three breeds. The multiple-breed imputation yielded corresponding accuracies from 69.21 to 88.35%. The results showed that population homogeneity and numerical representation for within and across breed strategies, respectively, are crucial components for improving imputation accuracies.
Asunto(s)
Bovinos , Genoma , Genotipo , Animales , Cruzamiento , Bovinos/genética , Genómica , Polimorfismo de Nucleótido Simple , SudáfricaRESUMEN
Single nucleotide polymorphisms (SNPs) have now replaced microsatellite markers in several species for various genetic investigations like parentage assignment, genetic breed composition, assessment for individuality and, most popularly, as a useful tool in genomic selection. However, such a resource, which can offer to assist breed identification in a cost-effective manner is still not explored in cattle breeding programs. In our study, we have tried to describe methods for reducing the number of SNPs to develop a breed-specific panel. We have used SNP data from Dryad open public access repository. Starting from a global dataset of 178 animals belonging to 10 different breeds, we selected five panels each comprising of similar number of SNPs using different methods i.e., Delta, Pairwise Wright's FST, informativeness for assignment, frequent item feature selection (FIFS) and minor allele frequency-linkage disequilibrium (MAF-LD) based method. MAF-LD based method has been recently developed by us for construction of breed-specific SNP panels. The STRUCTURE software analysis of MAF-LD based method showed appropriate clustering in comparison to other panels. Later, the panel of 591 breed-specific SNPs was called to their respective breeds using Venny 2.1.0 and UGent web tools software. Breed-specific SNPs were later annotated by using various Bioinformatics softwares.
Asunto(s)
Bovinos/clasificación , Bovinos/genética , Técnicas de Genotipaje/métodos , Polimorfismo de Nucleótido Simple/genética , Animales , Cruzamiento , Desequilibrio de Ligamiento/genéticaRESUMEN
BACKGROUND: PLINK is probably the most used program for analyzing SNP genotypes and runs of homozygosity (ROH), both in human and in animal populations. The last decade, ROH analyses have become the state-of-the-art method for inbreeding assessment. In PLINK, the --homozyg function is used to perform ROH analyses and relies on several input settings. These settings can have a large impact on the outcome and default values are not always appropriate for medium density SNP array data. Guidelines for a robust and uniform ROH analysis in PLINK using medium density data are lacking, albeit these guidelines are vital for comparing different ROH studies. In this study, 8 populations of different livestock and pet species are used to demonstrate the importance of PLINK input settings. Moreover, the effects of pruning SNPs for low minor allele frequencies and linkage disequilibrium on ROH detection are shown. RESULTS: We introduce the genome coverage parameter to appropriately estimate FROH and to check the validity of ROH analyses. The effect of pruning for linkage disequilibrium and low minor allele frequencies on ROH analyses is highly population dependent and such pruning may result in missed ROH. PLINK's minimal density requirement is crucial for medium density genotypes and if set too low, genome coverage of the ROH analysis is limited. Finally, we provide recommendations for the maximal gap, scanning window length and threshold settings. CONCLUSIONS: In this study, we present guidelines for an adequate and robust ROH analysis in PLINK on medium density SNP data. Furthermore, we advise to report parameter settings in publications, and to validate them prior to analysis. Moreover, we encourage authors to report genome coverage to reflect the ROH analysis' validity. Implementing these guidelines will substantially improve the overall quality and uniformity of ROH analyses.
Asunto(s)
Homocigoto , Ganado/genética , Mascotas/genética , Polimorfismo de Nucleótido Simple , Alelos , Animales , Frecuencia de los Genes , Pruebas Genéticas , Genética de Población , Genotipo , Endogamia , Desequilibrio de LigamientoRESUMEN
BACKGROUND: Synonymous mutations are able to change the tAI (tRNA adaptation index) of a codon and consequently affect the local translation rate. Intuitively, one may hypothesize that those synonymous mutations which increase the tAI values are favored by natural selection. RESULTS: We use the maize (Zea mays) genome to test our assumption. The first supporting evidence is that the tAI-increasing synonymous mutations have higher fixed-to-polymorphic ratios than the tAI-decreasing ones. Next, the DAF (derived allele frequency) or MAF (minor allele frequency) of the former is significantly higher than the latter. Moreover, similar results are obtained when we investigate CAI (codon adaptation index) instead of tAI. CONCLUSION: The synonymous mutations in the maize genome are not strictly neutral. The tAI-increasing mutations are positively selected while those tAI-decreasing ones undergo purifying selection. This selection force might be weak but should not be automatically ignored.
Asunto(s)
Selección Genética , Mutación Silenciosa , Zea mays/genética , Composición de Base , Codón/genética , Frecuencia de los Genes , Genes de Plantas , Variación Genética , Genoma de Planta , Modelos Genéticos , Pliegue del ARN , ARN de Transferencia/genéticaRESUMEN
BACKGROUND: Due to the advent of SNP array technology, a genome-wide analysis of genetic differences between populations and breeds has become possible at a previously unattainable level. The Wright's fixation index (Fst) and the principal component analysis (PCA) are widely used methods in animal genetics studies. In paper we compared the power of these methods, their complementing each other and which of them is the most powerful. RESULTS: Comparative analysis of the power Principal Components Analysis (PCA) and Fst were carried out to reveal genetic differences between herds of Holsteinized cows. Totally, 803 BovineSNP50 genotypes of cows from 13 herds were used in current study. Obtained Fst values were in the range of 0.002-0.012 (mean 0.0049) while for rare SNPs with MAF 0.0001-0.005 they were even smaller in the range of 0.001-0.01 (mean 0.0027). Genetic relatedness of the cows in the herds was the cause of such small Fst values. The contribution of rare alleles with MAF 0.0001-0.01 to the Fst values was much less than common alleles and this effect depends on linkage disequilibrium (LD). Despite of substantial change in the MAF spectrum and the number of SNPs we observed small effect size of LD - based pruning on Fst data. PCA analysis confirmed the mutual admixture and small genetic difference between herds. Moreover, PCA analysis of the herds based on the visualization the results of a single eigenvector cannot be used to significantly differentiate herds. Only summed eigenvectors should be used to realize full power of PCA to differentiate small between herds genetic difference. Finally, we presented evidences that the significance of Fst data far exceeds the significance of PCA data when these methods are used to reveal genetic differences between herds. CONCLUSIONS: LD - based pruning had a small effect on findings of Fst and PCA analyzes. Therefore, for weakly structured populations the LD - based pruning is not effective. In addition, our results show that the significance of genetic differences between herds obtained by Fst analysis exceeds the values of PCA. Proposed, to differentiate herds or low structured populations we recommend primarily using the Fst approach and only then PCA.
Asunto(s)
Cruzamiento , Bovinos/genética , Genética de Población , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Animales , Genoma , Genotipo , Análisis de Componente Principal , Federación de RusiaRESUMEN
BACKGROUND: Next generation sequencing (NGS) generates a large amount of genetic data that can be used to better characterise disease-causing variants. Our aim was to examine allele frequencies of sequence variants reported to cause autosomal dominant inherited retinal diseases (AD-IRDs). METHODS: Genetic information was collected from various databases, including PubMed, the Human Genome Mutation Database, RETNET and gnomAD. RESULTS: We generated a database of 1223 variants reported in 58 genes, including their allele frequency in gnomAD that contains NGS data of over 138 000 individuals. While the majority of variants are not represented in gnomAD, 138 had an allele count of >1 and were examined carefully for various aspects including cosegregation and functional analyses. The analysis revealed 122 variants that were reported pathogenic but unlikely to cause AD-IRDs. Interestingly, in some cases, these unlikely pathogenic variants were the only ones reported to cause disease in AD inheritance pattern for a particular gene, therefore raising doubt regarding the involvement of 11 (19%) of the genes in AD-IRDs. CONCLUSION: We predict that these data are not limited to a specific disease or inheritance pattern since non-pathogenic variants were mistakenly reported as pathogenic in various diseases. Our results should serve as a warning sign for geneticists, variant database curators and sequencing panels' developers not to automatically accept reported variants as pathogenic but cross-reference the information with large databases.
Asunto(s)
Alelos , Frecuencia de los Genes , Genes Dominantes , Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Variación Genética , Enfermedades de la Retina/genética , Estudios de Asociación Genética , Genómica/métodos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
Genomic selection (GS) is a strategy to predict the genetic merits of individuals using genome-wide markers. However, GS prediction accuracy is affected by many factors, including missing rate and minor allele frequency (MAF) of genotypic data, GS models, trait features, etc. In this study, we used one wheat population to investigate prediction accuracies of various GS models on yield and yield-related traits from various quality control (QC) scenarios, missing genotype imputation, and genome-wide association studies (GWAS)-derived markers. Missing rate and MAF of single nucleotide polymorphism (SNP) markers were two major factors in QC. Five missing rate levels (0%, 20%, 40%, 60%, and 80%) and three MAF levels (0%, 5%, and 10%) were considered and the five-fold cross validation was used to estimate the prediction accuracy. The results indicated that a moderate missing rate level (20% to 40%) and MAF (5%) threshold provided better prediction accuracy. Under this QC scenario, prediction accuracies were further calculated for imputed and GWAS-derived markers. It was observed that the accuracies of the six traits were related to their heritability and genetic architecture, as well as the GS prediction model. Moore-Penrose generalized inverse (GenInv), ridge regression (RidgeReg), and random forest (RForest) resulted in higher prediction accuracies than other GS models across traits. Imputation of missing genotypic data had marginal effect on prediction accuracy, while GWAS-derived markers improved the prediction accuracy in most cases. These results demonstrate that QC on missing rate and MAF had positive impact on the predictability of GS models. We failed to identify one single combination of QC scenarios that could outperform the others for all traits and GS models. However, the balance between marker number and marker quality is important for the deployment of GS in wheat breeding. GWAS is able to select markers which are mostly related to traits, and therefore can be used to improve the prediction accuracy of GS.
Asunto(s)
Grano Comestible/genética , Genómica , Sitios de Carácter Cuantitativo , Triticum/genética , ADN de Plantas/genética , ADN de Plantas/aislamiento & purificación , Análisis de Datos , Frecuencia de los Genes , Variación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Desequilibrio de Ligamiento , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Selección GenéticaRESUMEN
Genotype imputation is a process of estimating missing ge-notypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies. The performance of genotype imputation is affected by many factors, including software, reference selection, sample size, and SNP density/sequencing coverage. A systematical evaluation of the imputation performance of current popular software will benefit future studies. Here, we evaluate imputation performances of Beagle4.1, IMPUTE2, MACH+Minimac3, and SHAPEIT2+ IM-PUTE2 using test samples of East Asian ancestry and references of the 1000 Genomes Project. The result indicated the accuracy of IMPUTE2 (99.18%) is slightly higher than that of the others (Beagle4.1: 98.94%, MACH+Minimac3: 98.51%, and SHAPEIT2+IMPUTE2: 99.08%). To achieve good and stable imputation quality, the minimum requirement of SNP density needs to be > 200/Mb. The imputation accuracies of IMPUTE2 and Beagle4.1 were under the minor influence of the study sample size. The contribution extent of reference to genotype imputation performance relied on software selection. We assessed the imputation performance on SNPs generated by next-generation whole genome sequencing and found that SNP sets detected by sequencing with 15× depth could be mostly got by imputing from the haplotype reference panel of the 1000 Genomes Project based on SNP data detected by sequencing with 4× depth. All of the imputation software had a weaker performance in low minor allele frequency SNP regions because of the bias of reference or software. In the future, more comprehensive reference panels or new algorithm developments may rise up to this challenge.
Asunto(s)
Técnicas de Genotipaje/métodos , Alelos , Cromosomas Humanos Par 1/genética , Cromosomas Humanos Par 22/genética , Reacciones Falso Positivas , Frecuencia de los Genes , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple/genética , Tamaño de la Muestra , Programas InformáticosRESUMEN
OBJECTIVE: To examine genomic, social, and clinical risk factors of ≥85 weight for length percentile (WFLP) at 12 months. STUDY DESIGN: Children in this study had whole-genome sequencing, and clinical and social data were collected. WFLPs at 12 months of age were grouped as follows: (1) <85th, (2) ≥85th to <95th, (3) ≥95th to <99th, and (4) ≥99th. Whole-genome sequencing data were used to analyze rare and common variants, and association of clinical and social factors was examined. RESULTS: A total of 690 children were included; WFLPs were 422 (61.2%) <85th, 112 (16.2%) ≥85th-<95th, 89 (12.9%) ≥95th-<99th, and 67 (9.7%) ≥99th. Family-related risk factors associated with greater WFLP were greater paternal body mass index, WFLP ≥99th OR 1.10 (1.03-1.16), and greater than recommended weight gain in pregnancy, WFLP ≥85th-<95th OR 1.90 (1.09-3.26). More breast milk at 6 months was protective factor: WFLP ≥85th-<95th, OR 0.98 (0.97-0.99), WFLP ≥95th-<99th OR 0.98 (0.97-0.99), and WFLP ≥99th OR 0.98 (0.96-0.99). Although none of the variants reached genome-wide significance, there was a trend toward increased prevalence of genetic variants within or near genes previously associated with obesity in children with WFLP ≥99th. CONCLUSION: This cross-sectional study identified several modifiable factors, including increased weight gain in pregnancy and decreased breast milk at 6 months, associated with greater WFLP at 12 months. Strong genetic factors were not identified.
Asunto(s)
Predisposición Genética a la Enfermedad , Obesidad Infantil/genética , Factores de Riesgo , Alelos , Estatura , Índice de Masa Corporal , Estudios Transversales , Femenino , Frecuencia de los Genes , Variación Genética , Humanos , Lactante , Masculino , Leche Humana , Embarazo , Control de Calidad , Análisis de Secuencia de ADN , Aumento de PesoRESUMEN
In total 52 samples of Sahiwal ( 19 ), Tharparkar ( 17 ), and Gir ( 16 ) were genotyped by using BovineHD SNP chip to analyze minor allele frequency (MAF), genetic diversity, and linkage disequilibrium among these cattle. The common SNPs of BovineHD and 54K SNP Chips were also extracted and evaluated for their performance. Only 40%-50% SNPs of these arrays was found informative for genetic analysis in these cattle breeds. The overall mean of MAF for SNPs of BovineHD SNPChip was 0.248 ± 0.006, 0.241 ± 0.007, and 0.242 ± 0.009 in Sahiwal, Tharparkar and Gir, respectively, while that for 54K SNPs was on lower side. The average Reynold's genetic distance between breeds ranged from 0.042 to 0.055 based on BovineHD Beadchip, and from 0.052 to 0.084 based on 54K SNP Chip. The estimates of genetic diversity based on HD and 54K chips were almost same and, hence, low density chip seems to be good enough to decipher genetic diversity of these cattle breeds. The linkage disequilibrium started decaying (r2 < 0.2) at 140 kb inter-marker distance and, hence, a 20K low density customized SNP array from HD chip could be designed for genomic selection in these cattle else the 54K Bead Chip as such will be useful.
Asunto(s)
Bovinos/genética , Frecuencia de los Genes , Polimorfismo de Nucleótido Simple/genética , Alelos , Animales , Cruzamiento , Femenino , Variación Genética , Genómica , Técnicas de Genotipaje , Desequilibrio de Ligamiento , Análisis de Secuencia por Matrices de OligonucleótidosRESUMEN
In the classical twin study, phenotypic variation is often partitioned into additive genetic (A), common (C) and specific environment (E) components. From genetical theory, the outcome of genotype by environment interaction is expected to inflate A when the interacting factor is shared (i.e., C) between the members of a twin pair. We show that estimates of both A and C can be inflated. When the shared interacting factor changes the size of the difference between homozygotes' means, the expected sibling or DZ twin correlation is .5 if and only if the minor allele frequency (MAF) is .5; otherwise the expected DZ correlation is greater than this value, consistent (and confounded) with some additional effect of C. This result is considered in the light of the distribution of minor allele frequencies for polygenic traits. Also discussed is whether such interactions take place at the locus level or affect an aggregated biological structure or system. Interactions with structures or endophenotypes that result from the aggregated effects of many loci will generally emerge as part of the A estimate.
Asunto(s)
Frecuencia de los Genes/genética , Interacción Gen-Ambiente , Simulación por Computador , Endofenotipos , Genotipo , Humanos , Hermanos , Gemelos Dicigóticos/genéticaRESUMEN
Analysis of large numbers of single-nucleotide polymorphisms (SNPs) can increase individual discrimination power, and, particularly, it can supply important evidence for kinship or ethnic identification. We identified 300 Korean-specific SNPs from 306 Korean whole-exome sequencing (WES) data. Functionally significant SNPs (variants in splicing site, missense, nonsense, and exonic indels) were filtered out from the variant pool, and SNPs with minor allele frequencies (MAFs) of <0.3 in the 1000 Genomes (1000G) database but >0.3 in the Korean population were selected. Genotypes obtained from WES were confirmed by the Sanger sequencing method. The identified markers were evenly distributed throughout the autosomal chromosomes. All the SNPs were in the Hardy-Weinberg equilibrium with a mean MAF of 0.415 (0.161 in 1000G). The mean heterozygosities were 0.476 (observed) and 0.470 (experimental). The combined power of discrimination was very high. Korean MAFs in most SNPs were similar to those for the Chinese and Japanese populations, but were significantly higher than those for several other ethnic populations. These selected SNPs will be used to develop forensic markers and are expected to be widely used for additional individual identification, ethnic discrimination, and linkage analysis for kinship tests.
Asunto(s)
Pueblo Asiatico/genética , Exoma , Marcadores Genéticos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Frecuencia de los Genes , Genotipo , Heterocigoto , Humanos , República de CoreaRESUMEN
Mechanistic hypotheses suggest that vitamin D and the closely related parathyroid hormone (PTH) may be involved in prostate carcinogenesis. However, epidemiological evidence is lacking for PTH and inconsistent for vitamin D. Our objectives were to prospectively investigate the association between vitamin D status, vitamin D-related gene polymorphisms, PTH and prostate cancer risk. A total of 129 cases diagnosed within the Supplémentation en Vitamines et Minéraux Antioxydants cohort were included in a nested case-control study and matched to 167 controls (13 years of follow-up). 25-Hydroxyvitamin D (25(OH)D) and PTH concentrations were assessed from baseline plasma samples. Conditional logistic regression models were computed. Higher 25(OH)D concentration was associated with decreased risk of prostate cancer (ORQ4 v. Q1 0·30; 95 % CI 0·12, 0·77; P trend=0·007). PTH concentration was not associated with prostate cancer risk (P trend=0·4) neither did the studied vitamin D-related gene polymorphisms. In this prospective study, prostate cancer risk was inversely associated with 25(OH)D concentration but not with PTH concentration. These results bring a new contribution to the understanding of the relationship between vitamin D and prostate cancer, which deserves further investigation.
Asunto(s)
Neoplasias de la Próstata/sangre , Vitamina D/análogos & derivados , Adulto , Anciano , Estudios de Casos y Controles , Método Doble Ciego , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Hormona Paratiroidea/sangre , Placebos , Polimorfismo de Nucleótido Simple/genética , Estudios Prospectivos , Neoplasias de la Próstata/genética , Receptores de Calcitriol/genética , Receptores X Retinoide/genética , Factores de Riesgo , Vitamina D/sangre , Vitamina D/genéticaRESUMEN
We propose a new approach to detect gene × gene joint action in genome-wide association studies (GWASs) for case-control designs. This approach offers an exhaustive search for all two-way joint action (including, as a special case, single gene action) that is computationally feasible at the genome-wide level and has reasonable statistical power under most genetic models. We found that the presence of any gene × gene joint action may imply differences in three types of genetic components: the minor allele frequencies and the amounts of Hardy-Weinberg disequilibrium may differ between cases and controls, and between the two genetic loci the degree of linkage disequilibrium may differ between cases and controls. Using Fisher's method, it is possible to combine the different sources of genetic information in an overall test for detecting gene × gene joint action. The proposed statistical analysis is efficient and its simplicity makes it applicable to GWASs. In the current study, we applied the proposed approach to a GWAS on schizophrenia and found several potential gene × gene interactions. Our application illustrates the practical advantage of the proposed method.
Asunto(s)
Genes , Estudio de Asociación del Genoma Completo/métodos , Esquizofrenia/genética , Estudios de Casos y Controles , Epistasis Genética/genética , Frecuencia de los Genes/genética , Sitios Genéticos/genética , Predisposición Genética a la Enfermedad/genética , Genoma Humano/genética , Humanos , Desequilibrio de Ligamiento/genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome-wide association analysis. Approaches to post-analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open-source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome-wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org.
Asunto(s)
Biología Computacional , Estudio de Asociación del Genoma Completo , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Programas InformáticosRESUMEN
A-to-I RNA editing operated by ADAR enzymes is extremely common in mammals. Several editing events in coding regions have pivotal physiological roles and affect protein sequence (recoding events) or function. We analyzed the evolutionary history of the 3 ADAR family genes and of their coding targets. Evolutionary analysis indicated that ADAR evolved adaptively in primates, with the strongest selection in the unique N-terminal domain of the interferon-inducible isoform. Positively selected residues in the human lineage were also detected in the ADAR deaminase domain and in the RNA binding domains of ADARB1 and ADARB2. During the recent history of human populations distinct variants in the 3 genes increased in frequency as a result of local selective pressures. Most selected variants are located within regulatory regions and some are in linkage disequilibrium with eQTLs in monocytes. Finally, analysis of conservation scores of coding editing sites indicated that editing events are counter-selected within regions that are poorly tolerant to change. Nevertheless, a minority of recoding events occurs at highly conserved positions and possibly represents the functional fraction. These events are enriched in pathways related to HIV-1 infection and to epidermis/hair development. Thus, both ADAR genes and their targets evolved under variable selective regimes, including purifying and positive selection. Pressures related to immune response likely represented major drivers of evolution for ADAR genes. As for their coding targets, we suggest that most editing events are slightly deleterious, although a minority may be beneficial and contribute to antiviral response and skin homeostasis.
Asunto(s)
Adenosina Desaminasa/genética , Variación Genética , Primates/genética , Edición de ARN , Proteínas de Unión al ARN/genética , Selección Genética , Adenosina Desaminasa/metabolismo , Secuencia de Aminoácidos , Animales , Evolución Biológica , Codón , Cabello/citología , Cabello/enzimología , Humanos , Isoenzimas/genética , Isoenzimas/metabolismo , Desequilibrio de Ligamiento , Datos de Secuencia Molecular , Monocitos/citología , Monocitos/enzimología , Sistemas de Lectura Abierta , Primates/clasificación , Sitios de Carácter Cuantitativo , Proteínas de Unión al ARN/metabolismo , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Piel/citología , Piel/enzimologíaRESUMEN
Genotype imputation facilitates the identification of missing genotypes on a high-density array using low-density arrays and has great potential for reducing genotyping costs for cattle populations. However, the imputation quality varies across breeds, which have different effective population sizes. Therefore, the accuracy of genotype imputation must be evaluated in each breed. The Japanese Black cattle population has a unique genetic background, and this study aimed to investigate different factors affecting imputation quality in this population. A total of 1368 animals were genotyped using the Illumina BovineHD BeadChip, and the accuracy of imputation was evaluated using information from four lower density arrays. The extent of linkage disequilibrium for this population was relatively higher than that in other beef breeds but lower than that in dairy breeds. The accuracy of arrays with more than 20 000 single nucleotide polymorphisms (SNPs) was similar to or higher than that of lower density arrays. In addition, the minor allele frequency of SNPs in the reference population affected the accuracy. The accuracy increased as the size of the reference population increased, up to 400 animals, beyond which there was little increase. A higher genetic relationship between the reference and test populations increased imputation accuracy. These results indicate that high imputation accuracy can be achieved using high-density arrays, having enough reference animals and including relatives in the reference population.