Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 118
Filter
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39101500

ABSTRACT

Genomic selection (GS) has emerged as an effective technology to accelerate crop hybrid breeding by enabling early selection prior to phenotype collection. Genomic best linear unbiased prediction (GBLUP) is a robust method that has been routinely used in GS breeding programs. However, GBLUP assumes that markers contribute equally to the total genetic variance, which may not be the case. In this study, we developed a novel GS method called GA-GBLUP that leverages the genetic algorithm (GA) to select markers related to the target trait. We defined four fitness functions for optimization, including AIC, BIC, R2, and HAT, to improve the predictability and bin adjacent markers based on the principle of linkage disequilibrium to reduce model dimension. The results demonstrate that the GA-GBLUP model, equipped with R2 and HAT fitness function, produces much higher predictability than GBLUP for most traits in rice and maize datasets, particularly for traits with low heritability. Moreover, we have developed a user-friendly R package, GAGBLUP, for GS, and the package is freely available on CRAN (https://CRAN.R-project.org/package=GAGBLUP).


Subject(s)
Algorithms , Genomics , Selection, Genetic , Zea mays , Genomics/methods , Zea mays/genetics , Oryza/genetics , Models, Genetic , Plant Breeding/methods , Linkage Disequilibrium , Phenotype , Quantitative Trait Loci , Genome, Plant , Polymorphism, Single Nucleotide , Software
2.
Genetics ; 227(4)2024 Aug 07.
Article in English | MEDLINE | ID: mdl-38861387

ABSTRACT

The main objective of mapping quantitative trait loci (QTL) and genome-wide association studies (GWAS) is to identify and locate QTLs on the genome. Estimating the sizes of QTL is equally important as identifying the QTLs. The size of a QTL is often measured by the QTL variance, or the proportion of phenotypic variance explained by the QTL, known as the QTL heritability. The reported QTL heritability is biased upward for small-sized QTLs estimated from small samples, especially in GWAS with a very small P-value threshold accommodating to Bonferroni correction for multiple tests. The phenomenon is called the Beavis effect. Methods of correcting the Beavis effect have been developed for additive effect models. Corresponding methods are not available for QTLs with more than one effect, such as QTLs including dominance and other genetic effects. In this study, we developed explicit formulas for estimating the variances and heritability for QTL with multiple effects. We also developed a method to remove nuisance parameters via an annihilator matrix. Finally, biases in estimated QTL variances caused by the Beavis effect are investigated and corrected. The new method is demonstrated by analyzing the 1000 grain weight (KGW) trait in a hybrid rice population.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Oryza , Quantitative Trait Loci , Oryza/genetics , Genome-Wide Association Study/methods , Genetic Variation , Phenotype , Chromosome Mapping/methods
3.
Int J Mol Sci ; 25(2)2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38256265

ABSTRACT

Maize is one of the major crops that has demonstrated success in the utilization of heterosis. Developing high-yield hybrids is a crucial part of plant breeding to secure global food demand. In this study, we conducted a genome-wide association study (GWAS) for 10 agronomic traits using a typical breeder population comprised 442 single-cross hybrids by evaluating additive, dominance, and epistatic effects. A total of 49 significant single nucleotide polymorphisms (SNPs) and 69 significant pairs of epistasis were identified, explaining 26.2% to 64.3% of the phenotypic variation across the 10 traits. The enrichment of favorable genotypes is significantly correlated to the corresponding phenotype. In the confident region of the associated site, 532 protein-coding genes were discovered. Among these genes, the Zm00001d044211 candidate gene was found to negatively regulate starch synthesis and potentially impact yield. This typical breeding population provided a valuable resource for dissecting the genetic architecture of yield-related traits. We proposed a novel mating strategy to increase the GWAS efficiency without utilizing more resources. Finally, we analyzed the enrichment of favorable alleles in the Shaan A and Shaan B groups, as well as in each inbred line. Our breeding practice led to consistent results. Not only does this study demonstrate the feasibility of GWAS in F1 hybrid populations, it also provides a valuable basis for further molecular biology and breeding research.


Subject(s)
Genome-Wide Association Study , Zea mays , Zea mays/genetics , Plant Breeding , Agriculture , Crops, Agricultural
4.
Cell Death Dis ; 14(8): 516, 2023 08 12.
Article in English | MEDLINE | ID: mdl-37573356

ABSTRACT

Urothelial bladder cancer (UBC) is one of the most prevalent malignancies worldwide, with striking tumor heterogeneity. Elucidating the molecular mechanisms that can be exploited for the treatment of aggressive UBC is a particularly relevant goal. Protein ubiquitination is a critical post-translational modification (PTM) that mediates the degradation of target protein via the proteasome. However, the roles of aberrant protein ubiquitination in UBC development and the underlying mechanisms by which it drives tumor progression remain unclear. In this study, taking advantage of clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein (Cas) 9 technology, we identified the ubiquitin E3 ligase ANAPC11, a critical subunit of the anaphase-promoting complex/cyclosome (APC/C), as a potential oncogenic molecule in UBC cells. Our clinical analysis showed that elevated expression of ANAPC11 was significantly correlated with high T stage, positive lymph node (LN) metastasis, and poor outcomes in UBC patients. By employing a series of in vitro experiments, we demonstrated that ANAPC11 enhanced the proliferation and invasiveness of UBC cells, while knockout of ANAPC11 inhibited the growth and LN metastasis of UBC cells in vivo. By conducting immunoprecipitation coupled with mass spectrometry, we confirmed that ANAPC11 increased the ubiquitination level of the Forkhead transcription factor FOXO3. The resulting decrease in FOXO3 protein stability led to the downregulation of the cell cycle regulator p21 and decreased expression of GULP1, a downstream effector of androgen receptor signaling. Taken together, these findings indicated that ANAPC11 plays an oncogenic role in UBC by modulating FOXO3 protein degradation. The ANAPC11-FOXO3 regulatory axis might serve as a novel therapeutic target for UBC.


Subject(s)
Ubiquitin-Protein Ligases , Urinary Bladder Neoplasms , Humans , Adaptor Proteins, Signal Transducing/metabolism , Anaphase-Promoting Complex-Cyclosome/metabolism , Apc11 Subunit, Anaphase-Promoting Complex-Cyclosome/metabolism , Cell Proliferation , Forkhead Box Protein O3/genetics , Forkhead Box Protein O3/metabolism , Lymphatic Metastasis , Proteolysis , Ubiquitin-Protein Ligases/genetics , Ubiquitin-Protein Ligases/metabolism , Ubiquitination , Urinary Bladder Neoplasms/genetics
5.
Nature ; 606(7914): 527-534, 2022 06.
Article in English | MEDLINE | ID: mdl-35676474

ABSTRACT

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.


Subject(s)
Genetic Variation , Genome, Plant , Genome-Wide Association Study , Plant Breeding , Solanum lycopersicum , Alleles , Crops, Agricultural/genetics , Genome, Plant/genetics , Linkage Disequilibrium , Solanum lycopersicum/genetics , Solanum lycopersicum/metabolism
6.
PLoS Comput Biol ; 18(3): e1009923, 2022 03.
Article in English | MEDLINE | ID: mdl-35275920

ABSTRACT

Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Animals , Chromosome Mapping/methods , Mice , Models, Genetic , Monte Carlo Method , Quantitative Trait Loci/genetics
7.
J Natl Cancer Inst ; 114(2): 220-227, 2022 02 07.
Article in English | MEDLINE | ID: mdl-34473310

ABSTRACT

BACKGROUND: Cystoscopy plays an important role in bladder cancer (BCa) diagnosis and treatment, but its sensitivity needs improvement. Artificial intelligence has shown promise in endoscopy, but few cystoscopic applications have been reported. We report a Cystoscopy Artificial Intelligence Diagnostic System (CAIDS) for BCa diagnosis. METHODS: In total, 69 204 images from 10 729 consecutive patients from 6 hospitals were collected and divided into training, internal validation, and external validation sets. The CAIDS was built using a pyramid scene parsing network and transfer learning. A subset (n = 260) of the validation sets was used for a performance comparison between the CAIDS and urologists for complex lesion detection. The diagnostic accuracy, sensitivity, specificity, and positive and negative predictive values and 95% confidence intervals (CIs) were calculated using the Clopper-Pearson method. RESULTS: The diagnostic accuracies of the CAIDS were 0.977 (95% CI = 0.974 to 0.979) in the internal validation set and 0.990 (95% CI = 0.979 to 0.996), 0.982 (95% CI = 0.974 to 0.988), 0.978 (95% CI = 0.959 to 0.989), and 0.991 (95% CI = 0.987 to 0.994) in different external validation sets. In the CAIDS vs urologists' comparisons, the CAIDS showed high accuracy and sensitivity (accuracy = 0.939, 95% CI = 0.902 to 0.964; sensitivity = 0.954, 95% CI = 0.902 to 0.983) with a short latency of 12 seconds, much more accurate and quicker than the expert urologists. CONCLUSIONS: The CAIDS achieved accurate BCa detection with a short latency. The CAIDS may provide many clinical benefits, from increasing the diagnostic accuracy for BCa, even for commonly misdiagnosed cases such as flat cancerous tissue (carcinoma in situ), to reducing the operation time for cystoscopy.


Subject(s)
Cystoscopy , Urinary Bladder Neoplasms , Artificial Intelligence , Cystoscopy/methods , Humans , Predictive Value of Tests , Urinary Bladder Neoplasms/diagnostic imaging , Urinary Bladder Neoplasms/pathology
8.
Front Plant Sci ; 12: 774478, 2021.
Article in English | MEDLINE | ID: mdl-34917109

ABSTRACT

Heterosis contributes a big proportion to hybrid performance in maize, especially for grain yield. It is attractive to explore the underlying genetic architecture of hybrid performance and heterosis. Considering its complexity, different from former mapping method, we developed a series of linear mixed models incorporating multiple polygenic covariance structures to quantify the contribution of each genetic component (additive, dominance, additive-by-additive, additive-by-dominance, and dominance-by-dominance) to hybrid performance and midparent heterosis variation and to identify significant additive and non-additive (dominance and epistatic) quantitative trait loci (QTL). Here, we developed a North Carolina II population by crossing 339 recombinant inbred lines with two elite lines (Chang7-2 and Mo17), resulting in two populations of hybrids signed as Chang7-2 × recombinant inbred lines and Mo17 × recombinant inbred lines, respectively. The results of a path analysis showed that kernel number per row and hundred grain weight contributed the most to the variation of grain yield. The heritability of midparent heterosis for 10 investigated traits ranged from 0.27 to 0.81. For the 10 traits, 21 main (additive and dominance) QTL for hybrid performance and 17 dominance QTL for midparent heterosis were identified in the pooled hybrid populations with two overlapping QTL. Several of the identified QTL showed pleiotropic effects. Significant epistatic QTL were also identified and were shown to play an important role in ear height variation. Genomic selection was used to assess the influence of QTL on prediction accuracy and to explore the strategy of heterosis utilization in maize breeding. Results showed that treating significant single nucleotide polymorphisms as fixed effects in the linear mixed model could improve the prediction accuracy under prediction schemes 2 and 3. In conclusion, the different analyses all substantiated the different genetic architecture of hybrid performance and midparent heterosis in maize. Dominance contributes the highest proportion to heterosis, especially for grain yield, however, epistasis contributes the highest proportion to hybrid performance of grain yield.

9.
Genetics ; 219(3)2021 11 05.
Article in English | MEDLINE | ID: mdl-34740243

ABSTRACT

The Beavis effect in quantitative trait locus (QTL) mapping describes a phenomenon that the estimated effect size of a statistically significant QTL (measured by the QTL variance) is greater than the true effect size of the QTL if the sample size is not sufficiently large. This is a typical example of the Winners' curse applied to molecular quantitative genetics. Theoretical evaluation and correction for the Winners' curse have been studied for interval mapping. However, similar technologies have not been available for current models of QTL mapping and genome-wide association studies where a polygene is often included in the linear mixed models to control the genetic background effect. In this study, we developed the theory of the Beavis effect in a linear mixed model using a truncated noncentral Chi-square distribution. We equated the observed Wald test statistic of a significant QTL to the expectation of a truncated noncentral Chi-square distribution to obtain a bias-corrected estimate of the QTL variance. The results are validated from replicated Monte Carlo simulation experiments. We applied the new method to the grain width (GW) trait of a rice population consisting of 524 homozygous varieties with over 300 k single nucleotide polymorphism markers. Two loci were identified and the estimated QTL heritability were corrected for the Beavis effect. Bias correction for the larger QTL on chromosome 5 (GW5) with an estimated heritability of 12% did not change the QTL heritability due to the extremely large test score and estimated QTL effect. The smaller QTL on chromosome 9 (GW9) had an estimated QTL heritability of 9% reduced to 6% after the bias-correction.


Subject(s)
Chromosome Mapping/methods , Models, Genetic , Oryza/genetics , Quantitative Trait Loci , Chromosomes, Plant/genetics , Computer Simulation , Genome-Wide Association Study , Monte Carlo Method , Multifactorial Inheritance , Multivariate Analysis , Seeds/genetics
10.
NAR Genom Bioinform ; 3(3): lqab060, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34235432

ABSTRACT

Genome-wide association study data analyses often face two significant challenges: (i) high dimensionality of single-nucleotide polymorphism (SNP) genotypes and (ii) imputation of missing values. SNPs are not independent due to physical linkage and natural selection. The correlation of nearby SNPs is known as linkage disequilibrium (LD), which can be used for LD conceptual SNP bin mapping, missing genotype inferencing and SNP dimension reduction. We used a stochastic process to describe the SNP signals and proposed two types of autocorrelations to measure nearby SNPs' information redundancy. Based on the calculated autocorrelation coefficients, we constructed LD bins. We adopted a k-nearest neighbors algorithm (kNN) to impute the missing genotypes. We proposed several novel methods to find the optimal synthetic marker to represent the SNP bin. We also proposed methods to evaluate the information loss or information conservation between using the original genome-wide markers and using dimension-reduced synthetic markers. Our performance assessments on the real-life SNP data from a rice recombinant inbred line (RIL) population and a rice HapMap project show that the new methods produce satisfactory results. We implemented these functional modules in C/C++ and streamlined them into a web-based pipeline named PIP-SNP (https://bioinfo.noble.org/PIP_SNP/) for processing SNP data.

11.
Clin Epigenetics ; 13(1): 91, 2021 04 26.
Article in English | MEDLINE | ID: mdl-33902700

ABSTRACT

BACKGROUND: Current non-invasive tests have limited sensitivities and lack capabilities of pre-operative risk stratification for bladder cancer (BC) diagnosis. We aimed to develop and validate a urine-based DNA methylation assay as a clinically feasible test for improving BC detection and enabling pre-operative risk stratifications. METHODS: A urine-based DNA methylation assay was developed and validated by retrospective single-center studies in patients of suspected BC in Cohort 1 (n = 192) and Cohort 2 (n = 98), respectively. In addition, a prospective single-center study in hematuria patient group (Cohort 3, n = 174) was used as a second validation of the model. RESULTS: The assay with a dual-marker detection model showed 88.1% and 91.2% sensitivities, 89.7% and 85.7% specificities in validation Cohort 2 (patients of suspected BC) and Cohort 3 (patients of hematuria), respectively. Furthermore, this assay showed improved sensitivities over cytology and FISH on detecting low-grade tumor (66.7-77.8% vs. 0.0-22.2%, 0.0-22.2%), Ta tumor (83.3% vs. 22.2-41.2%, 44.4-52.9%) and non-muscle invasive BC (NMIBC) (80.0-89.7% vs. 51.5-52.0%, 59.4-72.0%) in both cohorts. The assay also had higher accuracies (88.9-95.8%) in diagnosing cases with concurrent genitourinary disorders as compared to cytology (55.6-70.8%) and FISH (72.2-77.8%). Meanwhile, the assay with a five-marker stratification model identified high-risk NMIBC and muscle invasive BC with 90.5% sensitivity and 86.8% specificity in Cohort 2. CONCLUSIONS: The urine-based DNA methylation assay represents a highly sensitive and specific approach for BC early-stage detection and risk stratification. It has a potential to be used as a routine test to improve diagnosis and prognosis of BC in clinic.


Subject(s)
DNA Methylation/genetics , DNA, Neoplasm/genetics , DNA, Neoplasm/urine , Early Detection of Cancer/methods , Urinary Bladder Neoplasms/genetics , Urinary Bladder Neoplasms/urine , Biomarkers, Tumor/genetics , Biomarkers, Tumor/urine , Cohort Studies , Prospective Studies , Reproducibility of Results , Risk Assessment , Sensitivity and Specificity , Urinary Bladder Neoplasms/diagnosis
12.
Plant Biotechnol J ; 19(2): 261-272, 2021 02.
Article in English | MEDLINE | ID: mdl-32738177

ABSTRACT

Hybrid breeding has been shown to effectively increase rice productivity. However, identifying desirable hybrids out of numerous potential combinations is a daunting challenge. Genomic selection holds great promise for accelerating hybrid breeding by enabling early selection before phenotypes are measured. With the recent advances in multi-omic technologies, hybrid prediction based on transcriptomic and metabolomic data has received increasing attention. However, the current omic-based hybrid prediction has ignored parental phenotypic information, which is of fundamental importance in plant breeding. In this study, we integrated parental phenotypic information into various multi-omic prediction models applied in hybrid breeding of rice and compared the predictabilities of 15 combinations from four sets of predictors from the parents, that is genome, transcriptome, metabolome and phenome. The predictability for each combination was evaluated using the best linear unbiased prediction and a modified fast HAT method. We found significant interactions between predictors and traits in predictability, but joint prediction with various combinations of the predictors significantly improved predictability relative to prediction of any single source omic data for each trait investigated. Incorporation of parental phenotypic data into various omic predictors increased the predictability, averagely by 13.6%, 54.5%, 19.9% and 8.3%, for grain yield, number of tillers per plant, number of grains per panicle and 1000 grain weight, respectively. Among nine models of incorporating parental traits, the AD-All model was the most effective one. This novel strategy of incorporating parental phenotypic data into multi-omic prediction is expected to improve hybrid breeding progress, especially with the development of high-throughput phenotyping technologies.


Subject(s)
Oryza , Hybridization, Genetic , Models, Genetic , Oryza/genetics , Phenotype , Plant Breeding
13.
Front Plant Sci ; 11: 583277, 2020.
Article in English | MEDLINE | ID: mdl-33281846

ABSTRACT

Accurate phenotype prediction of quantitative traits is paramount to enhanced plant research and breeding. Here, we report the accurate prediction of cotton fiber length, a typical quantitative trait, using 474 cotton (Gossypium ssp.) fiber length (GFL) genes and nine prediction models. When the SNPs/InDels contained in 226 of the GFL genes or the expressions of all 474 GFL genes was used for fiber length prediction, a prediction accuracy of r = 0.83 was obtained, approaching the maximally possible prediction accuracy of a quantitative trait. This has improved by 116%, the prediction accuracies of the fiber length thus far achieved for genomic selection using genome-wide random DNA markers. Moreover, analysis of the GFL genes identified 125 of the GFL genes that are key to accurate prediction of fiber length, with which a prediction accuracy similar to that of all 474 GFL genes was obtained. The fiber lengths of the plants predicted with expressions of the 125 key GFL genes were significantly correlated with those predicted with the SNPs/InDels of the above 226 SNP/InDel-containing GFL genes (r = 0.892, P = 0.000). The prediction accuracies of fiber length using both genic datasets were highly consistent across environments or generations. Finally, we found that a training population consisting of 100-120 plants was sufficient to train a model for accurate prediction of a quantitative trait using the genes controlling the trait. Therefore, the genes controlling a quantitative trait are capable of accurately predicting its phenotype, thereby dramatically improving the ability, accuracy, and efficiency of phenotype prediction and promoting gene-based breeding in cotton and other species.

14.
Genetics ; 216(3): 781-804, 2020 11.
Article in English | MEDLINE | ID: mdl-32978270

ABSTRACT

The biological basis of exercise behavior is increasingly relevant for maintaining healthy lifestyles. Various quantitative genetic studies and selection experiments have conclusively demonstrated substantial heritability for exercise behavior in both humans and laboratory rodents. In the "High Runner" selection experiment, four replicate lines of Mus domesticus were bred for high voluntary wheel running (HR), along with four nonselected control (C) lines. After 61 generations, the genomes of 79 mice (9-10 from each line) were fully sequenced and single nucleotide polymorphisms (SNPs) were identified. We used nested ANOVA with MIVQUE estimation and other approaches to compare allele frequencies between the HR and C lines for both SNPs and haplotypes. Approximately 61 genomic regions, across all somatic chromosomes, showed evidence of differentiation; 12 of these regions were differentiated by all methods of analysis. Gene function was inferred largely using Panther gene ontology terms and KO phenotypes associated with genes of interest. Some of the differentiated genes are known to be associated with behavior/motivational systems and/or athletic ability, including Sorl1, Dach1, and Cdh10 Sorl1 is a sorting protein associated with cholinergic neuron morphology, vascular wound healing, and metabolism. Dach1 is associated with limb bud development and neural differentiation. Cdh10 is a calcium ion binding protein associated with phrenic neurons. Overall, these results indicate that selective breeding for high voluntary exercise has resulted in changes in allele frequencies for multiple genes associated with both motivation and ability for endurance exercise, providing candidate genes that may explain phenotypic changes observed in previous studies.


Subject(s)
Directed Molecular Evolution , Polymorphism, Single Nucleotide , Running , Selection, Genetic , Animals , Cadherins/genetics , Chromosomes/genetics , Eye Proteins/genetics , Female , Hybridization, Genetic , Male , Membrane Transport Proteins/genetics , Mice , Mice, Inbred ICR , Multifactorial Inheritance , Receptors, LDL/genetics
15.
J Clin Invest ; 130(12): 6278-6289, 2020 12 01.
Article in English | MEDLINE | ID: mdl-32817589

ABSTRACT

BACKGROUNDCurrent methods for the detection and surveillance of bladder cancer (BCa) are often invasive and/or possess suboptimal sensitivity and specificity, especially in early-stage, minimal, and residual tumors.METHODSWe developed an efficient method, termed utMeMA, for the detection of urine tumor DNA methylation at multiple genomic regions by MassARRAY. We identified the BCa-specific methylation markers by combined analyses of cohorts from Sun Yat-sen Memorial Hospital (SYSMH), The Cancer Genome Atlas (TCGA), and the Gene Expression Omnibus (GEO) database. The BCa diagnostic model was built in a retrospective cohort (n = 313) and validated in a multicenter, prospective cohort (n = 175). The performance of this diagnostic assay was analyzed and compared with urine cytology and FISH.RESULTSWe first discovered 26 significant methylation markers of BCa in combined analyses. We built and validated a 2-marker-based diagnostic model that discriminated among patients with BCa with high accuracy (86.7%), sensitivity (90.0%), and specificity (83.1%). Furthermore, the utMeMA-based assay achieved a great improvement in sensitivity over urine cytology and FISH, especially in the detection of early-stage (stage Ta and low-grade tumor, 64.5% vs. 11.8%, 15.8%), minimal (81.0% vs. 14.8%, 37.9%), residual (93.3% vs. 27.3%, 64.3%), and recurrent (89.5% vs. 31.4%, 52.8%) tumors. The urine diagnostic score from this assay was better associated with tumor malignancy and burden.CONCLUSIONUrine tumor DNA methylation assessment for early diagnosis, minimal, residual tumor detection and surveillance in BCa is a rapid, high-throughput, noninvasive, and promising approach, which may reduce the burden of cystoscopy and blind second surgery.FUNDINGThis study was supported by the National Key Research and Development Program of China and the National Natural Science Foundation of China.


Subject(s)
Biomarkers, Tumor/urine , DNA Methylation , DNA, Neoplasm/urine , Early Detection of Cancer , Urinary Bladder Neoplasms/diagnosis , Urinary Bladder Neoplasms/urine , Aged , Biomarkers, Tumor/genetics , DNA, Neoplasm/genetics , Female , Humans , Male , Middle Aged , Retrospective Studies , Urinary Bladder Neoplasms/genetics
16.
Bioinformatics ; 36(19): 4833-4837, 2020 12 08.
Article in English | MEDLINE | ID: mdl-32614415

ABSTRACT

SUMMARY: We have developed a rapid mixed model algorithm for exhaustive genome-wide epistatic association analysis by controlling multiple polygenic effects. Our model can simultaneously handle additive by additive epistasis, dominance by dominance epistasis and additive by dominance epistasis, and account for intrasubject fluctuations due to individuals with repeated records. Furthermore, we suggest a simple but efficient approximate algorithm, which allows the examination of all pairwise interactions in a remarkably fast manner of linear with population size. Simulation studies are performed to investigate the properties of REMMAX. Application to publicly available yeast and human data has showed that our mixed model-based method has similar performance with simple linear model on computational efficiency. It took less than 40 h for the pairwise analysis of 5000 individuals genotyped with roughly 350 000 SNPs with five threads on Intel Xeon E5 2.6 GHz CPU. AVAILABILITY AND IMPLEMENTATION: Source codes are freely available at https://github.com/chaoning/GMAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Epistasis, Genetic , Multifactorial Inheritance , Algorithms , Genome-Wide Association Study , Humans , Multifactorial Inheritance/genetics , Software
17.
Bioinformatics ; 36(14): 4154-4162, 2020 08 15.
Article in English | MEDLINE | ID: mdl-32379866

ABSTRACT

MOTIVATION: Genome-wide association studies (GWAS) are still the primary steps toward gene discovery. The urgency is more obvious in the big data era when GWAS are conducted simultaneously for thousand traits, e.g. transcriptomic and metabolomic traits. Efficient mixed model association (EMMA) and genome-wide efficient mixed model association (GEMMA) are the widely used methods for GWAS. An algorithm with high computational efficiency is badly needed. It is interesting to note that the test statistics of the ordinary ridge regression (ORR) have the same patterns across the genome as those obtained from the EMMA method. However, ORR has never been used for GWAS due to its severe shrinkage on the estimated effects and the test statistics. RESULTS: We introduce a degree of freedom for each marker effect obtained from ORR and use it to deshrink both the estimated effect and the standard error so that the Wald test of ORR is brought back to the same level as that of EMMA. The new method is called deshrinking ridge regression (DRR). By evaluating the methods under three different model sizes (small, medium and large), we demonstrate that DRR is more generalized for all model sizes than EMMA, which only works for medium and large models. Furthermore, DRR detect all markers in a simultaneous manner instead of scanning one marker at a time. As a result, the computational time complexity of DRR is much simpler than EMMA and about m (number of genetic variants) times simpler than that of GEMMA when the sample size is way smaller than the number of markers. CONTACT: shizhong.xu@ucr.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Algorithms , Phenotype , Sample Size
18.
NAR Genom Bioinform ; 2(1): lqz009, 2020 Mar.
Article in English | MEDLINE | ID: mdl-33575561

ABSTRACT

Genome-wide association study (GWAS) is a powerful approach that has revolutionized the field of quantitative genetics. Two-dimensional GWAS that accounts for epistatic genetic effects needs to consider the effects of marker pairs, thus quadratic genetic variants, compared to one-dimensional GWAS that accounts for individual genetic variants. Calculating genome-wide kinship matrices in GWAS that account for relationships among individuals represented by ultra-high dimensional genetic variants is computationally challenging. Fortunately, kinship matrix calculation involves pure matrix operations and the algorithms can be parallelized, particular on graphics processing unit (GPU)-empowered high-performance computing (HPC) architectures. We have devised a new method and two pipelines: KMC1D and KMC2D for kinship matrix calculation with high-dimensional genetic variants, respectively, facilitating 1D and 2D GWAS analyses. We first divide the ultra-high-dimensional markers and marker pairs into successive blocks. We then calculate the kinship matrix for each block and merge together the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. The performance analyses show that the calculation speed of KMC1D and KMC2D can be accelerated by 100-400 times over the conventional CPU-based computing.

19.
Heredity (Edinb) ; 124(2): 288-298, 2020 02.
Article in English | MEDLINE | ID: mdl-31641238

ABSTRACT

Linear mixed models (LMM) that tests trait association one marker at a time have been the most popular methods for genome-wide association studies. However, this approach has potential pitfalls: over conservativeness after Bonferroni correction, ignorance of linkage disequilibrium (LD) between neighboring markers, and power reduction due to overfitting SNP effects. So, multiple locus models that can simultaneously estimate and test all markers in the genome are more appropriate. Based on the multiple locus models, we proposed a bin model that combines markers into bins based on their LD relationships. A bin is treated as a new synthetic marker and we detect the associations between bins and traits. Since the number of bins can be substantially smaller than the number of markers, a penalized multiple regression method can be adopted by fitting all bins to a single model. We developed an innovative method to bin the neighboring markers and used the least absolute shrinkage and selection operator (LASSO) method. We compared BIN-Lasso with SNP-Lasso and Q + K-LMM in a simulation experiment, and showed that the new method is more powerful with less Type I error than the other two methods. We also applied the bin model to a Chinese Simmental beef cattle population for bone weight association study. The new method identified more significant associations than the classical LMM. The bin model is a new dimension reduction technique that takes advantage of biological information (i.e., LD). The new method will be a significant breakthrough in associative genomics in the big data era.


Subject(s)
Cattle/genetics , Genetic Association Studies/veterinary , Genomics/methods , Models, Genetic , Animals , Computer Simulation , Genotype , Linear Models , Linkage Disequilibrium , Polymorphism, Single Nucleotide
20.
Genomics ; 112(1): 225-236, 2020 01.
Article in English | MEDLINE | ID: mdl-30826444

ABSTRACT

Accurately predicting the phenotypes of complex traits is crucial to enhanced breeding in plants and livestock, and to enhanced medicine in humans. Here we reports the first study accurately predicting complex traits using their contributing genes, especially their number of favorable alleles (NFAs), genotypes and transcript expressions, with the grain yield of maize, Zea mays L. When the NFAs or genotypes of only 27 SNP/InDel-containing grain yield genes were used, a prediction accuracy of r = 0.52 or 0.49 was obtained. When the expressions of grain yield gene transcripts were used, a plateaued prediction accuracy of r = 0.84 was achieved. When the phenotypes predicted with two or three of the genic datasets were used for progeny selection, the selected lines were completely consistent with those selected by phenotypic selection. Therefore, the genes controlling complex traits enable accurately predicting their phenotypes, thus desirable for gene-based breeding in crop plants.


Subject(s)
Edible Grain/genetics , Genes, Plant , Plant Breeding/methods , Zea mays/genetics , Alleles , Gene Expression , Genotype , Multifactorial Inheritance , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL