RESUMEN
Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits. However, most existing multilocus methods require relatively long computational time when analyzing large datasets. To address this issue, in this study, we proposed a fast mrMLM method, namely, best linear unbiased prediction multilocus random-SNP-effect mixed linear model (BLUPmrMLM). First, genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction (BLUP) values of marker effects and their variances in BLUPmrMLM. Then, adaptive best subset selection (ABESS) was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes. Finally, shared memory and parallel computing schemes were used to reduce the computational time. In simulation studies, BLUPmrMLM outperformed GEMMA, EMMAX, mrMLM, and FarmCPU as well as the control method (BLUPmrMLM with ABESS removed), in terms of computational time, power, accuracy for estimating quantitative trait nucleotide positions and effects, false positive rate, false discovery rate, false negative rate, and F1 score. In the reanalysis of two large rice datasets, BLUPmrMLM significantly reduced the computational time and identified more previously reported genes, compared with the aforementioned methods. This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets. The software mrMLM v5.1 is available at BioCode (https://ngdc.cncb.ac.cn/biocode/tool/BT007388) or GitHub (https://github.com/YuanmingZhang65/mrMLM).
Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Polimorfismo de Nucleótido Simple/genética , Oryza/genética , Sitios de Carácter Cuantitativo/genética , Modelos GenéticosRESUMEN
Interspecific genomic introgression is an important evolutionary process with respect to the generation of novel phenotypic diversity and adaptation. A key question is how gene flow perturbs gene expression networks and regulatory interactions. Here, an introgression population of two species of allopolyploid cotton (Gossypium) to delineate the regulatory perturbations of gene expression regarding fiber development accompanying fiber quality change is utilized. De novo assembly of the recipient parent (G. hirsutum Emian22) genome allowed the identification of genomic variation and introgression segments (ISs) in 323 introgression lines (ILs) from the donor parent (G. barbadense 3-79). It documented gene expression dynamics by sequencing 1,284 transcriptomes of developing fibers and characterized genetic regulatory perturbations mediated by genomic introgression using a multi-locus model. Introgression of individual homoeologous genes exhibiting extreme low or high expression bias can lead to a parallel expression bias in their non-introgressed duplicates, implying a shared yet divergent regulatory fate of duplicated genes following allopolyploidy. Additionally, the IL N182 with improved fiber quality is characterized, and the candidate gene GhFLAP1 related to fiber length is validated. This study outlines a framework for understanding introgression-mediated regulatory perturbations in polyploids, and provides insights for targeted breeding of superior upland cotton fiber.
Asunto(s)
Fibra de Algodón , Regulación de la Expresión Génica de las Plantas , Gossypium , Gossypium/genética , Regulación de la Expresión Génica de las Plantas/genética , Introgresión Genética/genética , Genoma de Planta/genética , Tetraploidía , Poliploidía , Transcriptoma/genéticaRESUMEN
Malus sieversii, commonly known as wild apples, represents a Tertiary relict plant species and serves as the progenitor of globally cultivated apple varieties. Unfortunately, wild apple populations are facing significant degradation in localized areas due to a myriad of factors. To gain a comprehensive understanding of the nutrient status and spatiotemporal variations of M. sieversii, green leaves were collected in May and July, and the fallen leaves were collected in October. The concentrations of leaf nitrogen (N), phosphorus (P), and potassium (K) were measured, and the stoichiometric ratios as well as nutrient resorption efficiencies were calculated. The study also explored the relative contributions of soil, topographic, and biotic factors to the variation in nutrient traits. The results indicate that as the growing period progressed, the concentrations of N and P in the leaves significantly decreased (P < 0.05), and the concentration of K in October was significantly lower than in May and July. Throughout plant growth, leaf N-P and N-K exhibited hyperallometric relationships, while P-K showed an isometric relationship. Resorption efficiency followed the order of N < P < K (P < 0.05), with all three ratios being less than 1; this indicates that the order of nutrient limitation is K > P > N. The resorption efficiencies were mainly regulated by nutrient concentrations in fallen leaves. A robust spatial dependence was observed in leaf nutrient concentrations during all periods (70.1-97.9% for structural variation), highlighting that structural variation, rather than random factors, dominated the spatial variation. Nutrient resorption efficiencies (NRE, PRE, and KRE) displayed moderate structural variation (30.2-66.8%). The spatial patterns of nutrient traits varied across growth periods, indicating they are influenced by multifactorial elements (in which, soil property showed the highest influence). In conclusion, wild apples manifested differentiated spatiotemporal variability and influencing factors across various leaf nutrient traits. These results provide crucial insights into the spatiotemporal patterns and influencing factors of leaf nutrient traits of M. sieversii at the permanent plot scale for the first time. This work is of great significance for the ecosystem restoration and sustainable management of degrading wild fruit forests.
Asunto(s)
Malus , Nitrógeno , Fósforo , Hojas de la Planta , Potasio , Hojas de la Planta/metabolismo , Malus/metabolismo , Malus/crecimiento & desarrollo , Malus/fisiología , China , Fósforo/metabolismo , Fósforo/análisis , Nitrógeno/metabolismo , Potasio/metabolismo , Potasio/análisis , Bosques , Nutrientes/metabolismo , Nutrientes/análisis , Suelo/química , Frutas/crecimiento & desarrollo , Frutas/metabolismo , Análisis Espacio-TemporalRESUMEN
Large sample datasets have been regarded as the primary basis for innovative discoveries and the solution to missing heritability in genome-wide association studies. However, their computational complexity cannot consider all comprehensive effects and all polygenic backgrounds, which reduces the effectiveness of large datasets. To address these challenges, we included all effects and polygenic backgrounds in a mixed logistic model for binary traits and compressed four variance components into two. The compressed model combined three computational algorithms to develop an innovative method, called FastBiCmrMLM, for large data analysis. These algorithms were tailored to sample size, computational speed, and reduced memory requirements. To mine additional genes, linkage disequilibrium markers were replaced by bin-based haplotypes, which are analyzed by FastBiCmrMLM, named FastBiCmrMLM-Hap. Simulation studies highlighted the superiority of FastBiCmrMLM over GMMAT, SAIGE and fastGWA-GLMM in identifying dominant, small α (allele substitution effect), and rare variants. In the UK Biobank-scale dataset, we demonstrated that FastBiCmrMLM could detect variants as small as 0.03% and with α ≈ 0. In re-analyses of seven diseases in the WTCCC datasets, 29 candidate genes, with both functional and TWAS evidence, around 36 variants identified only by the new methods, strongly validated the new methods. These methods offer a new way to decipher the genetic architecture of binary traits and address the challenges outlined above.
Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo , Estudio de Asociación del Genoma Completo/métodos , Humanos , Modelos Logísticos , Estudios de Casos y Controles , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Genómica/métodos , Simulación por Computador , Haplotipos , Modelos GenéticosRESUMEN
BACKGROUND: Salt stress significantly reduces soybean yield. To improve salt tolerance in soybean, it is important to mine the genes associated with salt tolerance traits. RESULTS: Salt tolerance traits of 286 soybean accessions were measured four times between 2009 and 2015. The results were associated with 740,754 single nucleotide polymorphisms (SNPs) to identify quantitative trait nucleotides (QTNs) and QTN-by-environment interactions (QEIs) using three-variance-component multi-locus random-SNP-effect mixed linear model (3VmrMLM). As a result, eight salt tolerance genes (GmCHX1, GsPRX9, Gm5PTase8, GmWRKY, GmCHX20a, GmNHX1, GmSK1, and GmLEA2-1) near 179 significant and 79 suggested QTNs and two salt tolerance genes (GmWRKY49 and GmSK1) near 45 significant and 14 suggested QEIs were associated with salt tolerance index traits in previous studies. Six candidate genes and three gene-by-environment interactions (GEIs) were predicted to be associated with these index traits. Analysis of four salt tolerance related traits under control and salt treatments revealed six genes associated with salt tolerance (GmHDA13, GmPHO1, GmERF5, GmNAC06, GmbZIP132, and GmHsp90s) around 166 QEIs were verified in previous studies. Five candidate GEIs were confirmed to be associated with salt stress by at least one haplotype analysis. The elite molecular modules of seven candidate genes with selection signs were extracted from wild soybean, and these genes could be applied to soybean molecular breeding. Two of these genes, Glyma06g04840 and Glyma07g18150, were confirmed by qRT-PCR and are expected to be key players in responding to salt stress. CONCLUSIONS: Around the QTNs and QEIs identified in this study, 16 known genes, 6 candidate genes, and 8 candidate GEIs were found to be associated with soybean salt tolerance, of which Glyma07g18150 was further confirmed by qRT-PCR.
Asunto(s)
Interacción Gen-Ambiente , Genes de Plantas , Glycine max , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Tolerancia a la Sal , Glycine max/genética , Glycine max/fisiología , Tolerancia a la Sal/genética , Sitios de Carácter Cuantitativo/genética , FenotipoRESUMEN
International interest is growing in biodiversity conservation and sustainable use in drylands. Desert ecosystems across arid Central Asia are severely affected by global change. Understanding the changes in a plant community is an essential prerequisite to revealing the community assembly mechanism, vegetation conservation, and management. The knowledge of large-scale spatial variation in plant community structure in different Central Asian deserts is still limited. In this study, we selected the Taukum (TD, Kazakhstan) and the Gurbantunggut (GD, China) deserts as the research area, with similar latitudes despite being nearly 1000 km apart. Thirteen and 15 sampling plots were set up and thoroughly investigated. The differences in community structure depending on multiple plant attributes (individual level: plant height, canopy diameter, and plant volume, and community level: plant density, total cover, and total volume) were systematically studied. TD had a better overall environmental status than GD. A total of 113 species were found, with 68 and 74 in TD and GD, respectively. The number of species and plant attributes was unequally distributed across different families and functional groups between deserts. The values of several plant attributes, such as ephemerals, annuals, dicotyledons, and shrubs with assimilative branches in GD, were significantly lower than those in TD. The Motyka indices of six plant attributes (26.18-38.61%) were higher between the two deserts than the species similarity index (20.4%), indicating a more robust convergence for plant functional attributes. The community structures in the two deserts represented by different plant attribute matrices demonstrated irregular differentiation patterns in ordination diagrams. The most variance in community structure was attributed to soil and climatic factors, while geographic factors had the smallest proportion. Consequently, the community structures of the two distant deserts were both different and similar to an extent. This resulted from the long-term impacts of heterogeneous environments within the same region. Our knowledge is further deepened by understanding the variation in community structure in different deserts on a large spatial scale. This therefore provides valuable insights into conserving regional biodiversity in Central Asia.
RESUMEN
BACKGROUND: Wild apple (Malus sieversii) is under second-class national protection in China and one of the lineal ancestors of cultivated apples worldwide. In recent decades, the natural habitation area of wild apple trees has been seriously declining, resulting in a lack of saplings and difficulty in population regeneration. Artificial near-natural breeding is crucial for protecting and restoring wild apple populations, and adding nitrogen (N) and phosphorous (P) is one of the important measures to improve the growth performance of saplings. In this study, field experiments using N (CK, N1, N2, and N3: 0, 10, 20, and 40 g m- 2 yr- 1, respectively), P (CK, P1, P2, and P3: 0, 2, 4, and 8 g m- 2 yr- 1, respectively), N20Px (CK, N2P1, N2P2, and N2P3: N20P2, N20P4 and N20P8 g m- 2 yr- 1, respectively), and NxP4 (CK, N1P2, N2P2, and N3P2: N10P4, N20P4, and N40P4 g m- 2 yr- 1, respectively) treatments (totaling 12 levels, including one CK) were conducted in four consecutive years. The twig traits (including four current-year stem, 10 leaf, and three ratio traits) and comprehensive growth performance of wild apple saplings were analyzed under different nutrient treatments. RESULTS: N addition had a significantly positive effect on stem length, basal diameter, leaf area, and leaf dry mass, whereas P addition had a significantly positive effect on stem length and basal diameter only. The combination of N and P (NxP4 and N20Px) treatments evidently promoted stem growth at moderate concentrations; however, the N20Px treatment showed a markedly negative effect at low concentrations and a positive effect at moderate and high concentrations. The ratio traits (leaf intensity, leaf area ratio, and leaf to stem mass ratio) decreased with the increase in nutrient concentration under each treatment. In the plant trait network, basal diameter, stem mass, and twig mass were tightly connected to other traits after nutrient treatments, indicating that stem traits play an important role in twig growth. The membership function revealed that the greatest comprehensive growth performance of saplings was achieved after N addition alone, followed by that under the NxP4 treatment (except for N40P4). CONCLUSIONS: Consequently, artificial nutrient treatments for four years significantly but differentially altered the growth status of wild apple saplings, and the use of appropriate N fertilizer promoted sapling growth. These results can provide scientific basis for the conservation and management of wild apple populations.
Asunto(s)
Malus , Malus/genética , Fitomejoramiento , Nitrógeno , Hojas de la Planta , FenotipoRESUMEN
Although multi-parent populations (MPPs) integrate the advantages of linkage and association mapping populations in the genetic dissection of complex traits and especially combine genetic analysis with crop breeding, it is difficult to detect small-effect quantitative trait loci (QTL) for complex traits in multiparent advanced generation intercross (MAGIC), nested association mapping (NAM), and random-open-parent association mapping (ROAM) populations. To address this issue, here we proposed a multi-locus linear mixed model method, namely mppQTL, to detect QTLs, especially small-effect QTLs, in these MPPs. The new method includes two steps. The first is genome-wide scanning based on a single-locus linear mixed model; the P-values are obtained from likelihood-ratio test, the peaks of negative logarithm P-value curve are selected by group-lasso, and all the selected peaks are regarded as potential QTLs. In the second step, all the potential QTLs are placed on a multi-locus linear mixed model, all the effects are estimated using expectation-maximization empirical Bayes algorithm, and all the non-zero effect vectors are further evaluated via likelihood-ratio test for significant QTLs. In Monte Carlo simulation studies, the new method has higher power in QTL detection, lower false positive rate, lower mean absolute deviation for QTL position estimate, and lower mean squared error for the estimate of QTL size (r2) than existing methods because the new method increases the power of detecting small-effect QTLs. In real dataset analysis, the new method (19) identified five more known genes than the existing three methods (14). This study provides an effective method for detecting small-effect QTLs in any MPPs.
RESUMEN
Background: Considerable attention has been given to how different aspects of biodiversity sustain ecosystem functions. Herbs are a critical component of the plant community of dryland ecosystems, but the importance of different life form groups of herbs is often overlooked in experiments on biodiversity-ecosystem multifunctionality. Hence, little is known about how the multiple attributes of diversity of different life form groups of herbs affect changes to the multifunctionality of ecosystems. Methods: We investigated geographic patterns of herb diversity and ecosystem multifunctionality along a precipitation gradient of 2100 km in Northwest China, and assessed the taxonomic, phylogenetic and functional attributes of different life form groups of herbs on the multifunctionality. Results: We found that subordinate (richness effect) species of annual herbs and dominant (mass ratio effect) species of perennial herbs were crucial for driving multifunctionality. Most importantly, the multiple attributes (taxonomic, phylogenetic and functional) of herb diversity enhanced the multifunctionality. The functional diversity of herbs provided greater explanatory power than did taxonomic and phylogenetic diversity. In addition, the multiple attribute diversity of perennial herbs contributed more than annual herbs to multifunctionality. Conclusions: Our findings provide insights into previously neglected mechanisms by which the diversity of different life form groups of herbs affect ecosystem multifunctionality. These results provide a comprehensive understanding of the relationship between biodiversity and multifunctionality, and will ultimately contribute to multifunctional conservation and restoration programs in dryland ecosystems.
RESUMEN
Although grain size is an important quantitative trait affecting rice yield and quality, there are few studies on gene-by-environment interactions (GEIs) in genome-wide association studies, especially, in main crop (MC) and ratoon rice (RR). To address these issues, the phenotypes for grain width (GW), grain length (GL), and thousand grain weight (TGW) of 159 accessions of MC and RR in two environments were used to associate with 2,017,495 SNPs for detecting quantitative trait nucleotides (QTNs) and QTN-by-environment interactions (QEIs) using 3VmrMLM. As a result, 64, 71, 67, 72, 63, and 56 QTNs, and 0, 1, 2, 2, 2, and 1 QEIs were found to be significantly associated with GW in MC (GW-MC), GL-MC, TGW-MC, GW-RR, GL-RR, and TGW-RR, respectively. 3, 4, 7, 2, 2, and 4 genes were found to be truly associated with the above traits, respectively, while 2 genes around the above QEIs were found to be truly associated with GL-RR, and one of the two known genes was differentially expressed under two soil moisture conditions. 10, 7, 1, 8, 4, and 3 candidate genes were found by differential expression and GO annotation analysis to be around the QTNs for the above traits, respectively, in which 6, 3, 1, 2, 0, and 2 candidate genes were found to be significant in haplotype analysis. The gene Os03g0737000 around one QEI for GL-MC was annotated as salt stress related gene and found to be differentially expressed in two cultivars with different grain sizes. Among all the candidate genes around the QTNs in this study, four were key, in which two were reported to be truly associated with seed development, and two (Os02g0626100 for GL-MC and Os02g0538000 for GW-MC) were new. Moreover, 1, 2, and 1 known genes, along with 8 additional candidate genes and 2 candidate GEIs, were found to be around QTNs and QEIs for GW, GL, and TGW, respectively in MC and RR joint analysis, in which 3 additional candidate genes were key and new. Our results provided a solid foundation for genetic improvement and molecular breeding in MC and RR.
RESUMEN
Shrubs play a major role in maintaining ecosystem stability in the arid deserts of Central Asia. During the long-term adaptation to extreme arid environments, shrubs have developed special assimilative branches that replace leaves for photosynthesis. In this study, four dominant shrubs with assimilative branches, namely Haloxylon ammodendron, Haloxylon persicum, Calligonum mongolicum, and Ephedra przewalskii, were selected as the research objects, and the dry mass, total length, node number, and basal diameter of their assimilative branches and the average length of the first three nodes were carefully measured, and the allometric relationships among five traits of four species were systematically compared. The results indicated that: (1) Four desert shrubs have different assimilative branches traits. Compared with H. persicum and H. ammodendron, C. mongolicum and E. przewalskii have longer internodes and fewer nodes. The dry mass of H. ammodendron and the basal diameter of H. persicum were the smallest; (2) Significant allometric scaling relationships were found between dry mass, total length, basal diameter, and each trait of assimilative branches, all of which were significantly less than 1; (3) The scaling exponents of the allometric relationship between four traits and the dry mass of assimilative branches of H. persicum were greater or significantly greater than those of H. ammodendron. The scaling exponents of the relationships between the basal diameter, dry mass, and total length of E. przewalskii were higher than those of the other three shrubs. Therefore, although different species have adapted to drought and high temperatures by convergence, there was great variability in morphological characteristics of assimilative branches, as well as in the scaling exponents of relationships among traits. The results of this study will provide valuable insights into the ecological functions of assimilative branches and survival strategies of these shrubs to cope with aridity and drought in desert environments.
RESUMEN
Introduction: Although seed oil content and its fatty acid compositions in soybean were affected by environment, QTN-by-environment (QEIs) and gene-by-environment interactions (GEIs) were rarely reported in genome-wide association studies. Methods: The 3VmrMLM method was used to associate the trait phenotypes, measured in five to seven environments, of 286 soybean accessions with 106,013 SNPs for detecting QTNs and QEIs. Results: Seven oil metabolism genes (GmSACPD-A, GmSACPD-B, GmbZIP123, GmSWEET39, GmFATB1A, GmDGAT2D, and GmDGAT1B) around 598 QTNs and one oil metabolism gene GmFATB2B around 54 QEIs were verified in previous studies; 76 candidate genes and 66 candidate GEIs were predicted to be associated with these traits, in which 5 genes around QEIs were verified in other species to participate in oil metabolism, and had differential expression across environments. These genes were found to be related to soybean seed oil content in haplotype analysis. In addition, most candidate GEIs were co-expressed with drought response genes in co-expression network, and three KEGG pathways which respond to drought were enriched under drought stress rather than control condition; six candidate genes were hub genes in the co-expression networks under drought stress. Discussion: The above results indicated that GEIs, together with drought response genes in co-expression network, may respond to drought, and play important roles in regulating seed oil-related traits together with oil metabolism genes. These results provide important information for genetic basis, molecular mechanisms, and soybean breeding for seed oil-related traits.
RESUMEN
BACKGROUND: Ferula L. is one of the largest and most taxonomically complicated genera as well as being an important medicinal plant resource in the family Apiaceae. To investigate the plastome features and phylogenetic relationships of Ferula and its neighboring genera Soranthus Ledeb., Schumannia Kuntze., and Talassia Korovin, we sequenced 14 complete plastomes of 12 species. RESULTS: The size of the 14 complete chloroplast genomes ranged from 165,607 to 167,013 base pairs (bp) encoding 132 distinct genes (87 protein-coding, 37 tRNA, and 8 rRNA genes), and showed a typical quadripartite structure with a pair of inverted repeats (IR) regions. Based on comparative analysis, we found that the 14 plastomes were similar in codon usage, repeat sequence, simple sequence repeats (SSRs), and IR borders, and had significant collinearity. Based on our phylogenetic analyses, Soranthus, Schumannia, and Talassia should be considered synonymous with Ferula. Six highly divergent regions (rps16/trnQ-UUG, trnS-UGA/psbZ, psbH/petB, ycf1/ndhF, rpl32, and ycf1) were also detected, which may represent potential molecular markers, and combined with selective pressure analysis, the weak positive selection gene ccsA may be a discriminating DNA barcode for Ferula species. CONCLUSION: Plastids contain abundant informative sites for resolving phylogenetic relationships. Combined with previous studies, we suggest that there is still much room for improvement in the classification of Ferula. Overall, our study provides new insights into the plastome evolution, phylogeny, and taxonomy of this genus.
Asunto(s)
Ferula , Genoma del Cloroplasto , Ferula/genética , Repeticiones de Microsatélite , FilogeniaRESUMEN
BACKGROUND: The yield and quality of soybean oil are determined by seed oil-related traits, and metabolites/lipids act as bridges between genes and traits. Although there are many studies on the mode of inheritance of metabolites or traits, studies on multi-dimensional genetic network (MDGN) are limited. RESULTS: In this study, six seed oil-related traits, 59 metabolites, and 107 lipids in 398 recombinant inbred lines, along with their candidate genes and miRNAs, were used to construct an MDGN in soybean. Around 175 quantitative trait loci (QTLs), 36 QTL-by-environment interactions, and 302 metabolic QTL clusters, 70 and 181 candidate genes, including 46 and 70 known homologs, were previously reported to be associated with the traits and metabolites, respectively. Gene regulatory networks were constructed using co-expression, protein-protein interaction, and transcription factor binding site and miRNA target predictions between candidate genes and 26 key miRNAs. Using modern statistical methods, 463 metabolite-lipid, 62 trait-metabolite, and 89 trait-lipid associations were found to be significant. Integrating these associations into the above networks, an MDGN was constructed, and 128 sub-networks were extracted. Among these sub-networks, the gene-trait or gene-metabolite relationships in 38 sub-networks were in agreement with previous studies, e.g., oleic acid (trait)-GmSEI-GmDGAT1a-triacylglycerol (16:0/18:2/18:3), gene and metabolite in each of 64 sub-networks were predicted to be in the same pathway, e.g., oleic acid (trait)-GmPHS-D-glucose, and others were new, e.g., triacylglycerol (16:0/18:1/18:2)-GmbZIP123-GmHD-ZIPIII-10-miR166s-oil content. CONCLUSIONS: This study showed the advantages of MGDN in dissecting the genetic relationships between complex traits and metabolites. Using sub-networks in MGDN, 3D genetic sub-networks including pyruvate/threonine/citric acid revealed genetic relationships between carbohydrates, oil, and protein content, and 4D genetic sub-networks including PLDs revealed the relationships between oil-related traits and phospholipid metabolism likely influenced by the environment. This study will be helpful in soybean quality improvement and molecular biological research.
RESUMEN
To address domestication and improvement studies of soybean seed size- and oil-related traits, a series of domesticated and improved regions, loci, and candidate genes were identified in 286 soybean accessions using domestication and improvement analyses, genome-wide association studies, quantitative trait locus (QTL) mapping and bulked segregant analyses in this study. As a result, 534 candidate domestication regions (CDRs) and 458 candidate improvement regions (CIRs) were identified in this study and integrated with those in five and three previous studies, respectively, to obtain 952 CDRs and 538 CIRs; 1469 loci for soybean seed size- and oil-related traits were identified in this study and integrated with those in Soybase to obtain 433 QTL clusters. The two results were intersected to obtain 245 domestication and 221 improvement loci for the above traits. Around these trait-related domestication and improvement loci, 7 domestication and 7 improvement genes were found to be truly associated with these traits, and 372 candidate domestication and 87 candidate improvement genes were identified using gene expression, SNP variants in genome, miRNA binding, KEGG pathway, DNA methylation, and haplotype analysis. These genes were used to explain the trait changes in domestication and improvement. As a result, the trait changes can be explained by their frequencies of elite haplotypes, base mutations in coding region, and three factors affecting their expression levels. In addition, 56 domestication and 15 improvement genes may be valuable for future soybean breeding. This study can provide useful gene resources for future soybean breeding and molecular biology research.
RESUMEN
Although methodologies and software packages for bulked segregant analysis (BSA) are well established, it is difficult to detect extremely over-dominant and small-effect genes for quantitative traits in F2 population. To address this issue, we proposed a combinatorial strategy to identify all types of quantitative trait loci (QTLs) using extreme phenotype individuals in F2. To popularize this strategy, we developed an R software package dQTG.seq v1.0.1. It has some features not found in other BSA software packages: 1) new (dQTG-seq1 and dQTG-seq2) and existing (G', deltaSNP, Euclidean distance (ED), and SmoothLOD) methods are available to identify all types of QTLs in bi-parental segregation populations, one data file with two BSA and three QTL-mapping data formats was inputted, and two *.csv files and one figure were outputted; 2) main smoothing methods (AIC, Window size, and Block) have been incorporated into each of the above-mentioned methods; 3) the threshold value of LOD score for significant QTLs is determined by permutation experiments. To save running time, vroom function was used to read the dataset, and parallel operation was used to estimate parameters. In real data analyses, users should select a suitable initial value of window size, depending on the species, and appropriate smoothing methods to obtain the best result. dQTG-seq2 detects more known loci and genes for rice grain number per panicle than composite interval mapping (CIM) and inclusive CIM, especially extremely over-dominant and small-effect genes. A handbook for our software package (https://cran.r-project.org/web/packages/dQTG.seq/index.html) has been provided in the supplemental materials for the users' convenience.