RESUMO
The BAHD acyltransferase family is one of the largest enzyme families in flowering plants, containing dozens to hundreds of genes in individual genomes. Highly prevalent in angiosperm genomes, members of this family contribute to several pathways in primary and specialized metabolism. In this study, we performed a phylogenomic analysis of the family using 52 genomes across the plant kingdom to gain deeper insights into its functional evolution and enable function prediction. We found that BAHD expansion in land plants was associated with significant changes in various gene features. Using pre-defined BAHD clades, we identified clade expansions in different plant groups. In some groups, these expansions coincided with the prominence of metabolite classes such as anthocyanins (flowering plants) and hydroxycinnamic acid amides (monocots). Clade-wise motif-enrichment analysis revealed that some clades have novel motifs fixed on either the acceptor or the donor side, potentially reflecting historical routes of functional evolution. Co-expression analysis in rice and Arabidopsis further identified BAHDs with similar expression patterns, however, most co-expressed BAHDs belonged to different clades. Comparing BAHD paralogs, we found that gene expression diverges rapidly after duplication, suggesting that sub/neo-functionalization of duplicate genes occurs quickly via expression diversification. Analyzing co-expression patterns in Arabidopsis in conjunction with orthology-based substrate class predictions and metabolic pathway models led to the recovery of metabolic processes of most of the already-characterized BAHDs as well as definition of novel functional predictions for some uncharacterized BAHDs. Overall, this study provides new insights into the evolution of BAHD acyltransferases and sets up a foundation for their functional characterization.
RESUMO
Nitrous oxide (N2 O) is a potent greenhouse gas that is primarily emitted from agriculture. Sampling limitations have generally resulted in discontinuous N2 O observations over the course of any given year. The status quo for interpolating between sampling points has been to use a simple linear interpolation. This can be problematic with N2 O emissions, since they are highly variable and sampling bias around these peak emission periods can have dramatic impacts on cumulative emissions. Here, we outline five gap-filling practices: linear interpolation, generalized additive models (GAMs), autoregressive integrated moving average (ARIMA), random forest (RF), and neural networks (NNs) that have been used for gap-filling soil N2 O emissions. To facilitate the use of improved gap-filling methods, we describe the five methods and then provide strengths and challenges or weaknesses of each method so that model selection can be improved. We then outline a protocol that details data organization and selection, splitting of data into training and testing datasets, building and testing models, and reporting results. Use of advanced gap-filling methods within a standardized protocol is likely to increase transparency, improve emission estimates, reduce uncertainty, and increase capacity to quantify the impact of mitigation practices.