ABSTRACT
The best ideotypes are under mounting pressure due to increased aridity. Understanding the conserved molecular mechanisms that evolve in wild plants adapted to harsh environments is crucial in developing new strategies for agriculture. Yet our knowledge of such mechanisms in wild species is scant. We performed metabolic pathway reconstruction using transcriptome information from 32 Atacama and phylogenetically related species that do not live in Atacama (sister species). We analyzed reaction enrichment to understand the commonalities and differences of Atacama plants. To gain insights into the mechanisms that ensure survival, we compared expressed gene isoform numbers and gene expression patterns between the annotated biochemical reactions from 32 Atacama and sister species. We found biochemical convergences characterized by reactions enriched in at least 50% of the Atacama species, pointing to potential advantages against drought and nitrogen starvation, for instance. These findings suggest that the adaptation in the Atacama Desert may result in part from shared genetic legacies governing the expression of key metabolic pathways to face harsh conditions. Enriched reactions corresponded to ubiquitous compounds common to extreme and agronomic species and were congruent with our previous metabolomic analyses. Convergent adaptive traits offer promising candidates for improving abiotic stress resilience in crop species.
Subject(s)
Desert Climate , Phylogeny , Transcriptome , Chile , Adaptation, Physiological , Metabolic Networks and PathwaysABSTRACT
This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.
Subject(s)
Algorithms , Software , Bayes Theorem , RNA-SeqABSTRACT
A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified "gold standard" edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on "gold standard" regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.
ABSTRACT
[This corrects the article DOI: 10.3389/fgene.2024.1371607.].