RESUMO
Deciphering gene regulatory networks (GRNs) is both a promise and challenge of systems biology. The promise lies in identifying key transcription factors (TFs) that enable an organism to react to changes in its environment. The challenge lies in validating GRNs that involve hundreds of TFs with hundreds of thousands of interactions with their genome-wide targets experimentally determined by high-throughput sequencing. To address this challenge, we developed ConnecTF, a species-independent, web-based platform that integrates genome-wide studies of TF-target binding, TF-target regulation, and other TF-centric omic datasets and uses these to build and refine validated or inferred GRNs. We demonstrate the functionality of ConnecTF by showing how integration within and across TF-target datasets uncovers biological insights. Case study 1 uses integration of TF-target gene regulation and binding datasets to uncover TF mode-of-action and identify potential TF partners for 14 TFs in abscisic acid signaling. Case study 2 demonstrates how genome-wide TF-target data and automated functions in ConnecTF are used in precision/recall analysis and pruning of an inferred GRN for nitrogen signaling. Case study 3 uses ConnecTF to chart a network path from NLP7, a master TF in nitrogen signaling, to direct secondary TF2s and to its indirect targets in a Network Walking approach. The public version of ConnecTF (https://ConnecTF.org) contains 3,738,278 TF-target interactions for 423 TFs in Arabidopsis, 839,210 TF-target interactions for 139 TFs in maize (Zea mays), and 293,094 TF-target interactions for 26 TFs in rice (Oryza sativa). The database and tools in ConnecTF will advance the exploration of GRNs in plant systems biology applications for model and crop species.
Assuntos
Arabidopsis/genética , Bases de Dados como Assunto , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Oryza/genética , Fatores de Transcrição/genética , Zea mays/genética , Produtos Agrícolas/genética , Genes de PlantasRESUMO
This study exploits time, the relatively unexplored fourth dimension of gene regulatory networks (GRNs), to learn the temporal transcriptional logic underlying dynamic nitrogen (N) signaling in plants. Our "just-in-time" analysis of time-series transcriptome data uncovered a temporal cascade of cis elements underlying dynamic N signaling. To infer transcription factor (TF)-target edges in a GRN, we applied a time-based machine learning method to 2,174 dynamic N-responsive genes. We experimentally determined a network precision cutoff, using TF-regulated genome-wide targets of three TF hubs (CRF4, SNZ, and CDF1), used to "prune" the network to 155 TFs and 608 targets. This network precision was reconfirmed using genome-wide TF-target regulation data for four additional TFs (TGA1, HHO5/6, and PHL1) not used in network pruning. These higher-confidence edges in the GRN were further filtered by independent TF-target binding data, used to calculate a TF "N-specificity" index. This refined GRN identifies the temporal relationship of known/validated regulators of N signaling (NLP7/8, TGA1/4, NAC4, HRS1, and LBD37/38/39) and 146 additional regulators. Six TFs-CRF4, SNZ, CDF1, HHO5/6, and PHL1-validated herein regulate a significant number of genes in the dynamic N response, targeting 54% of N-uptake/assimilation pathway genes. Phenotypically, inducible overexpression of CRF4 in planta regulates genes resulting in altered biomass, root development, and 15NO3- uptake, specifically under low-N conditions. This dynamic N-signaling GRN now provides the temporal "transcriptional logic" for 155 candidate TFs to improve nitrogen use efficiency with potential agricultural applications. Broadly, these time-based approaches can uncover the temporal transcriptional logic for any biological response system in biology, agriculture, or medicine.
Assuntos
Arabidopsis/genética , Arabidopsis/metabolismo , Regulação da Expressão Gênica de Plantas/genética , Redes Reguladoras de Genes/genética , Nitrogênio/metabolismo , Transcrição Gênica/genética , Proteínas de Arabidopsis/genética , Perfilação da Expressão Gênica/métodos , Lógica , Ligação Proteica/genética , Transdução de Sinais/genética , Fatores de Transcrição/genéticaRESUMO
BACKGROUND: Networks whose nodes have labels can seem complex. Fortunately, many have substructures that occur often ("motifs"). A societal example of a motif might be a household. Replacing such motifs by named supernodes reduces the complexity of the network and can bring out insightful features. Doing so repeatedly may give hints about higher level structures of the network. We call this recursive process Recursive Supernode Extraction. RESULTS: This paper describes algorithms and a tool to discover disjoint (i.e. non-overlapping) motifs in a network, replacing those motifs by new nodes, and then recursing. We show applications in food-web and protein-protein interaction (PPI) networks where our methods reduce the complexity of the network and yield insights. CONCLUSIONS: SuperNoder is a web-based and standalone tool which enables the simplification of big graphs based on the reduction of high frequency motifs. It applies various strategies for identifying disjoint motifs with the goal of enhancing the understandability of networks.
Assuntos
Algoritmos , Biologia Computacional/métodos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Software , HumanosRESUMO
Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self-supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods. The code can be accessed at https://github.com/pranavsinghps1/S4MI .
Assuntos
Processamento de Imagem Assistida por Computador , Aprendizado de Máquina Supervisionado , Humanos , Processamento de Imagem Assistida por Computador/métodos , Diagnóstico por Imagem/métodos , AlgoritmosRESUMO
In the plant meristem, tissue-wide maturation gradients are coordinated with specialized cell networks to establish various developmental phases required for indeterminate growth. Here, we used single-cell transcriptomics to reconstruct the protophloem developmental trajectory from the birth of cell progenitors to terminal differentiation in the Arabidopsis thaliana root. PHLOEM EARLY DNA-BINDING-WITH-ONE-FINGER (PEAR) transcription factors mediate lineage bifurcation by activating guanosine triphosphatase signaling and prime a transcriptional differentiation program. This program is initially repressed by a meristem-wide gradient of PLETHORA transcription factors. Only the dissipation of PLETHORA gradient permits activation of the differentiation program that involves mutual inhibition of early versus late meristem regulators. Thus, for phloem development, broad maturation gradients interface with cell-type-specific transcriptional regulators to stage cellular differentiation.
Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/citologia , Floema/citologia , Floema/crescimento & desenvolvimento , Raízes de Plantas/citologia , Fatores de Transcrição/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Diferenciação Celular , Proteínas de Ligação ao GTP/genética , Proteínas de Ligação ao GTP/metabolismo , Meristema/citologia , Floema/genética , Floema/metabolismo , Raízes de Plantas/genética , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/metabolismo , RNA-Seq , Transdução de Sinais , Análise de Célula Única , Fatores de Transcrição/genética , TranscriptomaRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.
Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Fatores de Transcrição/genética , Simulação por Computador , Perfilação da Expressão Gênica/estatística & dados numéricos , Aprendizado de Máquina , Reprodutibilidade dos TestesRESUMO
Charting a temporal path in gene networks requires linking early transcription factor (TF)-triggered events to downstream effects. We scale-up a cell-based TF-perturbation assay to identify direct regulated targets of 33 nitrogen (N)-early response TFs encompassing 88% of N-responsive Arabidopsis genes. We uncover a duality where each TF is an inducer and repressor, and in vitro cis-motifs are typically specific to regulation directionality. Validated TF-targets (71,836) are used to refine precision of a time-inferred root network, connecting 145 N-responsive TFs and 311 targets. These data are used to chart network paths from direct TF1-regulated targets identified in cells to indirect targets responding only in planta via Network Walking. We uncover network paths from TGA1 and CRF4 to direct TF2 targets, which in turn regulate 76% and 87% of TF1 indirect targets in planta, respectively. These results have implications for N-use and the approach can reveal temporal networks for any biological system.