RESUMEN
Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.
Asunto(s)
Magnoliopsida , Magnoliopsida/genética , Plantas/genética , Estrés Fisiológico/genética , Hojas de la Planta/genética , Expresión Génica , Regulación de la Expresión Génica de las Plantas/genéticaRESUMEN
Network-based machine learning (ML) has the potential for predicting novel genes associated with nearly any health and disease context. However, this approach often uses network information from only the single species under consideration even though networks for most species are noisy and incomplete. While some recent methods have begun addressing this shortcoming by using networks from more than one species, they lack one or more key desirable properties: handling networks from more than two species simultaneously, incorporating many-to-many orthology information, or generating a network representation that is reusable across different types of and newly-defined prediction tasks. Here, we present GenePlexusZoo, a framework that casts molecular networks from multiple species into a single reusable feature space for network-based ML. We demonstrate that this multi-species network representation improves both gene classification within a single species and knowledge-transfer across species, even in cases where the inter-species correspondence is undetectable based on shared orthologous genes. Thus, GenePlexusZoo enables effectively leveraging the high evolutionary molecular, functional, and phenotypic conservation across species to discover novel genes associated with diverse biological contexts.
Asunto(s)
Genómica , Aprendizaje Automático , Genómica/métodosRESUMEN
MOTIVATION: Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network. RESULTS: Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+. AVAILABILITY AND IMPLEMENTATION: The data and code are available on GitHub at https://github.com/krishnanlab/node2vecplus_benchmarks. All additional data underlying this article are available on Zenodo at https://doi.org/10.5281/zenodo.7007164. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Redes Reguladoras de Genes , Fenotipo , Epistasis GenéticaRESUMEN
SUMMARY: PyGenePlexus is a Python package that enables a user to gain insight into any gene set of interest through a molecular interaction network informed supervised machine learning model. PyGenePlexus provides predictions of how associated every gene in the network is to the input gene set, offers interpretability by comparing the model trained on the input gene set to models trained on thousands of known gene sets, and returns the network connectivity of the top predicted genes. AVAILABILITY AND IMPLEMENTATION: https://pypi.org/project/geneplexus/ and https://github.com/krishnanlab/PyGenePlexus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional , Programas Informáticos , Aprendizaje Automático , Aprendizaje Automático Supervisado , Estudios de Asociación GenéticaRESUMEN
Biomedical researchers take advantage of high-throughput, high-coverage technologies to routinely generate sets of genes of interest across a wide range of biological conditions. Although these technologies have directly shed light on the molecular underpinnings of various biological processes and diseases, the list of genes from any individual experiment is often noisy and incomplete. Additionally, interpreting these lists of genes can be challenging in terms of how they are related to each other and to other genes in the genome. In this work, we present GenePlexus (https://www.geneplexus.net/), a web-server that allows a researcher to utilize a powerful, network-based machine learning method to gain insights into their gene set of interest and additional functionally similar genes. Once a user uploads their own set of human genes and chooses between a number of different human network representations, GenePlexus provides predictions of how associated every gene in the network is to the input set. The web-server also provides interpretability through network visualization and comparison to other machine learning models trained on thousands of known process/pathway and disease gene sets. GenePlexus is free and open to all users without the need for registration.
Asunto(s)
Computadores , Programas Informáticos , Humanos , Genoma , Aprendizaje Automático , Estudios de Asociación Genética , InternetRESUMEN
We previously identified a deletion on chromosome 16p12.1 that is mostly inherited and associated with multiple neurodevelopmental outcomes, where severely affected probands carried an excess of rare pathogenic variants compared to mildly affected carrier parents. We hypothesized that the 16p12.1 deletion sensitizes the genome for disease, while "second-hits" in the genetic background modulate the phenotypic trajectory. To test this model, we examined how neurodevelopmental defects conferred by knockdown of individual 16p12.1 homologs are modulated by simultaneous knockdown of homologs of "second-hit" genes in Drosophila melanogaster and Xenopus laevis. We observed that knockdown of 16p12.1 homologs affect multiple phenotypic domains, leading to delayed developmental timing, seizure susceptibility, brain alterations, abnormal dendrite and axonal morphology, and cellular proliferation defects. Compared to genes within the 16p11.2 deletion, which has higher de novo occurrence, 16p12.1 homologs were less likely to interact with each other in Drosophila models or a human brain-specific interaction network, suggesting that interactions with "second-hit" genes may confer higher impact towards neurodevelopmental phenotypes. Assessment of 212 pairwise interactions in Drosophila between 16p12.1 homologs and 76 homologs of patient-specific "second-hit" genes (such as ARID1B and CACNA1A), genes within neurodevelopmental pathways (such as PTEN and UBE3A), and transcriptomic targets (such as DSCAM and TRRAP) identified genetic interactions in 63% of the tested pairs. In 11 out of 15 families, patient-specific "second-hits" enhanced or suppressed the phenotypic effects of one or many 16p12.1 homologs in 32/96 pairwise combinations tested. In fact, homologs of SETD5 synergistically interacted with homologs of MOSMO in both Drosophila and X. laevis, leading to modified cellular and brain phenotypes, as well as axon outgrowth defects that were not observed with knockdown of either individual homolog. Our results suggest that several 16p12.1 genes sensitize the genome towards neurodevelopmental defects, and complex interactions with "second-hit" genes determine the ultimate phenotypic manifestation.
Asunto(s)
Encéfalo/metabolismo , Deleción Cromosómica , Cromosomas Humanos Par 16/genética , Trastornos del Neurodesarrollo/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Animales , Encéfalo/patología , Canales de Calcio/genética , Moléculas de Adhesión Celular/genética , Proteínas de Unión al ADN/genética , Modelos Animales de Enfermedad , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Epistasis Genética/genética , Regulación del Desarrollo de la Expresión Génica , Humanos , Metiltransferasas/genética , Trastornos del Neurodesarrollo/patología , Proteínas Nucleares/genética , Fosfohidrolasa PTEN/genética , Factores de Transcripción/genética , Ubiquitina-Proteína Ligasas/genética , Proteínas de Xenopus/genética , Xenopus laevis/genéticaRESUMEN
The basis of several recent methods for drug repurposing is the key principle that an efficacious drug will reverse the disease molecular 'signature' with minimal side effects. This principle was defined and popularized by the influential 'connectivity map' study in 2006 regarding reversal relationships between disease- and drug-induced gene expression profiles, quantified by a disease-drug 'connectivity score.' Over the past 15 years, several studies have proposed variations in calculating connectivity scores toward improving accuracy and robustness in light of massive growth in reference drug profiles. However, these variations have been formulated inconsistently using various notations and terminologies even though they are based on a common set of conceptual and statistical ideas. Therefore, we present a systematic reconciliation of multiple disease-drug similarity metrics ($ES$, $css$, $Sum$, $Cosine$, $XSum$, $XCor$, $XSpe$, $XCos$, $EWCos$) and connectivity scores ($CS$, $RGES$, $NCS$, $WCS$, $Tau$, $CSS$, $EMUDRA$) by defining them using consistent notation and terminology. In addition to providing clarity and deeper insights, this coherent definition of connectivity scores and their relationships provides a unified scheme that newer methods can adopt, enabling the computational drug-development community to compare and investigate different approaches easily. To facilitate the continuous and transparent integration of newer methods, this article will be available as a live document (https://jravilab.github.io/connectivity_scores) coupled with a GitHub repository (https://github.com/jravilab/connectivity_scores) that any researcher can build on and push changes to.
Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Perfilación de la Expresión Génica/métodos , Farmacogenética/métodos , Algoritmos , Biomarcadores , Regulación de la Expresión Génica/efectos de los fármacos , Humanos , TranscriptomaRESUMEN
The role of ß-CoOOH crystallographic orientations in catalytic activity for the oxygen evolution reaction (OER) remains elusive. We combine correlative electron backscatter diffraction/scanning electrochemical cell microscopy with X-ray photoelectron spectroscopy, transmission electron microscopy, and atom probe tomography to establish the structure-activity relationships of various faceted ß-CoOOH formed on a Co microelectrode under OER conditions. We reveal that ≈6â nm ß-CoOOH(01 1 â¾ ${\bar{1}}$ 0), grown on [ 1 â¾ 2 1 â¾ ${\bar{1}2\bar{1}}$ 0]-oriented Co, exhibits higher OER activity than ≈3â nm ß-CoOOH(10 1 â¾ ${\bar{1}}$ 3) or ≈6â nm ß-CoOOH(0006) formed on [02 2 â¾ 1 ] ${\bar{2}1]}$ - and [0001]-oriented Co, respectively. This arises from higher amounts of incorporated hydroxyl ions and more easily reducible CoIII -O sites present in ß-CoOOH(01 1 â¾ ${\bar{1}}$ 0) than those in the latter two oxyhydroxide facets. Our correlative multimodal approach shows great promise in linking local activity with atomic-scale details of structure, thickness and composition of active species, which opens opportunities to design pre-catalysts with preferred defects that promote the formation of the most active OER species.
RESUMEN
SUMMARY: Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. However, its original Python and C++ implementations scale poorly with network density, failing for dense biological networks with hundreds of millions of edges. We have developed PecanPy, a new Python implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in fast, high-quality node embeddings for biological networks of all sizes and densities. AVAILABILITYAND IMPLEMENTATION: PecanPy software is freely available at https://github.com/krishnanlab/PecanPy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96-570 and LINCS), and multiple imputation tasks (within and across microarray/RNA-seq datasets) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , RNA-SeqRESUMEN
BACKGROUND: Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS: In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION: The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT: arjun@msu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Humanos , Aprendizaje Automático SupervisadoRESUMEN
GIANT2 (Genome-wide Integrated Analysis of gene Networks in Tissues) is an interactive web server that enables biomedical researchers to analyze their proteins and pathways of interest and generate hypotheses in the context of genome-scale functional maps of human tissues. The precise actions of genes are frequently dependent on their tissue context, yet direct assay of tissue-specific protein function and interactions remains infeasible in many normal human tissues and cell-types. With GIANT2, researchers can explore predicted tissue-specific functional roles of genes and reveal changes in those roles across tissues, all through interactive multi-network visualizations and analyses. Additionally, the NetWAS approach available through the server uses tissue-specific/cell-type networks predicted by GIANT2 to re-prioritize statistical associations from GWAS studies and identify disease-associated genes. GIANT2 predicts tissue-specific interactions by integrating diverse functional genomics data from now over 61 400 experiments for 283 diverse tissues and cell-types. GIANT2 does not require any registration or installation and is freely available for use at http://giant-v2.princeton.edu.
Asunto(s)
Redes Reguladoras de Genes/genética , Genómica/tendencias , Internet , Programas Informáticos , Investigación Biomédica/tendencias , Biología Computacional/tendencias , HumanosRESUMEN
PURPOSE: To assess the contribution of rare variants in the genetic background toward variability of neurodevelopmental phenotypes in individuals with rare copy-number variants (CNVs) and gene-disruptive variants. METHODS: We analyzed quantitative clinical information, exome sequencing, and microarray data from 757 probands and 233 parents and siblings who carry disease-associated variants. RESULTS: The number of rare likely deleterious variants in functionally intolerant genes ("other hits") correlated with expression of neurodevelopmental phenotypes in probands with 16p12.1 deletion (n=23, p=0.004) and in autism probands carrying gene-disruptive variants (n=184, p=0.03) compared with their carrier family members. Probands with 16p12.1 deletion and a strong family history presented more severe clinical features (p=0.04) and higher burden of other hits compared with those with mild/no family history (p=0.001). The number of other hits also correlated with severity of cognitive impairment in probands carrying pathogenic CNVs (n=53) or de novo pathogenic variants in disease genes (n=290), and negatively correlated with head size among 80 probands with 16p11.2 deletion. These co-occurring hits involved known disease-associated genes such as SETD5, AUTS2, and NRXN1, and were enriched for cellular and developmental processes. CONCLUSION: Accurate genetic diagnosis of complex disorders will require complete evaluation of the genetic background even after a candidate disease-associated variant is identified.
Asunto(s)
Trastorno Autístico/genética , Moléculas de Adhesión Celular Neuronal/genética , Tamización de Portadores Genéticos , Metiltransferasas/genética , Proteínas del Tejido Nervioso/genética , Proteínas/genética , Trastorno Autístico/fisiopatología , Proteínas de Unión al Calcio , Cromosomas Humanos Par 16/genética , Cognición/fisiología , Proteínas del Citoesqueleto , Variaciones en el Número de Copia de ADN/genética , Femenino , Regulación de la Expresión Génica/genética , Antecedentes Genéticos , Humanos , Masculino , Moléculas de Adhesión de Célula Nerviosa , Padres , Linaje , Fenotipo , Eliminación de Secuencia/genética , Hermanos , Factores de TranscripciónRESUMEN
A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).
Asunto(s)
Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Estudio de Asociación del Genoma Completo/métodos , Trastorno Bipolar/genética , Neoplasias de la Mama/genética , Análisis por Conglomerados , Femenino , Humanos , MasculinoRESUMEN
We present SEEK (search-based exploration of expression compendia; http://seek.princeton.edu/), a query-based search engine for very large transcriptomic data collections, including thousands of human data sets from many different microarray and high-throughput sequencing platforms. SEEK uses a query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify genes, pathways and processes co-regulated with the query. SEEK provides multigene query searching with iterative metadata-based search refinement and extensive visualization-based analysis options.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Motor de Búsqueda , Transcriptoma , Algoritmos , Bases de Datos Genéticas , Ontología de Genes , Proteínas Hedgehog/genética , Proteínas Hedgehog/metabolismo , Humanos , ARNRESUMEN
IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers. IMP does not require any registration or installation and is freely available for use at http://imp.princeton.edu.
Asunto(s)
Redes Reguladoras de Genes , Programas Informáticos , Animales , Gráficos por Computador , Enfermedad/genética , Genes , Genómica , Humanos , Internet , Ratones , Mapeo de Interacción de Proteínas , Proteínas/fisiología , Ratas , Integración de SistemasRESUMEN
Functional Networks of Tissues in Mouse (FNTM) provides biomedical researchers with tissue-specific predictions of functional relationships between proteins in the most widely used model organism for human disease, the laboratory mouse. Users can explore FNTM-predicted functional relationships for their tissues and genes of interest or examine gene function and interaction predictions across multiple tissues, all through an interactive, multi-tissue network browser. FNTM makes predictions based on integration of a variety of functional genomic data, including over 13 000 gene expression experiments, and prior knowledge of gene function. FNTM is an ideal starting point for clinical and translational researchers considering a mouse model for their disease of interest, researchers already working with mouse models who are interested in discovering new genes related to their pathways or phenotypes of interest, and biologists working with other organisms to explore the functional relationships of their genes of interest in specific mouse tissue contexts. FNTM predicts tissue-specific functional relationships in 200 tissues, does not require any registration or installation and is freely available for use at http://fntm.princeton.edu.
Asunto(s)
Redes Reguladoras de Genes , Ratones/genética , Programas Informáticos , Animales , Internet , Especificidad de ÓrganosRESUMEN
MOTIVATION: Leveraging the large compendium of genomic data to predict biomedical pathways and specific mechanisms of protein interactions genome-wide in metazoan organisms has been challenging. In contrast to unicellular organisms, biological and technical variation originating from diverse tissues and cell-lineages is often the largest source of variation in metazoan data compendia. Therefore, a new computational strategy accounting for the tissue heterogeneity in the functional genomic data is needed to accurately translate the vast amount of human genomic data into specific interaction-level hypotheses. RESULTS: We developed an integrated, scalable strategy for inferring multiple human gene interaction types that takes advantage of data from diverse tissue and cell-lineage origins. Our approach specifically predicts both the presence of a functional association and also the most likely interaction type among human genes or its protein products on a whole-genome scale. We demonstrate that directly incorporating tissue contextual information improves the accuracy of our predictions, and further, that such genome-wide results can be used to significantly refine regulatory interactions from primary experimental datasets (e.g. ChIP-Seq, mass spectrometry). AVAILABILITY AND IMPLEMENTATION: An interactive website hosting all of our interaction predictions is publically available at http://pathwaynet.princeton.edu. Software was implemented using the open-source Sleipnir library, which is available for download at https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Redes Reguladoras de Genes , Genómica/métodos , Proteínas Serina-Treonina Quinasas/metabolismo , Inmunoprecipitación de Cromatina , Humanos , Especificidad de Órganos , Fosforilación , Mapeo de Interacción de Proteínas , Proteínas Serina-Treonina Quinasas/antagonistas & inhibidores , Proteínas Serina-Treonina Quinasas/genética , ARN Interferente Pequeño/genética , Transducción de Señal , Programas Informáticos , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
Rice (Oryza sativa) is the primary food source for more than one-half of the world's population. Because rice cultivation is dependent on water availability, drought during flowering severely affects grain yield. Here, we show that the function of a drought-inducible receptor-like cytoplasmic kinase, named GROWTH UNDER DROUGHT KINASE (GUDK), is required for grain yield under drought and well-watered conditions. Loss-of-function gudk mutant lines exhibit sensitivity to salinity, osmotic stress, and abscisic acid treatment at the seedling stage, and a reduction in photosynthesis and plant biomass under controlled drought stress at the vegetative stage. The gudk mutants interestingly showed a significant reduction in grain yield, both under normal well-watered conditions and under drought stress at the reproductive stage. Phosphoproteome profiling of the mutant followed by in vitro assays identified the transcription factor APETALA2/ETHYLENE RESPONSE FACTOR OsAP37 as a phosphorylation target of GUDK. The involvement of OsAP37 in regulating grain yield under drought through activation of several stress genes was previously shown. Our transactivation assays confirmed that GUDK is required for activation of stress genes by OsAP37. We propose that GUDK mediates drought stress signaling through phosphorylation and activation of OsAP37, resulting in transcriptional activation of stress-regulated genes, which impart tolerance and improve yield under drought. Our study reveals insights around drought stress signaling mediated by receptor-like cytoplasmic kinases, and also identifies a primary regulator of grain yield in rice that offers the opportunity to improve and stabilize rice grain yield under normal and drought stress conditions.
Asunto(s)
Oryza/fisiología , Proteínas de Plantas/metabolismo , Proteínas Quinasas/metabolismo , Sequías , Regulación de la Expresión Génica de las Plantas , Germinación , Oryza/genética , Fosforilación , Proteínas de Plantas/genética , Proteínas Quinasas/genética , Plantones/fisiología , Semillas/crecimiento & desarrollo , Estrés Fisiológico/genética , Factores de Transcripción/metabolismoRESUMEN
The diagnosis of Parkinson's disease (PD) is usually not established until advanced neurodegeneration leads to clinically detectable symptoms. Previous blood PD transcriptome studies show low concordance, possibly resulting from the use of microarray technology, which has high measurement variation. The Leucine-rich repeat kinase 2 (LRRK2) G2019S mutation predisposes to PD. Using preclinical and clinical studies, we sought to develop a novel statistically motivated transcriptomic-based approach to identify a molecular signature in the blood of Ashkenazi Jewish PD patients, including LRRK2 mutation carriers. Using a digital gene expression platform to quantify 175 messenger RNA (mRNA) markers with low coefficients of variation (CV), we first compared whole-blood transcript levels in mouse models (1) overexpressing wild-type (WT) LRRK2, (2) overexpressing G2019S LRRK2, (3) lacking LRRK2 (knockout), and (4) and in WT controls. We then studied an Ashkenazi Jewish cohort of 34 symptomatic PD patients (both WT LRRK2 and G2019S LRRK2) and 32 asymptomatic controls. The expression profiles distinguished the four mouse groups with different genetic background. In patients, we detected significant differences in blood transcript levels both between individuals differing in LRRK2 genotype and between PD patients and controls. Discriminatory PD markers included genes associated with innate and adaptive immunity and inflammatory disease. Notably, gene expression patterns in levodopa-treated PD patients were significantly closer to those of healthy controls in a dose-dependent manner. We identify whole-blood mRNA signatures correlating with LRRK2 genotype and with PD disease state. This approach may provide insight into pathogenesis and a route to early disease detection.