Búsqueda | Portal Regional de la BVS Paraguay

1.

Heuristic energy-based cyclic peptide design.

Zhu, Qiyao; Mulligan, Vikram Khipple; Shasha, Dennis.

bioRxiv ; 2024 Jul 04.

Artículo en Inglés | MEDLINE | ID: mdl-39005429

RESUMEN

Rational computational design is crucial to the pursuit of novel drugs and therapeutic agents. Meso-scale cyclic peptides, which consist of 7-40 amino acid residues, are of particular interest due to their conformational rigidity, binding specificity, degradation resistance, and potential cell permeability. Because there are few natural cyclic peptides, de novo design involving non-canonical amino acids is a potentially useful goal. Here, we develop an efficient pipeline (CyclicChamp) for cyclic peptide design. After converting the cyclic constraint into an error function, we employ a variant of simulated annealing to search for low-energy peptide backbones while maintaining peptide closure. Compared to the previous random sampling approach, which was capable of sampling conformations of cyclic peptides of up to 14 residues, our method both greatly accelerates the computation speed for sampling conformations of small macrocycles (ca. 7 residues), and addresses the high-dimensionality challenge that large macrocycle designs often encounter. As a result, CyclicChamp makes conformational sampling tractable for 15- to 24-residue cyclic peptides, thus permitting the design of macrocycles in this size range. Microsecond-length molecular dynamics simulations on the resulting 15, 20, and 24 amino acid cyclic designs identify trajectories with kinetic stability. To test their thermodynamic stability, we perform additional replica exchange molecular dynamics simulations and generate free energy surfaces. Two 15-residue designs and one 20-residue design emerge as promising candidates, along with one viable 24-residue candidate.

2.

Corrigendum: Bipartite networks represent causality better than simple networks: Evidence, algorithms, and applications.

Shen, Bingran; Coruzzi, Gloria M; Shasha, Dennis.

Front Genet ; 15: 1440665, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38957809

RESUMEN

[This corrects the article DOI: 10.3389/fgene.2024.1371607.].

3.

Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications.

Shen, Bingran; Curozzi, Gloria; Shasha, Dennis.

Front Genet ; 15: 1371607, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38798697

RESUMEN

A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified "gold standard" edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on "gold standard" regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.

4.

Phylogenetically diverse wild plant species use common biochemical strategies to thrive in the Atacama Desert.

Dussarrat, Thomas; Nilo-Poyanco, Ricardo; Moyano, Tomás C; Prigent, Sylvain; Jeffers, Tim L; Díaz, Francisca P; Decros, Guillaume; Audi, Lauren; Sondervan, Veronica M; Shen, Bingran; Araus, Viviana; Rolin, Dominique; Shasha, Dennis; Coruzzi, Gloria M; Gibon, Yves; Latorre, Claudio; Pétriacq, Pierre; Gutiérrez, Rodrigo A.

J Exp Bot ; 75(11): 3596-3611, 2024 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-38477678

RESUMEN

The best ideotypes are under mounting pressure due to increased aridity. Understanding the conserved molecular mechanisms that evolve in wild plants adapted to harsh environments is crucial in developing new strategies for agriculture. Yet our knowledge of such mechanisms in wild species is scant. We performed metabolic pathway reconstruction using transcriptome information from 32 Atacama and phylogenetically related species that do not live in Atacama (sister species). We analyzed reaction enrichment to understand the commonalities and differences of Atacama plants. To gain insights into the mechanisms that ensure survival, we compared expressed gene isoform numbers and gene expression patterns between the annotated biochemical reactions from 32 Atacama and sister species. We found biochemical convergences characterized by reactions enriched in at least 50% of the Atacama species, pointing to potential advantages against drought and nitrogen starvation, for instance. These findings suggest that the adaptation in the Atacama Desert may result in part from shared genetic legacies governing the expression of key metabolic pathways to face harsh conditions. Enriched reactions corresponded to ubiquitous compounds common to extreme and agronomic species and were congruent with our previous metabolomic analyses. Convergent adaptive traits offer promising candidates for improving abiotic stress resilience in crop species.

Asunto(s)

Clima Desértico , Filogenia , Transcriptoma , Chile , Adaptación Fisiológica , Redes y Vías Metabólicas

5.

EnsInfer: a simple ensemble approach to network inference outperforms any single method.

Shen, Bingran; Coruzzi, Gloria; Shasha, Dennis.

BMC Bioinformatics ; 24(1): 114, 2023 Mar 24.

Artículo en Inglés | MEDLINE | ID: mdl-36964499

RESUMEN

This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.

Asunto(s)

Algoritmos , Programas Informáticos , Teorema de Bayes , RNA-Seq

6.

Cell-by-cell dissection of phloem development links a maturation gradient to cell specialization.

Roszak, Pawel; Heo, Jung-Ok; Blob, Bernhard; Toyokura, Koichi; Sugiyama, Yuki; de Luis Balaguer, Maria Angels; Lau, Winnie W Y; Hamey, Fiona; Cirrone, Jacopo; Madej, Ewelina; Bouatta, Alida M; Wang, Xin; Guichard, Marjorie; Ursache, Robertas; Tavares, Hugo; Verstaen, Kevin; Wendrich, Jos; Melnyk, Charles W; Oda, Yoshihisa; Shasha, Dennis; Ahnert, Sebastian E; Saeys, Yvan; De Rybel, Bert; Heidstra, Renze; Scheres, Ben; Grossmann, Guido; Mähönen, Ari Pekka; Denninger, Philipp; Göttgens, Berthold; Sozzani, Rosangela; Birnbaum, Kenneth D; Helariutta, Yrjö.

Science ; 374(6575): eaba5531, 2021 Dec 24.

Artículo en Inglés | MEDLINE | ID: mdl-34941412

RESUMEN

In the plant meristem, tissue-wide maturation gradients are coordinated with specialized cell networks to establish various developmental phases required for indeterminate growth. Here, we used single-cell transcriptomics to reconstruct the protophloem developmental trajectory from the birth of cell progenitors to terminal differentiation in the Arabidopsis thaliana root. PHLOEM EARLY DNA-BINDING-WITH-ONE-FINGER (PEAR) transcription factors mediate lineage bifurcation by activating guanosine triphosphatase signaling and prime a transcriptional differentiation program. This program is initially repressed by a meristem-wide gradient of PLETHORA transcription factors. Only the dissipation of PLETHORA gradient permits activation of the differentiation program that involves mutual inhibition of early versus late meristem regulators. Thus, for phloem development, broad maturation gradients interface with cell-type-specific transcriptional regulators to stage cellular differentiation.

Asunto(s)

Proteínas de Arabidopsis/metabolismo , Arabidopsis/citología , Floema/citología , Floema/crecimiento & desarrollo , Raíces de Plantas/citología , Factores de Transcripción/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Diferenciación Celular , Proteínas de Unión al GTP/genética , Proteínas de Unión al GTP/metabolismo , Meristema/citología , Floema/genética , Floema/metabolismo , Raíces de Plantas/genética , Raíces de Plantas/crecimiento & desarrollo , Raíces de Plantas/metabolismo , RNA-Seq , Transducción de Señal , Análisis de la Célula Individual , Factores de Transcripción/genética , Transcriptoma

7.

Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing.

Galanti, Lior; Shasha, Dennis; Gunsalus, Kristin C.

BMC Bioinformatics ; 22(1): 359, 2021 Jul 02.

Artículo en Inglés | MEDLINE | ID: mdl-34215187

RESUMEN

BACKGROUND: Systems biology increasingly relies on deep sequencing with combinatorial index tags to associate biological sequences with their sample, cell, or molecule of origin. Accurate data interpretation depends on the ability to classify sequences based on correct decoding of these combinatorial barcodes. The probability of correct decoding is influenced by both sequence quality and the number and arrangement of barcodes. The rising complexity of experimental designs calls for a probability model that accounts for both sequencing errors and random noise, generalizes to multiple combinatorial tags, and can handle any barcoding scheme. The needs for reproducibility and community benchmark standards demand a peer-reviewed tool that preserves decoding quality scores and provides tunable control over classification confidence that balances precision and recall. Moreover, continuous improvements in sequencing throughput require a fast, parallelized and scalable implementation. RESULTS AND DISCUSSION: We developed a flexible, robustly engineered software that performs probabilistic decoding and supports arbitrarily complex barcoding designs. Pheniqs computes the full posterior decoding error probability of observed barcodes by consulting basecalling quality scores and prior distributions, and reports sequences and confidence scores in Sequence Alignment/Map (SAM) fields. The product of posteriors for multiple independent barcodes provides an overall confidence score for each read. Pheniqs achieves greater accuracy than minimum edit distance or simple maximum likelihood estimation, and it scales linearly with core count to enable the classification of > 11 billion reads in 1 h 15 m using < 50 megabytes of memory. Pheniqs has been in production use for seven years in our genomics core facility. CONCLUSION: We introduce a computationally efficient software that implements both probabilistic and minimum distance decoders and show that decoding barcodes using posterior probabilities is more accurate than available methods. Pheniqs allows fine-tuning of decoding sensitivity using intuitive confidence thresholds and is extensible with alternative decoders and new error models. Any arbitrary arrangement of barcodes is easily configured, enabling computation of combinatorial confidence scores for any barcoding strategy. An optimized multithreaded implementation assures that Pheniqs is faster and scales better with complex barcode sets than existing tools. Support for POSIX streams and multiple sequencing formats enables easy integration with automated analysis pipelines.

Asunto(s)

Procesamiento Automatizado de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Teorema de Bayes , Código de Barras del ADN Taxonómico , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Programas Informáticos

8.

SafePredict: A Meta-Algorithm for Machine Learning That Uses Refusals to Guarantee Correctness.

Kocak, Mustafa A; Ramirez, David; Erkip, Elza; Shasha, Dennis E.

IEEE Trans Pattern Anal Mach Intell ; 43(2): 663-678, 2021 02.

Artículo en Inglés | MEDLINE | ID: mdl-31380747

RESUMEN

SafePredict is a novel meta-algorithm that works with any base prediction algorithm for online data to guarantee an arbitrarily chosen correctness rate, 1-Ïµ, by allowing refusals. Allowing refusals means that the meta-algorithm may refuse to emit a prediction produced by the base algorithm so that the error rate on non-refused predictions does not exceed Ïµ. The SafePredict error bound does not rely on any assumptions on the data distribution or the base predictor. When the base predictor happens not to exceed the target error rate Ïµ, SafePredict refuses only a finite number of times. When the error rate of the base predictor changes through time SafePredict makes use of a weight-shifting heuristic that adapts to these changes without knowing when the changes occur yet still maintains the correctness guarantee. Empirical results show that (i) SafePredict compares favorably with state-of-the-art confidence-based refusal mechanisms which fail to offer robust error guarantees; and (ii) combining SafePredict with such refusal mechanisms can in many cases further reduce the number of refusals. Our software is included in the supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2019.2932415.

9.

Author Correction: OutPredict: multiple datasets can improve prediction of expression and inference of causality.

Cirrone, Jacopo; Brooks, Matthew D; Bonneau, Richard; Coruzzi, Gloria M; Shasha, Dennis E.

Sci Rep ; 10(1): 14141, 2020 Aug 19.

Artículo en Inglés | MEDLINE | ID: mdl-32811842

RESUMEN

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

10.

Inexpensive, non-invasive biomarkers predict Alzheimer transition using machine learning analysis of the Alzheimer's Disease Neuroimaging (ADNI) database.

Beltrán, Juan Felipe; Wahba, Brandon Malik; Hose, Nicole; Shasha, Dennis; Kline, Richard P.

PLoS One ; 15(7): e0235663, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32716914

RESUMEN

The Alzheimer's Disease Neuroimaging (ADNI) database is an expansive undertaking by government, academia, and industry to pool resources and data on subjects at various stage of symptomatic severity due to Alzheimer's disease. As expected, magnetic resonance imaging is a major component of the project. Full brain images are obtained at every 6-month visit. A range of cognitive tests studying executive function and memory are employed less frequently. Two blood draws (baseline, 6 months) provide samples to measure concentrations of approximately 145 plasma biomarkers. In addition, other diagnostic measurements are performed including PET imaging, cerebral spinal fluid measurements of amyloid-beta and tau peptides, as well as genetic tests, demographics, and vital signs. ADNI data is available upon review of an application. There have been numerous reports of how various processes evolve during AD progression, including alterations in metabolic and neuroendocrine activity, cell survival, and cognitive behavior. Lacking an analytic model at the onset, we leveraged recent advances in machine learning, which allow us to deal with large, non-linear systems with many variables. Of particular note was examining how well binary predictions of future disease states could be learned from simple, non-invasive measurements like those dependent on blood samples. Such measurements make relatively little demands on the time and effort of medical staff or patient. We report findings with recall/precision/area under the receiver operator curve after application of CART, Random Forest, Gradient Boosting, and Support Vector Machines, Our results show (i) Random Forests and Gradient Boosting work very well with such data, (ii) Prediction quality when applied to relatively easily obtained measurements (Cognitive scores, Genetic Risk and plasma biomarkers) achieve results that are competitive with magnetic resonance techniques. This is by no means an exhaustive study, but instead an exploration of the plausibility of defining a series of relatively inexpensive, broad population based tests.

Asunto(s)

Enfermedad de Alzheimer/diagnóstico , Biomarcadores/metabolismo , Encéfalo/diagnóstico por imagen , Aprendizaje Automático , Neuroimagen/métodos , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/patología , Apolipoproteína A-V/sangre , Área Bajo la Curva , Biomarcadores/sangre , Bases de Datos Factuales , Progresión de la Enfermedad , Humanos , Imagen por Resonancia Magnética , Análisis de Componente Principal , Curva ROC

11.

OutPredict: multiple datasets can improve prediction of expression and inference of causality.

Cirrone, Jacopo; Brooks, Matthew D; Bonneau, Richard; Coruzzi, Gloria M; Shasha, Dennis E.

Sci Rep ; 10(1): 6804, 2020 04 22.

Artículo en Inglés | MEDLINE | ID: mdl-32321967

RESUMEN

The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Factores de Transcripción/genética , Simulación por Computador , Perfilación de la Expresión Génica/estadística & datos numéricos , Aprendizaje Automático , Reproducibilidad de los Resultados

12.

TACITuS: transcriptomic data collector, integrator, and selector on big data platform.

Alaimo, Salvatore; Di Maria, Antonio; Shasha, Dennis; Ferro, Alfredo; Pulvirenti, Alfredo.

BMC Bioinformatics ; 20(Suppl 9): 366, 2019 Nov 22.

Artículo en Inglés | MEDLINE | ID: mdl-31757212

RESUMEN

BACKGROUND: Several large public repositories of microarray datasets and RNA-seq data are available. Two prominent examples include ArrayExpress and NCBI GEO. Unfortunately, there is no easy way to import and manipulate data from such resources, because the data is stored in large files, requiring large bandwidth to download and special purpose data manipulation tools to extract subsets relevant for the specific analysis. RESULTS: TACITuS is a web-based system that supports rapid query access to high-throughput microarray and NGS repositories. The system is equipped with modules capable of managing large files, storing them in a cloud environment and extracting subsets of data in an easy and efficient way. The system also supports the ability to import data into Galaxy for further analysis. CONCLUSIONS: TACITuS automates most of the pre-processing needed to analyze high-throughput microarray and NGS data from large publicly-available repositories. The system implements several modules to manage large files in an easy and efficient way. Furthermore, it is capable deal with Galaxy environment allowing users to analyze data through a user-friendly interface.

Asunto(s)

Macrodatos , Recolección de Datos , Programas Informáticos , Transcriptoma/genética , Línea Celular Tumoral , Bases de Datos Genéticas , Humanos , Interfaz Usuario-Computador

13.

Network Walking charts transcriptional dynamics of nitrogen signaling by integrating validated and predicted genome-wide interactions.

Brooks, Matthew D; Cirrone, Jacopo; Pasquino, Angelo V; Alvarez, Jose M; Swift, Joseph; Mittal, Shipra; Juang, Che-Lun; Varala, Kranthi; Gutiérrez, Rodrigo A; Krouk, Gabriel; Shasha, Dennis; Coruzzi, Gloria M.

Nat Commun ; 10(1): 1569, 2019 04 05.

Artículo en Inglés | MEDLINE | ID: mdl-30952851

RESUMEN

Charting a temporal path in gene networks requires linking early transcription factor (TF)-triggered events to downstream effects. We scale-up a cell-based TF-perturbation assay to identify direct regulated targets of 33 nitrogen (N)-early response TFs encompassing 88% of N-responsive Arabidopsis genes. We uncover a duality where each TF is an inducer and repressor, and in vitro cis-motifs are typically specific to regulation directionality. Validated TF-targets (71,836) are used to refine precision of a time-inferred root network, connecting 145 N-responsive TFs and 311 targets. These data are used to chart network paths from direct TF1-regulated targets identified in cells to indirect targets responding only in planta via Network Walking. We uncover network paths from TGA1 and CRF4 to direct TF2 targets, which in turn regulate 76% and 87% of TF1 indirect targets in planta, respectively. These results have implications for N-use and the approach can reveal temporal networks for any biological system.

Asunto(s)

Arabidopsis/genética , Redes Reguladoras de Genes , Nitrógeno/metabolismo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Proteínas de Arabidopsis/fisiología , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/metabolismo , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/fisiología , Regulación de la Expresión Génica de las Plantas , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Factores de Transcripción/fisiología

14.

Fast Subgraph Matching Strategies Based on Pattern-Only Heuristics.

Aparo, Antonino; Bonnici, Vincenzo; Micale, Giovanni; Ferro, Alfredo; Shasha, Dennis; Pulvirenti, Alfredo; Giugno, Rosalba.

Interdiscip Sci ; 11(1): 21-32, 2019 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-30790228

RESUMEN

Many scientific applications entail solving the subgraph isomorphism problem, i.e., given an input pattern graph, find all the subgraphs of a (usually much larger) target graph that are structurally equivalent to that input. Because subgraph isomorphism is NP-complete, methods to solve it have to use heuristics. This work evaluates subgraph isomorphism methods to assess their computational behavior on a wide range of synthetic and real graphs. Surprisingly, our experiments show that, among the leading algorithms, certain heuristics based only on pattern graphs are the most efficient.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Heurística Computacional , Humanos , Programas Informáticos

15.

SuperNoder: a tool to discover over-represented modular structures in networks.

Dessì, Danilo; Cirrone, Jacopo; Recupero, Diego Reforgiato; Shasha, Dennis.

BMC Bioinformatics ; 19(1): 318, 2018 Sep 10.

Artículo en Inglés | MEDLINE | ID: mdl-30200901

RESUMEN

BACKGROUND: Networks whose nodes have labels can seem complex. Fortunately, many have substructures that occur often ("motifs"). A societal example of a motif might be a household. Replacing such motifs by named supernodes reduces the complexity of the network and can bring out insightful features. Doing so repeatedly may give hints about higher level structures of the network. We call this recursive process Recursive Supernode Extraction. RESULTS: This paper describes algorithms and a tool to discover disjoint (i.e. non-overlapping) motifs in a network, replacing those motifs by new nodes, and then recursing. We show applications in food-web and protein-protein interaction (PPI) networks where our methods reduce the complexity of the network and yield insights. CONCLUSIONS: SuperNoder is a web-based and standalone tool which enables the simplification of big graphs based on the reduction of high frequency motifs. It applies various strategies for identifying disjoint motifs with the goal of enhancing the understandability of networks.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Redes y Vías Metabólicas , Mapas de Interacción de Proteínas , Programas Informáticos , Humanos

16.

Temporal transcriptional logic of dynamic regulatory networks underlying nitrogen signaling and use in plants.

Varala, Kranthi; Marshall-Colón, Amy; Cirrone, Jacopo; Brooks, Matthew D; Pasquino, Angelo V; Léran, Sophie; Mittal, Shipra; Rock, Tara M; Edwards, Molly B; Kim, Grace J; Ruffel, Sandrine; McCombie, W Richard; Shasha, Dennis; Coruzzi, Gloria M.

Proc Natl Acad Sci U S A ; 115(25): 6494-6499, 2018 06 19.

Artículo en Inglés | MEDLINE | ID: mdl-29769331

RESUMEN

This study exploits time, the relatively unexplored fourth dimension of gene regulatory networks (GRNs), to learn the temporal transcriptional logic underlying dynamic nitrogen (N) signaling in plants. Our "just-in-time" analysis of time-series transcriptome data uncovered a temporal cascade of cis elements underlying dynamic N signaling. To infer transcription factor (TF)-target edges in a GRN, we applied a time-based machine learning method to 2,174 dynamic N-responsive genes. We experimentally determined a network precision cutoff, using TF-regulated genome-wide targets of three TF hubs (CRF4, SNZ, and CDF1), used to "prune" the network to 155 TFs and 608 targets. This network precision was reconfirmed using genome-wide TF-target regulation data for four additional TFs (TGA1, HHO5/6, and PHL1) not used in network pruning. These higher-confidence edges in the GRN were further filtered by independent TF-target binding data, used to calculate a TF "N-specificity" index. This refined GRN identifies the temporal relationship of known/validated regulators of N signaling (NLP7/8, TGA1/4, NAC4, HRS1, and LBD37/38/39) and 146 additional regulators. Six TFs-CRF4, SNZ, CDF1, HHO5/6, and PHL1-validated herein regulate a significant number of genes in the dynamic N response, targeting 54% of N-uptake/assimilation pathway genes. Phenotypically, inducible overexpression of CRF4 in planta regulates genes resulting in altered biomass, root development, and 15NO3- uptake, specifically under low-N conditions. This dynamic N-signaling GRN now provides the temporal "transcriptional logic" for 155 candidate TFs to improve nitrogen use efficiency with potential agricultural applications. Broadly, these time-based approaches can uncover the temporal transcriptional logic for any biological response system in biology, agriculture, or medicine.

Asunto(s)

Arabidopsis/genética , Arabidopsis/metabolismo , Regulación de la Expresión Génica de las Plantas/genética , Redes Reguladoras de Genes/genética , Nitrógeno/metabolismo , Transcripción Genética/genética , Proteínas de Arabidopsis/genética , Perfilación de la Expresión Génica/métodos , Lógica , Unión Proteica/genética , Transducción de Señal/genética , Factores de Transcripción/genética

17.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Jiang, Yuxiang; Oron, Tal Ronnen; Clark, Wyatt T; Bankapur, Asma R; D'Andrea, Daniel; Lepore, Rosalba; Funk, Christopher S; Kahanda, Indika; Verspoor, Karin M; Ben-Hur, Asa; Koo, Da Chen Emily; Penfold-Brown, Duncan; Shasha, Dennis; Youngs, Noah; Bonneau, Richard; Lin, Alexandra; Sahraeian, Sayed M E; Martelli, Pier Luigi; Profiti, Giuseppe; Casadio, Rita; Cao, Renzhi; Zhong, Zhaolong; Cheng, Jianlin; Altenhoff, Adrian; Skunca, Nives; Dessimoz, Christophe; Dogan, Tunca; Hakala, Kai; Kaewphan, Suwisa; Mehryary, Farrokh; Salakoski, Tapio; Ginter, Filip; Fang, Hai; Smithers, Ben; Oates, Matt; Gough, Julian; Törönen, Petri; Koskinen, Patrik; Holm, Liisa; Chen, Ching-Tai; Hsu, Wen-Lian; Bryson, Kevin; Cozzetto, Domenico; Minneci, Federico; Jones, David T; Chapman, Samuel; Bkc, Dukka; Khan, Ishita K; Kihara, Daisuke; Ofer, Dan.

Genome Biol ; 17(1): 184, 2016 09 07.

Artículo en Inglés | MEDLINE | ID: mdl-27604469

RESUMEN

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

Asunto(s)

Biología Computacional , Proteínas/química , Programas Informáticos , Relación Estructura-Actividad , Algoritmos , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Anotación de Secuencia Molecular , Proteínas/genética

18.

NetMatchStar: an enhanced Cytoscape network querying app.

Rinnone, Fabio; Micale, Giovanni; Bonnici, Vincenzo; Bader, Gary D; Shasha, Dennis; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba.

F1000Res ; 4: 479, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26594341

RESUMEN

We present NetMatchStar, a Cytoscape app to find all the occurrences of a query graph in a network and check for its significance as a motif with respect to seven different random models. The query can be uploaded or built from scratch using Cytoscape facilities. The app significantly enhances the previous NetMatch in style, performance and functionality. Notably NetMatchStar allows queries with wildcards.

19.

Negative example selection for protein function prediction: the NoGO database.

Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis.

PLoS Comput Biol ; 10(6): e1003644, 2014 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-24922051

RESUMEN

Negative examples - genes that are known not to carry out a given protein function - are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

Asunto(s)

Algoritmos , Bases de Datos Genéticas , Ontología de Genes , Proteínas/genética , Proteínas/fisiología , Animales , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/fisiología , Inteligencia Artificial , Biología Computacional , Genoma , Humanos , Ratones , Anotación de Secuencia Molecular , Proteoma , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiología

20.

miR-Synth: a computational resource for the design of multi-site multi-target synthetic miRNAs.

Laganà, Alessandro; Acunzo, Mario; Romano, Giulia; Pulvirenti, Alfredo; Veneziano, Dario; Cascione, Luciano; Giugno, Rosalba; Gasparini, Pierluigi; Shasha, Dennis; Ferro, Alfredo; Croce, Carlo Maria.

Nucleic Acids Res ; 42(9): 5416-25, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24627222

RESUMEN

RNAi is a powerful tool for the regulation of gene expression. It is widely and successfully employed in functional studies and is now emerging as a promising therapeutic approach. Several RNAi-based clinical trials suggest encouraging results in the treatment of a variety of diseases, including cancer. Here we present miR-Synth, a computational resource for the design of synthetic microRNAs able to target multiple genes in multiple sites. The proposed strategy constitutes a valid alternative to the use of siRNA, allowing the employment of a fewer number of molecules for the inhibition of multiple targets. This may represent a great advantage in designing therapies for diseases caused by crucial cellular pathways altered by multiple dysregulated genes. The system has been successfully validated on two of the most prominent genes associated to lung cancer, c-MET and Epidermal Growth Factor Receptor (EGFR). (See http://microrna.osumc.edu/mir-synth).

Asunto(s)

Técnicas de Silenciamiento del Gen , MicroARNs/genética , Programas Informáticos , Regiones no Traducidas 3' , Secuencia de Bases , Receptores ErbB/biosíntesis , Receptores ErbB/genética , Expresión Génica , Genes Reporteros , Células HEK293 , Células HeLa , Humanos , Luciferasas de Renilla/biosíntesis , Luciferasas de Renilla/genética , Proteínas Proto-Oncogénicas c-met/biosíntesis , Proteínas Proto-Oncogénicas c-met/genética , Interferencia de ARN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA