Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Front Genet ; 15: 1371607, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38798697

RESUMO

A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified "gold standard" edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on "gold standard" regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.

2.
J Exp Bot ; 75(11): 3596-3611, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38477678

RESUMO

The best ideotypes are under mounting pressure due to increased aridity. Understanding the conserved molecular mechanisms that evolve in wild plants adapted to harsh environments is crucial in developing new strategies for agriculture. Yet our knowledge of such mechanisms in wild species is scant. We performed metabolic pathway reconstruction using transcriptome information from 32 Atacama and phylogenetically related species that do not live in Atacama (sister species). We analyzed reaction enrichment to understand the commonalities and differences of Atacama plants. To gain insights into the mechanisms that ensure survival, we compared expressed gene isoform numbers and gene expression patterns between the annotated biochemical reactions from 32 Atacama and sister species. We found biochemical convergences characterized by reactions enriched in at least 50% of the Atacama species, pointing to potential advantages against drought and nitrogen starvation, for instance. These findings suggest that the adaptation in the Atacama Desert may result in part from shared genetic legacies governing the expression of key metabolic pathways to face harsh conditions. Enriched reactions corresponded to ubiquitous compounds common to extreme and agronomic species and were congruent with our previous metabolomic analyses. Convergent adaptive traits offer promising candidates for improving abiotic stress resilience in crop species.


Assuntos
Clima Desértico , Filogenia , Transcriptoma , Chile , Adaptação Fisiológica , Redes e Vias Metabólicas
3.
BMC Bioinformatics ; 24(1): 114, 2023 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-36964499

RESUMO

This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.


Assuntos
Algoritmos , Software , Teorema de Bayes , RNA-Seq
4.
Science ; 374(6575): eaba5531, 2021 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-34941412

RESUMO

In the plant meristem, tissue-wide maturation gradients are coordinated with specialized cell networks to establish various developmental phases required for indeterminate growth. Here, we used single-cell transcriptomics to reconstruct the protophloem developmental trajectory from the birth of cell progenitors to terminal differentiation in the Arabidopsis thaliana root. PHLOEM EARLY DNA-BINDING-WITH-ONE-FINGER (PEAR) transcription factors mediate lineage bifurcation by activating guanosine triphosphatase signaling and prime a transcriptional differentiation program. This program is initially repressed by a meristem-wide gradient of PLETHORA transcription factors. Only the dissipation of PLETHORA gradient permits activation of the differentiation program that involves mutual inhibition of early versus late meristem regulators. Thus, for phloem development, broad maturation gradients interface with cell-type-specific transcriptional regulators to stage cellular differentiation.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/citologia , Floema/citologia , Floema/crescimento & desenvolvimento , Raízes de Plantas/citologia , Fatores de Transcrição/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Diferenciação Celular , Proteínas de Ligação ao GTP/genética , Proteínas de Ligação ao GTP/metabolismo , Meristema/citologia , Floema/genética , Floema/metabolismo , Raízes de Plantas/genética , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/metabolismo , RNA-Seq , Transdução de Sinais , Análise de Célula Única , Fatores de Transcrição/genética , Transcriptoma
5.
BMC Bioinformatics ; 22(1): 359, 2021 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-34215187

RESUMO

BACKGROUND: Systems biology increasingly relies on deep sequencing with combinatorial index tags to associate biological sequences with their sample, cell, or molecule of origin. Accurate data interpretation depends on the ability to classify sequences based on correct decoding of these combinatorial barcodes. The probability of correct decoding is influenced by both sequence quality and the number and arrangement of barcodes. The rising complexity of experimental designs calls for a probability model that accounts for both sequencing errors and random noise, generalizes to multiple combinatorial tags, and can handle any barcoding scheme. The needs for reproducibility and community benchmark standards demand a peer-reviewed tool that preserves decoding quality scores and provides tunable control over classification confidence that balances precision and recall. Moreover, continuous improvements in sequencing throughput require a fast, parallelized and scalable implementation. RESULTS AND DISCUSSION: We developed a flexible, robustly engineered software that performs probabilistic decoding and supports arbitrarily complex barcoding designs. Pheniqs computes the full posterior decoding error probability of observed barcodes by consulting basecalling quality scores and prior distributions, and reports sequences and confidence scores in Sequence Alignment/Map (SAM) fields. The product of posteriors for multiple independent barcodes provides an overall confidence score for each read. Pheniqs achieves greater accuracy than minimum edit distance or simple maximum likelihood estimation, and it scales linearly with core count to enable the classification of > 11 billion reads in 1 h 15 m using < 50 megabytes of memory. Pheniqs has been in production use for seven years in our genomics core facility. CONCLUSION: We introduce a computationally efficient software that implements both probabilistic and minimum distance decoders and show that decoding barcodes using posterior probabilities is more accurate than available methods. Pheniqs allows fine-tuning of decoding sensitivity using intuitive confidence thresholds and is extensible with alternative decoders and new error models. Any arbitrary arrangement of barcodes is easily configured, enabling computation of combinatorial confidence scores for any barcoding strategy. An optimized multithreaded implementation assures that Pheniqs is faster and scales better with complex barcode sets than existing tools. Support for POSIX streams and multiple sequencing formats enables easy integration with automated analysis pipelines.


Assuntos
Processamento Eletrônico de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Teorema de Bayes , Código de Barras de DNA Taxonômico , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Software
6.
IEEE Trans Pattern Anal Mach Intell ; 43(2): 663-678, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-31380747

RESUMO

SafePredict is a novel meta-algorithm that works with any base prediction algorithm for online data to guarantee an arbitrarily chosen correctness rate, 1-ϵ, by allowing refusals. Allowing refusals means that the meta-algorithm may refuse to emit a prediction produced by the base algorithm so that the error rate on non-refused predictions does not exceed ϵ. The SafePredict error bound does not rely on any assumptions on the data distribution or the base predictor. When the base predictor happens not to exceed the target error rate ϵ, SafePredict refuses only a finite number of times. When the error rate of the base predictor changes through time SafePredict makes use of a weight-shifting heuristic that adapts to these changes without knowing when the changes occur yet still maintains the correctness guarantee. Empirical results show that (i) SafePredict compares favorably with state-of-the-art confidence-based refusal mechanisms which fail to offer robust error guarantees; and (ii) combining SafePredict with such refusal mechanisms can in many cases further reduce the number of refusals. Our software is included in the supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2019.2932415.

7.
Sci Rep ; 10(1): 14141, 2020 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-32811842

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

8.
PLoS One ; 15(7): e0235663, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32716914

RESUMO

The Alzheimer's Disease Neuroimaging (ADNI) database is an expansive undertaking by government, academia, and industry to pool resources and data on subjects at various stage of symptomatic severity due to Alzheimer's disease. As expected, magnetic resonance imaging is a major component of the project. Full brain images are obtained at every 6-month visit. A range of cognitive tests studying executive function and memory are employed less frequently. Two blood draws (baseline, 6 months) provide samples to measure concentrations of approximately 145 plasma biomarkers. In addition, other diagnostic measurements are performed including PET imaging, cerebral spinal fluid measurements of amyloid-beta and tau peptides, as well as genetic tests, demographics, and vital signs. ADNI data is available upon review of an application. There have been numerous reports of how various processes evolve during AD progression, including alterations in metabolic and neuroendocrine activity, cell survival, and cognitive behavior. Lacking an analytic model at the onset, we leveraged recent advances in machine learning, which allow us to deal with large, non-linear systems with many variables. Of particular note was examining how well binary predictions of future disease states could be learned from simple, non-invasive measurements like those dependent on blood samples. Such measurements make relatively little demands on the time and effort of medical staff or patient. We report findings with recall/precision/area under the receiver operator curve after application of CART, Random Forest, Gradient Boosting, and Support Vector Machines, Our results show (i) Random Forests and Gradient Boosting work very well with such data, (ii) Prediction quality when applied to relatively easily obtained measurements (Cognitive scores, Genetic Risk and plasma biomarkers) achieve results that are competitive with magnetic resonance techniques. This is by no means an exhaustive study, but instead an exploration of the plausibility of defining a series of relatively inexpensive, broad population based tests.


Assuntos
Doença de Alzheimer/diagnóstico , Biomarcadores/metabolismo , Encéfalo/diagnóstico por imagem , Aprendizado de Máquina , Neuroimagem/métodos , Doença de Alzheimer/metabolismo , Doença de Alzheimer/patologia , Apolipoproteína A-V/sangue , Área Sob a Curva , Biomarcadores/sangue , Bases de Dados Factuais , Progressão da Doença , Humanos , Imageamento por Ressonância Magnética , Análise de Componente Principal , Curva ROC
9.
Sci Rep ; 10(1): 6804, 2020 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-32321967

RESUMO

The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Fatores de Transcrição/genética , Simulação por Computador , Perfilação da Expressão Gênica/estatística & dados numéricos , Aprendizado de Máquina , Reprodutibilidade dos Testes
10.
BMC Bioinformatics ; 20(Suppl 9): 366, 2019 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-31757212

RESUMO

BACKGROUND: Several large public repositories of microarray datasets and RNA-seq data are available. Two prominent examples include ArrayExpress and NCBI GEO. Unfortunately, there is no easy way to import and manipulate data from such resources, because the data is stored in large files, requiring large bandwidth to download and special purpose data manipulation tools to extract subsets relevant for the specific analysis. RESULTS: TACITuS is a web-based system that supports rapid query access to high-throughput microarray and NGS repositories. The system is equipped with modules capable of managing large files, storing them in a cloud environment and extracting subsets of data in an easy and efficient way. The system also supports the ability to import data into Galaxy for further analysis. CONCLUSIONS: TACITuS automates most of the pre-processing needed to analyze high-throughput microarray and NGS data from large publicly-available repositories. The system implements several modules to manage large files in an easy and efficient way. Furthermore, it is capable deal with Galaxy environment allowing users to analyze data through a user-friendly interface.


Assuntos
Big Data , Coleta de Dados , Software , Transcriptoma/genética , Linhagem Celular Tumoral , Bases de Dados Genéticas , Humanos , Interface Usuário-Computador
11.
Nat Commun ; 10(1): 1569, 2019 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-30952851

RESUMO

Charting a temporal path in gene networks requires linking early transcription factor (TF)-triggered events to downstream effects. We scale-up a cell-based TF-perturbation assay to identify direct regulated targets of 33 nitrogen (N)-early response TFs encompassing 88% of N-responsive Arabidopsis genes. We uncover a duality where each TF is an inducer and repressor, and in vitro cis-motifs are typically specific to regulation directionality. Validated TF-targets (71,836) are used to refine precision of a time-inferred root network, connecting 145 N-responsive TFs and 311 targets. These data are used to chart network paths from direct TF1-regulated targets identified in cells to indirect targets responding only in planta via Network Walking. We uncover network paths from TGA1 and CRF4 to direct TF2 targets, which in turn regulate 76% and 87% of TF1 indirect targets in planta, respectively. These results have implications for N-use and the approach can reveal temporal networks for any biological system.


Assuntos
Arabidopsis/genética , Redes Reguladoras de Genes , Nitrogênio/metabolismo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Proteínas de Arabidopsis/fisiologia , Fatores de Transcrição de Zíper de Leucina Básica/genética , Fatores de Transcrição de Zíper de Leucina Básica/metabolismo , Fatores de Transcrição de Zíper de Leucina Básica/fisiologia , Regulação da Expressão Gênica de Plantas , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição/fisiologia
12.
Interdiscip Sci ; 11(1): 21-32, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30790228

RESUMO

Many scientific applications entail solving the subgraph isomorphism problem, i.e., given an input pattern graph, find all the subgraphs of a (usually much larger) target graph that are structurally equivalent to that input. Because subgraph isomorphism is NP-complete, methods to solve it have to use heuristics. This work evaluates subgraph isomorphism methods to assess their computational behavior on a wide range of synthetic and real graphs. Surprisingly, our experiments show that, among the leading algorithms, certain heuristics based only on pattern graphs are the most efficient.


Assuntos
Algoritmos , Biologia Computacional/métodos , Heurística Computacional , Humanos , Software
13.
BMC Bioinformatics ; 19(1): 318, 2018 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-30200901

RESUMO

BACKGROUND: Networks whose nodes have labels can seem complex. Fortunately, many have substructures that occur often ("motifs"). A societal example of a motif might be a household. Replacing such motifs by named supernodes reduces the complexity of the network and can bring out insightful features. Doing so repeatedly may give hints about higher level structures of the network. We call this recursive process Recursive Supernode Extraction. RESULTS: This paper describes algorithms and a tool to discover disjoint (i.e. non-overlapping) motifs in a network, replacing those motifs by new nodes, and then recursing. We show applications in food-web and protein-protein interaction (PPI) networks where our methods reduce the complexity of the network and yield insights. CONCLUSIONS: SuperNoder is a web-based and standalone tool which enables the simplification of big graphs based on the reduction of high frequency motifs. It applies various strategies for identifying disjoint motifs with the goal of enhancing the understandability of networks.


Assuntos
Algoritmos , Biologia Computacional/métodos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Software , Humanos
14.
Proc Natl Acad Sci U S A ; 115(25): 6494-6499, 2018 06 19.
Artigo em Inglês | MEDLINE | ID: mdl-29769331

RESUMO

This study exploits time, the relatively unexplored fourth dimension of gene regulatory networks (GRNs), to learn the temporal transcriptional logic underlying dynamic nitrogen (N) signaling in plants. Our "just-in-time" analysis of time-series transcriptome data uncovered a temporal cascade of cis elements underlying dynamic N signaling. To infer transcription factor (TF)-target edges in a GRN, we applied a time-based machine learning method to 2,174 dynamic N-responsive genes. We experimentally determined a network precision cutoff, using TF-regulated genome-wide targets of three TF hubs (CRF4, SNZ, and CDF1), used to "prune" the network to 155 TFs and 608 targets. This network precision was reconfirmed using genome-wide TF-target regulation data for four additional TFs (TGA1, HHO5/6, and PHL1) not used in network pruning. These higher-confidence edges in the GRN were further filtered by independent TF-target binding data, used to calculate a TF "N-specificity" index. This refined GRN identifies the temporal relationship of known/validated regulators of N signaling (NLP7/8, TGA1/4, NAC4, HRS1, and LBD37/38/39) and 146 additional regulators. Six TFs-CRF4, SNZ, CDF1, HHO5/6, and PHL1-validated herein regulate a significant number of genes in the dynamic N response, targeting 54% of N-uptake/assimilation pathway genes. Phenotypically, inducible overexpression of CRF4 in planta regulates genes resulting in altered biomass, root development, and 15NO3- uptake, specifically under low-N conditions. This dynamic N-signaling GRN now provides the temporal "transcriptional logic" for 155 candidate TFs to improve nitrogen use efficiency with potential agricultural applications. Broadly, these time-based approaches can uncover the temporal transcriptional logic for any biological response system in biology, agriculture, or medicine.


Assuntos
Arabidopsis/genética , Arabidopsis/metabolismo , Regulação da Expressão Gênica de Plantas/genética , Redes Reguladoras de Genes/genética , Nitrogênio/metabolismo , Transcrição Gênica/genética , Proteínas de Arabidopsis/genética , Perfilação da Expressão Gênica/métodos , Lógica , Ligação Proteica/genética , Transdução de Sinais/genética , Fatores de Transcrição/genética
15.
Genome Biol ; 17(1): 184, 2016 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-27604469

RESUMO

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Assuntos
Biologia Computacional , Proteínas/química , Software , Relação Estrutura-Atividade , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Proteínas/genética
16.
F1000Res ; 4: 479, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26594341

RESUMO

We present NetMatchStar, a Cytoscape app to find all the occurrences of a query graph in a network and check for its significance as a motif with respect to seven different random models. The query can be uploaded or built from scratch using Cytoscape facilities. The app significantly enhances the previous NetMatch in style, performance and functionality. Notably NetMatchStar allows queries with wildcards.

17.
PLoS Comput Biol ; 10(6): e1003644, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24922051

RESUMO

Negative examples - genes that are known not to carry out a given protein function - are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).


Assuntos
Algoritmos , Bases de Dados Genéticas , Ontologia Genética , Proteínas/genética , Proteínas/fisiologia , Animais , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/fisiologia , Inteligência Artificial , Biologia Computacional , Genoma , Humanos , Camundongos , Anotação de Sequência Molecular , Proteoma , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiologia
18.
Nucleic Acids Res ; 42(9): 5416-25, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24627222

RESUMO

RNAi is a powerful tool for the regulation of gene expression. It is widely and successfully employed in functional studies and is now emerging as a promising therapeutic approach. Several RNAi-based clinical trials suggest encouraging results in the treatment of a variety of diseases, including cancer. Here we present miR-Synth, a computational resource for the design of synthetic microRNAs able to target multiple genes in multiple sites. The proposed strategy constitutes a valid alternative to the use of siRNA, allowing the employment of a fewer number of molecules for the inhibition of multiple targets. This may represent a great advantage in designing therapies for diseases caused by crucial cellular pathways altered by multiple dysregulated genes. The system has been successfully validated on two of the most prominent genes associated to lung cancer, c-MET and Epidermal Growth Factor Receptor (EGFR). (See http://microrna.osumc.edu/mir-synth).


Assuntos
Técnicas de Silenciamento de Genes , MicroRNAs/genética , Software , Regiões 3' não Traduzidas , Sequência de Bases , Receptores ErbB/biossíntese , Receptores ErbB/genética , Expressão Gênica , Genes Reporter , Células HEK293 , Células HeLa , Humanos , Luciferases de Renilla/biossíntese , Luciferases de Renilla/genética , Proteínas Proto-Oncogênicas c-met/biossíntese , Proteínas Proto-Oncogênicas c-met/genética , Interferência de RNA
19.
Artigo em Inglês | MEDLINE | ID: mdl-25566532

RESUMO

The use of synthetic non-coding RNAs for post-transcriptional regulation of gene expression has not only become a standard laboratory tool for gene functional studies but it has also opened up new perspectives in the design of new and potentially promising therapeutic strategies. Bioinformatics has provided researchers with a variety of tools for the design, the analysis, and the evaluation of RNAi agents such as small-interfering RNA (siRNA), short-hairpin RNA (shRNA), artificial microRNA (a-miR), and microRNA sponges. More recently, a new system for genome engineering based on the bacterial CRISPR-Cas9 system (Clustered Regularly Interspaced Short Palindromic Repeats), was shown to have the potential to also regulate gene expression at both transcriptional and post-transcriptional level in a more specific way. In this mini review, we present RNAi and CRISPRi design principles and discuss the advantages and limitations of the current design approaches.

20.
PLoS One ; 8(10): e76911, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24167551

RESUMO

Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.


Assuntos
Antivirais , Bases de Dados Factuais , Modelos Biológicos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA