Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Blood ; 141(6): 645-658, 2023 02 09.
Artículo en Inglés | MEDLINE | ID: mdl-36223592

RESUMEN

The mechanisms of coordinated changes in proteome composition and their relevance for the differentiation of neutrophil granulocytes are not well studied. Here, we discover 2 novel human genetic defects in signal recognition particle receptor alpha (SRPRA) and SRP19, constituents of the mammalian cotranslational targeting machinery, and characterize their roles in neutrophil granulocyte differentiation. We systematically study the proteome of neutrophil granulocytes from patients with variants in the SRP genes, HAX1, and ELANE, and identify global as well as specific proteome aberrations. Using in vitro differentiation of human induced pluripotent stem cells and in vivo zebrafish models, we study the effects of SRP deficiency on neutrophil granulocyte development. In a heterologous cell-based inducible protein expression system, we validate the effects conferred by SRP dysfunction for selected proteins that we identified in our proteome screen. Thus, SRP-dependent protein processing, intracellular trafficking, and homeostasis are critically important for the differentiation of neutrophil granulocytes.


Asunto(s)
Células Madre Pluripotentes Inducidas , Proteoma , Animales , Humanos , Pez Cebra , Genética Humana , Mamíferos , Proteínas Adaptadoras Transductoras de Señales
2.
Brief Bioinform ; 22(1): 545-556, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32026945

RESUMEN

MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR. CONTACT: ludwig.geistlinger@sph.cuny.edu.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genómica/métodos , RNA-Seq/métodos , Animales , Benchmarking , Bases de Datos Genéticas/normas , Perfilación de la Expresión Génica/normas , Genómica/normas , Humanos , RNA-Seq/normas , Programas Informáticos
3.
Mol Cell Proteomics ; 18(9): 1880-1892, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31235637

RESUMEN

Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of protein expression in a wide range of biological and biomedical applications. Protein expression changes need to be reliably derived from many measured peptide intensities and their corresponding peptide fold changes. These peptide fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, whereas current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe, which explicitly accounts for the noise underlying peptide fold changes. We derive data set-specific, intensity-dependent empirical error fold change distributions, which are used for individual weighing of peptide fold changes to detect differentially expressed proteins (DEPs).In a recently published proteome-wide benchmarking data set, MS-EmpiRe doubles the number of correctly identified DEPs at an estimated FDR cutoff compared with state-of-the-art tools. We additionally confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. We apply our method to diverse MS data sets and observe consistent increases in sensitivity with more than 1000 additional significant proteins in deep data sets, including a clinical study over multiple patients. At the same time, we observe that even the proteins classified as most insignificant by other methods but significant by MS-EmpiRe show very clear regulation on the peptide intensity level. MS-EmpiRe provides rapid processing (< 2 min for 6 LC-MS/MS runs (3 h gradients)) and is publicly available under github.com/zimmerlab/MS-EmpiRe with a manual including examples.


Asunto(s)
Espectrometría de Masas/métodos , Péptidos/análisis , Proteoma/análisis , Proteómica/métodos , Programas Informáticos , Enfermedad de Alzheimer/metabolismo , Benchmarking , Bases de Datos Factuales , Francisella/metabolismo , Proteínas Fúngicas/análisis , Células HeLa , Humanos , Enfermedad de Parkinson/metabolismo , Proteínas de Plantas/análisis , Reproducibilidad de los Resultados , Relación Señal-Ruido
4.
Bioinformatics ; 35(18): 3412-3420, 2019 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-30759193

RESUMEN

MOTIVATION: Several gene expression-based risk scores and subtype classifiers for breast cancer were developed to distinguish high- and low-risk patients. Evaluating the performance of these classifiers helps to decide which classifiers should be used in clinical practice for personal therapeutic recommendations. So far, studies that compared multiple classifiers in large independent patient cohorts mostly used microarray measurements. qPCR-based classifiers were not included in the comparison or had to be adapted to the different experimental platforms. RESULTS: We used a prospective study of 726 early breast cancer patients from seven certified German breast cancer centers. Patients were treated according to national guidelines and the expressions of 94 selected genes were measured by the mid-throughput qPCR platform Fluidigm. Clinical and pathological data including outcome over five years is available. Using these data, we could compare the performance of six classifiers (scmgene and research versions of PAM50, ROR-S, recurrence score, EndoPredict and GGI). Similar to other studies, we found a similar or even higher concordance between most of the classifiers and most were also able to differentiate high- and low-risk patients. The classifiers that were originally developed for microarray data still performed similarly using the Fluidigm data. Therefore, Fluidigm can be used to measure the gene expressions needed by several classifiers for a large cohort with little effort. In addition, we provide an interactive report of the results, which enables a transparent, in-depth comparison of classifiers and their prediction of individual patients. AVAILABILITY AND IMPLEMENTATION: https://services.bio.ifi.lmu.de/pia/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama , Humanos , Recurrencia Local de Neoplasia , Estudios Prospectivos , Reacción en Cadena en Tiempo Real de la Polimerasa , Riesgo
5.
J Proteome Res ; 18(4): 1553-1566, 2019 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-30793903

RESUMEN

Spectral libraries play a central role in the analysis of data-independent-acquisition (DIA) proteomics experiments. A main assumption in current spectral library tools is that a single characteristic intensity pattern (CIP) suffices to describe the fragmentation of a peptide in a particular charge state (peptide charge pair). However, we find that this is often not the case. We carry out a systematic evaluation of spectral variability over public repositories and in-house data sets. We show that spectral variability is widespread and partly occurs under fixed experimental conditions. Using clustering of preprocessed spectra, we derive a limited number of multiple characteristic intensity patterns (MCIPs) for each peptide charge pair, which allow almost complete coverage of our heterogeneous data set without affecting the false discovery rate. We show that a MCIP library derived from public repositories performs in most cases similar to a "custom-made" spectral library, which has been acquired under identical experimental conditions as the query spectra. We apply the MCIP approach to a DIA data set and observe a significant increase in peptide recognition. We propose the MCIP approach as an easy-to-implement addition to current spectral library search engines and as a new way to utilize the data stored in spectral repositories.


Asunto(s)
Cromatografía Liquida , Bases de Datos de Proteínas , Biblioteca de Péptidos , Proteómica/métodos , Espectrometría de Masas en Tándem , Algoritmos , Fragmentos de Péptidos/química , Fragmentos de Péptidos/genética
6.
Bioinformatics ; 33(12): 1837-1844, 2017 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-28165113

RESUMEN

MOTIVATION: The goal of many genome-wide experiments is to explain the changes between the analyzed conditions. Typically, the analysis is started with a set of differential genes DG and the first step is to identify the set of relevant biological processes BP . Current enrichment methods identify the involved biological process via statistically significant overrepresentation of differential genes in predefined sets, but do not further explain how the differential genes interact with each other or which other genes might be important for the enriched process. Other network-based methods determine subnetworks of interacting genes containing many differential genes, but do not employ process knowledge for a more focused analysis. RESULTS: RelExplain is a method to analyze a given biological process bp (e.g. identified by enrichment) in more detail by computing an explanation using the measured DG and a given network. An explanation is a subnetwork that contains the differential genes in the process bp and connects them in the best way given the experimental data using also genes that are not differential or not in bp . RelExplain takes into account the functional annotations of nodes and the edge consistency of the measurements. Explanations are compact networks of the relevant part of the bp and additional nodes that might be important for the bp . Our evaluation showed that RelExplain is better suited to retrieve manually curated subnetworks from unspecific networks than other algorithms. The interactive RelExplain tool allows to compute and inspect sub-optimal and alternative optimal explanations. AVAILABILITY AND IMPLEMENTATION: A webserver is available at https://services.bio.ifi.lmu.de/relexplain . CONTACT: berchtold@bio.ifi.lmu.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Redes y Vías Metabólicas , Programas Informáticos , Algoritmos , Fenómenos Biológicos , Neoplasias de la Mama/metabolismo , Humanos , Anotación de Secuencia Molecular/métodos
7.
Circ Res ; 119(9): 1030-1038, 2016 Oct 14.
Artículo en Inglés | MEDLINE | ID: mdl-27531933

RESUMEN

RATIONALE: Atheroprogression is a consequence of nonresolved inflammation, and currently a comprehensive overview of the mechanisms preventing resolution is missing. However, in acute inflammation, resolution is known to be orchestrated by a switch from inflammatory to resolving lipid mediators. Therefore, we hypothesized that lesional lipid mediator imbalance favors atheroprogression. OBJECTIVE: To understand the lipid mediator balance during atheroprogression and to establish an interventional strategy based on the delivery of resolving lipid mediators. METHODS AND RESULTS: Aortic lipid mediator profiling of aortas from Apoe-/- mice fed a high-fat diet for 4 weeks, 8 weeks, or 4 months revealed an expansion of inflammatory lipid mediators, Leukotriene B4 and Prostaglandin E2, and a concomitant decrease of resolving lipid mediators, Resolvin D2 (RvD2) and Maresin 1 (MaR1), during advanced atherosclerosis. Functionally, aortic Leukotriene B4 and Prostaglandin E2 levels correlated with traits of plaque instability, whereas RvD2 and MaR1 levels correlated with the signs of plaque stability. In a therapeutic context, repetitive RvD2 and MaR1 delivery prevented atheroprogression as characterized by halted expansion of the necrotic core and accumulation of macrophages along with increased fibrous cap thickness and smooth muscle cell numbers. Mechanistically, RvD2 and MaR1 induced a shift in macrophage profile toward a reparative phenotype, which secondarily stimulated collagen synthesis in smooth muscle cells. CONCLUSIONS: We present evidence for the imbalance between inflammatory and resolving lipid mediators during atheroprogression. Delivery of RvD2 and MaR1 successfully prevented atheroprogression, suggesting that resolving lipid mediators potentially represent an innovative strategy to resolve arterial inflammation.


Asunto(s)
Aterosclerosis/metabolismo , Aterosclerosis/prevención & control , Ácidos Docosahexaenoicos/metabolismo , Mediadores de Inflamación/metabolismo , Metabolismo de los Lípidos/fisiología , Animales , Aterosclerosis/etiología , Células Cultivadas , Dieta Alta en Grasa/efectos adversos , Progresión de la Enfermedad , Ácidos Docosahexaenoicos/administración & dosificación , Sistemas de Liberación de Medicamentos/métodos , Metabolismo de los Lípidos/efectos de los fármacos , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados
8.
BMC Bioinformatics ; 17: 45, 2016 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-26791995

RESUMEN

BACKGROUND: Enrichment analysis of gene expression data is essential to find functional groups of genes whose interplay can explain experimental observations. Numerous methods have been published that either ignore (set-based) or incorporate (network-based) known interactions between genes. However, the often subtle benefits and disadvantages of the individual methods are confusing for most biological end users and there is currently no convenient way to combine methods for an enhanced result interpretation. RESULTS: We present the EnrichmentBrowser package as an easily applicable software that enables (1) the application of the most frequently used set-based and network-based enrichment methods, (2) their straightforward combination, and (3) a detailed and interactive visualization and exploration of the results. The package is available from the Bioconductor repository and implements additional support for standardized expression data preprocessing, differential expression analysis, and definition of suitable input gene sets and networks. CONCLUSION: The EnrichmentBrowser package implements essential functionality for the enrichment analysis of gene expression data. It combines the advantages of set-based and network-based enrichment analysis in order to derive high-confidence gene sets and biological pathways that are differentially regulated in the expression data under investigation. Besides, the package facilitates the visualization and exploration of such sets and pathways.


Asunto(s)
Redes Reguladoras de Genes , Análisis por Micromatrices/métodos , Programas Informáticos , Bases de Datos Factuales , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN
9.
BMC Bioinformatics ; 16: 122, 2015 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-25928589

RESUMEN

BACKGROUND: Mapping of short sequencing reads is a crucial step in the analysis of RNA sequencing (RNA-seq) data. ContextMap is an RNA-seq mapping algorithm that uses a context-based approach to identify the best alignment for each read and allows parallel mapping against several reference genomes. RESULTS: In this article, we present ContextMap 2, a new and improved version of ContextMap. Its key novel features are: (i) a plug-in structure that allows easily integrating novel short read alignment programs with improved accuracy and runtime; (ii) context-based identification of insertions and deletions (indels); (iii) mapping of reads spanning an arbitrary number of exons and indels. ContextMap 2 using Bowtie, Bowtie 2 or BWA was evaluated on both simulated and real-life data from the recently published RGASP study. CONCLUSIONS: We show that ContextMap 2 generally combines similar or higher recall compared to other state-of-the-art approaches with significantly higher precision in read placement and junction and indel prediction. Furthermore, runtime was significantly lower than for the best competing approaches. ContextMap 2 is freely available at http://www.bio.ifi.lmu.de/ContextMap .


Asunto(s)
Algoritmos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Exones/genética , Humanos , Mutación INDEL/genética , Transcriptoma
10.
Nucleic Acids Res ; 41(18): 8452-63, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23873954

RESUMEN

Existing machine-readable resources for large-scale gene regulatory networks usually do not provide context information characterizing the activating conditions for a regulation and how targeted genes are affected. Although this information is essentially required for data interpretation, available networks are often restricted to not condition-dependent, non-quantitative, plain binary interactions as derived from high-throughput screens. In this article, we present a comprehensive Petri net based regulatory network that controls the diauxic shift in Saccharomyces cerevisiae. For 100 specific enzymatic genes, we collected regulations from public databases as well as identified and manually curated >400 relevant scientific articles. The resulting network consists of >300 multi-input regulatory interactions providing (i) activating conditions for the regulators; (ii) semi-quantitative effects on their targets; and (iii) classification of the experimental evidence. The diauxic shift network compiles widespread distributed regulatory information and is available in an easy-to-use machine-readable form. Additionally, we developed a browsable system organizing the network into pathway maps, which allows to inspect and trace the evidence for each annotated regulation in the model.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Ciclo del Ácido Cítrico/genética , Ácidos Grasos/metabolismo , Gluconeogénesis/genética , Modelos Genéticos , Fosfoenolpiruvato Carboxiquinasa (ATP)/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética
11.
FEBS Lett ; 598(6): 635-657, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38366111

RESUMEN

The response to proteotoxic stresses such as heat shock allows organisms to maintain protein homeostasis under changing environmental conditions. We asked what happens if an organism can no longer react to cytosolic proteotoxic stress. To test this, we deleted or depleted, either individually or in combination, the stress-responsive transcription factors Msn2, Msn4, and Hsf1 in Saccharomyces cerevisiae. Our study reveals a combination of survival strategies, which together protect essential proteins. Msn2 and 4 broadly reprogram transcription, triggering the response to oxidative stress, as well as biosynthesis of the protective sugar trehalose and glycolytic enzymes, while Hsf1 mainly induces the synthesis of molecular chaperones and reverses the transcriptional response upon prolonged mild heat stress (adaptation).


Asunto(s)
Proteínas de Saccharomyces cerevisiae , Factores de Transcripción , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Factores de Transcripción del Choque Térmico/genética , Factores de Transcripción del Choque Térmico/metabolismo , Proteínas de Choque Térmico/genética , Proteínas de Choque Térmico/metabolismo , Respuesta al Choque Térmico/genética , Estrés Proteotóxico , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo
12.
BMC Bioinformatics ; 13 Suppl 6: S9, 2012 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-22537048

RESUMEN

BACKGROUND: Sequencing of mRNA (RNA-seq) by next generation sequencing technologies is widely used for analyzing the transcriptomic state of a cell. Here, one of the main challenges is the mapping of a sequenced read to its transcriptomic origin. As a simple alignment to the genome will fail to identify reads crossing splice junctions and a transcriptome alignment will miss novel splice sites, several approaches have been developed for this purpose. Most of these approaches have two drawbacks. First, each read is assigned to a location independent on whether the corresponding gene is expressed or not, i.e. information from other reads is not taken into account. Second, in case of multiple possible mappings, the mapping with the fewest mismatches is usually chosen which may lead to wrong assignments due to sequencing errors. RESULTS: To address these problems, we developed ContextMap which efficiently uses information on the context of a read, i.e. reads mapping to the same expressed region. The context information is used to resolve possible ambiguities and, thus, a much larger degree of ambiguities can be allowed in the initial stage in order to detect all possible candidate positions. Although ContextMap can be used as a stand-alone version using either a genome or transcriptome as input, the version presented in this article is focused on refining initial mappings provided by other mapping algorithms. Evaluation results on simulated sequencing reads showed that the application of ContextMap to either TopHat or MapSplice mappings improved the mapping accuracy of both initial mappings considerably. CONCLUSIONS: In this article, we show that the context of reads mapping to nearby locations provides valuable information for identifying the best unique mapping for a read. Using our method, mappings provided by other state-of-the-art methods can be refined and alignment accuracy can be further improved. AVAILABILITY: http://www.bio.ifi.lmu.de/ContextMap.


Asunto(s)
Algoritmos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Animales , Genoma , Humanos , Ratones , Empalme del ARN , Transcriptoma
13.
Bioinformatics ; 27(13): i366-73, 2011 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-21685094

RESUMEN

MOTIVATION: Current gene set enrichment approaches do not take interactions and associations between set members into account. Mutual activation and inhibition causing positive and negative correlation among set members are thus neglected. As a consequence, inconsistent regulations and contextless expression changes are reported and, thus, the biological interpretation of the result is impeded. RESULTS: We analyzed established gene set enrichment methods and their result sets in a large-scale investigation of 1000 expression datasets. The reported statistically significant gene sets exhibit only average consistency between the observed patterns of differential expression and known regulatory interactions. We present Gene Graph Enrichment Analysis (GGEA) to detect consistently and coherently enriched gene sets, based on prior knowledge derived from directed gene regulatory networks. Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods. Secondly, GGEA yields result sets where a large fraction of relevant expression changes can be explained by nearby regulators, such as transcription factors, again improving on set-based methods. Thirdly, we demonstrate in additional case studies that GGEA can be applied to human regulatory pathways, where it sensitively detects very specific regulation processes, which are altered in tumors of the central nervous system. GGEA significantly increases the detection of gene sets where measured positively or negatively correlated expression patterns coincide with directed inducing or repressing relationships, thus facilitating further interpretation of gene expression data. AVAILABILITY: The method and accompanying visualization capabilities have been bundled into an R package and tied to a grahical user interface, the Galaxy workflow environment, that is running as a web server. CONTACT: Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias de Tejido Nervioso/genética , Neoplasias de Tejido Nervioso/metabolismo , Programas Informáticos , Algoritmos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Proteínas/genética , Transducción de Señal
14.
Bioinformatics ; 26(18): i474-81, 2010 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-20823310

RESUMEN

SUMMARY: The identification of good protein structure models and their appropriate ranking is a crucial problem in structure prediction and fold recognition. For many alignment methods, rescoring of alignment-induced models using structural information can improve the separation of useful and less useful models as compared with the alignment score. Vorescore, a template-based protein structure model rescoring system is introduced. The method scores the model structure against the template used for the modeling using Vorolign. The method works on models from different alignment methods and incorporates both knowledge from the prediction method and the rescoring. RESULTS: The performance of Vorescore is evaluated in a large-scale and difficult protein structure prediction context. We use different threading methods to create models for 410 targets, in three scenarios: (i) family members are contained in the template set; (ii) superfamily members (but no family members); and (iii) only fold members (but no family or superfamily members). In all cases Vorescore improves significantly (e.g. 40% on both Gotoh and HHalign at the fold level) on the model quality, and clearly outperforms the state-of-the-art physics-based model scoring system Rosetta. Moreover, Vorescore improves on other successful rescoring approaches such as Pcons and ProQ. In an additional experiment we add high-quality models based on structural alignments to the set, which allows Vorescore to improve the fold recognition rate by another 50%. AVAILABILITY: All models of the test set (about 2 million, 44 GB gzipped) are available upon request.


Asunto(s)
Conformación Proteica , Pliegue de Proteína , Proteínas/química , Programas Informáticos , Modelos Moleculares , Modelos Estructurales , Alineación de Secuencia , Homología de Secuencia de Aminoácido
15.
BMC Bioinformatics ; 11: 135, 2010 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-20233441

RESUMEN

BACKGROUND: MicroRNAs have been discovered as important regulators of gene expression. To identify the target genes of microRNAs, several databases and prediction algorithms have been developed. Only few experimentally confirmed microRNA targets are available in databases. Many of the microRNA targets stored in databases were derived from large-scale experiments that are considered not very reliable. We propose to use text mining of publication abstracts for extracting microRNA-gene associations including microRNA-target relations to complement current repositories. RESULTS: The microRNA-gene association database miRSel combines text-mining results with existing databases and computational predictions. Text mining enables the reliable extraction of microRNA, gene and protein occurrences as well as their relationships from texts. Thereby, we increased the number of human, mouse and rat miRNA-gene associations by at least three-fold as compared to e.g. TarBase, a resource for miRNA-gene associations. CONCLUSIONS: Our database miRSel offers the currently largest collection of literature derived miRNA-gene associations. Comprehensive collections of miRNA-gene associations are important for the development of miRNA target prediction tools and the analysis of regulatory networks. miRSel is updated daily and can be queried using a web-based interface via microRNA identifiers, gene and protein names, PubMed queries as well as gene ontology (GO) terms. miRSel is freely available online at http://services.bio.ifi.lmu.de/mirsel.


Asunto(s)
Minería de Datos/métodos , MicroARNs/genética , Programas Informáticos , Biología Computacional/métodos , Bases de Datos Genéticas , Internet , PubMed , Análisis de Secuencia de ARN
16.
Nucleic Acids Res ; 36(2): 550-8, 2008 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18055499

RESUMEN

Alternative splicing is thought to be one of the major sources for functional diversity in higher eukaryotes. Interestingly, when mapping splicing events onto protein structures, about half of the events affect structured and even highly conserved regions i.e. are non-trivial on the structure level. This has led to the controversial hypothesis that such splice variants result in nonsense-mediated mRNA decay or non-functional, unstructured proteins, which do not contribute to the functional diversity of an organism. Here we show in a comprehensive study on alternative splicing that proteins appear to be much more tolerant to structural deletions, insertions and replacements than previously thought. We find literature evidence that such non-trivial splicing isoforms exhibit different functional properties compared to their native counterparts and allow for interesting regulatory patterns on the protein network level. We provide examples that splicing events may represent transitions between different folds in the protein sequence-structure space and explain these links by a common genetic mechanism. Taken together, those findings hint to a more prominent role of splicing in protein structure evolution and to a different view of phenotypic plasticity of protein structures.


Asunto(s)
Empalme Alternativo , Evolución Molecular , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Bases de Datos de Proteínas , Exones , Modelos Moleculares , Análisis de Secuencia por Matrices de Oligonucleótidos , Pliegue de Proteína , Estructura Secundaria de Proteína
17.
BMC Struct Biol ; 9: 23, 2009 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-19374763

RESUMEN

BACKGROUND: SCOP and CATH are widely used as gold standards to benchmark novel protein structure comparison methods as well as to train machine learning approaches for protein structure classification and prediction. The two hierarchies result from different protocols which may result in differing classifications of the same protein. Ignoring such differences leads to problems when being used to train or benchmark automatic structure classification methods. Here, we propose a method to compare SCOP and CATH in detail and discuss possible applications of this analysis. RESULTS: We create a new mapping between SCOP and CATH and define a consistent benchmark set which is shown to largely reduce errors made by structure comparison methods such as TM-Align and has useful further applications, e.g. for machine learning methods being trained for protein structure classification. Additionally, we extract additional connections in the topology of the protein fold space from the orthogonal features contained in SCOP and CATH. CONCLUSION: Via an all-to-all comparison, we find that there are large and unexpected differences between SCOP and CATH w.r.t. their domain definitions as well as their hierarchic partitioning of the fold space on every level of the two classifications. A consistent mapping of SCOP and CATH can be exploited for automated structure comparison and classification. AVAILABILITY: Benchmark sets and an interactive SCOP-CATH browser are available at http://www.bio.ifi.lmu.de/SCOPCath.


Asunto(s)
Bases de Datos de Proteínas , Conformación Proteica , Proteínas/química , Biología Computacional/métodos , Bases de Datos de Proteínas/normas , Reconocimiento de Normas Patrones Automatizadas , Pliegue de Proteína , Estructura Terciaria de Proteína , Programas Informáticos
18.
Bioinformatics ; 24(16): i98-104, 2008 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-18689847

RESUMEN

MOTIVATION: Protein structure comparison exhibits differences and similarities of proteins and protein families and may help to elucidate protein sequence and structure evolution. Despite many methods to score protein structure similarity with and without flexibility and to align proteins accurately based on their structures, a meaningful evolutionary distance measure and alignment method which models the cost of mutations, insertions and deletions occurring in protein sequences on the structure level is still missing. RESULTS: Here, we introduce a new measure for protein structure similarity and propose a novel method called phenotypic plasticity method (PPM) which explicitly tries to model the evolutionary distance of two proteins on the structure level by measuring the cost of 'morphing' one structure into the other one. PPM aligns protein structures taking variations naturally observed in groups of structures ('phenotypic plasticity') into account while preserving the overall topological arrangement of the structures. The performance of PPM in detecting similarities between protein structures is evaluated against well-known structure classification methods on two benchmark sets. The larger set consists of more than 3.6 million structure pairs from the SCOP database which are also consistently classified in CATH. In the current parameterization, PPM already performs comparable or better than other methods such as TM-Align and Vorolign on those two sets according to various evaluation criteria showing that the method is able to reliably classify known protein structures, to detect their similarities and to compute accurate alignments despite phenotypic plasticity. AVAILABILITY: Executables are available upon request. Datasets and supplementary data (datasets and superpositions) can be accessed on http://www.bio. ifi.lmu.de/PPM.


Asunto(s)
Evolución Molecular , Modelos Químicos , Modelos Genéticos , Proteínas/genética , Proteínas/metabolismo , Alineación de Secuencia/métodos , Análisis de Secuencia/métodos , Algoritmos , Secuencia de Aminoácidos , Secuencia de Bases , Simulación por Computador , Variación Genética/genética , Modelos Moleculares , Datos de Secuencia Molecular , Fenotipo
19.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30821814

RESUMEN

The stress response in the model organisms Saccharomyces cerevisiae is a well-studied system for which many data sets are available. Already in 2000, it was discovered that yeast cells trigger a similar transcriptional response when different types of stress are applied. However, the exact regulatory mechanisms and differences between the different types of stress are still not understood. Here, we present the Yeast Environmental Stress database (YESdb), a database containing all high-throughput experiments measuring various kinds of stress in yeast. The goal of the database is to allow the user to execute complex, integrative analyses of selected data sets, e.g. the comparison of measurements of the same stress using different platforms or differences between strains, stress strengths or types of stress. The analyses can be visualized in various ways and can be compiled into interactive reports to summarize and communicate the results. The data sets are available as differential conditions (typically stressed vs control), which are grouped to time or concentration series when multiple measurements over time or concentrations are done in one experiment. An annotation ontology has been constructed to annotate the data sets with the type, duration and strength of the applied stress, the used strain and experimental platform as well as the publication date. These annotations can easily be combined to select all relevant data sets for an analysis. YESdb allows to construct and execute Petri net-based workflows to perform predefined and custom analyses. E.g. to compare two types of stress (e.g. salt vs oxidative stress), the corresponding data sets are selected from the database, the consistently changed genes are defined and combined and the shared genes are characterized by enrichment analysis. A broad collection of visualizations is available most of which are also interactive. The results of all analyses can be summarized in an interactive report. Visualizations of individual steps (transitions) of YESdb workflows can be automatically added to this report or customized visualizations as well as interpretive text can manually be added to the report. Overall, YESdb aims at making all published data sets on yeast stress immediately available and comparable for integrated analysis of data sets and sets of genes in order to identify and assess hypotheses and mechanisms.


Asunto(s)
Bases de Datos Factuales , Ambiente , Saccharomyces cerevisiae/fisiología , Estrés Fisiológico , Curaduría de Datos , Internet , Interfaz Usuario-Computador
20.
Biotechnol Biofuels ; 12: 243, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31636702

RESUMEN

BACKGROUND: One of the main obstacles preventing solventogenic clostridia from achieving higher yields in biofuel production is the toxicity of produced solvents. Unfortunately, regulatory mechanisms responsible for the shock response are poorly described on the transcriptomic level. Although the strain Clostridium beijerinckii NRRL B-598, a promising butanol producer, has been studied under different conditions in the past, its transcriptional response to a shock caused by butanol in the cultivation medium remains unknown. RESULTS: In this paper, we present a transcriptional response of the strain during a butanol challenge, caused by the addition of butanol to the cultivation medium at the very end of the acidogenic phase, using RNA-Seq. We resequenced and reassembled the genome sequence of the strain and prepared novel genome and gene ontology annotation to provide the most accurate results. When compared to samples under standard cultivation conditions, samples gathered during butanol shock represented a well-distinguished group. Using reference samples gathered directly before the addition of butanol, we identified genes that were differentially expressed in butanol challenge samples. We determined clusters of 293 down-regulated and 301 up-regulated genes whose expression was affected by the cultivation conditions. Enriched term "RNA binding" among down-regulated genes corresponded to the downturn of translation and the cluster contained a group of small acid-soluble spore proteins. This explained phenotype of the culture that had not sporulated. On the other hand, up-regulated genes were characterized by the term "protein binding" which corresponded to activation of heat-shock proteins that were identified within this cluster. CONCLUSIONS: We provided an overall transcriptional response of the strain C. beijerinckii NRRL B-598 to butanol shock, supplemented by auxiliary technologies, including high-pressure liquid chromatography and flow cytometry, to capture the corresponding phenotypic response. We identified genes whose regulation was affected by the addition of butanol to the cultivation medium and inferred related molecular functions that were significantly influenced. Additionally, using high-quality genome assembly and custom-made gene ontology annotation, we demonstrated that this settled terminology, widely used for the analysis of model organisms, could also be applied to non-model organisms and for research in the field of biofuels.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA