Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Methods ; 59(1): S24-8, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23036331

RESUMEN

In recent years, gene fusions have gained significant recognition as biomarkers. They can assist treatment decisions, are seldom found in normal tissue and are detectable through Next-generation sequencing (NGS) of the transcriptome (RNA-seq). To transform the data provided by the sequencer into robust gene fusion detection several analysis steps are needed. Usually the first step is to map the sequenced transcript fragments (RNA-seq) to a reference genome. One standard application of this approach is to estimate expression and detect variants within known genes, e.g. SNPs and indels. In case of gene fusions, however, completely novel gene structures have to be detected. Here, we describe the detection of such gene fusion events based on our comprehensive transcript annotation (ElDorado). To demonstrate the utility of our approach, we extract gene fusion candidates from eight breast cancer cell lines, which we compare to experimentally verified gene fusions. We discuss several gene fusion events, like BCAS3-BCAS4 that was only detected in the breast cancer cell line MCF7. As supporting evidence we show that gene fusions occur more frequently in copy number enriched regions (CNV analysis). In addition, we present the Transcriptome Viewer (TViewer) a tool that allows to interactively visualize gene fusions. Finally, we support detected gene fusions through literature mining based annotations and network analyses. In conclusion, we present a platform that allows detecting gene fusions and supporting them through literature knowledge as well as rich visualization capabilities. This enables scientists to better understand molecular processes, biological functions and disease associations, which will ultimately lead to better biomedical knowledge for the development of biomarkers for diagnostics and therapies.


Asunto(s)
Mapeo Cromosómico/métodos , Proteínas de Fusión Oncogénica/genética , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de ADN
2.
Nucleic Acids Res ; 40(6): 2668-82, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22121224

RESUMEN

TDP-43 is linked to neurodegenerative diseases including frontotemporal dementia and amyotrophic lateral sclerosis. Mostly localized in the nucleus, TDP-43 acts in conjunction with other ribonucleoproteins as a splicing co-factor. Several RNA targets of TDP-43 have been identified so far, but its role(s) in pathogenesis remains unclear. Using Affymetrix exon arrays, we have screened for the first time for splicing events upon TDP-43 knockdown. We found alternative splicing of the ribosomal S6 kinase 1 (S6K1) Aly/REF-like target (SKAR) upon TDP-43 knockdown in non-neuronal and neuronal cell lines. Alternative SKAR splicing depended on the first RNA recognition motif (RRM1) of TDP-43 and on 5'-GA-3' and 5'-UG-3' repeats within the SKAR pre-mRNA. SKAR is a component of the exon junction complex, which recruits S6K1, thereby facilitating the pioneer round of translation and promoting cell growth. Indeed, we found that expression of the alternatively spliced SKAR enhanced S6K1-dependent signaling pathways and the translational yield of a splice-dependent reporter. Consistent with this, TDP-43 knockdown also increased translational yield and significantly increased cell size. This indicates a novel mechanism of deregulated translational control upon TDP-43 deficiency, which might contribute to pathogenesis of the protein aggregation diseases frontotemporal dementia and amyotrophic lateral sclerosis.


Asunto(s)
Empalme Alternativo , Proteínas de Unión al ADN/fisiología , Proteínas Nucleares/genética , Biosíntesis de Proteínas , Proteínas de Unión al ARN/fisiología , Línea Celular , Proteínas de Unión al ADN/antagonistas & inhibidores , Proteínas de Unión al ADN/metabolismo , Exones , Humanos , Proteínas Nucleares/metabolismo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas de Unión al ARN/antagonistas & inhibidores , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Secuencias Repetitivas de Ácidos Nucleicos , Transfección
3.
BMC Bioinformatics ; 8: 334, 2007 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-17850657

RESUMEN

BACKGROUND: Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional gene-condition-time dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for gene-condition-time datasets. RESULTS: In this work, we present the EDISA (Extended Dimension Iterative Signature Algorithm), a novel probabilistic clustering approach for 3D gene-condition-time datasets. Based on mathematical definitions of gene expression modules, the EDISA samples initial modules from the dataset which are then refined by removing genes and conditions until they comply with the module definition. A subsequent extension step ensures gene and condition maximality. We applied the algorithm to a synthetic dataset and were able to successfully recover the implanted modules over a range of background noise intensities. Analysis of microarray datasets has lead us to define three biologically relevant module types: 1) We found modules with independent response profiles to be the most prevalent ones. These modules comprise genes which are co-regulated under several conditions, yet with a different response pattern under each condition. 2) Coherent modules with similar responses under all conditions occurred frequently, too, and were often contained within these modules. 3) A third module type, which covers a response specific to a single condition was also detected, but rarely. All of these modules are essentially different types of biclusters. CONCLUSION: We successfully applied the EDISA to different 3D datasets. While previous studies were mostly aimed at detecting coherent modules only, our results show that coherent responses are often part of a more general module type with independent response profiles under different conditions. Our approach thus allows for a more comprehensive view of the gene expression response. After subsequent analysis of the resulting modules, the EDISA helped to shed light on the global organization of transcriptional control. An implementation of the algorithm is available at http://www-ra.informatik.uni-tuebingen.de/software/IAGEN/.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Modelos Biológicos , Familia de Multigenes/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Análisis por Conglomerados , Simulación por Computador , Programas Informáticos
4.
Genome Biol ; 15(3): R53, 2014 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-24667040

RESUMEN

BACKGROUND: There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. RESULTS: A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. CONCLUSIONS: The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.


Asunto(s)
Bases de Datos Genéticas/normas , Pruebas Genéticas/métodos , Genómica/métodos , Revisión de la Investigación por Pares , Análisis de Secuencia de ADN/métodos , Niño , Femenino , Organización de la Financiación , Pruebas Genéticas/economía , Pruebas Genéticas/normas , Genómica/economía , Genómica/normas , Cardiopatías Congénitas/diagnóstico , Cardiopatías Congénitas/genética , Humanos , Masculino , Miopatías Estructurales Congénitas/diagnóstico , Miopatías Estructurales Congénitas/genética , Análisis de Secuencia de ADN/economía , Análisis de Secuencia de ADN/normas
5.
PLoS One ; 5(11): e13876, 2010 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-21152420

RESUMEN

Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy.


Asunto(s)
Algoritmos , Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Factores de Transcripción/metabolismo , Secuencias de Aminoácidos/genética , Secuencia de Aminoácidos , Animales , Sitios de Unión/genética , Unión Competitiva , Biología Computacional/métodos , Proteínas de Unión al ADN/genética , Humanos , Ratones , Datos de Secuencia Molecular , Unión Proteica , Ratas , Reproducibilidad de los Resultados , Factores de Transcripción/genética
6.
Algorithms Mol Biol ; 5: 28, 2010 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-20579369

RESUMEN

BACKGROUND: Mass spectrometry (MS) based protein profiling has become one of the key technologies in biomedical research and biomarker discovery. One bottleneck in MS-based protein analysis is sample preparation and an efficient fractionation step to reduce the complexity of the biological samples, which are too complex to be analyzed directly with MS. Sample preparation strategies that reduce the complexity of tryptic digests by using immunoaffinity based methods have shown to lead to a substantial increase in throughput and sensitivity in the proteomic mass spectrometry approach. The limitation of using such immunoaffinity-based approaches is the availability of the appropriate peptide specific capture antibodies. Recent developments in these approaches, where subsets of peptides with short identical terminal sequences can be enriched using antibodies directed against short terminal epitopes, promise a significant gain in efficiency. RESULTS: We show that the minimal set of terminal epitopes for the coverage of a target protein list can be found by the formulation as a set cover problem, preceded by a filtering pipeline for the exclusion of peptides and target epitopes with undesirable properties. CONCLUSIONS: For small datasets (a few hundred proteins) it is possible to solve the problem to optimality with moderate computational effort using commercial or free solvers. Larger datasets, like full proteomes require the use of heuristics.

7.
BMC Syst Biol ; 3: 67, 2009 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-19566957

RESUMEN

BACKGROUND: Sensory proteins react to changing environmental conditions by transducing signals into the cell. These signals are integrated into core proteins that activate downstream target proteins such as transcription factors (TFs). This structure is referred to as a bow tie, and allows cells to respond appropriately to complex environmental conditions. Understanding this cellular processing of information, from sensory proteins (e.g., cell-surface proteins) to target proteins (e.g., TFs) is important, yet for many processes the signaling pathways remain unknown. RESULTS: Here, we present BowTieBuilder for inferring signal transduction pathways from multiple source and target proteins. Given protein-protein interaction (PPI) data signaling pathways are assembled without knowledge of the intermediate signaling proteins while maximizing the overall probability of the pathway. To assess the inference quality, BowTieBuilder and three alternative heuristics are applied to several pathways, and the resulting pathways are compared to reference pathways taken from KEGG. In addition, BowTieBuilder is used to infer a signaling pathway of the innate immune response in humans and a signaling pathway that potentially regulates an underlying gene regulatory network. CONCLUSION: We show that BowTieBuilder, given multiple source and/or target proteins, infers pathways with satisfactory recall and precision rates and detects the core proteins of each pathway.


Asunto(s)
Biología Computacional/métodos , Modelos Biológicos , Transducción de Señal , Ciclo Celular , Bases de Datos Genéticas , Redes Reguladoras de Genes , Humanos , Inmunidad Innata , Sistema de Señalización de MAP Quinasas , Modelos Moleculares , Conformación Proteica , Mapeo de Interacción de Proteínas , Proteínas/química , Proteínas/metabolismo , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/metabolismo
8.
BMC Syst Biol ; 3: 5, 2009 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-19144170

RESUMEN

BACKGROUND: To understand the dynamic behavior of cellular systems, mathematical modeling is often necessary and comprises three steps: (1) experimental measurement of participating molecules, (2) assignment of rate laws to each reaction, and (3) parameter calibration with respect to the measurements. In each of these steps the modeler is confronted with a plethora of alternative approaches, e. g., the selection of approximative rate laws in step two as specific equations are often unknown, or the choice of an estimation procedure with its specific settings in step three. This overall process with its numerous choices and the mutual influence between them makes it hard to single out the best modeling approach for a given problem. RESULTS: We investigate the modeling process using multiple kinetic equations together with various parameter optimization methods for a well-characterized example network, the biosynthesis of valine and leucine in C. glutamicum. For this purpose, we derive seven dynamic models based on generalized mass action, Michaelis-Menten and convenience kinetics as well as the stochastic Langevin equation. In addition, we introduce two modeling approaches for feedback inhibition to the mass action kinetics. The parameters of each model are estimated using eight optimization strategies. To determine the most promising modeling approaches together with the best optimization algorithms, we carry out a two-step benchmark: (1) coarse-grained comparison of the algorithms on all models and (2) fine-grained tuning of the best optimization algorithms and models. To analyze the space of the best parameters found for each model, we apply clustering, variance, and correlation analysis. CONCLUSION: A mixed model based on the convenience rate law and the Michaelis-Menten equation, in which all reactions are assumed to be reversible, is the most suitable deterministic modeling approach followed by a reversible generalized mass action kinetics model. A Langevin model is advisable to take stochastic effects into account. To estimate the model parameters, three algorithms are particularly useful: For first attempts the settings-free Tribes algorithm yields valuable results. Particle swarm optimization and differential evolution provide significantly better results with appropriate settings.


Asunto(s)
Algoritmos , Corynebacterium glutamicum/metabolismo , Leucina/biosíntesis , Redes y Vías Metabólicas/fisiología , Modelos Biológicos , Valina/biosíntesis , Corynebacterium glutamicum/fisiología , Cinética
9.
BMC Syst Biol ; 2: 39, 2008 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-18447902

RESUMEN

BACKGROUND: The development of complex biochemical models has been facilitated through the standardization of machine-readable representations like SBML (Systems Biology Markup Language). This effort is accompanied by the ongoing development of the human-readable diagrammatic representation SBGN (Systems Biology Graphical Notation). The graphical SBML editor CellDesigner allows direct translation of SBGN into SBML, and vice versa. For the assignment of kinetic rate laws, however, this process is not straightforward, as it often requires manual assembly and specific knowledge of kinetic equations. RESULTS: SBMLsqueezer facilitates exactly this modeling step via automated equation generation, overcoming the highly error-prone and cumbersome process of manually assigning kinetic equations. For each reaction the kinetic equation is derived from the stoichiometry, the participating species (e.g., proteins, mRNA or simple molecules) as well as the regulatory relations (activation, inhibition or other modulations) of the SBGN diagram. Such information allows distinctions between, for example, translation, phosphorylation or state transitions. The types of kinetics considered are numerous, for instance generalized mass-action, Hill, convenience and several Michaelis-Menten-based kinetics, each including activation and inhibition. These kinetics allow SBMLsqueezer to cover metabolic, gene regulatory, signal transduction and mixed networks. Whenever multiple kinetics are applicable to one reaction, parameter settings allow for user-defined specifications. After invoking SBMLsqueezer, the kinetic formulas are generated and assigned to the model, which can then be simulated in CellDesigner or with external ODE solvers. Furthermore, the equations can be exported to SBML, LaTeX or plain text format. CONCLUSION: SBMLsqueezer considers the annotation of all participating reactants, products and regulators when generating rate laws for reactions. Thus, for each reaction, only applicable kinetic formulas are considered. This modeling scheme creates kinetics in accordance with the diagrammatic representation. In contrast most previously published tools have relied on the stoichiometry and generic modulators of a reaction, thus ignoring and potentially conflicting with the information expressed through the process diagram. Additional material and the source code can be found at the project homepage (URL found in the Availability and requirements section).


Asunto(s)
Química Orgánica/métodos , Sistemas de Administración de Bases de Datos , Interfaz Usuario-Computador , Algoritmos , Redes Reguladoras de Genes , Hipermedia , Almacenamiento y Recuperación de la Información/métodos , Cinética , Redes y Vías Metabólicas , Modelos Biológicos , Modelos Químicos , Mapeo de Interacción de Proteínas/métodos , Transducción de Señal , Biología de Sistemas/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA