Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Appl Microbiol Biotechnol ; 100(2): 969-85, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26454869

RESUMEN

Xylose is present with glucose in lignocellulosic streams available for valorisation to biochemicals. Saccharomyces cerevisiae has excellent characteristics as a host for the bioconversion, except that it strongly prefers glucose to xylose, and the co-consumption remains a challenge. Further, since xylose is not a natural substrate of S. cerevisiae, the regulatory response it induces in an engineered strain cannot be expected to have evolved for its utilisation. Xylose-induced effects on metabolism and gene expression during anaerobic growth of an engineered strain of S. cerevisiae on medium containing both glucose and xylose medium were quantified. The gene expression of S. cerevisiae with an XR-XDH pathway for xylose utilisation was analysed throughout the cultivation: at early cultivation times when mainly glucose was metabolised, at times when xylose was co-consumed in the presence of low glucose concentrations, and when glucose had been depleted and only xylose was being consumed. Cultivations on glucose as a sole carbon source were used as a control. Genome-scale dynamic flux balance analysis models were simulated to analyse the metabolic dynamics of S. cerevisiae. The simulations quantitatively estimated xylose-dependent flux dynamics and challenged the utilisation of the metabolic network. A relative increase in xylose utilisation was predicted to induce the bi-directionality of glycolytic flux and a redox challenge even at low glucose concentrations. Remarkably, xylose was observed to specifically delay the glucose-dependent repression of particular genes in mixed glucose-xylose cultures compared to glucose cultures. The delay occurred at a cultivation time when the metabolic flux activities were similar in the both cultures.


Asunto(s)
Disacáridos/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Xilosa/metabolismo , Anaerobiosis , Biomasa , Medios de Cultivo/química , Fermentación , Expresión Génica , Ingeniería Genética , Glucosa/metabolismo , Lignina/química , Redes y Vías Metabólicas/genética , Análisis por Micromatrices , Saccharomyces cerevisiae/crecimiento & desarrollo
2.
Appl Microbiol Biotechnol ; 100(17): 7549-63, 2016 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-27102126

RESUMEN

We describe here the identification and characterization of two novel enzymes belonging to the IlvD/EDD protein family, the D-xylonate dehydratase from Caulobacter crescentus, Cc XyDHT, (EC 4.2.1.82), and the L-arabonate dehydratase from Rhizobium leguminosarum bv. trifolii, Rl ArDHT (EC 4.2.1.25), that produce the corresponding 2-keto-3-deoxy-sugar acids. There is only a very limited amount of characterization data available on pentonate dehydratases, even though the enzymes from these oxidative pathways have potential applications with plant biomass pentose sugars. The two bacterial enzymes share 41 % amino acid sequence identity and were expressed and purified from Escherichia coli as homotetrameric proteins. Both dehydratases were shown to accept pentonate and hexonate sugar acids as their substrates and require Mg(2+) for their activity. Cc XyDHT displayed the highest activity on D-xylonate and D-gluconate, while Rl ArDHT functioned best on D-fuconate, L-arabonate and D-galactonate. The configuration of the OH groups at C2 and C3 position of the sugar acid were shown to be critical, and the C4 configuration also contributed substantially to the substrate recognition. The two enzymes were also shown to contain an iron-sulphur [Fe-S] cluster. Our phylogenetic analysis and mutagenesis studies demonstrated that the three conserved cysteine residues in the aldonic acid dehydratase group of IlvD/EDD family members, those of C60, C128 and C201 in Cc XyDHT, and of C59, C127 and C200 in Rl ArDHT, are needed for coordination of the [Fe-S] cluster. The iron-sulphur cluster was shown to be crucial for the catalytic activity (kcat) but not for the substrate binding (Km) of the two pentonate dehydratases.


Asunto(s)
Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Caulobacter crescentus/enzimología , Hidroliasas/genética , Hidroliasas/metabolismo , Rhizobium leguminosarum/enzimología , Secuencia de Aminoácidos , Arabinosa/metabolismo , Clonación Molecular , Escherichia coli/genética , Escherichia coli/metabolismo , Gluconatos/metabolismo , Alineación de Secuencia , Xilosa/metabolismo
3.
Appl Microbiol Biotechnol ; 100(16): 7203-22, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-27183995

RESUMEN

The genomes of hybrid organisms, such as lager yeast (Saccharomyces cerevisiae × Saccharomyces eubayanus), contain orthologous genes, the functionality and effect of which may differ depending on their origin and copy number. How the parental subgenomes in lager yeast contribute to important phenotypic traits such as fermentation performance, aroma production, and stress tolerance remains poorly understood. Here, three de novo lager yeast hybrids with different ploidy levels (allodiploid, allotriploid, and allotetraploid) were generated through hybridization techniques without genetic modification. The hybrids were characterized in fermentations of both high gravity wort (15 °P) and very high gravity wort (25 °P), which were monitored for aroma compound and sugar concentrations. The hybrid strains with higher DNA content performed better during fermentation and produced higher concentrations of flavor-active esters in both worts. The hybrid strains also outperformed both the parent strains. Genome sequencing revealed that several genes related to the formation of flavor-active esters (ATF1, ATF2¸ EHT1, EEB1, and BAT1) were present in higher copy numbers in the higher ploidy hybrid strains. A direct relationship between gene copy number and transcript level was also observed. The measured ester concentrations and transcript levels also suggest that the functionality of the S. cerevisiae- and S. eubayanus-derived gene products differs. The results contribute to our understanding of the complex molecular mechanisms that determine phenotypes in lager yeast hybrids and are expected to facilitate targeted strain development through interspecific hybridization.


Asunto(s)
Cerveza/microbiología , Quimera/genética , Etanol/metabolismo , Fermentación/genética , Saccharomyces cerevisiae/genética , Quimera/crecimiento & desarrollo , ADN de Hongos/genética , Ésteres/análisis , Hibridación Genética , Compuestos Orgánicos/análisis , Ploidias , Reacción en Cadena de la Polimerasa , Polimorfismo de Longitud del Fragmento de Restricción , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/metabolismo , Transcripción Genética/genética
4.
Metab Eng ; 31: 153-62, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26275749

RESUMEN

Isoprene is a naturally produced hydrocarbon emitted into the atmosphere by green plants. It is also a constituent of synthetic rubber and a potential biofuel. Microbial production of isoprene can become a sustainable alternative to the prevailing chemical production of isoprene from petroleum. In this work, sequence homology searches were conducted to find novel isoprene synthases. Candidate sequences were functionally expressed in Escherichia coli and the desired enzymes were identified based on an isoprene production assay. The activity of three enzymes was shown for the first time: expression of the candidate genes from Ipomoea batatas, Mangifera indica, and Elaeocarpus photiniifolius resulted in isoprene formation. The Ipomoea batatas isoprene synthase produced the highest amounts of isoprene in all experiments, exceeding the isoprene levels obtained by the previously known Populus alba and Pueraria montana isoprene synthases that were studied in parallel as controls.


Asunto(s)
Transferasas Alquil y Aril/aislamiento & purificación , Escherichia coli/genética , Transferasas Alquil y Aril/química , Transferasas Alquil y Aril/fisiología , Secuencia de Aminoácidos , Butadienos , Genoma Bacteriano , Hemiterpenos/biosíntesis , Datos de Secuencia Molecular , Pentanos , Homología de Secuencia
5.
PLoS Comput Biol ; 10(2): e1003465, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24516375

RESUMEN

We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.


Asunto(s)
Hongos/genética , Hongos/metabolismo , Genoma Fúngico , Redes y Vías Metabólicas , Algoritmos , Biomasa , Biotecnología , Biología Computacional , Evolución Molecular , Hongos/clasificación , Técnicas de Inactivación de Genes , Microbiología Industrial , Redes y Vías Metabólicas/genética , Modelos Biológicos , Modelos Genéticos , Modelos Estadísticos , Filogenia , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Especificidad de la Especie
6.
Appl Microbiol Biotechnol ; 99(22): 9439-47, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26264136

RESUMEN

An open reading frame CC1225 from the Caulobacter crescentus CB15 genome sequence belongs to the Gfo/Idh/MocA protein family and has 47 % amino acid sequence identity with the glucose-fructose oxidoreductase from Zymomonas mobilis (Zm GFOR). We expressed the ORF CC1225 in the yeast Saccharomyces cerevisiae and used a yeast strain expressing the gene coding for Zm GFOR as a reference. Cell extracts of strains overexpressing CC1225 (renamed as Cc aaor) showed some Zm GFOR type of activity, producing D-gluconate and D-sorbitol when a mixture of D-glucose and D-fructose was used as substrate. However, the activity in Cc aaor expressing strain was >100-fold lower compared to strains expressing Zm gfor. Interestingly, C. crescentus AAOR was clearly more efficient than the Zm GFOR in converting in vitro a single sugar substrate D-xylose (10 mM) to xylitol without an added cofactor, whereas this type of activity was very low with Zm GFOR. Furthermore, when cultured in the presence of D-xylose, the S. cerevisiae strain expressing Cc aaor produced nearly equal concentrations of D-xylonate and xylitol (12.5 g D-xylonate l(-1) and 11.5 g D-xylitol l(-1) from 26 g D-xylose l(-1)), whereas the control strain and strain expressing Zm gfor produced only D-xylitol (5 g l(-1)). Deletion of the gene encoding the major aldose reductase, Gre3p, did not affect xylitol production in the strain expressing Cc aaor, but decreased xylitol production in the strain expressing Zm gfor. In addition, expression of Cc aaor together with the D-xylonolactone lactonase encoding the gene xylC from C. crescentus slightly increased the final concentration and initial volumetric production rate of both D-xylonate and D-xylitol. These results suggest that C. crescentus AAOR is a novel type of oxidoreductase able to convert the single aldose substrate D-xylose to both its oxidized and reduced product.


Asunto(s)
Aldehído Reductasa/aislamiento & purificación , Aldehído Reductasa/metabolismo , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Azúcares Ácidos/metabolismo , Xilitol/metabolismo , Xilosa/metabolismo , Aldehído Reductasa/genética , Caulobacter crescentus/enzimología , Caulobacter crescentus/genética , Gluconatos/metabolismo , Glucosa/metabolismo , Oxidación-Reducción , Oxidorreductasas/genética , Oxidorreductasas/metabolismo , Filogenia , Saccharomyces cerevisiae/metabolismo , Sorbitol/metabolismo , Zymomonas/enzimología , Zymomonas/genética
7.
BMC Genomics ; 15: 763, 2014 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-25192596

RESUMEN

BACKGROUND: Production of D-xylonate by the yeast S. cerevisiae provides an example of bioprocess development for sustainable production of value-added chemicals from cheap raw materials or side streams. Production of D-xylonate may lead to considerable intracellular accumulation of D-xylonate and to loss of viability during the production process. In order to understand the physiological responses associated with D-xylonate production, we performed transcriptome analyses during D-xylonate production by a robust recombinant strain of S. cerevisiae which produces up to 50 g/L D-xylonate. RESULTS: Comparison of the transcriptomes of the D-xylonate producing and the control strain showed considerably higher expression of the genes controlled by the cell wall integrity (CWI) pathway and of some genes previously identified as up-regulated in response to other organic acids in the D-xylonate producing strain. Increased phosphorylation of Slt2 kinase in the D-xylonate producing strain also indicated that D-xylonate production caused stress to the cell wall. Surprisingly, genes encoding proteins involved in translation, ribosome structure and RNA metabolism, processes which are commonly down-regulated under conditions causing cellular stress, were up-regulated during D-xylonate production, compared to the control. The overall transcriptional responses were, therefore, very dissimilar to those previously reported as being associated with stress, including stress induced by organic acid treatment or production. Quantitative PCR analyses of selected genes supported the observations made in the transcriptomic analysis. In addition, consumption of ethanol was slower and the level of trehalose was lower in the D-xylonate producing strain, compared to the control. CONCLUSIONS: The production of organic acids has a major impact on the physiology of yeast cells, but the transcriptional responses to presence or production of different acids differs considerably, being much more diverse than responses to other stresses. D-Xylonate production apparently imposed considerable stress on the cell wall. Transcriptional data also indicated that activation of the PKA pathway occurred during D-xylonate production, leaving cells unable to adapt normally to stationary phase. This, together with intracellular acidification, probably contributes to cell death.


Asunto(s)
Pared Celular/metabolismo , Perfilación de la Expresión Génica/métodos , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/fisiología , Azúcares Ácidos/metabolismo , Regulación Fúngica de la Expresión Génica , Sistema de Señalización de MAP Quinasas , Proteínas Quinasas Activadas por Mitógenos/genética , Proteínas Quinasas Activadas por Mitógenos/metabolismo , Datos de Secuencia Molecular , Fosforilación , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ARN , Estrés Fisiológico , Xilosa/metabolismo
8.
BMC Biotechnol ; 14: 91, 2014 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-25344685

RESUMEN

BACKGROUND: Trichoderma reesei is known as a good producer of industrial proteins but has hitherto been less successful in the production of therapeutic proteins. In order to elucidate the bottlenecks of heterologous protein production, human α-galactosidase A (GLA) was chosen as a model therapeutic protein. Fusion partners were designed to compare the effects of secretion using a cellobiohydrolase I (CBHI) carrier and intracellular production using a gamma zein peptide from maize (ZERA) which accumulates inside the endoplasmic reticulum (ER). The two strategies were compared on the basis of expression levels, purification performance, enzymatic activity, bioreactor cultivations, and transcriptional profiling. RESULTS: Constructs were cloned into the cbh1 locus of the T. reesei strain Rut-C30. The secretion and intracellular strains produced 20 mg/l and 636 mg/l of GLA respectively. Purifications of secreted product were accomplished using Step-Tactin affinity columns and for intracellular product, a method was developed for gravity-based density separation and protein body solubilisation. The secreted protein had similar specific activity to that of the commercially available mammalian form. The intracellular version had 5-10-fold lower activity due to the enzymes incompatibility with alkaline pH. The secretion strain achieved 10% lower total biomass than either the parental or the intracellular strain. The patterns of gene induction for intracellular and parental strains were similar, whereas the secretion strain had a broader spectrum of gene expression level changes. Identification of the genes involved indicated strong secretion stress in the secretion strain and to a lesser extent also in intracellular production. Genes involved in the unfolded protein response (UPR) and ER-associated degradation were induced by GLA production, including; hac1, pdi1, prp1, cnx1, der1, and bap31. CONCLUSIONS: Active human α-galactosidase could most effectively be produced intracellularly in Trichoderma reesei at >0.5 g/l by avoidance of the extracellular environment, although purification was challenging due to specific activity losses. Strain analysis revealed that in addition to the issues with secreted proteases, the processes of secretion stress including UPR and ER degradation remain as bottlenecks for heterologous protein production. Genetic engineering to eliminate these bottlenecks is the logical path towards establishing a strain capable of producing sensitive heterologous proteins.


Asunto(s)
Ingeniería de Proteínas/métodos , alfa-Galactosidasa/genética , alfa-Galactosidasa/metabolismo , Humanos , Señales de Clasificación de Proteína , Transporte de Proteínas , Vías Secretoras , Trichoderma/genética
9.
Appl Microbiol Biotechnol ; 98(23): 9653-65, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25236800

RESUMEN

Four potential dehydrogenases identified through literature and bioinformatic searches were tested for L-arabonate production from L-arabinose in the yeast Saccharomyces cerevisiae. The most efficient enzyme, annotated as a D-galactose 1-dehydrogenase from the pea root nodule bacterium Rhizobium leguminosarum bv. trifolii, was purified from S. cerevisiae as a homodimeric protein and characterised. We named the enzyme as a L-arabinose/D-galactose 1-dehydrogenase (EC 1.1.1.-), Rl AraDH. It belongs to the Gfo/Idh/MocA protein family, prefers NADP(+) but uses also NAD(+) as a cofactor, and showed highest catalytic efficiency (k cat/K m) towards L-arabinose, D-galactose and D-fucose. Based on nuclear magnetic resonance (NMR) and modelling studies, the enzyme prefers the α-pyranose form of L-arabinose, and the stable oxidation product detected is L-arabino-1,4-lactone which can, however, open slowly at neutral pH to a linear L-arabonate form. The pH optimum for the enzyme was pH 9, but use of a yeast-in-vivo-like buffer at pH 6.8 indicated that good catalytic efficiency could still be expected in vivo. Expression of the Rl AraDH dehydrogenase in S. cerevisiae, together with the galactose permease Gal2 for L-arabinose uptake, resulted in production of 18 g of L-arabonate per litre, at a rate of 248 mg of L-arabonate per litre per hour, with 86 % of the provided L-arabinose converted to L-arabonate. Expression of a lactonase-encoding gene from Caulobacter crescentus was not necessary for L-arabonate production in yeast.


Asunto(s)
Arabinosa/metabolismo , Galactosa Deshidrogenasas/metabolismo , Rhizobium leguminosarum/enzimología , Saccharomyces cerevisiae/metabolismo , Azúcares Ácidos/metabolismo , Clonación Molecular , Coenzimas/metabolismo , Estabilidad de Enzimas , Galactosa Deshidrogenasas/química , Galactosa Deshidrogenasas/genética , Galactosa Deshidrogenasas/aislamiento & purificación , Expresión Génica , Concentración de Iones de Hidrógeno , Cinética , Datos de Secuencia Molecular , NAD/metabolismo , NADP/metabolismo , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/aislamiento & purificación , Proteínas Recombinantes/metabolismo , Rhizobium leguminosarum/metabolismo , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN
10.
Microb Cell Fact ; 11: 134, 2012 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-23035824

RESUMEN

BACKGROUND: Trichoderma reesei is a soft rot Ascomycota fungus utilised for industrial production of secreted enzymes, especially lignocellulose degrading enzymes. About 30 carbohydrate active enzymes (CAZymes) of T. reesei have been biochemically characterised. Genome sequencing has revealed a large number of novel candidates for CAZymes, thus increasing the potential for identification of enzymes with novel activities and properties. Plenty of data exists on the carbon source dependent regulation of the characterised hydrolytic genes. However, information on the expression of the novel CAZyme genes, especially on complex biomass material, is very limited. RESULTS: In this study, the CAZyme gene content of the T. reesei genome was updated and the annotations of the genes refined using both computational and manual approaches. Phylogenetic analysis was done to assist the annotation and to identify functionally diversified CAZymes. The analyses identified 201 glycoside hydrolase genes, 22 carbohydrate esterase genes and five polysaccharide lyase genes. Updated or novel functional predictions were assigned to 44 genes, and the phylogenetic analysis indicated further functional diversification within enzyme families or groups of enzymes. GH3 ß-glucosidases, GH27 α-galactosidases and GH18 chitinases were especially functionally diverse. The expression of the lignocellulose degrading enzyme system of T. reesei was studied by cultivating the fungus in the presence of different inducing substrates and by subjecting the cultures to transcriptional profiling. The substrates included both defined and complex lignocellulose related materials, such as pretreated bagasse, wheat straw, spruce, xylan, Avicel cellulose and sophorose. The analysis revealed co-regulated groups of CAZyme genes, such as genes induced in all the conditions studied and also genes induced preferentially by a certain set of substrates. CONCLUSIONS: In this study, the CAZyme content of the T. reesei genome was updated, the discrepancies between the different genome versions and published literature were removed and the annotation of many of the genes was refined. Expression analysis of the genes gave information on the enzyme activities potentially induced by the presence of the different substrates. Comparison of the expression profiles of the CAZyme genes under the different conditions identified co-regulated groups of genes, suggesting common regulatory mechanisms for the gene groups.


Asunto(s)
Lignina/metabolismo , Trichoderma/genética , Biomasa , Celulasas/clasificación , Celulasas/genética , Bases de Datos Factuales , Perfilación de la Expresión Génica , Genoma Fúngico , Glicósido Hidrolasas/genética , Glicósido Hidrolasas/metabolismo , Filogenia , Polisacárido Liasas/genética , Polisacárido Liasas/metabolismo , Especificidad por Sustrato
11.
BMC Genomics ; 11: 441, 2010 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-20642838

RESUMEN

BACKGROUND: Trichoderma reesei is the main industrial producer of cellulases and hemicellulases that are used to depolymerize biomass in a variety of biotechnical applications. Many of the production strains currently in use have been generated by classical mutagenesis. In this study we characterized genomic alterations in high-producing mutants of T. reesei by high-resolution array comparative genomic hybridization (aCGH). Our aim was to obtain genome-wide information which could be utilized for better understanding of the mechanisms underlying efficient cellulase production, and would enable targeted genetic engineering for improved production of proteins in general. RESULTS: We carried out an aCGH analysis of four high-producing strains (QM9123, QM9414, NG14 and Rut-C30) using the natural isolate QM6a as a reference. In QM9123 and QM9414 we detected a total of 44 previously undocumented mutation sites including deletions, chromosomal translocation breakpoints and single nucleotide mutations. In NG14 and Rut-C30 we detected 126 mutations of which 17 were new mutations not documented previously. Among these new mutations are the first chromosomal translocation breakpoints identified in NG14 and Rut-C30. We studied the effects of two deletions identified in Rut-C30 (a deletion of 85 kb in the scaffold 15 and a deletion in a gene encoding a transcription factor) on cellulase production by constructing knock-out strains in the QM6a background. Neither the 85 kb deletion nor the deletion of the transcription factor affected cellulase production. CONCLUSIONS: aCGH analysis identified dozens of mutations in each strain analyzed. The resolution was at the level of single nucleotide mutation. High-density aCGH is a powerful tool for genome-wide analysis of organisms with small genomes e.g. fungi, especially in studies where a large set of interesting strains is analyzed.


Asunto(s)
Celulasa/biosíntesis , Hibridación Genómica Comparativa/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Trichoderma/genética , Trichoderma/metabolismo , ADN de Hongos/genética , Genómica , Sondas de Oligonucleótidos/genética , Polimorfismo de Nucleótido Simple , Eliminación de Secuencia
12.
BMC Bioinformatics ; 8 Suppl 2: S11, 2007 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-17493249

RESUMEN

BACKGROUND: Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown. RESULTS: We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown. CONCLUSION: (i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Bases de Datos Genéticas , Evolución Molecular , Etiquetas de Secuencia Expresada , Genoma Viral/genética , Retroviridae/genética , Activación Viral/genética , Humanos , Cadenas de Markov , Retroviridae/clasificación , Especificidad de la Especie
13.
PLoS One ; 11(7): e0159302, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27441920

RESUMEN

In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities.


Asunto(s)
Proteínas Fúngicas/metabolismo , Aprendizaje Automático , Mapeo de Interacción de Proteínas , Saccharomyces cerevisiae/metabolismo , Vías Secretoras , Trichoderma/metabolismo , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Evolución Molecular , Proteínas Fúngicas/química , Genoma Fúngico , Mapas de Interacción de Proteínas , Curva ROC , Saccharomyces cerevisiae/genética
14.
Biotechnol Biofuels ; 9: 252, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27895706

RESUMEN

BACKGROUND: Trichoderma reesei is one of the main sources of biomass-hydrolyzing enzymes for the biotechnology industry. There is a need for improving its enzyme production efficiency. The use of metabolic modeling for the simulation and prediction of this organism's metabolism is potentially a valuable tool for improving its capabilities. An accurate metabolic model is needed to perform metabolic modeling analysis. RESULTS: A whole-genome metabolic model of T. reesei has been reconstructed together with metabolic models of 55 related species using the metabolic model reconstruction algorithm CoReCo. The previously published CoReCo method has been improved to obtain better quality models. The main improvements are the creation of a unified database of reactions and compounds and the use of reaction directions as constraints in the gap-filling step of the algorithm. In addition, the biomass composition of T. reesei has been measured experimentally to build and include a specific biomass equation in the model. CONCLUSIONS: The improvements presented in this work on the CoReCo pipeline for metabolic model reconstruction resulted in higher-quality metabolic models compared with previous versions. A metabolic model of T. reesei has been created and is publicly available in the BIOMODELS database. The model contains a biomass equation, reaction boundaries and uptake/export reactions which make it ready for simulation. To validate the model, we dem1onstrate that the model is able to predict biomass production accurately and no stoichiometrically infeasible yields are detected. The new T. reesei model is ready to be used for simulations of protein production processes.

15.
Int J Neural Syst ; 15(3): 163-79, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-16013088

RESUMEN

About 8 per cent of the human genome consists of human endogenous retroviral sequences (HERVs), which are remains from ancient infections. The HERVs may give rise to transcripts or affect the expression of human genes. The first step in understanding HERV function is to classify HERVs into families. In this work we study the relationships of existing HERV families and detect potentially new HERV families. A Median Self-Organizing Map (SOM), a SOM for non-vectorial data, is used to group and visualize a collection of 3661 HERVs. The SOM-based analysis is complemented with estimates of the reliability of the results. A novel trustworthiness visualization method is used to estimate which parts of the SOM visualization are reliable and which not. The reliability of extracted interesting HERV groups is verified by a bootstrap procedure suitable for SOM visualization-based analysis. The SOM detects a group of epsilonretroviral sequences and a group of ERV9, HERVW, and HUERSP3 sequences which suggests that ERV9 and HERVW sequences may have a common origin.


Asunto(s)
Inteligencia Artificial , Mapeo Cromosómico/métodos , ADN/genética , Retrovirus Endógenos/genética , Genoma Humano , Algoritmos , Humanos , Filogenia , Reproducibilidad de los Resultados
16.
BMC Bioinformatics ; 4: 48, 2003 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-14552657

RESUMEN

BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.


Asunto(s)
Gráficos por Computador/normas , Perfilación de la Expresión Génica/normas , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Animales , Análisis por Conglomerados , Gráficos por Computador/tendencias , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación de la Expresión Génica/genética , Regulación Fúngica de la Expresión Génica/genética , Humanos , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Homología de Secuencia de Ácido Nucleico
17.
BMC Syst Biol ; 8: 16, 2014 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-24528924

RESUMEN

BACKGROUND: Saccharomyces cerevisiae is able to adapt to a wide range of external oxygen conditions. Previously, oxygen-dependent phenotypes have been studied individually at the transcriptional, metabolite, and flux level. However, the regulation of cell phenotype occurs across the different levels of cell function. Integrative analysis of data from multiple levels of cell function in the context of a network of several known biochemical interaction types could enable identification of active regulatory paths not limited to a single level of cell function. RESULTS: The graph theoretical method called Enriched Molecular Path detection (EMPath) was extended to enable integrative utilization of transcription and flux data. The utility of the method was demonstrated by detecting paths associated with phenotype differences of S. cerevisiae under three different conditions of oxygen provision: 20.9%, 2.8% and 0.5%. The detection of molecular paths was performed in an integrated genome-scale metabolic and protein-protein interaction network. CONCLUSIONS: The molecular paths associated with the phenotype differences of S. cerevisiae under conditions of different oxygen provisions revealed paths of molecular interactions that could potentially mediate information transfer between processes that respond to the particular oxygen availabilities.


Asunto(s)
Biología Computacional/métodos , Fenotipo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Transcripción Genética , Ciclo Celular , Regulación hacia Abajo , Fermentación , Regulación Fúngica de la Expresión Génica , Oxígeno , Saccharomyces cerevisiae/citología
18.
PLoS One ; 7(3): e32235, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22461885

RESUMEN

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Sitios de Unión , Proteínas de la Membrana/química , Reproducibilidad de los Resultados
19.
PLoS One ; 4(4): e5179, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19365549

RESUMEN

BACKGROUND: Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis. PRINCIPAL FINDINGS: Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs. CONCLUSION: The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.


Asunto(s)
ADN Viral/genética , Retroviridae/genética , Secuencias Repetidas Terminales , Algoritmos , Animales , Secuencia de Bases , ADN Viral/química , Regulación Viral de la Expresión Génica , Genoma Humano , Genoma Viral , Humanos , Ratones , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Sistemas de Lectura Abierta , Zarigüeyas/genética , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA