Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
BMC Bioinformatics ; 23(1): 171, 2022 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-35538405

RESUMEN

BACKGROUND: Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. RESULTS AND DISCUSSIONS: In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. CONCLUDING REMARKS: The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.


Asunto(s)
Archaea , Proteínas Arqueales , Archaea/genética , Proteínas Arqueales/química , Proteínas Arqueales/genética , Aprendizaje Automático , Regiones Promotoras Genéticas , Transcripción Genética
2.
Antonie Van Leeuwenhoek ; 115(8): 1009-1029, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35678932

RESUMEN

The genomes of two Penicillium strains were sequenced and studied in this study: strain 2HH was isolated from the digestive tract of Anobium punctatum beetle larva in 1979 and the cellulase hypersecretory strain S1M29, derived from strain 2HH by a long-term mutagenesis process. With these data, the strains were reclassified and insight is obtained on molecular features related to cellulase hyperproduction and the albino phenotype of the mutant. Both strains were previously identified as Penicillium echinulatum and this investigation indicated that these should be reclassified. Phylogenetic and phenotype data showed that these strains represent a new Penicillium species in series Oxalica, for which the name Penicillium ucsense is proposed here. Six additional strains (SFC101850, SFCP10873, SFCP10886, SFCP10931, SFCP10932 and SFCP10933) collected from the marine environment in the Republic of Korea were also classified as this species, indicating a worldwide distribution of this new taxon. Compared to the closely related strain Penicillium oxalicum 114-2, the composition of cell wall-associated proteins of P. ucsense 2HH shows five fewer chitinases, considerable differences in the number of proteins related to ß-D-glucan metabolism. The genomic comparison of 2HH and S1M29 highlighted single amino-acid substitutions in two major proteins (BGL2 and FlbA) that can be associated with the hyperproduction of cellulases. The study of melanin pathways shows that the S1M29 albino phenotype resulted from a single amino-acid substitution in the enzyme ALB1, a precursor of the 1,8-dihydroxynaphthalene (DHN)-melanin biosynthesis. Our study provides important knowledge towards understanding species distribution, molecular mechanisms, melanin production and cell wall biosynthesis of this new Penicillium species.


Asunto(s)
Celulasa , Penicillium , Celulasa/genética , Genómica , Melaninas/metabolismo , Penicillium/genética , Filogenia
3.
J Mol Recognit ; 32(5): e2770, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30458580

RESUMEN

Promoters are DNA sequences located upstream of the transcription start site of genes. In bacteria, the RNA polymerase enzyme requires additional subunits, called sigma factors (σ) to begin specific gene transcription in distinct environmental conditions. Currently, promoter prediction still poses many challenges due to the characteristics of these sequences. In this paper, the nucleotide content of Escherichia coli promoter sequences, related to five alternative σ factors, was analyzed by a machine learning technique in order to provide profiles according to the σ factor which recognizes them. For this, the clustering technique was applied since it is a viable method for finding hidden patterns on a data set. As a result, 20 groups of sequences were formed, and, aided by the Weblogo tool, it was possible to determine sequence profiles. These found patterns should be considered for implementing computational prediction tools. In addition, evidence was found of an overlap between the functions of the genes regulated by different σ factors, suggesting that DNA structural properties are also essential parameters for further studies.


Asunto(s)
Escherichia coli/enzimología , Escherichia coli/genética , Regiones Promotoras Genéticas , Factor sigma/genética , Algoritmos , Secuencia de Bases , ARN Polimerasas Dirigidas por ADN/genética , ARN Polimerasas Dirigidas por ADN/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Nucleótidos/análisis , Factor sigma/metabolismo , Transcripción Genética
4.
Biologicals ; 42(1): 22-8, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24172230

RESUMEN

The advent of modern high-throughput sequencing has made it possible to generate vast quantities of genomic sequence data. However, the processing of this volume of information, including prediction of gene-coding and regulatory sequences remains an important bottleneck in bioinformatics research. In this work, we integrated DNA duplex stability into the repertoire of a Neural Network (NN) capable of predicting promoter regions with augmented accuracy, specificity and sensitivity. We took our method beyond a simplistic analysis based on a single sigma subunit of RNA polymerase, incorporating the six main sigma-subunits of Escherichia coli. This methodology employed successfully re-discovered known promoter sequences recognized by E. coli RNA polymerase subunits σ(24), σ(28), σ(32), σ(38), σ(54) and σ(70), with highlighted accuracies for σ(28)- and σ(54)- dependent promoter sequences (values obtained were 80% and 78.8%, respectively). Furthermore, the discrimination of promoters according to the σ factor made it possible to extract functional commonalities for the genes expressed by each type of promoter. The DNA duplex stability rises as a distinctive feature which improves the recognition and classification of σ(28)- and σ(54)- dependent promoter sequences. The findings presented in this report underscore the usefulness of including DNA biophysical parameters into NN learning algorithms to increase accuracy, specificity and sensitivity in promoter beyond what is accomplished based on sequence alone.


Asunto(s)
ADN Bacteriano/genética , Escherichia coli/genética , Regiones Promotoras Genéticas , Factor sigma/genética
5.
NAR Genom Bioinform ; 6(1): lqae018, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38385146

RESUMEN

The decreasing cost of whole genome sequencing has produced high volumes of genomic information that require annotation. The experimental identification of promoter sequences, pivotal for regulating gene expression, is a laborious and cost-prohibitive task. To expedite this, we introduce the Comprehensive Directory of Bacterial Promoters (CDBProm), a directory of in-silico predicted bacterial promoter sequences. We first identified that an Extreme Gradient Boosting (XGBoost) algorithm would distinguish promoters from random downstream regions with an accuracy of 87%. To capture distinctive promoter signals, we generated a second XGBoost classifier trained on the instances misclassified in our first classifier. The predictor of CDBProm is then fed with over 55 million upstream regions from more than 6000 bacterial genomes. Upon finding potential promoter sequences in upstream regions, each promoter is mapped to the genomic data of the organism, linking the predicted promoter with its coding DNA sequence, and identifying the function of the gene regulated by the promoter. The collection of bacterial promoters available in CDBProm enables the quantitative analysis of a plethora of bacterial promoters. Our collection with over 24 million promoters is publicly available at https://aw.iimas.unam.mx/cdbprom/.

6.
Sci Rep ; 13(1): 1763, 2023 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-36720898

RESUMEN

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.


Asunto(s)
Inteligencia Artificial , Aprendizaje Automático , Archaea/genética , Regiones Promotoras Genéticas , Factores de Transcripción/genética
7.
Pharmacol Biochem Behav ; 223: 173523, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36731751

RESUMEN

Approximately two-thirds of patients with major depressive disorder (MDD) fail to respond to conventional antidepressants, suggesting that additional mechanisms are involved in the MDD pathophysiology. In this scenario, the glutamatergic system represents a promising therapeutic target for treatment-resistant depression. To our knowledge, this is the first study using semantic approach with systems biology to identify potential targets involved in the fast-acting antidepressant effects of ketamine and its enantiomers as well as identifying specific targets of (R)-ketamine. We performed a systematic review, followed by a semantic analysis and functional gene enrichment to identify the main biological processes involved in the therapeutic effects of these agents. Protein-protein interaction networks were constructed, and the genes exclusively regulated by (R)-ketamine were explored. We found that the regulation of α-Amino-3-Hydroxy-5-Methyl-4-Isoxazolepropionic Acid (AMPA) receptor and N-methyl-d-aspartate (NMDA) receptor subunits-Postsynaptic Protein 95 (PSD-95), Brain Derived Neurotrophic Factor (BDNF), and Tyrosine Receptor Kinase B (TrkB) are shared by the three-antidepressant agents, reinforcing the central role of the glutamatergic system and neurogenesis on its therapeutic effects. Differential regulation of Transforming Growth Factor Beta 1 (TGF-ß1) receptors-Mitogen-Activated Protein Kinases (MAPK's), Receptor Activator of Nuclear Factor-Kappa Beta Ligand (RANKL), and Serotonin Transporter (SERT) seems to be particularly involved in (R)-ketamine antidepressant effects. Our data helps further studies investigating the relationship between these targets and the mechanisms of (R)-ketamine and searching for other therapeutic compounds that share the regulation of these specific biomolecules. Ultimately, this study could contribute to improve the fast management of depressive-like symptoms with less detrimental side effects than ketamine and (S)-ketamine.


Asunto(s)
Trastorno Depresivo Mayor , Ketamina , Humanos , Ketamina/farmacología , Depresión/tratamiento farmacológico , Trastorno Depresivo Mayor/tratamiento farmacológico , Biología de Sistemas , Antidepresivos/farmacología , Receptores AMPA/metabolismo , Receptores de N-Metil-D-Aspartato/metabolismo
8.
Carbohydr Polym ; 320: 121176, 2023 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-37659785

RESUMEN

A co-metabolization of xylose and glucose by Schizophyllum commune 227E.32 wild mushroom for exopolysaccharide (EPS) production is presented. Cultivations performed with S. commune 227E.32 at different xylose concentrations demonstrated that the concentration of 50 g·L-1 of xylose achieved the highest EPS production, around 4.46 g·L-1. Scale-up in a stirred tank reactor (STR) was performed. 10 % inoculum showed the highest cost/benefit ratio regarding sugar conversion and EPS production (Y P/S = 0.90 g·g-1), achieving 1.82 g·L-1 of EPS. Isolation, purification, and characterization were conducted with EPS produced in flasks and STR. GC-MS analysis showed glucose as main monosaccharide constituents for both isolates. 13C NMR and HSQC-edited showed that both EPS isolated consisted of a ß-D-Glcp (1 â†’ 3) main chain, partially substituted at O-6 with nonreducing ß-D-Glcp ends on every third residue, similar to ß-D-glucan isolated from S. commune basidiomes known as schizophyllan (SPG). The Mw was determined by GPC to 1.5 × 106 Da (flasks) and 1.1 × 106 Da (STR). AFM topographs revealed a semi-flexible appearance of the ß-D-glucan, consistent with the triple helical structures adopted by SPG and overall contour length consistent with a high molar mass.


Asunto(s)
Glucosa , Schizophyllum , Xilosa , Glucanos , Monosacáridos
9.
Big Data ; 10(4): 279-297, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35394342

RESUMEN

The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.


Asunto(s)
Macrodatos , Minería de Datos , Nube Computacional , Minería de Datos/métodos , Aprendizaje Automático , Redes Neurales de la Computación
10.
Gene ; 822: 146345, 2022 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-35189252

RESUMEN

Penicillium echinulatum 2HH is an ascomycete well known for its production of cellulolytic enzymes. Understanding lignocellulolytic and sugar uptake systems is essential to obtain efficient fungi strains for the production of bioethanol. In this study we performed a genome-wide functional annotation of carbohydrate-active enzymes and sugar transporters involved in the lignocellulolytic system of P. echinulatum 2HH and S1M29 strains (wildtype and mutant, respectively) and eleven related fungi. Additionally, signal peptide and orthology prediction were carried out. We encountered a diverse assortment of cellulolytic enzymes in P. echinulatum, especially in terms of ß-glucosidases and endoglucanases. Other enzymes required for the breakdown of cellulosic biomass were also found, including cellobiohydrolases, lytic cellulose monooxygenases and cellobiose dehydrogenases. The S1M29 mutant, which is known to produce an increased cellulase activity, and the 2HH wild type strain of P. echinulatum did not show significant differences between their enzymatic repertoire. Nevertheless, we unveiled an amino acid substitution for a predicted intracellular ß-glucosidase of the mutant, which might contribute to hyperexpression of cellulases through a cellodextrin induction pathway. Most of the P. echinulatum enzymes presented orthologs in P. oxalicum 114-2, supporting the presence of highly similar cellulolytic mechanisms and a close phylogenetic relationship between these fungi. A phylogenetic analysis of intracellular ß-glucosidases and sugar transporters allowed us to identify several proteins potentially involved in the accumulation of intracellular cellodextrins. These may prove valuable targets in the genetic engineering of P. echinulatum focused on industrial cellulases production. Our study marks an important step in characterizing and understanding the molecular mechanisms employed by P. echinulatum in the enzymatic hydrolysis of lignocellulosic biomass.


Asunto(s)
Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Lignina/metabolismo , Penicillium/metabolismo , Sustitución de Aminoácidos , Transporte Biológico , Metabolismo de los Hidratos de Carbono , Celulosa/análogos & derivados , Dextrinas , Regulación Fúngica de la Expresión Génica , Anotación de Secuencia Molecular , Penicillium/genética , Filogenia , Azúcares/metabolismo
11.
PeerJ ; 10: e14487, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36530391

RESUMEN

Background: The severe form of COVID-19 can cause a dysregulated host immune syndrome that might lead patients to death. To understand the underlying immune mechanisms that contribute to COVID-19 disease we have examined 28 different biomarkers in two cohorts of COVID-19 patients, aiming to systematically capture, quantify, and algorithmize how immune signals might be associated to the clinical outcome of COVID-19 patients. Methods: The longitudinal concentration of 28 biomarkers of 95 COVID-19 patients was measured. We performed a dimensionality reduction analysis to determine meaningful biomarkers for explaining the data variability. The biomarkers were used as input of artificial neural network, random forest, classification and regression trees, k-nearest neighbors and support vector machines. Two different clinical cohorts were used to grant validity to the findings. Results: We benchmarked the classification capacity of two COVID-19 clinicals studies with different models and found that artificial neural networks was the best classifier. From it, we could employ different sets of biomarkers to predict the clinical outcome of COVID-19 patients. First, all the biomarkers available yielded a satisfactory classification. Next, we assessed the prediction capacity of each protein separated. With a reduced set of biomarkers, our model presented 94% accuracy, 96.6% precision, 91.6% recall, and 95% of specificity upon the testing data. We used the same model to predict 83% and 87% (recovered and deceased) of unseen data, granting validity to the results obtained. Conclusions: In this work, using state-of-the-art computational techniques, we systematically identified an optimal set of biomarkers that are related to a prediction capacity of COVID-19 patients. The screening of such biomarkers might assist in understanding the underlying immune response towards inflammatory diseases.


Asunto(s)
COVID-19 , Enfermedad Crítica , Humanos , Redes Neurales de la Computación , Biomarcadores
12.
J Theor Biol ; 287: 92-9, 2011 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-21827769

RESUMEN

Promoter sequences are well known to play a central role in gene expression. Their recognition and assignment in silico has not consolidated into a general bioinformatics method yet. Most previously available algorithms employ and are limited to σ70-dependent promoter sequences. This paper presents a new tool named BacPP, designed to recognize and predict Escherichia coli promoter sequences from background with specific accuracy for each σ factor (respectively, σ24, 86.9%; σ28, 92.8%; σ32, 91.5%; σ38, 89.3%, σ54, 97.0%; and σ70, 83.6%). BacPP is hence outstanding in recognition and assignment of sequences according to σ factor and provide circumstantial information about upstream gene sequences. This bioinformatic tool was developed by weighing rules extracted from neural networks trained with promoter sequences known to respond to a specific σ factor. Furthermore, when challenged with promoter sequences belonging to other enterobacteria BacPP maintained 76% accuracy overall.


Asunto(s)
Biología Computacional/métodos , Enterobacteriaceae/genética , Regiones Promotoras Genéticas/genética , Factor sigma/genética , Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica/genética , Redes Neurales de la Computación
13.
Microbiologyopen ; 10(5): e1230, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34713600

RESUMEN

The transcription machinery of archaea can be roughly classified as a simplified version of eukaryotic organisms. The basal transcription factor machinery binds to the TATA box found around 28 nucleotides upstream of the transcription start site; however, some transcription units lack a clear TATA box and still have TBP/TFB binding over them. This apparent absence of conserved sequences could be a consequence of sequence divergence associated with the upstream region, operon, and gene organization. Furthermore, earlier studies have found that a structural analysis gains more information compared with a simple sequence inspection. In this work, we evaluated and coded 3630 archaeal promoter sequences of three organisms, Haloferax volcanii, Thermococcus kodakarensis, and Sulfolobus solfataricus into DNA duplex stability, enthalpy, curvature, and bendability parameters. We also split our dataset into conserved TATA and degenerated TATA promoters to identify differences among these two classes of promoters. The structural analysis reveals variations in archaeal promoter architecture, that is, a distinctive signal is observed in the TFB, TBP, and TFE binding sites independently of these being TATA-conserved or TATA-degenerated. In addition, the promoter encountering method was validated with upstream regions of 13 other archaea, suggesting that there might be promoter sequences among them. Therefore, we suggest a novel method for locating promoters within the genome of archaea based on DNA energetic/structural features.


Asunto(s)
Archaea/genética , ADN de Archaea , Genoma Arqueal , Conformación de Ácido Nucleico , Regiones Promotoras Genéticas , TATA Box , Secuencia de Bases , Biología Computacional/métodos , Unión Proteica , Sitio de Iniciación de la Transcripción , Transcripción Genética
14.
OMICS ; 24(5): 300-309, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-31573385

RESUMEN

In the present postgenomic era, the capacity to generate big data has far exceeded the capacity to analyze, contextualize, and make sense of the data in clinical, biological, and ecological applications. There is a great unmet need for automation and algorithms to aid in analyses of big data, in biology in particular. In this context, it is noteworthy that computational methods used to analyze the regulation of bacterial gene expression have in the past focused mainly on Escherichia coli promoters due to the large amount of data available. The challenge and prospects of automation in prediction and recognition of bacteria sequences as promoters have not been properly addressed due to the promoter size and degenerate pattern. We report here an original neural network approach for recognition and prediction of Bacillus subtilis promoters. The artificial neural network used as input 767 B. subtilis promoter sequences, while also aiming at identifying the architecture, provides the most optimal prediction. Two multilayer perceptron neural network architectures offered the highest accuracy: one with five, and another with seven neurons in the hidden layer. Each architecture achieved an accuracy of 98.57% and 97.69%, respectively. The results collectively indicate the promise of the application of neural network approaches to the B. subtilis promoter recognition problem, while also suggesting the broader potential of algorithms for automation of data analyses in the postgenomic era.


Asunto(s)
Automatización/métodos , Bacillus subtilis/genética , Biología Computacional/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Regiones Promotoras Genéticas/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Escherichia coli/genética , Expresión Génica/genética , Genes Bacterianos/genética , Genoma Bacteriano/genética , Redes Neurales de la Computación
15.
Front Microbiol ; 11: 588263, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33193246

RESUMEN

Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 are well-known cellulase fungal producers. However, few studies addressing global mechanisms for gene regulation of these two important organisms are available so far. A recent finding that the 2HH wild-type is closely related to P. oxalicum leads to a combined study of these two species. Firstly, we provide a global gene regulatory network for P. echinulatum 2HH and P. oxalicum 114-2, based on TF-TG orthology relationships, considering three related species with well-known regulatory interactions combined with TFBSs prediction. The network was then analyzed in terms of topology, identifying TFs as hubs, and modules. Based on this approach, we explore numerous identified modules, such as the expression of cellulolytic and xylanolytic systems, where XlnR plays a key role in positive regulation of the xylanolytic system. It also regulates positively the cellulolytic system by acting indirectly through the cellodextrin induction system. This remarkable finding suggests that the XlnR-dependent cellulolytic and xylanolytic regulatory systems are probably conserved in both P. echinulatum and P. oxalicum. Finally, we explore the functional congruency on the genes clustered in terms of communities, where the genes related to cellular nitrogen, compound metabolic process and macromolecule metabolic process were the most abundant. Therefore, our approach allows us to confer a degree of accuracy regarding the existence of each inferred interaction.

16.
Data Brief ; 19: 264-270, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29892645

RESUMEN

This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.

17.
Gene ; 528(2): 277-81, 2013 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-23850726

RESUMEN

The influenza virus has been a challenge to science due to its ability to withstand new environmental conditions. Taking into account the development of virus sequence databases, computational approaches can be helpful to understand virus behavior over time. Furthermore, they can suggest new directions to deal with influenza. This work presents triplet entropy analysis as a potential phylodynamic tool to quantify nucleotide organization of viral sequences. The application of this measure to segments of hemagglutinin (HA) and neuraminidase (NA) of H1N1 and H3N2 virus subtypes has shown some variability effects along timeline, inferring about virus evolution. Sequences were divided by year and compared for virus subtype (H1N1 and H3N2). The nonparametric Mann-Whitney test was used for comparison between groups. Results show that differentiation in entropy precedes differentiation in GC content for both groups. Considering the HA fragment, both triplet entropy as well as GC concentration show intersection in 2009, year of the recent pandemic. Some conclusions about possible flu evolutionary lines were drawn.


Asunto(s)
Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Subtipo H1N1 del Virus de la Influenza A/genética , Subtipo H3N2 del Virus de la Influenza A/genética , Neuraminidasa/genética , Composición de Base , Evolución Molecular , Humanos , Modelos Genéticos , Filogenia , Análisis de Secuencia de ADN , Estadísticas no Paramétricas , Termodinámica
18.
Genet Mol Biol ; 34(2): 353-60, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21734842

RESUMEN

Promoters are DNA sequences located upstream of the gene region and play a central role in gene expression. Computational techniques show good accuracy in gene prediction but are less successful in predicting promoters, primarily because of the high number of false positives that reflect characteristics of the promoter sequences. Many machine learning methods have been used to address this issue. Neural Networks (NN) have been successfully used in this field because of their ability to recognize imprecise and incomplete patterns characteristic of promoter sequences. In this paper, NN was used to predict and recognize promoter sequences in two data sets: (i) one based on nucleotide sequence information and (ii) another based on stability sequence information. The accuracy was approximately 80% for simulation (i) and 68% for simulation (ii). In the rules extracted, biological consensus motifs were important parts of the NN learning process in both simulations.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA