RESUMO
Promoters are DNA sequences located upstream of the transcription start site of genes. In bacteria, the RNA polymerase enzyme requires additional subunits, called sigma factors (σ) to begin specific gene transcription in distinct environmental conditions. Currently, promoter prediction still poses many challenges due to the characteristics of these sequences. In this paper, the nucleotide content of Escherichia coli promoter sequences, related to five alternative σ factors, was analyzed by a machine learning technique in order to provide profiles according to the σ factor which recognizes them. For this, the clustering technique was applied since it is a viable method for finding hidden patterns on a data set. As a result, 20 groups of sequences were formed, and, aided by the Weblogo tool, it was possible to determine sequence profiles. These found patterns should be considered for implementing computational prediction tools. In addition, evidence was found of an overlap between the functions of the genes regulated by different σ factors, suggesting that DNA structural properties are also essential parameters for further studies.
Assuntos
Escherichia coli/enzimologia , Escherichia coli/genética , Regiões Promotoras Genéticas , Fator sigma/genética , Algoritmos , Sequência de Bases , RNA Polimerases Dirigidas por DNA/genética , RNA Polimerases Dirigidas por DNA/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Nucleotídeos/análise , Fator sigma/metabolismo , Transcrição GênicaRESUMO
The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
Assuntos
Big Data , Mineração de Dados , Computação em Nuvem , Mineração de Dados/métodos , Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
UNLABELLED: A whole genome contains not only coding regions, but also non-coding regions. These are located between the end of a given coding region and the beginning of the following coding region. For this reason, the information about gene regulation process underlies in intergenic regions. There is no easy way to obtain intergenic regions from current available databases. IntergenicDB was developed to integrate data of intergenic regions and their gene related information from NCBI databases. The main goal of INTERGENICDB is to offer friendly database for intergenic sequences of bacterial genomes. AVAILABILITY: http://intergenicdb.bioinfoucs.com/
RESUMO
The physiological and molecular effects of tobacco smoke in adult humans and the development of cancer have been well described. In contrast, how tobacco smoke affects embryonic development remains poorly understood. Morphological studies of the fetuses of smoking pregnant women have shown various physical deformities induced by constant fetal exposure to tobacco components, especially nicotine. In addition, nicotine exposure decreases fetal body weight and bone/cartilage growth in addition to decreasing cranial diameter and tibia length. Unfortunately, the molecular pathways leading to these morphological anomalies are not completely understood. In this study, we applied interactome data mining tools and small compound interaction networks to elucidate possible molecular pathways associated with the effects of tobacco smoke components during embryonic development in pregnant female smokers. Our analysis showed a relationship between nicotine and 50 additional harmful substances involved in a variety of biological process that can cause abnormal proliferation, impaired cell differentiation, and increased oxidative stress. We also describe how nicotine can negatively affect retinoic acid signaling and cell differentiation through inhibition of retinoic acid receptors. In addition, nicotine causes a stress reaction and/or a pro-inflammatory response that inhibits the agonistic action of retinoic acid. Moreover, we show that the effect of cigarette smoke on the developing fetus could represent systemic and aggressive impacts in the short term, causing malformations during certain stages of development. Our work provides the first approach describing how different tobacco constituents affect a broad range of biological process in human embryonic development.