RESUMO
Let A denote an alphabet consisting of n types of letters. Given a sequence S of length L with v(i) letters of type i on A, to describe the compositional properties and combinatorial structure of S, we propose a new complexity function of S, called the reciprocal complexity of S, as C(S) = (i=1) product operator (n) (L/nv(i))(vi) Based on this complexity measure, an efficient algorithm is developed for classifying and analyzing simple segments of protein and nucleotide sequence databases associated with scoring schemes. The running time of the algorithm is nearly proportional to the sequence length. The program DSR corresponding to the algorithm was written in C++, associated with two parameters (window length and cutoff value) and a scoring matrix. Some examples regarding protein sequences illustrate how the method can be used to find regions. The first application of DSR is the masking of simple sequences for searching databases. Queries masked by DSR returned a manageable set of hits below the E-value cutoff score, which contained all true positive homologues. The second application is to study simple regions detected by the DSR program corresponding to known structural features of proteins. An extensive computational analysis has been made of protein sequences with known, physicochemically defined nonglobular segments. For the SWISS-PROT amino acid sequence database (Release 40.2 of 02-Nov-2001), we determine that the best parameters and the best BLOSUM matrix are, respectively, for automatic segmentation of amino acid sequences into nonglobular and globular regions by the DSR program: Window length k = 35, cutoff value b = 0.46, and the BLOSUM 62.5 matrix. The average "agreement accuracy (sensitivity)" of DSR segmentation for the SWISS-PROT database is 97.3%.
Assuntos
Proteínas da Matriz Extracelular , Serina-tRNA Ligase/química , Thermus thermophilus/enzimologia , Agrecanas , Algoritmos , Sequência de Aminoácidos , Colágeno Tipo I/química , Colágeno Tipo I/genética , Bases de Dados Factuais , Lectinas Tipo C , Dados de Sequência Molecular , Estrutura Molecular , Miocárdio/metabolismo , Cadeias Pesadas de Miosina/química , Cadeias Pesadas de Miosina/genética , Conformação Proteica , Proteoglicanas/química , Proteoglicanas/genética , Alinhamento de Sequência , Serina-tRNA Ligase/genéticaRESUMO
OBJECTIVE: To study the damage of DNA in rat bone marrow cells induced by mustard gas. METHOD: Male SD rats were randomly divided into six groups. Physiological saline, propylene glycol and mustard gas(0.2, 0.4, 0.8, 1.6 mg/kg) were given separately by i.p. injection. 5 rats in each group were killed after 0, 24, 48, 72 hours of exposure. The DNA damage in rat bone marrow cells was assayed by single cell gel electrophoresis (SCGE). RESULTS: There is no significant difference of DNA damage among all groups at 0 h(P > 0.05). The rates of DNA migration and the lengths of DNA migration of the rat bone marrow cells in propylene glycol group at 24, 48, 72 hours were 15.4% +/- 0.21%, 16.0% +/- 0.19%, 15.7% +/- 0.23% and (11.4 +/- 0.2), (13.5 +/- 0.3), (12.8 +/- 0.2) micron respectively, and they were significantly higher than those of physiological saline group at the same time(P < 0.05). The rates of DNA migration and the lengths of DNA migration of the rat bone marrow cells in mustard gas groups at 24, 48, 72 hours were significantly higher than those in physiological saline group and propylene glycol group at the same time(P < 0.05). CONCLUSION: Mustard gas could induce DNA damage in rat bone marrow cells. The damage was likely to rise as the dose increased and was time-dependent.
Assuntos
Células da Medula Óssea/efeitos dos fármacos , Dano ao DNA , Gás de Mostarda/toxicidade , Animais , Células da Medula Óssea/ultraestrutura , Ensaio Cometa , Relação Dose-Resposta a Droga , Masculino , Ratos , Ratos Sprague-Dawley , Fatores de TempoRESUMO
MOTIVATION: Algorithm development for finding typical patterns in sequences, especially multiple pseudo-repeats (pseudo-periodic regions), is at the core of many problems arising in biological sequence and structure analysis. In fact, one of the most significant features of biological sequences is their high quasi-repetitiveness. Variation in the quasi-repetitiveness of genomic and proteomic texts demonstrates the presence and density of different biologically important information. It is very important to develop sensitive automatic computational methods for the identification of pseudo-periodic regions of sequences through which we can infer, describe and understand biological properties, and seek precise molecular details of biological structures, dynamics, interactions and evolution. RESULTS: We develop a novel, powerful computational tool for partitioning a sequence to pseudo-periodic regions. The pseudo-periodic partition is defined as a partition, which intuitively has the minimal bias to some perfect-periodic partition of the sequence based on the evolutionary distance. We devise a quadratic time and space algorithm for detecting a pseudo-periodic partition for a given sequence, which actually corresponds to the shortest path in the main diagonal of the directed (acyclic) weighted graph constructed by the Smith-Waterman self-alignment of the sequence. We use several typical examples to demonstrate the utilization of our algorithm and software system in detecting functional or structural domains and regions of proteins. A big advantage of our software program is that there is a parameter, the granularity factor, associated with it and we can freely choose a biological sequence family as a training set to determine the best parameter. In general, we choose all repeats (including many pseudo-repeats) in the SWISS-PROT amino acid sequence database as a typical training set. We show that the granularity factor is 0.52 and the average agreement accuracy of pseudo-periodic partitions, detected by our software for all pseudo-repeats in the SWISS-PROT database, is as high as 97.6%.