Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters











Publication year range
1.
Biochemistry (Mosc) ; 85(7): 725-734, 2020 Jul.
Article in English | MEDLINE | ID: mdl-33040717

ABSTRACT

Spliceosomal introns, which have been found in most eukaryotic genes, are non-coding sequences excised from pre-mRNAs by a special complex called spliceosome during mRNA splicing. Introns occur in both protein- and RNA-coding genes and can be found in coding and untranslated gene regions. Because intron sequences vary greatly due to a high rate of polymorphism, the functions of intron had been for a long time associated only with alternative splicing, while intron evolution had been viewed not as an evolution of an individual genomic element, but rather considered within a framework of the evolution of the gene intron-exon structure. Here, we review the theories of intron origin, evolutionary events in the exon-intron structure, such as intron gain, loss, and sliding, intron functions known to date, and mechanisms by which changes in the intron features (length and phase) can affect the regulation of gene-mediated processes.


Subject(s)
Introns , Spliceosomes , Alternative Splicing , Animals , Conserved Sequence , Eukaryota/genetics , Evolution, Molecular , Exons , Humans , RNA/metabolism , RNA Splicing
2.
Bioinformatics ; 17(11): 1065-6, 2001 Nov.
Article in English | MEDLINE | ID: mdl-11724737

ABSTRACT

UNLABELLED: We present a software system BASIO that allows one to segment a sequence into regions with homogeneous nucleotide composition at a desired length scale. The system can work with arbitrary alphabet and therefore can be applied to various (e.g. protein) sequences. Several sequences of complete genomes of eukaryotes are used to demonstrate the efficiency of the software. AVAILABILITY: The BASIO suite is available for non-commercial users free of charge as a set of executables and accompanying segmentation scenarios from http://www.imb.ac.ru/compbio/basio. To obtain the source code, contact the authors.


Subject(s)
Genome , Software , Algorithms , Animals , Computational Biology , Genomics/statistics & numerical data , Plasmodium falciparum/genetics , Saccharomyces cerevisiae/genetics
3.
Protein Sci ; 10(9): 1801-10, 2001 Sep.
Article in English | MEDLINE | ID: mdl-11514671

ABSTRACT

The sequence and structural analysis of cadherins allow us to find sequence determinants-a few positions in sequences whose residues are characteristic and specific for the structures of a given family. Comparison of the five extracellular domains of classic cadherins showed that they share the same sequence determinants despite only a nonsignificant sequence similarity between the N-terminal domain and other extracellular domains. This allowed us to predict secondary structures and propose three-dimensional structures for these domains that have not been structurally analyzed previously. A new method of assigning a sequence to its proper protein family is suggested: analysis of sequence determinants. The main advantage of this method is that it is not necessary to know all or almost all residues in a sequence as required for other traditional classification tools such as BLAST, FASTA, and HMM. Using the key positions only, that is, residues that serve as the sequence determinants, we found that all members of the classic cadherin family were unequivocally selected from among 80,000 examined proteins. In addition, we proposed a model for the secondary structure of the cytoplasmic domain of cadherins based on the principal relations between sequences and secondary structure multialignments. The patterns of the secondary structure of this domain can serve as the distinguishing characteristics of cadherins.


Subject(s)
Cadherins/chemistry , Computational Biology/methods , Algorithms , Amino Acid Sequence , Classification/methods , Databases as Topic , Molecular Sequence Data , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Alignment , Sequence Homology, Amino Acid , Structure-Activity Relationship
4.
J Comput Biol ; 7(1-2): 215-31, 2000.
Article in English | MEDLINE | ID: mdl-10890398

ABSTRACT

We present a new approach to DNA segmentation into compositionally homogeneous blocks. The Bayesian estimator, which is applicable for both short and long segments, is used to obtain the measure of homogeneity. An exact optimal segmentation is found via the dynamic programming technique. After completion of the segmentation procedure, the sequence composition on different scales can be analyzed with filtration of boundaries via the partition function approach.


Subject(s)
Bayes Theorem , DNA/genetics , Sequence Analysis, DNA/statistics & numerical data , Algorithms , Base Composition , Base Sequence , Biometry , DNA, Bacterial/genetics , DNA, Fungal/genetics , Escherichia coli/genetics , Genome, Bacterial , Genome, Fungal , Genome, Human , Humans , Likelihood Functions , Molecular Sequence Data , Pattern Recognition, Automated , Probability , Saccharomyces cerevisiae/genetics
5.
Nucleic Acids Res ; 27(14): 2981-9, 1999 Jul 15.
Article in English | MEDLINE | ID: mdl-10390542

ABSTRACT

Recognition of transcription regulation sites (operators) is a hard problem in computational molecular biology. In most cases, small sample size and low degree of sequence conservation preclude the construction of reliable recognition rules. We suggest an approach to this problem based on simultaneous analysis of several related genomes. It appears that as long as a gene coding for a transcription regulator is conserved in the compared bacterial genomes, the regulation of the respective group of genes (regulons) also tends to be maintained. Thus a gene can be confidently predicted to belong to a particular regulon in case not only itself, but also its orthologs in other genomes have candidate operators in the regulatory regions. This provides for a greater sensitivity of operator identification as even relatively weak signals are likely to be functionally relevant when conserved. We use this approach to analyze the purine (PurR), arginine (ArgR) and aromatic amino acid (TrpR and TyrR) regulons of Escherichia coli and Haemophilus influenzae. Candidate binding sites in regulatory regions of the respective H.influenzae genes are identified, a new family of purine transport proteins predicted to belong to the PurR regulon is described, and probable regulation of arginine transport by ArgR is demonstrated. Differences in the regulation of some orthologous genes in E.coli and H.influenzae, in particular the apparent lack of the autoregulation of the purine repressor gene in H.influenzae, are demonstrated.


Subject(s)
Computational Biology , Gene Expression Regulation, Bacterial , Genome, Bacterial , Transcription, Genetic/genetics , Arginine/genetics , Arginine/metabolism , Bacterial Proteins/genetics , Base Sequence , Binding Sites , Carrier Proteins/genetics , Conserved Sequence/genetics , Escherichia coli/genetics , Genes, Bacterial/genetics , Haemophilus influenzae/genetics , Operon/genetics , Phylogeny , Purines/metabolism , Regulon/genetics , Response Elements/genetics , Tryptophan/genetics , Tyrosine/genetics
6.
Genomics ; 51(3): 332-9, 1998 Aug 01.
Article in English | MEDLINE | ID: mdl-9721203

ABSTRACT

An important and still unsolved problem in gene prediction is designing an algorithm that not only predicts genes but estimates the quality of individual predictions as well. Since experimental biologists are interested mainly in the reliability of individual predictions (rather than in the average reliability of an algorithm) we attempted to develop a gene recognition algorithm that guarantees a certain quality of predictions. We demonstrate here that the similarity level with a related protein is a reliable quality estimator for the spliced alignment approach to gene recognition. We also study the average performance of the spliced alignment algorithm for different targets on a complete set of human genomic sequences with known relatives and demonstrate that the average performance of the method remains high even for very distant targets. Using plant, fungal, and prokaryotic target proteins for recognition of human genes leads to accurate predictions with 95, 93, and 91% correlation coefficient, respectively. For target proteins with similarity score above 60%, not only the average correlation coefficient is very high (97% and up) but also the quality of individual predictions is guaranteed to be at least 82%. It indicates that for this level of similarity the worst case performance of the spliced alignment algorithm is better than the average case performance of many statistical gene recognition methods.


Subject(s)
Genome, Human , Sequence Alignment , Sequence Analysis, DNA/methods , Algorithms , DNA/chemistry , Databases as Topic , Exons/genetics , Humans , Proteins/chemistry , RNA Splicing/genetics , Software
7.
Bioinformatics ; 14(1): 14-9, 1998.
Article in English | MEDLINE | ID: mdl-9520497

ABSTRACT

MOTIVATION: Gene annotation is the final goal of gene prediction algorithms. However, these algorithms frequently make mistakes and therefore the use of gene predictions for sequence annotation is hardly possible. As a result, biologists are forced to conduct time-consuming gene identification experiments by designing appropriate PCR primers to test cDNA libraries or applying RT-PCR, exon trapping/amplification, or other techniques. This process frequently amounts to 'guessing' PCR primers on top of unreliable gene predictions and frequently leads to wasting of experimental efforts. RESULTS: The present paper proposes a simple and reliable algorithm for experimental gene identification which bypasses the unreliable gene prediction step. Studies of the performance of the algorithm on a sample of human genes indicate that an experimental protocol based on the algorithm's predictions achieves an accurate gene identification with relatively few PCR primers. Predictions of PCR primers may be used for exon amplification in preliminary mutation analysis during an attempt to identify a gene responsible for a disease. We propose a simple approach to find a short region from a genomic sequence that with high probability overlaps with some exon of the gene. The algorithm is enhanced to find one or more segments that are probably contained in the translated region of the gene and can be used as PCR primers to select appropriate clones in cDNA libraries by selective amplification. The algorithm is further extended to locate a set of PCR primers that uniformly cover all translated regions and can be used for RT-PCR and further sequencing of (unknown) mRNA.


Subject(s)
Algorithms , Genes , Software , Arabidopsis , DNA Primers , Humans , Open Reading Frames , Polymerase Chain Reaction
8.
Comput Chem ; 21(4): 229-35, 1997.
Article in English | MEDLINE | ID: mdl-9440930

ABSTRACT

Recognition of genes via exon assembly approaches leads naturally to the use of dynamic programming. We consider the general graph-theoretical formulation of the exon assembly problem and analyze in detail some specific variants: multicriterial optimization in the case of non-linear gene-scoring functions; context-dependent schemes for scoring exons and related procedures for exon filtering; and highly specific recognition of arbitrary gene segments, oligonucleotide probes and polymerase chain reaction (PCR) primers.


Subject(s)
Exons , Genetic Techniques , Models, Genetic , Base Sequence , DNA/chemistry , DNA/genetics , DNA Primers , Mathematics , Molecular Sequence Data , Oligonucleotide Probes , Polymerase Chain Reaction , Reading Frames , Software
9.
J Comput Biol ; 3(2): 223-34, 1996.
Article in English | MEDLINE | ID: mdl-8811484

ABSTRACT

A new approach to computer-assisted gene recognition in higher eukaryote DNA is suggested. It allows one to use not only linear functions for scoring structures, but all functions satisfying natural monotonicity conditions. The algorithm constructs the set of structures guaranteed to contain an optimal structure for every function. So, it uncouples the time-consuming step of generation of this set from the fast step of structure scoring, thus making it simple to experiment with different functions. One particular scoring function, taking into account only codon usage and positional nucleotide frequencies of the splicing sites, has been implemented in the Genome Recognition and Exon Assembly Tool program, and has been tested on an independent sample of human genes, yielding 88% sensitivity and 79% specificity.


Subject(s)
DNA/genetics , Genes , Genetic Techniques , Software , Algorithms , Evaluation Studies as Topic , Humans , Sensitivity and Specificity
10.
Comput Appl Biosci ; 11(4): 423-6, 1995 Aug.
Article in English | MEDLINE | ID: mdl-8521051

ABSTRACT

The SAMSON package is a tool for advanced analysis of primary DNA, RNA and protein structures. The package consists of 16 programs performing statistical analysis and comparison of biopolymer sequences, search for homologies, translation of DNA and RNA sequences into amino acid sequences, splicing of RNA sequences and restriction map construction, recognition of functionally related sites in biopolymer molecules, textual analysis of DNA and RNA regulatory sites and prediction of intermolecular hybridization sites in DNA and RNA molecules.


Subject(s)
Biopolymers/chemistry , Software , Algorithms , Biopolymers/genetics , DNA/chemistry , DNA/genetics , Molecular Structure , Pattern Recognition, Automated , Proteins/chemistry , Proteins/genetics , RNA/chemistry , RNA/genetics , Sequence Alignment , Sequence Analysis
11.
Biosystems ; 30(1-3): 1-19, 1993.
Article in English | MEDLINE | ID: mdl-7690608

ABSTRACT

A comparative analysis of some effective algorithms widely used in analysis, computation and comparison of chain molecules is presented. A notion of a stream in an oriented hypergraph is introduced, which generalizes a notion of a path in a graph. All considered algorithms looking over exponential sets of structures in polynomial time can be described as variants of a general algorithm of analysis of paths in graphs and of streams in oriented hypergraphs.


Subject(s)
Algorithms , Biopolymers , Amino Acid Sequence , Biophysical Phenomena , Biophysics , DNA/chemistry , Mathematics , Models, Chemical , Molecular Sequence Data , Nucleic Acid Conformation , Peptides/chemistry , Protein Conformation , RNA/chemistry , Software
13.
Comput Appl Biosci ; 8(1): 57-64, 1992 Feb.
Article in English | MEDLINE | ID: mdl-1568127

ABSTRACT

A new approach to search for common patterns in many sequences is presented. The idea is that one sequence from the set of sequences to be compared is considered as a 'basic' one and all its similarities with other sequences are found. Multiple similarities are then reconstructed using these data. This approach allows one to search for similar segments which can differ in both substitutions and deletions/insertions. These segments can be situated at different positions in various sequences. No regions of complete or strong similarity within the segments are required. The other parts of the sequences can have no similarity at all. The only requirement is that the similar segments can be found in all the sequences (or in the majority of them, given the common segments are present in the basic sequence). Working time of an algorithm presented is proportional to n.L2 when n sequences of length L are analyzed. The algorithm proposed is implemented as programs for the IBM-PC and IBM/370. Its applications to the analysis of biopolymer primary structures as well as the dependence of the results on the choice of basic sequence are discussed.


Subject(s)
Amino Acid Sequence , Base Sequence , Sequence Alignment/methods , Software , Algorithms , Humans , Molecular Sequence Data , Pattern Recognition, Automated , Sequence Alignment/statistics & numerical data
14.
Biochimie ; 74(2): 187-94, 1992 Feb.
Article in English | MEDLINE | ID: mdl-1581394

ABSTRACT

The primary structure of the Citrus ichangensis satellite DNA repeating unit has been estimated. The repeat is 181 bp long and contains four pentanucleotides of adenine residues. Oligomer forms of the stDNA repeating unit were detected by a partial hydrolysis of the C ichangensis stDNA by BspI restriction endonuclease. Experiments on comparative mobility of oligomers in agarose and polyacrylamide gels evidenced a certain retardation of those in polyacrylamide gel indicating to a slight bend in the repeating unit. The BEN computer program [9] was employed to calculate the spatial positions of monomer and oligomer axes of the satellite DNA repeating unit of Citrus ichangensis, mouse and African green monkey, and to plot their two-dimensional projections. The bends in the monomer for higher oligomer form proved to result in a hypothetical solenoid-like structure, termed coiled double helix (CDH).


Subject(s)
DNA, Satellite/chemistry , Plants/genetics , Base Sequence , Macromolecular Substances , Molecular Sequence Data , Repetitive Sequences, Nucleic Acid , Sequence Homology, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL