Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 15(5): e0233978, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32470086

RESUMO

Intronic gene regions are mostly considered in the scope of gene expression regulation, such as alternative splicing. However, relations between basic statistical properties of introns are much rarely studied in detail, despite vast available data. Particularly, little is known regarding the relationship between the intron length and the intron phase. Intron phase distribution is significantly different at different intron length thresholds. In this study, we performed GO enrichment analysis of gene sets with a particular intron phase at varying intron length thresholds using a list of 13823 orthologous human-mouse gene pairs. We found a specific group of 153 genes with phase 1 introns longer than 50 kilobases that were specifically expressed in brain, functionally related to synaptic signaling, and strongly associated with schizophrenia and other mental disorders. We propose that the prevalence of long phase 1 introns arises from the presence of the signal peptide sequence and is connected with 1-1 exon shuffling.


Assuntos
Encéfalo/metabolismo , Íntrons/genética , Animais , Ontologia Genética , Humanos , Camundongos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
2.
Algorithms Mol Biol ; 6(1): 25, 2011 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-22032267

RESUMO

BACKGROUND: Algorithms of sequence alignment are the key instruments for computer-assisted studies of biopolymers. Obviously, it is important to take into account the "quality" of the obtained alignments, i.e. how closely the algorithms manage to restore the "gold standard" alignment (GS-alignment), which superimposes positions originating from the same position in the common ancestor of the compared sequences. As an approximation of the GS-alignment, a 3D-alignment is commonly used not quite reasonably. Among the currently used algorithms of a pair-wise alignment, the best quality is achieved by using the algorithm of optimal alignment based on affine penalties for deletions (the Smith-Waterman algorithm). Nevertheless, the expedience of using local or global versions of the algorithm has not been studied. RESULTS: Using model series of amino acid sequence pairs, we studied the relative "quality" of results produced by local and global alignments versus (1) the relative length of similar parts of the sequences (their "cores") and their nonhomologous parts, and (2) relative positions of the core regions in the compared sequences. We obtained numerical values of the average quality (measured as accuracy and confidence) of the global alignment method and the local alignment method for evolutionary distances between homologous sequence parts from 30 to 240 PAM and for the core length making from 10% to 70% of the total length of the sequences for all possible positions of homologous sequence parts relative to the centers of the sequences. CONCLUSION: We revealed criteria allowing to specify conditions of preferred applicability for the local and the global alignment algorithms depending on positions and relative lengths of the cores and nonhomologous parts of the sequences to be aligned. It was demonstrated that when the core part of one sequence was positioned above the core of the other sequence, the global algorithm was more stable at longer evolutionary distances and larger nonhomologous parts than the local algorithm. On the contrary, when the cores were positioned asymmetrically, the local algorithm was more stable at longer evolutionary distances and larger nonhomologous parts than the global algorithm. This opens a possibility for creation of a combined method allowing generation of more accurate alignments.

3.
J Comput Biol ; 15(4): 379-91, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18435572

RESUMO

In many applications, the algorithmically obtained alignment ideally should restore the "golden standard" (GS) alignment, which superimposes positions originating from the same position of the common ancestor of the compared sequences. The average similarity between the algorithmically obtained and GS alignments ("the quality") is an important characteristic of an alignment algorithm. We proposed to determine the quality of an algorithm, using sequences that were artificially generated in accordance with an appropriate evolution model; the approach was applied to the global version of the Smith-Waterman algorithm (SWA). The quality of SWA is between 97% (for a PAM distance of 60) and 70% (for a PAM distance of 300). The percentage of identical aligned residues is the same for algorithmic and GS alignments. The total length of indels in algorithmic alignments is less than in the GS-mainly due to a substantial decrease in the number of indels in algorithmic alignments.


Assuntos
Algoritmos , Alinhamento de Sequência , Evolução Molecular , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA
4.
J Comput Biol ; 14(8): 1074-87, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17985988

RESUMO

Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m

Assuntos
Algoritmos , RNA/química , RNA/genética , Alinhamento de Sequência/estatística & dados numéricos , Biologia Computacional
5.
Algorithms Mol Biol ; 2: 13, 2007 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-17927813

RESUMO

BACKGROUND: cis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap. RESULTS: We developed and implemented an algorithm computing the p-value that s different motifs occur respectively k1, ..., ks or more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of cis-regulatory modules involved in D. melanogaster early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA. METHOD: The algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the O(n|Sigma|m|H| + K|sigma|K) Piiki) time complexity, where n is the length of the text |Sigma| is the alphabet size, m is the maximal motif length, |H| is the total number of words in motifs, K is the order of Markov model, and ki is the number of occurrences of the ith motif. CONCLUSION: The primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs. AVAILABILITY: Project web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/AhoPro/

6.
Bioinformatics ; 22(11): 1317-24, 2006 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-16543280

RESUMO

MOTIVATION: Evaluating all possible internal loops is one of the key steps in predicting the optimal secondary structure of an RNA molecule. The best algorithm available runs in time O(L(3)), L is the length of the RNA. RESULTS: We propose a new algorithm for evaluating internal loops, its run-time is O(M(*)log(2)L), M < L(2) is a number of possible nucleotide pairings. We created a software tool Afold which predicts the optimal secondary structure of RNA molecules of lengths up to 28 000 nt, using a computer with 2 Gb RAM. We also propose algorithms constructing sets of conditionally optimal multi-branch loop free (MLF) structures, e.g. the set that for every possible pairing (x, y) contains an optimal MLF structure in which nucleotides x and y form a pair. All the algorithms have run-time O(M(*)log(2)L).


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Composição de Bases , Linguagens de Programação , Software , Termodinâmica
7.
Proteins ; 54(3): 569-82, 2004 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-14748004

RESUMO

Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.


Assuntos
Algoritmos , Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Modelos Moleculares , Dados de Sequência Molecular , Sensibilidade e Especificidade
8.
Bioinformatics ; 18(12): 1673-80, 2002 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-12490453

RESUMO

MOTIVATION: As a first approximation, similarity between two long orthologous regions of genomes can be represented by a chain of local similarities. Within such a chain, pairs of successive similarities are collinear (non-conflicting), i.e. segments involved in the nth similarity precede in both sequences segments involved in the (n+1)th similarity. However, when all similarities between two long sequences are considered, usually there are many conflicts between them. Although some conflicts can be avoided by masking transposons or low-complexity sequences, selecting only those similarities that reflect orthology and, thus, belong to the evolutionarily true chain is not trivial. RESULTS: We propose a simple, hierarchical algorithm of finding the true chain of local similarities. Starting from similarities with low P-values, we resolve each pairwise conflict by deleting a similarity with a higher P-value. This greedy approach constructs a chain of similarities faster than when a chain optimal with respect to some global criterion is sought, and makes more sense biologically.


Assuntos
Algoritmos , Genoma , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Galinhas/genética , Sequência Conservada/genética , Evolução Molecular , Fractais , Humanos , Camundongos , Dados de Sequência Molecular , Sensibilidade e Especificidade , Homologia de Sequência do Ácido Nucleico , Software , Takifugu/genética
9.
Bioinformatics ; 18(12): 1703-4, 2002 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-12490463

RESUMO

OWEN is an interactive tool for aligning two long DNA sequences that represents similarity between them by a chain of collinear local similarities. OWEN employs several methods for constructing and editing local similarities and for resolving conflicts between them. Alignments of sequences of lengths over 10(6) can often be produced in minutes. OWEN requires memory below 20 L, where L is the sum of lengths of the compared sequences.


Assuntos
Algoritmos , Genoma , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Cromossomos Humanos 6-12 e X/genética , Sequência Conservada/genética , Bases de Dados de Ácidos Nucleicos , Evolução Molecular , Humanos , Armazenamento e Recuperação da Informação/métodos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...