Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences.

Alamro, Hayam; Alzamel, Mai; Iliopoulos, Costas S; Pissis, Solon P; Watts, Steven.

BMC Bioinformatics ; 22(1): 51, 2021 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-33549041

RESUMO

BACKGROUND: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. RESULTS: We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. CONCLUSION: Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.

Assuntos

Genoma , Células Procarióticas , Sequências Repetitivas de Ácido Nucleico , Sequência de Bases , Sequências Repetidas Invertidas , Sequências Repetitivas de Ácido Nucleico/genética

2.

GenMap: ultra-fast computation of genome mappability.

Pockrandt, Christopher; Alzamel, Mai; Iliopoulos, Costas S; Reinert, Knut.

Bioinformatics ; 36(12): 3687-3692, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32246826

RESUMO

MOTIVATION: Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. RESULTS: We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. AVAILABILITY AND IMPLEMENTATION: GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.

Assuntos

Genoma , Software , Algoritmos , Genômica , Análise de Sequência de DNA

3.

Insights into the Influence of Specific Splicing Events on the Structural Organization of LRRK2.

Vlachakis, Dimitrios; Labrou, Nikolaos E; Iliopoulos, Costas; Hardy, John; Lewis, Patrick A; Rideout, Hardy; Trabzuni, Daniah.

Int J Mol Sci ; 19(9)2018 Sep 16.

Artigo em Inglês | MEDLINE | ID: mdl-30223621

RESUMO

Leucine-rich repeat kinase 2 (LRRK2) is a large protein of unclear function. Rare mutations in the LRRK2 gene cause familial Parkinson's disease (PD) and inflammatory bowel disease. Genome-wide association studies (GWAS) have revealed significant association of the abovementioned diseases at the LRRK2 locus. Cell and systems biology research has led to potential roles that LRRK2 may have in PD pathogenesis, especially the kinase domain (KIN). Previous human expression studies showed evidence of mRNA expression and splicing patterns that may contribute to our understanding of the function of LRRK2. In this work, we investigate and identified significant regional differences in LRRK2 expression at the mRNA level, including a number of splicing events in the Ras of complex protein (Roc) and C-terminal of Roc domain (COR) of LRRK2, in the substantia nigra (SN) and occipital cortex (OCTX). Our findings indicate that the predominant form of LRRK2 mRNA is full length, with shorter isoforms present at a lower copy number. Our molecular modelling study suggests that splicing events in the ROC/COR domains will have major consequences on the enzymatic function and dimer formation of LRRK2. The implications of these are highly relevant to the broader effort to understand the biology and physiological functions of LRRK2, and to better characterize the role(s) of LRRK2 in the underlying mechanism leading to PD.

Assuntos

Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/química , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Splicing de RNA , Expressão Gênica , Humanos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Modelos Moleculares , Doença de Parkinson/genética , Doença de Parkinson/metabolismo , Conformação Proteica , Domínios Proteicos , Domínios e Motivos de Interação entre Proteínas , RNA Mensageiro/genética , Relação Estrutura-Atividade , Substância Negra/metabolismo

4.

Predicting the functional consequences of non-synonymous DNA sequence variants--evaluation of bioinformatics tools and development of a consensus strategy.

Frousios, Kimon; Iliopoulos, Costas S; Schlitt, Thomas; Simpson, Michael A.

Genomics ; 102(4): 223-8, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23831115

RESUMO

The study of DNA sequence variation has been transformed by recent advances in DNA sequencing technologies. Determination of the functional consequences of sequence variant alleles offers potential insight as to how genotype may influence phenotype. Even within protein coding regions of the genome, establishing the consequences of variation on gene and protein function is challenging and requires substantial laboratory investigation. However, a series of bioinformatics tools have been developed to predict whether non-synonymous variants are neutral or disease-causing. In this study we evaluate the performance of nine such methods (SIFT, PolyPhen2, SNPs&GO, PhD-SNP, PANTHER, Mutation Assessor, MutPred, Condel and CAROL) and developed CoVEC (Consensus Variant Effect Classification), a tool that integrates the prediction results from four of these methods. We demonstrate that the CoVEC approach outperforms most individual methods and highlights the benefit of combining results from multiple tools.

Assuntos

Sequência de Bases , Biologia Computacional/métodos , Variação Genética , Algoritmos , Animais , Genoma , Genótipo , Humanos , Fases de Leitura Aberta , Fenótipo , Polimorfismo de Nucleotídeo Único

5.

Transcriptome map of mouse isochores in embryonic and neonatal cortex.

Frousios, Kimon; Iliopoulos, Costas S; Tischler, German; Kossida, Sophia; Pissis, Solon P; Arhondakis, Stilianos.

Genomics ; 101(2): 120-4, 2013 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23195409

RESUMO

Several studies on adult tissues agree on the presence of a positive effect of the genomic and genic base composition on mammalian gene expression. Recent literature supports the idea that during developmental processes GC-poor genomic regions are preferentially implicated. We investigate the relationship between the compositional properties of the isochores and of the genes with their respective expression activity during developmental processes. Using RNA-seq data from two distinct developmental stages of the mouse cortex, embryonic day 18 (E18) and postnatal day 7 (P7), we established for the first time a developmental-related transcriptome map of the mouse isochores. Additionally, for each stage we estimated the correlation between isochores' GC level and their expression activity, and the genes' expression patterns for each isochore family. Our analyses add evidence supporting the idea that during development GC-poor isochores are preferentially implicated, and confirm the positive effect of genes' GC level on their expression activity.

Assuntos

Encéfalo/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Isocoros/genética , Transcriptoma , Animais , Composição de Bases , Encéfalo/embriologia , Mapeamento Cromossômico , Embrião de Mamíferos , Biblioteca Gênica , Camundongos , Análise de Sequência de RNA

6.

SNP and Structural Study of the Notch Superfamily Provides Insights and Novel Pharmacological Targets against the CADASIL Syndrome and Neurodegenerative Diseases.

Papageorgiou, Louis; Papa, Lefteria; Papakonstantinou, Eleni; Mataragka, Antonia; Dragoumani, Konstantina; Chaniotis, Dimitrios; Beloukas, Apostolos; Iliopoulos, Costas; Bongcam-Rudloff, Erik; Chrousos, George P; Kossida, Sofia; Eliopoulos, Elias; Vlachakis, Dimitrios.

Genes (Basel) ; 15(5)2024 04 23.

Artigo em Inglês | MEDLINE | ID: mdl-38790158

RESUMO

The evolutionary conserved Notch signaling pathway functions as a mediator of direct cell-cell communication between neighboring cells during development. Notch plays a crucial role in various fundamental biological processes in a wide range of tissues. Accordingly, the aberrant signaling of this pathway underlies multiple genetic pathologies such as developmental syndromes, congenital disorders, neurodegenerative diseases, and cancer. Over the last two decades, significant data have shown that the Notch signaling pathway displays a significant function in the mature brains of vertebrates and invertebrates beyond neuronal development and specification during embryonic development. Neuronal connection, synaptic plasticity, learning, and memory appear to be regulated by this pathway. Specific mutations in human Notch family proteins have been linked to several neurodegenerative diseases including Alzheimer's disease, CADASIL, and ischemic injury. Neurodegenerative diseases are incurable disorders of the central nervous system that cause the progressive degeneration and/or death of brain nerve cells, affecting both mental function and movement (ataxia). There is currently a lot of study being conducted to better understand the molecular mechanisms by which Notch plays an essential role in the mature brain. In this study, an in silico analysis of polymorphisms and mutations in human Notch family members that lead to neurodegenerative diseases was performed in order to investigate the correlations among Notch family proteins and neurodegenerative diseases. Particular emphasis was placed on the study of mutations in the Notch3 protein and the structure analysis of the mutant Notch3 protein that leads to the manifestation of the CADASIL syndrome in order to spot possible conserved mutations and interpret the effect of these mutations in the Notch3 protein structure. Conserved mutations of cysteine residues may be candidate pharmacological targets for the potential therapy of CADASIL syndrome.

Assuntos

CADASIL , Doenças Neurodegenerativas , Polimorfismo de Nucleotídeo Único , Receptores Notch , Humanos , CADASIL/genética , CADASIL/metabolismo , CADASIL/patologia , Receptores Notch/metabolismo , Receptores Notch/genética , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/metabolismo , Doenças Neurodegenerativas/patologia , Mutação , Transdução de Sinais , Receptor Notch3/genética , Receptor Notch3/metabolismo

7.

Fingerprinting Breast Milk; insights into Milk Exosomics.

Papakonstantinou, Eleni; Dragoumani, Konstantina; Mataragka, Antonia; Bacopoulou, Flora; Yapijakis, Christos; Balatsos, Nikolaos Aa; Pissaridi, Katerina; Ladikos, Dimitris; Eftymiadou, Aspasia; Katsaros, George; Gikas, Evangelos; Hatzis, Pantelis; Samiotaki, Martina; Aivaliotis, Michalis; Megalooikonomou, Vasileios; Giannakakis, Antonis; Iliopoulos, Costas; Bongcam-Rudloff, Erik; Kossida, Sofia; Eliopoulos, Elias; Chrousos, George P; Vlachakis, Dimitrios.

EMBnet J ; 292024.

Artigo em Inglês | MEDLINE | ID: mdl-38845752

RESUMO

Breast milk, often referred to as "liquid gold," is a complex biofluid that provides essential nutrients, immune factors, and developmental cues for newborns. Recent advancements in the field of exosome research have shed light on the critical role of exosomes in breast milk. Exosomes are nanosized vesicles that carry bioactive molecules, including proteins, lipids, nucleic acids, and miRNAs. These tiny messengers play a vital role in intercellular communication and are now being recognized as key players in infant health and development. This paper explores the emerging field of milk exosomics, emphasizing the potential of exosome fingerprinting to uncover valuable insights into the composition and function of breast milk. By deciphering the exosomal cargo, we can gain a deeper understanding of how breast milk influences neonatal health and may even pave the way for personalized nutrition strategies.

8.

Locating tandem repeats in weighted sequences in proteins.

Zhang, Hui; Guo, Qing; Iliopoulos, Costas S.

BMC Bioinformatics ; 14 Suppl 8: S2, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23815711

RESUMO

A weighted biological sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. We attempt to locate all the tandem repeats in a weighted sequence. A repeated substring is called a tandem repeat if each occurrence of the substring is directly adjacent to each other. By introducing the idea of equivalence classes in weighted sequences, we identify the tandem repeats of every possible length using an iterative partitioning technique. We also present the algorithm for recording the tandem repeats, and prove that the problem can be solved in O(n²) time.

Assuntos

Algoritmos , Alinhamento de Sequência/métodos , Sequências de Repetição em Tandem , Proteínas/química , Proteínas/genética

9.

Transcriptome map of mouse isochores.

Arhondakis, Stilianos; Frousios, Kimon; Iliopoulos, Costas S; Pissis, Solon P; Tischler, German; Kossida, Sophia.

BMC Genomics ; 12: 511, 2011 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-22004510

RESUMO

BACKGROUND: The availability of fully sequenced genomes and the implementation of transcriptome technologies have increased the studies investigating the expression profiles for a variety of tissues, conditions, and species. In this study, using RNA-seq data for three distinct tissues (brain, liver, and muscle), we investigate how base composition affects mammalian gene expression, an issue of prime practical and evolutionary interest. RESULTS: We present the transcriptome map of the mouse isochores (DNA segments with a fairly homogeneous base composition) for the three different tissues and the effects of isochores' base composition on their expression activity. Our analyses also cover the relations between the genes' expression activity and their localization in the isochore families. CONCLUSIONS: This study is the first where next-generation sequencing data are used to associate the effects of both genomic and genic compositional properties to their corresponding expression activity. Our findings confirm previous results, and further support the existence of a relationship between isochores and gene expression. This relationship corroborates that isochores are primarily a product of evolutionary adaptation rather than a simple by-product of neutral evolutionary processes.

Assuntos

Isocoros/genética , Transcriptoma , Animais , Composição de Bases , Encéfalo/metabolismo , Genoma , Fígado/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Músculos/metabolismo , Análise de Sequência de RNA

10.

A fast and efficient algorithm for mapping short sequences to a reference genome.

Antoniou, Pavlos; Iliopoulos, Costas S; Mouchard, Laurent; Pissis, Solon P.

Adv Exp Med Biol ; 680: 399-403, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20865524

RESUMO

Novel high-throughput (Deep) sequencing technology methods have redefined the way genome sequencing is performed. They are able to produce tens of millions of short sequences (reads) in a single experiment and with a much lower cost than previous sequencing methods. In this paper, we present a new algorithm for addressing the problem of efficiently mapping millions of short reads to a reference genome. In particular, we define and solve the Massive Approximate Pattern Matching problem for mapping short sequences to a reference genome.

Assuntos

Algoritmos , Mapeamento Cromossômico/estatística & dados numéricos , Genômica/estatística & dados numéricos , Alinhamento de Sequência/estatística & dados numéricos , Animais , Biologia Computacional , Camundongos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , RNA/genética , Cromossomo X/genética

11.

Preface: MatBio 2021 Special Section.

Alzamel, Mai; Hampson, Christopher; Iliopoulos, Costas; Vayani, Fatima.

J Comput Biol ; 30(2): 117, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-36656165

12.

Efficient algorithms for counting and reporting segregating sites in genomic sequences.

Christodoulakis, Manolis; Golding, G Brian; Iliopoulos, Costas S; Ardila, Yoan José Pinzón; Smyth, William F.

J Comput Biol ; 14(7): 1001-10, 2007 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-17803376

RESUMO

The number of segregating sites provides an indicator of the degree of DNA sequence variation that is present in a sample, and has been of great interest to the biological, pharmaceutical and medical professions. In this paper, we first provide linear- and expected-sublinear-time algorithms for finding all the segregating sites of a given set of DNA sequences. We also describe a data structure for tracking segregating sites in a set of sequences, such that every time the set is updated with the insertion of a new sequence or removal of an existing one, the segregating sites are updated accordingly without the need to re-scan the entire set of sequences.

Assuntos

Algoritmos , Sequência de Bases , Genoma , Variação Genética , Análise de Sequência de DNA

13.

On avoided words, absent words, and their application to biological sequence analysis.

Almirantis, Yannis; Charalampopoulos, Panagiotis; Gao, Jia; Iliopoulos, Costas S; Mohamed, Manal; Pissis, Solon P; Polychronopoulos, Dimitris.

Algorithms Mol Biol ; 12: 5, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28293277

RESUMO

BACKGROUND: The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w, denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length [Formula: see text] is a [Formula: see text]-avoided word in x if [Formula: see text], for a given threshold [Formula: see text]. Notice that such a word may be completely absent from x. Hence, computing all such words naïvely can be a very time-consuming procedure, in particular for large k. RESULTS: In this article, we propose an [Formula: see text]-time and [Formula: see text]-space algorithm to compute all [Formula: see text]-avoided words of length k in a given sequence of length n over a fixed-sized alphabet. We also present a time-optimal [Formula: see text]-time algorithm to compute all [Formula: see text]-avoided words (of any length) in a sequence of length n over an integer alphabet of size [Formula: see text]. In addition, we provide a tight asymptotic upper bound for the number of [Formula: see text]-avoided words over an integer alphabet and the expected length of the longest one. We make available an implementation of our algorithm. Experimental results, using both real and synthetic data, show the efficiency and applicability of our implementation in biological sequence analysis. CONCLUSIONS: The systematic search for avoided words is particularly useful for biological sequence analysis. We present a linear-time and linear-space algorithm for the computation of avoided words of length k in a given sequence x. We suggest a modification to this algorithm so that it computes all avoided words of x, irrespective of their length, within the same time complexity. We also present combinatorial results with regards to avoided words and absent words.

14.

A Simple, Fast, Filter-Based Algorithm for Approximate Circular Pattern Matching.

Azim, Md Aashikur Rahman; Iliopoulos, Costas S; Rahman, M Sohel; Samiruzzaman, M.

IEEE Trans Nanobioscience ; 15(2): 93-100, 2016 03.

Artigo em Inglês | MEDLINE | ID: mdl-26992174

RESUMO

This paper deals with the approximate version of the circular pattern matching (ACPM) problem, which appears as an interesting problem in many biological contexts. The circular pattern matching problem consists in finding all occurrences of the rotations of a pattern P of length m in a text T of length n. In ACPM, we consider occurrences with k -mismatches under the Hamming distance model. In this paper, we present a simple and fast filter-based algorithm to solve the ACPM problem. We compare our algorithm with the state of the art algorithms and the results are found to be excellent. In particular, our algorithm runs almost twice as fast than the state of the art. Much of the efficiency of our algorithm can be attributed to its filters that are effective but extremely simple and lightweight.

Assuntos

Algoritmos , Biologia Computacional/métodos , DNA Circular , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , DNA Circular/análise , DNA Circular/química , DNA Circular/genética

15.

Transcriptome activity of isochores during preimplantation process in human and mouse.

Barton, Carl; Iliopoulos, Costas S; Pissis, Solon P; Arhondakis, Stilianos.

FEBS Lett ; 590(14): 2297-306, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27279593

RESUMO

This work investigates the role of isochores during preimplantation process. Using RNA-seq data from human and mouse preimplantation stages, we created the spatio-temporal transcriptional profiles of the isochores during preimplantation. We found that from early to late stages, GC-rich isochores increase their expression while GC-poor ones decrease it. Network analysis revealed that modules with few coexpressed isochores are GC-poorer than medium-large ones, characterized by an opposite expression as preimplantation advances, decreasing and increasing respectively. Our results reveal a functional contribution of the isochores, supporting the presence of structural-functional interactions during maturation and early-embryonic development.

Assuntos

Blastocisto/metabolismo , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Isocoros/metabolismo , Transcriptoma/fisiologia , Animais , Humanos , Camundongos , Especificidade da Espécie

16.

Circular sequence comparison: algorithms and applications.

Grossi, Roberto; Iliopoulos, Costas S; Mercas, Robert; Pisanti, Nadia; Pissis, Solon P; Retha, Ahmad; Vayani, Fatima.

Algorithms Mol Biol ; 11: 12, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27168761

RESUMO

BACKGROUND: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. RESULTS: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art.

17.

Erratum to: Circular sequence comparison: algorithms and applications.

Grossi, Roberto; Iliopoulos, Costas S; Mercas, Robert; Pisanti, Nadia; Pissis, Solon P; Retha, Ahmad; Vayani, Fatima.

Algorithms Mol Biol ; 11: 21, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27471546

RESUMO

[This corrects the article DOI: 10.1186/s13015-016-0076-6.].

18.

SimpLiFiCPM: A Simple and Lightweight Filter-Based Algorithm for Circular Pattern Matching.

Azim, Md Aashikur Rahman; Iliopoulos, Costas S; Rahman, M Sohel; Samiruzzaman, M.

Int J Genomics ; 2015: 259320, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26557649

RESUMO

This paper deals with the circular pattern matching (CPM) problem, which appears as an interesting problem in many biological contexts. CPM consists in finding all occurrences of the rotations of a pattern ð« of length m in a text ð¯ of length n. In this paper, we present SimpLiFiCPM (pronounced "Simplify CPM"), a simple and lightweight filter-based algorithm to solve the problem. We compare our algorithm with the state-of-the-art algorithms and the results are found to be excellent. Much of the speed of our algorithm comes from the fact that our filters are effective but extremely simple and lightweight.

19.

Optimal computation of all tandem repeats in a weighted sequence.

Barton, Carl; Iliopoulos, Costas S; Pissis, Solon P.

Algorithms Mol Biol ; 9: 21, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25221616

RESUMO

BACKGROUND: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. RESULTS: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal [Formula: see text]-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal [Formula: see text] time, thus improving on the best known [Formula: see text]-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

20.

Fast algorithms for approximate circular string matching.

Barton, Carl; Iliopoulos, Costas S; Pissis, Solon P.

Algorithms Mol Biol ; 9(1): 9, 2014 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-24656145

RESUMO

BACKGROUND: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area. RESULTS: In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k=O(m/logm). We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach. CONCLUSIONS: We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA