Pesquisa | BVS Violência e Saúde

libFLASM: a software library for fixed-length approximate string matching.

Ayad, Lorraine A K; Pissis, Solon P P; Retha, Ahmad.

BMC Bioinformatics ; 17(1): 454, 2016 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-27832739

RESUMO

BACKGROUND: Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length â of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time [Formula: see text] and space [Formula: see text] under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere. RESULTS: We present and make available libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching under both the edit and the Hamming distance models. Moreover we describe how fixed-length approximate string matching is applied to solve real problems by incorporating libFLASM into established applications for multiple circular sequence alignment as well as single and structured motif extraction. Specifically, we describe how it can be used to improve the accuracy of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies; and we also describe how it is used to efficiently find motifs in molecular sequences representing regulatory or functional regions. The comparison of the performance of the library to other algorithms show how it is competitive, especially with increasing distance thresholds. CONCLUSIONS: Fixed-length approximate string matching is a generalisation of the classic approximate string matching problem. We present libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching. The extensive experimental results presented here suggest that other applications could benefit from using libFLASM, and thus further maintenance and development of libFLASM is desirable.

Assuntos

Biologia Computacional/métodos , Biblioteca Gênica , Software , Algoritmos , Bases de Dados como Assunto , Funções Verossimilhança , Motivos de Nucleotídeos/genética , Alinhamento de Sequência , Fatores de Tempo

The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data.

Koscielny, Gautier; Yaikhom, Gagarine; Iyer, Vivek; Meehan, Terrence F; Morgan, Hugh; Atienza-Herrero, Julian; Blake, Andrew; Chen, Chao-Kung; Easty, Richard; Di Fenza, Armida; Fiegel, Tanja; Grifiths, Mark; Horne, Alan; Karp, Natasha A; Kurbatova, Natalja; Mason, Jeremy C; Matthews, Peter; Oakley, Darren J; Qazi, Asfand; Regnart, Jack; Retha, Ahmad; Santos, Luis A; Sneddon, Duncan J; Warren, Jonathan; Westerberg, Henrik; Wilson, Robert J; Melvin, David G; Smedley, Damian; Brown, Steve D M; Flicek, Paul; Skarnes, William C; Mallon, Ann-Marie; Parkinson, Helen.

Nucleic Acids Res ; 42(Database issue): D802-9, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24194600

RESUMO

The International Mouse Phenotyping Consortium (IMPC) web portal (http://www.mousephenotype.org) provides the biomedical community with a unified point of access to mutant mice and rich collection of related emerging and existing mouse phenotype data. IMPC mouse clinics worldwide follow rigorous highly structured and standardized protocols for the experimentation, collection and dissemination of data. Dedicated 'data wranglers' work with each phenotyping center to collate data and perform quality control of data. An automated statistical analysis pipeline has been developed to identify knockout strains with a significant change in the phenotype parameters. Annotation with biomedical ontologies allows biologists and clinicians to easily find mouse strains with phenotypic traits relevant to their research. Data integration with other resources will provide insights into mammalian gene function and human disease. As phenotype data become available for every gene in the mouse, the IMPC web portal will become an invaluable tool for researchers studying the genetic contributions of genes to human diseases.

Assuntos

Bases de Dados Genéticas , Camundongos Knockout , Fenótipo , Animais , Ontologias Biológicas , Internet , Camundongos

Comparative visualization of genotype-phenotype relationships.

Yaikhom, Gagarine; Morgan, Hugh; Sneddon, Duncan; Retha, Ahmad; Atienza-Herrero, Julian; Blake, Andrew; Brown, James; Di Fenza, Armida; Fiegel, Tanja; Horner, Neil; Ring, Natalie; Santos, Luis; Westerberg, Henrik; Brown, Steve D M; Mallon, Ann-Marie.

Nat Methods ; 12(8): 698-9, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26226357

Assuntos

Biologia Computacional/métodos , Estudos de Associação Genética , Software , Acesso à Informação , Animais , Gráficos por Computador , Bases de Dados Genéticas , Genótipo , Internet , Camundongos , Mutação , Fenótipo

Circular sequence comparison: algorithms and applications.

Grossi, Roberto; Iliopoulos, Costas S; Mercas, Robert; Pisanti, Nadia; Pissis, Solon P; Retha, Ahmad; Vayani, Fatima.

Algorithms Mol Biol ; 11: 12, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27168761

RESUMO

BACKGROUND: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. RESULTS: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art.

Erratum to: Circular sequence comparison: algorithms and applications.

Grossi, Roberto; Iliopoulos, Costas S; Mercas, Robert; Pisanti, Nadia; Pissis, Solon P; Retha, Ahmad; Vayani, Fatima.

Algorithms Mol Biol ; 11: 21, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27471546

RESUMO

[This corrects the article DOI: 10.1186/s13015-016-0076-6.].

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA