Pesquisa | Portal Regional da BVS

phyBWT2: phylogeny reconstruction via eBWT positional clustering.

Guerrini, Veronica; Conte, Alessio; Grossi, Roberto; Liti, Gianni; Rosone, Giovanna; Tattini, Lorenzo.

Algorithms Mol Biol ; 18(1): 11, 2023 Aug 03.

Artigo em Inglês | MEDLINE | ID: mdl-37537624

RESUMO

BACKGROUND: Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. RESULTS: We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23-12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. CONCLUSIONS: Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.

Defining TCRÎ³Î´ lymphoproliferative disorders by combined immunophenotypic and molecular evaluation.

Teramo, Antonella; Binatti, Andrea; Ciabatti, Elena; Schiavoni, Gianluca; Tarrini, Giulia; Barilà, Gregorio; Calabretto, Giulia; Vicenzetto, Cristina; Gasparini, Vanessa Rebecca; Facco, Monica; Petrini, Iacopo; Grossi, Roberto; Pisanti, Nadia; Bortoluzzi, Stefania; Falini, Brunangelo; Tiacci, Enrico; Galimberti, Sara; Semenzato, Gianpietro; Zambello, Renato.

Nat Commun ; 13(1): 3298, 2022 06 08.

Artigo em Inglês | MEDLINE | ID: mdl-35676278

RESUMO

TÎ³Î´ large granular lymphocyte leukemia (TÎ³Î´ LGLL) is a rare lymphoproliferative disease, scantily described in literature. A deep-analysis, in an initial cohort of 9 TÎ³Î´ LGLL compared to 23 healthy controls, shows that TÎ³Î´ LGLL dominant clonotypes are mainly public and exhibit different V-(D)-J Î³/Î´ usage between patients with symptomatic and indolent TÎ³Î´ neoplasm. Moreover, some clonotypes share the same rearranged sequence. Data obtained in an enlarged cohort (n = 36) indicate the importance of a combined evaluation of immunophenotype and STAT mutational profile for the correct management of patients with TÎ³Î´ cell expansions. In fact, we observe an association between VÎ´2/VÎ³9 clonality and indolent course, while VÎ´2/VÎ³9 negativity correlates with symptomatic disease. Moreover, the 7 patients with STAT3 mutations have neutropenia and a CD56-/VÎ´2- phenotype, and the 3 cases with STAT5B mutations display an asymptomatic clinical course and CD56/VÎ´2 expression. All these data indicate that biological characterization is needed for TÎ³Î´-cell neoplasm definition.

Assuntos

Leucemia Linfocítica Granular Grande , Receptores de Antígenos de Linfócitos T gama-delta , Humanos , Imunofenotipagem , Leucemia Linfocítica Granular Grande/diagnóstico , Leucemia Linfocítica Granular Grande/genética , Leucemia Linfocítica Granular Grande/metabolismo , Mutação , Fenótipo , Receptores de Antígenos de Linfócitos T gama-delta/genética

Erratum to: Circular sequence comparison: algorithms and applications.

Grossi, Roberto; Iliopoulos, Costas S; Mercas, Robert; Pisanti, Nadia; Pissis, Solon P; Retha, Ahmad; Vayani, Fatima.

Algorithms Mol Biol ; 11: 21, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27471546

RESUMO

[This corrects the article DOI: 10.1186/s13015-016-0076-6.].

Circular sequence comparison: algorithms and applications.

Grossi, Roberto; Iliopoulos, Costas S; Mercas, Robert; Pisanti, Nadia; Pissis, Solon P; Retha, Ahmad; Vayani, Fatima.

Algorithms Mol Biol ; 11: 12, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27168761

RESUMO

BACKGROUND: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. RESULTS: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art.

Mobilomics in Saccharomyces cerevisiae strains.

Menconi, Giulia; Battaglia, Giovanni; Grossi, Roberto; Pisanti, Nadia; Marangoni, Roberto.

BMC Bioinformatics ; 14: 102, 2013 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-23514613

RESUMO

BACKGROUND: Mobile Genetic Elements (MGEs) are selfish DNA integrated in the genomes. Their detection is mainly based on consensus-like searches by scanning the investigated genome against the sequence of an already identified MGE. Mobilomics aims at discovering all the MGEs in a genome and understanding their dynamic behavior: The data for this kind of investigation can be provided by comparative genomics of closely related organisms. The amount of data thus involved requires a strong computational effort, which should be alleviated. RESULTS: Our approach proposes to exploit the high similarity among homologous chromosomes of different strains of the same species, following a progressive comparative genomics philosophy. We introduce a software tool based on our new fast algorithm, called regender, which is able to identify the conserved regions between chromosomes. Our case study is represented by a unique recently available dataset of 39 different strains of S.cerevisiae, which regender is able to compare in few minutes. By exploring the non-conserved regions, where MGEs are mainly retrotransposons called Tys, and marking the candidate Tys based on their length, we are able to locate a priori and automatically all the already known Tys and map all the putative Tys in all the strains. The remaining putative mobile elements (PMEs) emerging from this intra-specific comparison are sharp markers of inter-specific evolution: indeed, many events of non-conservation among different yeast strains correspond to PMEs. A clustering based on the presence/absence of the candidate Tys in the strains suggests an evolutionary interconnection that is very similar to classic phylogenetic trees based on SNPs analysis, even though it is computed without using phylogenetic information. CONCLUSIONS: The case study indicates that the proposed methodology brings two major advantages: (a) it does not require any template sequence for the wanted MGEs and (b) it can be applied to infer MGEs also for low coverage genomes with unresolved bases, where traditional approaches are largely ineffective.

Assuntos

Retroelementos , Saccharomyces cerevisiae/genética , Genoma Fúngico , Genômica/métodos , Software , Sequências Repetidas Terminais

MADMX: a strategy for maximal dense motif extraction.

Grossi, Roberto; Pietracaprina, Andrea; Pisanti, Nadia; Pucci, Geppino; Upfal, Eli; Vandin, Fabio.

J Comput Biol ; 18(4): 535-45, 2011 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-21417937

RESUMO

We develop, analyze, and experiment with a new tool, called MADMX, which extracts frequent motifs from biological sequences. We introduce the notion of density to single out the "significant" motifs. The density is a simple and flexible measure for bounding the number of don't cares in a motif, defined as the fraction of solid (i.e., different from don't care) characters in the motif. A maximal dense motif has density above a certain threshold, and any further specialization of a don't care symbol in it or any extension of its boundaries decreases its number of occurrences in the input sequence. By extracting only maximal dense motifs, MADMX reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by MADMX.

Assuntos

Algoritmos , Biologia Computacional/métodos , Análise de Sequência/métodos

Bases of motifs for generating repeated patterns with wild cards.

Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France.

IEEE/ACM Trans Comput Biol Bioinform ; 2(1): 40-50, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-17044163

RESUMO

Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.

Assuntos

Algoritmos , Motivos de Aminoácidos/genética , Proteínas/genética , Sequências Repetitivas de Ácido Nucleico/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA