Pesquisa | BVS Doenças Infecciosas e Parasitárias

ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events.

Denti, Luca; Rizzi, Raffaella; Beretta, Stefano; Vedova, Gianluca Della; Previtali, Marco; Bonizzoni, Paola.

BMC Bioinformatics ; 19(1): 444, 2018 Nov 20.

Artigo em Inglês | MEDLINE | ID: mdl-30458725

RESUMO

BACKGROUND: While the reconstruction of transcripts from a sample of RNA-Seq data is a computationally expensive and complicated task, the detection of splicing events from RNA-Seq data and a gene annotation is computationally feasible. This latter task, which is adequate for many transcriptome analyses, is usually achieved by aligning the reads to a reference genome, followed by comparing the alignments with a gene annotation, often implicitly represented by a graph: the splicing graph. RESULTS: We present ASGAL (Alternative Splicing Graph ALigner): a tool for mapping RNA-Seq data to the splicing graph, with the specific goal of detecting novel splicing events, involving either annotated or unannotated splice sites. ASGAL takes as input the annotated transcripts of a gene and a RNA-Seq sample, and computes (1) the spliced alignments of each read in input, and (2) a list of novel events with respect to the gene annotation. CONCLUSIONS: An experimental analysis shows that ASGAL allows to enrich the annotation with novel alternative splicing events even when genes in an experiment express at most one isoform. Compared with other tools which use the spliced alignment of reads against a reference genome for differential analysis, ASGAL better predicts events that use splice sites which are novel with respect to a splicing graph, showing a higher accuracy. To the best of our knowledge, ASGAL is the first tool that detects novel alternative splicing events by directly aligning reads to a splicing graph. AVAILABILITY: Source code, documentation, and data are available for download at http://asgal.algolab.eu .

Assuntos

Processamento Alternativo/genética , Splicing de RNA/genética , RNA/genética , Análise de Sequência de RNA/métodos , Humanos

Tools and data services registry: a community effort to document bioinformatics resources.

Ison, Jon; Rapacki, Kristoffer; Ménager, Hervé; Kalas, Matús; Rydza, Emil; Chmura, Piotr; Anthon, Christian; Beard, Niall; Berka, Karel; Bolser, Dan; Booth, Tim; Bretaudeau, Anthony; Brezovsky, Jan; Casadio, Rita; Cesareni, Gianni; Coppens, Frederik; Cornell, Michael; Cuccuru, Gianmauro; Davidsen, Kristian; Vedova, Gianluca Della; Dogan, Tunca; Doppelt-Azeroual, Olivia; Emery, Laura; Gasteiger, Elisabeth; Gatter, Thomas; Goldberg, Tatyana; Grosjean, Marie; Grüning, Björn; Helmer-Citterich, Manuela; Ienasescu, Hans; Ioannidis, Vassilios; Jespersen, Martin Closter; Jimenez, Rafael; Juty, Nick; Juvan, Peter; Koch, Maximilian; Laibe, Camille; Li, Jing-Woei; Licata, Luana; Mareuil, Fabien; Micetic, Ivan; Friborg, Rune Møllegaard; Moretti, Sebastien; Morris, Chris; Möller, Steffen; Nenadic, Aleksandra; Peterson, Hedi; Profiti, Giuseppe; Rice, Peter; Romano, Paolo.

Nucleic Acids Res ; 44(D1): D38-47, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26538599

RESUMO

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.

Assuntos

Biologia Computacional , Sistema de Registros , Curadoria de Dados , Software

Oral bisphosphonates do not increase the risk of severe upper gastrointestinal complications: a nested case-control study.

Ghirardi, Arianna; Scotti, Lorenza; Vedova, Gianluca Della; D'Oro, Luca Cavalieri; Lapi, Francesco; Cipriani, Francesco; Caputi, Achille P; Vaccheri, Alberto; Gregori, Dario; Gesuita, Rosaria; Vestri, Annarita; Staniscia, Tommaso; Mazzaglia, Giampiero; Corrao, Giovanni.

BMC Gastroenterol ; 14: 5, 2014 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-24397769

RESUMO

BACKGROUND: Data on the effect of oral bisphosphonates (BPs) on risk of upper gastrointestinal complications (UGIC) are conflicting. We conducted a large population-based study from a network of Italian healthcare utilization databases aimed to assess the UGIC risk associated with use of BPs in the setting of secondary prevention of osteoporotic fractures. METHODS: A nested case-control study was carried out within a cohort of 68,970 patients aged 45 years or older, who have been hospitalized for osteoporotic fracture from 2003 until 2005. Cases were the 804 patients who experienced hospitalization for UGIC until 2007. Up to 20 controls were randomly selected for each case. Conditional logistic regression model was used to estimate odds ratio (OR) associated with current and past use of BPs (i.e. for drug dispensation within 30 days and over 31 days prior the outcome onset, respectively) after adjusting for several covariates. RESULTS: Compared with patients who did not use BPs, current and past users had OR (and 95% confidence interval) of 0.86 (0.60 to 1.22) and 1.07 (0.80 to 1.44) respectively. There was no difference in the ORs estimated according with BPs type (alendronate or risedronate) and regimen (daily or weekly), nor with co-therapies and comorbidities. CONCLUSIONS: Further evidence that BPs dispensed for secondary prevention of osteoporotic fractures are not associated with increased risk of severe gastrointestinal complications is supplied from this study. Further research is required to clarify the role BPs and other drugs of co-medication in inducing UGIC.

Assuntos

Conservadores da Densidade Óssea/administração & dosagem , Difosfonatos/administração & dosagem , Gastroenteropatias/epidemiologia , Administração Oral , Idoso , Idoso de 80 Anos ou mais , Conservadores da Densidade Óssea/efeitos adversos , Bloqueadores dos Canais de Cálcio/uso terapêutico , Estudos de Casos e Controles , Comorbidade , Difosfonatos/efeitos adversos , Feminino , Gastroenteropatias/induzido quimicamente , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Incidência , Itália/epidemiologia , Masculino , Pessoa de Meia-Idade , Fraturas por Osteoporose/prevenção & controle , Fatores de Risco , Prevenção Secundária

Reads2Vec: Efficient Embedding of Raw High-Throughput Sequencing Reads Data.

Chourasia, Prakash; Ali, Sarwan; Ciccolella, Simone; Vedova, Gianluca Della; Patterson, Murray.

J Comput Biol ; 30(4): 469-491, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36730750

RESUMO

The massive amount of genomic data appearing for SARS-CoV-2 since the beginning of the COVID-19 pandemic has challenged traditional methods for studying its dynamics. As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared. Such a tool is tailored to take as input assembled, aligned, and curated full-length sequences, such as those found in the GISAID database. As high-throughput sequencing technologies continue to advance, such assembly, alignment, and curation may become a bottleneck, creating a need for methods that can process raw sequencing reads directly. In this article, we propose Reads2Vec, an alignment-free embedding approach that can generate a fixed-length feature vector representation directly from the raw sequencing reads without requiring assembly. Furthermore, since such an embedding is a numerical representation, it may be applied to highly optimized classification and clustering algorithms. Experiments on simulated data show that our proposed embedding obtains better classification results and better clustering properties contrary to existing alignment-free baselines. In a study on real data, we show that alignment-free embeddings have better clustering properties than the Pangolin tool and that the spike region of the SARS-CoV-2 genome heavily informs the alignment-free clusterings, which is consistent with current biological knowledge of SARS-CoV-2.

Assuntos

COVID-19 , Pangolins , Humanos , Animais , Pandemias , SARS-CoV-2/genética , COVID-19/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos

Simpler and Faster Development of Tumor Phylogeny Pipelines.

Ali, Sarwan; Ciccolella, Simone; Lucarella, Lorenzo; Vedova, Gianluca Della; Patterson, Murray.

J Comput Biol ; 28(11): 1142-1155, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34698531

RESUMO

In the recent years, there has been an increasing amount of single-cell sequencing studies, producing a considerable number of new data sets. This has particularly affected the field of cancer analysis, where more and more articles are published using this sequencing technique that allows for capturing more detailed information regarding the specific genetic mutations on each individually sampled cell. As the amount of information increases, it is necessary to have more sophisticated and rapid tools for analyzing the samples. To this goal, we developed plastic (PipeLine Amalgamating Single-cell Tree Inference Components), an easy-to-use and quick to adapt pipeline that integrates three different steps: (1) to simplify the input data, (2) to infer tumor phylogenies, and (3) to compare the phylogenies. We have created a pipeline submodule for each of those steps and developed new in-memory data structures that allow for easy and transparent sharing of the information across the tools implementing the above steps. While we use existing open source tools for those steps, we have extended the tool used for simplifying the input data, incorporating two machine learning procedures-which greatly reduce the running time without affecting the quality of the downstream analysis. Moreover, we have introduced the capability of producing some plots to quickly visualize results.

Assuntos

Biologia Computacional/métodos , Mutação , Neoplasias/classificação , Humanos , Internet , Neoplasias/genética , Filogenia , Análise de Sequência de DNA , Análise de Célula Única , Software

Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach.

Bonizzoni, Paola; Ciccolella, Simone; Vedova, Gianluca Della; Soto, Mauricio.

IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1410-1423, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31603766

RESUMO

Most of the evolutionary history reconstruction approaches are based on the infinite sites assumption, which states that mutations appear once in the evolutionary history. The Perfect Phylogeny model is the result of the infinite sites assumption and has been widely used to infer cancer evolution. Nonetheless, recent results show that recurrent and back mutations are present in the evolutionary history of tumors, hence the Perfect Phylogeny model might be too restrictive. We propose an approach that allows losing previously acquired mutations and multiple acquisitions of a character. Moreover, we provide an ILP formulation for the evolutionary tree reconstruction problem. Our formulation allows us to tackle both the Incomplete Directed Phylogeny problem and the Clonal Reconstruction problem when general evolutionary models are considered. The latter problem is fundamental in cancer genomics, the goal is to study the evolutionary history of a tumor considering as input data the fraction of cells having a certain mutation in a set of cancer samples. For the Clonal Reconstruction problem, an experimental analysis shows the advantage of allowing mutation losses. Namely, by analyzing real and simulated datasets, our ILP approach provides a better interpretation of the evolutionary history than a Perfect Phylogeny. The software is at https://github.com/AlgoLab/gppf.

Assuntos

Genômica/métodos , Neoplasias , Software , Algoritmos , Humanos , Mutação/genética , Neoplasias/classificação , Neoplasias/genética , Neoplasias/metabolismo , Filogenia

FSG: Fast String Graph Construction for De Novo Assembly.

Bonizzoni, Paola; Vedova, Gianluca Della; Pirola, Yuri; Previtali, Marco; Rizzi, Raffaella.

J Comput Biol ; 24(10): 953-968, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-28715269

RESUMO

The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this article, we explore a novel approach to compute the string graph, based on the FM-index and Burrows and Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the string graph assembler (SGA) as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.

Assuntos

Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genoma Humano , Humanos

LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly.

Bonizzoni, Paola; Vedova, Gianluca Della; Pirola, Yuri; Previtali, Marco; Rizzi, Raffaella.

J Comput Biol ; 23(3): 137-49, 2016 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-26953874

RESUMO

The large amount of short read data that has to be assembled in future applications, such as in metagenomics or cancer genomics, strongly motivates the investigation of disk-based approaches to index next-generation sequencing (NGS) data. Positive results in this direction stimulate the investigation of efficient external memory algorithms for de novo assembly from NGS data. Our article is also motivated by the open problem of designing a space-efficient algorithm to compute a string graph using an indexing procedure based on the Burrows-Wheeler transform (BWT). We have developed a disk-based algorithm for computing string graphs in external memory: the light string graph (LSG). LSG relies on a new representation of the FM-index that is exploited to use an amount of main memory requirement that is independent from the size of the data set. Moreover, we have developed a pipeline for genome assembly from NGS data that integrates LSG with the assembly step of SGA (Simpson and Durbin, 2012 ), a state-of-the-art string graph-based assembler, and uses BEETL for indexing the input data. LSG is open source software and is available online. We have analyzed our implementation on a 875-million read whole-genome dataset, on which LSG has built the string graph using only 1GB of main memory (reducing the memory occupation by a factor of 50 with respect to SGA), while requiring slightly more than twice the time than SGA. The analysis of the entire pipeline shows an important decrease in memory usage, while managing to have only a moderate increase in the running time.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos

Modeling alternative splicing variants from RNA-Seq data with isoform graphs.

Beretta, Stefano; Bonizzoni, Paola; Vedova, Gianluca Della; Pirola, Yuri; Rizzi, Raffaella.

J Comput Biol ; 21(1): 16-40, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24200390

RESUMO

Next-generation sequencing (NGS) technologies need new methodologies for alternative splicing (AS) analysis. Current computational methods for AS analysis from NGS data are mainly based on aligning short reads against a reference genome, while methods that do not need a reference genome are mostly underdeveloped. In this context, the main developed tools for NGS data focus on de novo transcriptome assembly (Grabherr et al., 2011 ; Schulz et al., 2012). While these tools are extensively applied for biological investigations and often show intrinsic shortcomings from the obtained results, a theoretical investigation of the inherent computational limits of transcriptome analysis from NGS data, when a reference genome is unknown or highly unreliable, is still missing. On the other hand, we still lack methods for computing the gene structures due to AS events under the above assumptions--a problem that we start to tackle with this article. More precisely, based on the notion of isoform graph (Lacroix et al., 2008), we define a compact representation of gene structures--called splicing graph--and investigate the computational problem of building a splicing graph that is (i) compatible with NGS data and (ii) isomorphic to the isoform graph. We characterize when there is only one representative splicing graph compatible with input data, and we propose an efficient algorithmic approach to compute this graph.

Assuntos

Processamento Alternativo , Modelos Genéticos , Algoritmos , Biologia Computacional , Gráficos por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Sequências Repetitivas de Ácido Nucleico , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de RNA/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA