Pesquisa | BVS Violência e Saúde

Federated sharing and processing of genomic datasets for tertiary data analysis.

Canakoglu, Arif; Pinoli, Pietro; Gulino, Andrea; Nanni, Luca; Masseroli, Marco; Ceri, Stefano.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-34020536

RESUMO

MOTIVATION: With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing. RESULTS: A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized. AVAILABILITY: The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/. CONTACT: {arif.canakoglu, pietro.pinoli}@polimi.it.

Assuntos

Conjuntos de Dados como Assunto , Genômica , Disseminação de Informação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Linguagens de Programação

VirusViz: comparative analysis and effective visualization of viral nucleotide and amino acid variants.

Bernasconi, Anna; Gulino, Andrea; Alfonsi, Tommaso; Canakoglu, Arif; Pinoli, Pietro; Sandionigi, Anna; Ceri, Stefano.

Nucleic Acids Res ; 49(15): e90, 2021 09 07.

Artigo em Inglês | MEDLINE | ID: mdl-34107016

RESUMO

Variant visualization plays an important role in supporting the viral evolution analysis, extremely valuable during the COVID-19 pandemic. VirusViz is a web-based application for comparing variants of selected viral populations and their sub-populations; it is primarily focused on SARS-CoV-2 variants, although the tool also supports other viral species (SARS-CoV, MERS-CoV, Dengue, Ebola). As input, VirusViz imports results of queries extracting variants and metadata from the large database ViruSurf, which integrates information about most SARS-CoV-2 sequences publicly deposited worldwide. Moreover, VirusViz accepts sequences of new viral populations as multi-FASTA files plus corresponding metadata in CSV format; a bioinformatic pipeline builds a suitable input for VirusViz by extracting the nucleotide and amino acid variants. Pages of VirusViz provide metadata summarization, variant descriptions, and variant visualization with rich options for zooming, highlighting variants or regions of interest, and switching from nucleotides to amino acids; sequences can be grouped, groups can be comparatively analyzed. For SARS-CoV-2, we manually collect mutations with known or predicted levels of severity/virulence, as indicated in linked research articles; such critical mutations are reported when observed in sequences. The system includes light-weight project management for downloading, resuming, and merging data analysis sessions. VirusViz is freely available at http://gmql.eu/virusviz/.

Assuntos

COVID-19/virologia , Visualização de Dados , SARS-CoV-2/química , SARS-CoV-2/genética , Sequência de Aminoácidos , Sequência de Bases , Bases de Dados Factuais , Humanos , Bases de Conhecimento , SARS-CoV-2/classificação , África do Sul/epidemiologia , Estados Unidos/epidemiologia

Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data.

Masseroli, Marco; Canakoglu, Arif; Pinoli, Pietro; Kaitoua, Abdulrahman; Gulino, Andrea; Horlova, Olha; Nanni, Luca; Bernasconi, Anna; Perna, Stefano; Stamoulakatou, Eirini; Ceri, Stefano.

Bioinformatics ; 35(5): 729-736, 2019 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-30101316

RESUMO

MOTIVATION: We previously proposed a paradigm shift in genomic data management, based on the Genomic Data Model (GDM) for mediating existing data formats and on the GenoMetric Query Language (GMQL) for supporting, at a high level of abstraction, data extraction and the most common data-driven computations required by tertiary data analysis of Next Generation Sequencing datasets. Here, we present a new GMQL-based system with enhanced accessibility, portability, scalability and performance. RESULTS: The new system has a well-designed modular architecture featuring: (i) an intermediate representation supporting many different implementations (including Spark, Flink and SciDB); (ii) a high-level technology-independent repository abstraction, supporting different repository technologies (e.g., local file system, Hadoop File System, database or others); (iii) several system interfaces, including a user-friendly Web-based interface, a Web Service interface, and a programmatic interface for Python language. Biological use case examples, using public ENCODE, Roadmap Epigenomics and TCGA datasets, demonstrate the relevance of our work. AVAILABILITY AND IMPLEMENTATION: The GMQL system is freely available for non-commercial use as open source project at: http://www.bioinformatics.deib.polimi.it/GMQLsystem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Software , Epigenômica , Genoma , Genômica

MutViz 2.0: visual analysis of somatic mutations and the impact of mutational signatures on selected genomic regions.

Gulino, Andrea; Stamoulakatou, Eirini; Piro, Rosario M.

NAR Cancer ; 3(2): zcab012, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34316703

RESUMO

Patterns of somatic single nucleotide variants observed in human cancers vary widely between different tumor types. They depend not only on the activity of diverse mutational processes, such as exposure to ultraviolet light and the deamination of methylated cytosines, but largely also on the sequence content of different genomic regions on which these processes act. With MutViz (http://gmql.eu/mutviz/), we have presented a user-friendly web tool for the identification of mutation enrichments that offers preloaded mutations from public datasets for a variety of cancer types, well organized within an effective database architecture. Somatic mutation patterns can be visually and statistically analyzed within arbitrary sets of small, user-provided genomic regions, such as promoters or collections of transcription factor binding sites. Here, we present MutViz 2.0, a largely extended and consolidated version of the tool: we took into account the immediate (trinucleotide) sequence context of mutations, improved the representation of clinical annotation of tumor samples and devised a method for signature refitting on limited genomic regions to infer the contribution of individual mutational processes to the mutation patterns observed in these regions. We described both the features of MutViz 2.0, concentrating on the novelties, and the substantial re-engineering of the cloud-based architecture.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA