Pesquisa | Portal de Pesquisa da BVS Enfermagem

ViruSurf: an integrated database to investigate viral sequences.

Canakoglu, Arif; Pinoli, Pietro; Bernasconi, Anna; Alfonsi, Tommaso; Melidis, Damianos P; Ceri, Stefano.

Nucleic Acids Res ; 49(D1): D817-D824, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33045721

RESUMO

ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf_gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

Assuntos

COVID-19/prevenção & controle , Biologia Computacional/métodos , Curadoria de Dados/métodos , Bases de Dados Genéticas , Genoma Viral/genética , SARS-CoV-2/genética , COVID-19/epidemiologia , COVID-19/virologia , Variação Genética , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Pandemias , SARS-CoV-2/fisiologia , Interface Usuário-Computador

VirusViz: comparative analysis and effective visualization of viral nucleotide and amino acid variants.

Bernasconi, Anna; Gulino, Andrea; Alfonsi, Tommaso; Canakoglu, Arif; Pinoli, Pietro; Sandionigi, Anna; Ceri, Stefano.

Nucleic Acids Res ; 49(15): e90, 2021 09 07.

Artigo em Inglês | MEDLINE | ID: mdl-34107016

RESUMO

Variant visualization plays an important role in supporting the viral evolution analysis, extremely valuable during the COVID-19 pandemic. VirusViz is a web-based application for comparing variants of selected viral populations and their sub-populations; it is primarily focused on SARS-CoV-2 variants, although the tool also supports other viral species (SARS-CoV, MERS-CoV, Dengue, Ebola). As input, VirusViz imports results of queries extracting variants and metadata from the large database ViruSurf, which integrates information about most SARS-CoV-2 sequences publicly deposited worldwide. Moreover, VirusViz accepts sequences of new viral populations as multi-FASTA files plus corresponding metadata in CSV format; a bioinformatic pipeline builds a suitable input for VirusViz by extracting the nucleotide and amino acid variants. Pages of VirusViz provide metadata summarization, variant descriptions, and variant visualization with rich options for zooming, highlighting variants or regions of interest, and switching from nucleotides to amino acids; sequences can be grouped, groups can be comparatively analyzed. For SARS-CoV-2, we manually collect mutations with known or predicted levels of severity/virulence, as indicated in linked research articles; such critical mutations are reported when observed in sequences. The system includes light-weight project management for downloading, resuming, and merging data analysis sessions. VirusViz is freely available at http://gmql.eu/virusviz/.

Assuntos

COVID-19/virologia , Visualização de Dados , SARS-CoV-2/química , SARS-CoV-2/genética , Sequência de Aminoácidos , Sequência de Bases , Bases de Dados Factuais , Humanos , Bases de Conhecimento , SARS-CoV-2/classificação , África do Sul/epidemiologia , Estados Unidos/epidemiologia

Genomic data integration and user-defined sample-set extraction for population variant analysis.

Alfonsi, Tommaso; Bernasconi, Anna; Canakoglu, Arif; Masseroli, Marco.

BMC Bioinformatics ; 23(1): 401, 2022 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-36175857

RESUMO

BACKGROUND: Population variant analysis is of great importance for gathering insights into the links between human genotype and phenotype. The 1000 Genomes Project established a valuable reference for human genetic variation; however, the integrative use of the corresponding data with other datasets within existing repositories and pipelines is not fully supported. Particularly, there is a pressing need for flexible and fast selection of population partitions based on their variant and metadata-related characteristics. RESULTS: Here, we target general germline or somatic mutation data sources for their seamless inclusion within an interoperable-format repository, supporting integration among them and with other genomic data, as well as their integrated use within bioinformatic workflows. In addition, we provide VarSum, a data summarization service working on sub-populations of interest selected using filters on population metadata and/or variant characteristics. The service is developed as an optimized computational framework with an Application Programming Interface (API) that can be called from within any existing computing pipeline or programming script. Provided example use cases of biological interest show the relevance, power and ease of use of the API functionalities. CONCLUSIONS: The proposed data integration pipeline and data set extraction and summarization API pave the way for solid computational infrastructures that quickly process cumbersome variation data, and allow biologists and bioinformaticians to easily perform scalable analysis on user-defined partitions of large cohorts from increasingly available genetic variation studies. With the current tendency to large (cross)nation-wide sequencing and variation initiatives, we expect an ever growing need for the kind of computational support hereby proposed.

Assuntos

Genômica , Metadados , Biologia Computacional , Genótipo , Humanos , Software

Data-driven recombination detection in viral genomes.

Alfonsi, Tommaso; Bernasconi, Anna; Chiara, Matteo; Ceri, Stefano.

Nat Commun ; 15(1): 3313, 2024 Apr 17.

Artigo em Inglês | MEDLINE | ID: mdl-38632281

RESUMO

Recombination is a key molecular mechanism for the evolution and adaptation of viruses. The first recombinant SARS-CoV-2 genomes were recognized in 2021; as of today, more than ninety SARS-CoV-2 lineages are designated as recombinant. In the wake of the COVID-19 pandemic, several methods for detecting recombination in SARS-CoV-2 have been proposed; however, none could faithfully confirm manual analyses by experts in the field. We hereby present RecombinHunt, an original data-driven method for the identification of recombinant genomes, capable of recognizing recombinant SARS-CoV-2 genomes (or lineages) with one or two breakpoints with high accuracy and within reduced turn-around times. ReconbinHunt shows high specificity and sensitivity, compares favorably with other state-of-the-art methods, and faithfully confirms manual analyses by experts. RecombinHunt identifies recombinant viral genomes from the recent monkeypox epidemic in high concordance with manually curated analyses by experts, suggesting that our approach is robust and can be applied to any epidemic/pandemic virus.

Assuntos

COVID-19 , Pandemias , Humanos , SARS-CoV-2 , Genoma Viral , Recombinação Genética , Filogenia

CoV2K model, a comprehensive representation of SARS-CoV-2 knowledge and data interplay.

Alfonsi, Tommaso; Al Khalaf, Ruba; Ceri, Stefano; Bernasconi, Anna.

Sci Data ; 9(1): 260, 2022 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-35650205

RESUMO

Since the outbreak of the COVID-19 pandemic, many research organizations have studied the genome of the SARS-CoV-2 virus; a body of public resources have been published for monitoring its evolution. While we experience an unprecedented richness of information in this domain, we also ascertained the presence of several information quality issues. We hereby propose CoV2K, an abstract model for explaining SARS-CoV-2-related concepts and interactions, focusing on viral mutations, their co-occurrence within variants, and their effects. CoV2K provides a clear and concise route map for understanding different connected types of information related to the virus; it thus drives a process of data and knowledge integration that aggregates information from several current resources, harmonizing their content and overcoming incompleteness and inconsistency issues. CoV2K is available for exploration as a graph that can be queried through a RESTful API addressing single entities or paths through their relationships. Practical use cases demonstrate its application to current knowledge inquiries.

Assuntos

COVID-19 , Modelos Biológicos , SARS-CoV-2 , Conjuntos de Dados como Assunto , Humanos , Mutação , Pandemias

High Performance Integration Pipeline for Viral and Epitope Sequences.

Alfonsi, Tommaso; Pinoli, Pietro; Canakoglu, Arif.

BioTech (Basel) ; 11(1)2022 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-35822815

RESUMO

With the spread of COVID-19, sequencing laboratories started to share hundreds of sequences daily. However, the lack of a commonly agreed standard across deposition databases hindered the exploration and study of all the viral sequences collected worldwide in a practical and homogeneous way. During the first months of the pandemic, we developed an automatic procedure to collect, transform, and integrate viral sequences of SARS-CoV-2, MERS, SARS-CoV, Ebola, and Dengue from four major database institutions (NCBI, COG-UK, GISAID, and NMDC). This data pipeline allowed the creation of the data exploration interfaces VirusViz and EpiSurf, as well as ViruSurf, one of the largest databases of integrated viral sequences. Almost two years after the first release of the repository, the original pipeline underwent a thorough refinement process and became more efficient, scalable, and general (currently, it also includes epitopes from the IEDB). Thanks to these improvements, we constantly update and expand our integrated repository, encompassing about 9.1 million SARS-CoV-2 sequences at present (March 2022). This pipeline made it possible to design and develop fundamental resources for any researcher interested in understanding the biological mechanisms behind the viral infection. In addition, it plays a crucial role in many analytic and visualization tools, such as ViruSurf, EpiSurf, VirusViz, and VirusLab.

EpiSurf: metadata-driven search server for analyzing amino acid changes within epitopes of SARS-CoV-2 and other viral species.

Bernasconi, Anna; Cilibrasi, Luca; Al Khalaf, Ruba; Alfonsi, Tommaso; Ceri, Stefano; Pinoli, Pietro; Canakoglu, Arif.

Database (Oxford) ; 20212021 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-34585726

RESUMO

EpiSurf is a Web application for selecting viral populations of interest and then analyzing how their amino acid changes are distributed along epitopes. Viral sequences are searched within ViruSurf, which stores curated metadata and amino acid changes imported from the most widely used deposition sources for viral databases (GenBank, COVID-19 Genomics UK (COG-UK) and Global initiative on sharing all influenza data (GISAID)). Epitopes are searched within the open source Immune Epitope Database or directly proposed by users by indicating their start and stop positions in the context of a given viral protein. Amino acid changes of selected populations are joined with epitopes of interest; a result table summarizes, for each epitope, statistics about the overlapping amino acid changes and about the sequences carrying such alterations. The results may also be inspected by the VirusViz Web application; epitope regions are highlighted within the given viral protein, and changes can be comparatively inspected. For sequences mutated within the epitope, we also offer a complete view of the distribution of amino acid changes, optionally grouped by the location, collection date or lineage. Thanks to these functionalities, EpiSurf supports the user-friendly testing of epitope conservancy within selected populations of interest, which can be of utmost relevance for designing vaccines, drugs or serological assays. EpiSurf is available at two endpoints. Database URL: http://gmql.eu/episurf/ (for searching GenBank and COG-UK sequences) and http://gmql.eu/episurf_gisaid/ (for GISAID sequences).

Assuntos

Substituição de Aminoácidos , Antígenos Virais/química , Epitopos/química , Internet , Metadados , SARS-CoV-2/química , Ferramenta de Busca , Software , Aminoácidos/química , Aminoácidos/imunologia , Antígenos Virais/imunologia , COVID-19/virologia , Epitopos/imunologia , Humanos , SARS-CoV-2/imunologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA