Búsqueda | Portal Regional de la BVS

1.

PanPA: generation and alignment of panproteome graphs.

Dabbaghie, Fawaz; Srikakulam, Sanjay K; Marschall, Tobias; Kalinina, Olga V.

Bioinform Adv ; 3(1): vbad167, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38145107

RESUMEN

Motivation: Compared to eukaryotes, prokaryote genomes are more diverse through different mechanisms, including a higher mutation rate and horizontal gene transfer. Therefore, using a linear representative reference can cause a reference bias. Graph-based pangenome methods have been developed to tackle this problem. However, comparisons in DNA space are still challenging due to this high diversity. In contrast, amino acid sequences have higher similarity due to evolutionary constraints, whereby a single amino acid may be encoded by several synonymous codons. Coding regions cover the majority of the genome in prokaryotes. Thus, panproteomes present an attractive alternative leveraging the higher sequence similarity while not losing much of the genome in non-coding regions. Results: We present PanPA, a method that takes a set of multiple sequence alignments of protein sequences, indexes them, and builds a graph for each multiple sequence alignment. In the querying step, it can align DNA or amino acid sequences back to these graphs. We first showcase that PanPA generates correct alignments on a panproteome from 1350 Escherichia coli. To demonstrate that panproteomes allow comparisons at longer phylogenetic distances, we compare DNA and protein alignments from 1073 Salmonella enterica assemblies against E.coli reference genome, pangenome, and panproteome using BWA, GraphAligner, and PanPA, respectively; with PanPA aligning around 22% more sequences. We also aligned a DNA short-reads whole genome sequencing (WGS) sample from S.enterica against the E.coli reference with BWA and the panproteome with PanPA, where PanPA was able to find alignment for 68% of the reads compared to 5% with BWA. Availalability and implementation: PanPA is available at https://github.com/fawaz-dabbaghieh/PanPA.

2.

MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants.

Srikakulam, Sanjay K; Keller, Sebastian; Dabbaghie, Fawaz; Bals, Robert; Kalinina, Olga V.

Bioinformatics ; 39(3)2023 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-36825843

RESUMEN

MOTIVATION: Bloom filters are a popular data structure that allows rapid searches in large sequence datasets. So far, all tools work with nucleotide sequences; however, protein sequences are conserved over longer evolutionary distances, and only mutations on the protein level may have any functional significance. RESULTS: We present MetaProFi, a Bloom filter-based tool that, for the first time, offers the functionality to build indexes of amino acid sequences and query them with both amino acid and nucleotide sequences, thus bringing sequence comparison to the biologically relevant protein level. MetaProFi implements additional efficient engineering solutions, such as a shared memory system, chunked data storage and efficient compression. In addition to its conceptual novelty, MetaProFi demonstrates state-of-the-art performance and excellent memory consumption-to-speed ratio when applied to various large datasets. AVAILABILITY AND IMPLEMENTATION: Source code in Python is available at https://github.com/kalininalab/metaprofi.

Asunto(s)

Algoritmos , Compresión de Datos , Secuencia de Bases , Programas Informáticos , Proteínas

3.

The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms.

Walker, Kimberly; Kalra, Divya; Lowdon, Rebecca; Chen, Guangyi; Molik, David; Soto, Daniela C; Dabbaghie, Fawaz; Khleifat, Ahmad Al; Mahmoud, Medhat; Paulin, Luis F; Raza, Muhammad Sohail; Pfeifer, Susanne P; Agustinho, Daniel Paiva; Aliyev, Elbay; Avdeyev, Pavel; Barrozo, Enrico R; Behera, Sairam; Billingsley, Kimberley; Chong, Li Chuin; Choubey, Deepak; De Coster, Wouter; Fu, Yilei; Gener, Alejandro R; Hefferon, Timothy; Henke, David Morgan; Höps, Wolfram; Illarionova, Anastasia; Jochum, Michael D; Jose, Maria; Kesharwani, Rupesh K; Kolora, Sree Rohit Raj; Kubica, Jedrzej; Lakra, Priya; Lattimer, Damaris; Liew, Chia-Sin; Lo, Bai-Wei; Lo, Chunhsuan; Lötter, Anneri; Majidian, Sina; Mendem, Suresh Kumar; Mondal, Rajarshi; Ohmiya, Hiroko; Parvin, Nasrin; Peralta, Carolina; Poon, Chi-Lam; Prabhakaran, Ramanandan; Saitou, Marie; Sammi, Aditi; Sanio, Philippe; Sapoval, Nicolae.

F1000Res ; 11: 530, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36262335

RESUMEN

In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.

Asunto(s)

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Genómica , Programas Informáticos

4.

BubbleGun: enumerating bubbles and superbubbles in genome graphs.

Dabbaghie, Fawaz; Ebler, Jana; Marschall, Tobias.

Bioinformatics ; 38(17): 4217-4219, 2022 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-35799353

RESUMEN

MOTIVATION: With the fast development of sequencing technology, accurate de novo genome assembly is now possible even for larger genomes. Graph-based representations of genomes arise both as part of the assembly process, but also in the context of pangenomes representing a population. In both cases, polymorphic loci lead to bubble structures in such graphs. Detecting bubbles is hence an important task when working with genomic variants in the context of genome graphs. RESULTS: Here, we present a fast general-purpose tool, called BubbleGun, for detecting bubbles and superbubbles in genome graphs. Furthermore, BubbleGun detects and outputs runs of linearly connected bubbles and superbubbles, which we call bubble chains. We showcase its utility on de Bruijn graphs and compare our results to vg's snarl detection. We show that BubbleGun is considerably faster than vg especially in bigger graphs, where it reports all bubbles in less than 30 min on a human sample de Bruijn graph of around 2 million nodes. AVAILABILITY AND IMPLEMENTATION: BubbleGun is available and documented as a Python3 package at https://github.com/fawaz-dabbaghieh/bubble_gun under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Programas Informáticos , Humanos , Análisis de Secuencia de ADN/métodos , Genoma , Genómica/métodos

5.

An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates.

Mc Cartney, Ann M; Mahmoud, Medhat; Jochum, Michael; Agustinho, Daniel Paiva; Zorman, Barry; Al Khleifat, Ahmad; Dabbaghie, Fawaz; K Kesharwani, Rupesh; Smolka, Moritz; Dawood, Moez; Albin, Dreycey; Aliyev, Elbay; Almabrazi, Hakeem; Arslan, Ahmed; Balaji, Advait; Behera, Sairam; Billingsley, Kimberley; L Cameron, Daniel; Daw, Joyjit; T Dawson, Eric; De Coster, Wouter; Du, Haowei; Dunn, Christopher; Esteban, Rocio; Jolly, Angad; Kalra, Divya; Liao, Chunxiao; Liu, Yunxi; Lu, Tsung-Yu; M Havrilla, James; M Khayat, Michael; Marin, Maximillian; Monlong, Jean; Price, Stephen; Rafael Gener, Alejandro; Ren, Jingwen; Sagayaradj, Sagayamary; Sapoval, Nicolae; Sinner, Claude; C Soto, Daniela; Soylev, Arda; Subramaniyan, Arun; Syed, Najeeb; Tadimeti, Neha; Tater, Pamella; Vats, Pankaj; Vaughn, Justin; Walker, Kimberly; Wang, Gaojianyong; Zeng, Qiandong.

F1000Res ; 10: 246, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34621504

RESUMEN

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.

Asunto(s)

COVID-19 , SARS-CoV-2 , Animales , Genoma Viral , Humanos , Vertebrados

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA