Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 2.110
Filtrar
Más filtros

Intervalo de año de publicación
1.
Mol Cell ; 73(1): 130-142.e5, 2019 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-30472192

RESUMEN

Since its establishment in 2009, single-cell RNA sequencing (RNA-seq) has been a major driver behind progress in biomedical research. In developmental biology and stem cell studies, the ability to profile single cells confers particular benefits. Although most studies still focus on individual tissues or organs, the recent development of ultra-high-throughput single-cell RNA-seq has demonstrated potential power in characterizing more complex systems or even the entire body. However, although multiple ultra-high-throughput single-cell RNA-seq systems have attracted attention, no systematic comparison of these systems has been performed. Here, with the same cell line and bioinformatics pipeline, we developed directly comparable datasets for each of three widely used droplet-based ultra-high-throughput single-cell RNA-seq systems, inDrop, Drop-seq, and 10X Genomics Chromium. Although each system is capable of profiling single-cell transcriptomes, their detailed comparison revealed the distinguishing features and suitable applications for each system.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Técnicas Analíticas Microfluídicas , ARN/genética , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Transcriptoma , Automatización de Laboratorios , Secuencia de Bases , Línea Celular , Biología Computacional , Análisis Costo-Beneficio , Código de Barras del ADN Taxonómico , Perfilación de la Expresión Génica/economía , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Humanos , Técnicas Analíticas Microfluídicas/economía , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/economía , Análisis de la Célula Individual/economía , Flujo de Trabajo
2.
J Cell Sci ; 137(4)2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38264939

RESUMEN

Filopodia are slender, actin-filled membrane projections used by various cell types for environment exploration. Analyzing filopodia often involves visualizing them using actin, filopodia tip or membrane markers. Due to the diversity of cell types that extend filopodia, from amoeboid to mammalian, it can be challenging for some to find a reliable filopodia analysis workflow suited for their cell type and preferred visualization method. The lack of an automated workflow capable of analyzing amoeboid filopodia with only a filopodia tip label prompted the development of filoVision. filoVision is an adaptable deep learning platform featuring the tools filoTips and filoSkeleton. filoTips labels filopodia tips and the cytosol using a single tip marker, allowing information extraction without actin or membrane markers. In contrast, filoSkeleton combines tip marker signals with actin labeling for a more comprehensive analysis of filopodia shafts in addition to tip protein analysis. The ZeroCostDL4Mic deep learning framework facilitates accessibility and customization for different datasets and cell types, making filoVision a flexible tool for automated analysis of tip-marked filopodia across various cell types and user data.


Asunto(s)
Actinas , Aprendizaje Profundo , Animales , Actinas/metabolismo , Seudópodos/metabolismo , Mamíferos/metabolismo
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38493344

RESUMEN

Venomous organisms have independently evolved the ability to produce toxins 101 times during their evolutionary history, resulting in over 200 000 venomous species. Collectively, these species produce millions of toxins, making them a valuable resource for bioprospecting and understanding the evolutionary mechanisms underlying genetic diversification. RNA-seq is the preferred method for characterizing toxin repertoires, but the analysis of the resulting data remains challenging. While early approaches relied on similarity-based mapping to known toxin databases, recent studies have highlighted the importance of structural features for toxin detection. The few existing pipelines lack an integration between these complementary approaches, and tend to be difficult to run for non-experienced users. To address these issues, we developed DeTox, a comprehensive and user-friendly tool for toxin research. It combines fast execution, parallelization and customization of parameters. DeTox was tested on published transcriptomes from gastropod mollusks, cnidarians and snakes, retrieving most putative toxins from the original articles and identifying additional peptides as potential toxins to be confirmed through manual annotation and eventually proteomic analysis. By integrating a structure-based search with similarity-based approaches, DeTox allows the comprehensive characterization of toxin repertoire in poorly-known taxa. The effect of the taxonomic bias in existing databases is minimized in DeTox, as mirrored in the detection of unique and divergent toxins that would have been overlooked by similarity-based methods. DeTox streamlines toxin annotation, providing a valuable tool for efficient identification of venom components that will enhance venom research in neglected taxa.


Asunto(s)
Toxinas Biológicas , Ponzoñas , Animales , Ponzoñas/genética , Ponzoñas/química , Proteómica , Toxinas Biológicas/genética , Serpientes , Péptidos , Transcriptoma
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38349061

RESUMEN

Extrachromosomal circular DNA (eccDNA) is currently attracting considerable attention from researchers due to its significant impact on tumor biogenesis. High-throughput sequencing (HTS) methods for eccDNA identification are continually evolving. However, an efficient pipeline for the integrative and comprehensive analysis of eccDNA obtained from HTS data is still lacking. Here, we introduce eccDNA-pipe, an accessible software package that offers a user-friendly pipeline for conducting eccDNA analysis starting from raw sequencing data. This dataset includes data from various sequencing techniques such as whole-genome sequencing (WGS), Circle-seq and Circulome-seq, obtained through short-read sequencing or long-read sequencing. eccDNA-pipe presents a comprehensive solution for both upstream and downstream analysis, encompassing quality control and eccDNA identification in upstream analysis and downstream tasks such as eccDNA length distribution analysis, differential analysis of genes enriched with eccDNA and visualization of eccDNA structures. Notably, eccDNA-pipe automatically generates high-quality publication-ready plots. In summary, eccDNA-pipe provides a comprehensive and user-friendly pipeline for customized analysis of eccDNA research.


Asunto(s)
ADN Circular , Neoplasias , Humanos , ADN Circular/genética , ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación Completa del Genoma
5.
Mol Biol Evol ; 41(3)2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38427787

RESUMEN

Advancements in next-generation sequencing (NGS) technologies have led to a substantial increase in the availability of population genetic variant data, thus prompting the development of various population analysis tools to enhance our understanding of population structure and evolution. The tools that are currently used to analyze population genetic variant data generally require different environments, parameters, and formats of the input data, which can act as a barrier preventing the wide-spread usage of such tools by general researchers who may not be familiar with bioinformatics. To address this problem, we have developed an automated and comprehensive pipeline called PAPipe to perform nine widely used population genetic analyses using population NGS data. PAPipe seamlessly interconnects and serializes multiple steps, such as read trimming and mapping, genetic variant calling, data filtering, and format converting, along with nine population genetic analyses such as principal component analysis, phylogenetic analysis, population tree analysis, population structure analysis, linkage disequilibrium decay analysis, selective sweep analysis, population admixture analysis, sequentially Markovian coalescent analysis, and fixation index analysis. PAPipe also provides an easy-to-use web interface that allows for the parameters to be set and the analysis results to be browsed in intuitive manner. PAPipe can be used to generate extensive results that provide insights that can help enhance user convenience and data usability. PAPipe is freely available at https://github.com/jkimlab/PAPipe.


Asunto(s)
Biología Computacional , Programas Informáticos , Filogenia , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genética de Población
6.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930022

RESUMEN

Identifying potential drug targets using metabolic modeling requires integrating multiple modeling methods and heterogeneous biological datasets, which can be challenging without efficient tools. We developed Constraint-based Optimization of Metabolic Objectives (COMO), a user-friendly pipeline that integrates multi-omics data processing, context-specific metabolic model development, simulations, drug databases and disease data to aid drug discovery. COMO can be installed as a Docker Image or with Conda and includes intuitive instructions within a Jupyter Lab environment. It provides a comprehensive solution for the integration of bulk and single-cell RNA-seq, microarrays and proteomics outputs to develop context-specific metabolic models. Using public databases, open-source solutions for model construction and a streamlined approach for predicting repurposable drugs, COMO enables researchers to investigate low-cost alternatives and novel disease treatments. As a case study, we used the pipeline to construct metabolic models of B cells, which simulate and analyze them to predict metabolic drug targets for rheumatoid arthritis and systemic lupus erythematosus, respectively. COMO can be used to construct models for any cell or tissue type and identify drugs for any human disease where metabolic inhibition is relevant. The pipeline has the potential to improve the health of the global community cost-effectively by providing high-confidence targets to pursue in preclinical and clinical studies. The source code of the COMO pipeline is available at https://github.com/HelikarLab/COMO. The Docker image can be pulled at https://github.com/HelikarLab/COMO/pkgs/container/como.


Asunto(s)
Multiómica , Proteómica , Humanos , Programas Informáticos , Bases de Datos Factuales , Descubrimiento de Drogas
7.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36847692

RESUMEN

Single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) is a powerful tool to study cellular heterogeneity. The high dimensional data generated from this technology are complex and require specialized expertise for analysis and interpretation. The core of scRNA-seq data analysis contains several key analytical steps, which include pre-processing, quality control, normalization, dimensionality reduction, integration and clustering. Each step often has many algorithms developed with varied underlying assumptions and implications. With such a diverse choice of tools available, benchmarking analyses have compared their performances and demonstrated that tools operate differentially according to the data types and complexity. Here, we present Integrated Benchmarking scRNA-seq Analytical Pipeline (IBRAP), which contains a suite of analytical components that can be interchanged throughout the pipeline alongside multiple benchmarking metrics that enable users to compare results and determine the optimal pipeline combinations for their data. We apply IBRAP to single- and multi-sample integration analysis using primary pancreatic tissue, cancer cell line and simulated data accompanied with ground truth cell labels, demonstrating the interchangeable and benchmarking functionality of IBRAP. Our results confirm that the optimal pipelines are dependent on individual samples and studies, further supporting the rationale and necessity of our tool. We then compare reference-based cell annotation with unsupervised analysis, both included in IBRAP, and demonstrate the superiority of the reference-based method in identifying robust major and minor cell types. Thus, IBRAP presents a valuable tool to integrate multiple samples and studies to create reference maps of normal and diseased tissues, facilitating novel biological discovery using the vast volume of scRNA-seq data available.


Asunto(s)
Benchmarking , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Perfilación de la Expresión Génica/métodos
8.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37406192

RESUMEN

Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.


Asunto(s)
Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN , Seudogenes , Cromosomas
9.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37000175

RESUMEN

Single-cell CRISPR screens have been widely used to investigate gene regulatory circuits in diverse biological systems. The recent development of single-cell CRISPR screens has enabled multimodal profiling of perturbed cells with both gene expression, chromatin accessibility and protein levels. However, current methods cannot meet the analysis requirements of different types of data and have limited functions. Here, we introduce Single-cell CRISPR screens data analysEs and perturbation modEling (SCREE) as a comprehensive and flexible pipeline to facilitate the analyses of various types of single-cell CRISPR screens data. SCREE performs read alignment, sgRNA assignment, quality control, clustering and visualization, perturbation enrichment evaluation, perturbation efficiency modeling, gene regulatory score calculation and functional analyses of perturbations for single-cell CRISPR screens with both RNA, ATAC and multimodal readout. SCREE is available at https://github.com/wanglabtongji/SCREE.


Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Regulación de la Expresión Génica , Redes Reguladoras de Genes
10.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37779245

RESUMEN

Single-cell multiomics techniques have been widely applied to detect the key signature of cells. These methods have achieved a single-molecule resolution and can even reveal spatial localization. These emerging methods provide insights elucidating the features of genomic, epigenomic and transcriptomic heterogeneity in individual cells. However, they have given rise to new computational challenges in data processing. Here, we describe Single-cell Single-molecule multiple Omics Pipeline (ScSmOP), a universal pipeline for barcode-indexed single-cell single-molecule multiomics data analysis. Essentially, the C language is utilized in ScSmOP to set up spaced-seed hash table-based algorithms for barcode identification according to ligation-based barcoding data and synthesis-based barcoding data, followed by data mapping and deconvolution. We demonstrate high reproducibility of data processing between ScSmOP and published pipelines in comprehensive analyses of single-cell omics data (scRNA-seq, scATAC-seq, scARC-seq), single-molecule chromatin interaction data (ChIA-Drop, SPRITE, RD-SPRITE), single-cell single-molecule chromatin interaction data (scSPRITE) and spatial transcriptomic data from various cell types and species. Additionally, ScSmOP shows more rapid performance and is a versatile, efficient, easy-to-use and robust pipeline for single-cell single-molecule multiomics data analysis.


Asunto(s)
Genómica , Multiómica , Reproducibilidad de los Resultados , Cromatina/genética , Análisis de Datos
11.
Brief Bioinform ; 24(6)2023 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-37756591

RESUMEN

In the process of drug discovery, one of the key problems is how to improve the biological activity and ADMET properties starting from a specific structure, which is also called structural optimization. Based on a starting scaffold, the use of deep generative model to generate molecules with desired drug-like properties will provide a powerful tool to accelerate the structural optimization process. However, the existing generative models remain challenging in extracting molecular features efficiently in 3D space to generate drug-like 3D molecules. Moreover, most of the existing ADMET prediction models made predictions of different properties through a single model, which can result in reduced prediction accuracy on some datasets. To effectively generate molecules from a specific scaffold and provide basis for the structural optimization, the 3D-SMGE (3-Dimensional Scaffold-based Molecular Generation and Evaluation) work consisting of molecular generation and prediction of ADMET properties is presented. For the molecular generation, we proposed 3D-SMG, a novel deep generative model for the end-to-end design of 3D molecules. In the 3D-SMG model, we designed the cross-aggregated continuous-filter convolution (ca-cfconv), which is used to achieve efficient and low-cost 3D spatial feature extraction while ensuring the invariance of atomic space rotation. 3D-SMG was proved to generate valid, unique and novel molecules with high drug-likeness. Besides, the proposed data-adaptive multi-model ADMET prediction method outperformed or maintained the best evaluation metrics on 24 out of 27 ADMET benchmark datasets. 3D-SMGE is anticipated to emerge as a powerful tool for hit-to-lead structural optimizations and accelerate the drug discovery process.

12.
EMBO Rep ; 24(1): e56033, 2023 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-36533629

RESUMEN

Antibacterial resistance is one of the greatest threats to human health. The development of new therapeutics against bacterial pathogens has slowed drastically since the approvals of the first antibiotics in the early and mid-20th century. Most of the currently investigated drug leads are modifications of approved antibacterials, many of which are derived from natural products. In this review, we highlight the challenges, advancements and current standing of the clinical and preclinical antibacterial research pipeline. Additionally, we present novel strategies for rejuvenating the discovery process and advocate for renewed and enthusiastic investment in the antibacterial discovery pipeline.


Asunto(s)
Productos Biológicos , Descubrimiento de Drogas , Humanos , Antibacterianos/farmacología , Antibacterianos/uso terapéutico , Bacterias/genética , Farmacorresistencia Microbiana
13.
Methods ; 226: 9-18, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38604412

RESUMEN

Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving the performance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods.


Asunto(s)
Minería de Datos , Aprendizaje Automático , Minería de Datos/métodos , Humanos , Semántica , Procesamiento de Lenguaje Natural , Algoritmos
14.
Mol Ther ; 32(6): 1817-1834, 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38627969

RESUMEN

Cellular therapies for the treatment of human diseases, such as chimeric antigen receptor (CAR) T and natural killer (NK) cells have shown remarkable clinical efficacy in treating hematological malignancies; however, current methods mainly utilize viral vectors that are limited by their cargo size capacities, high cost, and long timelines for production of clinical reagent. Delivery of genetic cargo via DNA transposon engineering is a more timely and cost-effective approach, yet has been held back by less efficient integration rates. Here, we report the development of a novel hyperactive TcBuster (TcB-M) transposase engineered through structure-guided and in vitro evolution approaches that achieves high-efficiency integration of large, multicistronic CAR-expression cassettes in primary human cells. Our proof-of-principle TcB-M engineering of CAR-NK and CAR-T cells shows low integrated vector copy number, a safe insertion site profile, robust in vitro function, and improves survival in a Burkitt lymphoma xenograft model in vivo. Overall, TcB-M is a versatile, safe, efficient and open-source option for the rapid manufacture and preclinical testing of primary human immune cell therapies through delivery of multicistronic large cargo via transposition.


Asunto(s)
Linfoma de Burkitt , Vectores Genéticos , Inmunoterapia Adoptiva , Receptores Quiméricos de Antígenos , Transposasas , Humanos , Transposasas/genética , Transposasas/metabolismo , Animales , Receptores Quiméricos de Antígenos/genética , Receptores Quiméricos de Antígenos/metabolismo , Inmunoterapia Adoptiva/métodos , Ratones , Vectores Genéticos/genética , Vectores Genéticos/administración & dosificación , Linfoma de Burkitt/terapia , Linfoma de Burkitt/genética , Ensayos Antitumor por Modelo de Xenoinjerto , Células Asesinas Naturales/inmunología , Células Asesinas Naturales/metabolismo , Línea Celular Tumoral , Elementos Transponibles de ADN , Linfocitos T/inmunología , Linfocitos T/metabolismo , Transgenes
15.
Proc Natl Acad Sci U S A ; 119(16): e2118853119, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-35377735

RESUMEN

Based on a dataset that we collected from the top research institutions in economics around the globe (including universities, business schools, and other organizations, such as central banks), we document the underrepresentation of women in economics. For the 238 universities and business schools in the sample, women hold 25% of senior-level positions (full professor or associate professor) and 37% of junior-level positions. In the 82 US universities and business schools, the figures are 20% on the senior level and 32% on the entry level, while in the 122 European institutions, the numbers are 27% and 38%, respectively, with some heterogeneity across countries. The numbers also show that the highest-ranking institutions (in terms of research output) have fewer women in senior positions. Moreover, in the United States, this effect is even present on the junior level. The "leaky pipeline" may hence begin earlier than oftentimes assumed and is even more of an issue in the highly integrated market of the United States. In Europe, an institution ranked 100 places higher has 3 percentage points fewer women in senior positions, but in the United States, it is almost 5 percentage points.

16.
Proc Natl Acad Sci U S A ; 119(36): e2123201119, 2022 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-36037360

RESUMEN

Using public housing developments as a strategic site, our research documents a distinct pathway linking disadvantaged context to incarceration-the public-housing-to-prison pipeline. Focusing on New York City Housing Authority (NYCHA) housing developments as a case study, we find that incarceration rates in NYCHA tracts are 4.6 times higher than those in non-NYCHA tracts. More strikingly, 94% of NYCHA tracts report rates above the median value for non-NYCHA tracts. Moreover, 17% of New York State's incarcerated population originated from just 372 NYCHA tracts. Compared with non-NYCHA tracts, NYCHA tracts had higher shares of Black residents and were significantly more disadvantaged. This NYCHA disadvantage in concentrated incarceration is also robust at different spatial scales. Our findings have implications for policies and programs to disrupt community-based pipelines to prison.


Asunto(s)
Prisiones , Vivienda Popular , Población Negra , Humanos , Ciudad de Nueva York/epidemiología , Características de la Residencia , Poblaciones Vulnerables
17.
Genomics ; 116(4): 110858, 2024 07.
Artículo en Inglés | MEDLINE | ID: mdl-38735595

RESUMEN

The ever decreasing cost of Next-Generation Sequencing coupled with the emergence of efficient and reproducible analysis pipelines has rendered genomic methods more accessible. However, downstream analyses are basic or missing in most workflows, creating a significant barrier for non-bioinformaticians. To help close this gap, we developed Cactus, an end-to-end pipeline for analyzing ATAC-Seq and mRNA-Seq data, either separately or jointly. Its Nextflow-, container-, and virtual environment-based architecture ensures efficient and reproducible analyses. Cactus preprocesses raw reads, conducts differential analyses between conditions, and performs enrichment analyses in various databases, including DNA-binding motifs, ChIP-Seq binding sites, chromatin states, and ontologies. We demonstrate the utility of Cactus in a multi-modal and multi-species case study as well as by showcasing its unique capabilities as compared to other ATAC-Seq pipelines. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.


Asunto(s)
Programas Informáticos , Humanos , Animales , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Cromatina/genética , Cromatina/metabolismo , RNA-Seq/métodos
18.
Proteomics ; 24(3-4): e2200403, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37787899

RESUMEN

Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.


Asunto(s)
Proteínas , Proteómica , Proteómica/métodos , Proteínas/análisis , Flujo de Trabajo
19.
BMC Bioinformatics ; 25(1): 233, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38982375

RESUMEN

BACKGROUND: Structural variations play an important role in bacterial genomes. They can mediate genome adaptation quickly in response to the external environment and thus can also play a role in antibiotic resistance. The detection of structural variations in bacteria is challenging, and the recognition of even small rearrangements can be important. Even though most detection tools are aimed at and benchmarked on eukaryotic genomes, they can also be used on prokaryotic genomes. The key features of detection are the ability to detect small rearrangements and support haploid genomes. Because of the limiting performance of a single detection tool, combining the detection abilities of multiple tools can lead to more robust results. There are already available workflows for structural variation detection for long-reads technologies and for the detection of single-nucleotide variation and indels, both aimed at bacteria. Yet we are unaware of structural variations detection workflows for the short-reads sequencing platform. Motivated by this gap we created our workflow. Further, we were interested in increasing the detection performance and providing more robust results. RESULTS: We developed an open-source bioinformatics pipeline, ProcaryaSV, for the detection of structural variations in bacterial isolates from paired-end short sequencing reads. Multiple tools, starting with quality control and trimming of sequencing data, alignment to the reference genome, and multiple structural variation detection tools, are integrated. All the partial results are then processed and merged with an in-house merging algorithm. Compared with a single detection approach, ProcaryaSV has improved detection performance and is a reproducible easy-to-use tool. CONCLUSIONS: The ProcaryaSV pipeline provides an integrative approach to structural variation detection from paired-end next-generation sequencing of bacterial samples. It can be easily installed and used on Linux machines. It is publicly available on GitHub at https://github.com/robinjugas/ProcaryaSV .


Asunto(s)
Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Bacterias/genética
20.
BMC Bioinformatics ; 25(1): 222, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38914932

RESUMEN

BACKGROUND: Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps. RESULTS: Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results. CONCLUSIONS: Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.


Asunto(s)
Genoma Viral , Secuenciación de Nucleótidos de Alto Rendimiento , SARS-CoV-2 , Programas Informáticos , Genoma Viral/genética , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , SARS-CoV-2/genética , Metagenómica/métodos , Virus/genética , COVID-19/virología , Viroma/genética , Células HeLa
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA