Your browser doesn't support javascript.
loading
Identification of representative species-specific genes for abundance measurements.
Zachariasen, Trine; Petersen, Anders Østergaard; Brejnrod, Asker; Vestergaard, Gisle Alberg; Eklund, Aron; Nielsen, Henrik Bjørn.
Affiliation
  • Zachariasen T; Department of Health and Technology, Technical University of Denmark, Lyngby 2800, Denmark.
  • Petersen AØ; Department of Health and Technology, Technical University of Denmark, Lyngby 2800, Denmark.
  • Brejnrod A; Department of Health and Technology, Technical University of Denmark, Lyngby 2800, Denmark.
  • Vestergaard GA; Department of Health and Technology, Technical University of Denmark, Lyngby 2800, Denmark.
  • Eklund A; Clinical Microbiomics A/S, Copenhagen 2100, Denmark.
  • Nielsen HB; Clinical Microbiomics A/S, Copenhagen 2100, Denmark.
Bioinform Adv ; 3(1): vbad060, 2023.
Article in En | MEDLINE | ID: mdl-37213867
ABSTRACT
Motivation Metagenomic binning facilitates the reconstruction of genomes and identification of Metagenomic Species Pan-genomes or Metagenomic Assembled Genomes. We propose a method for identifying a set of de novo representative genes, termed signature genes, which can be used to measure the relative abundance and used as markers of each metagenomic species with high accuracy.

Results:

An initial set of the 100 genes that correlate with the median gene abundance profile of the entity is selected. A variant of the coupon collector's problem was utilized to evaluate the probability of identifying a certain number of unique genes in a sample. This allows us to reject the abundance measurements of strains exhibiting a significantly skewed gene representation. A rank-based negative binomial model is employed to assess the performance of different gene sets across a large set of samples, facilitating identification of an optimal signature gene set for the entity. When benchmarked the method on a synthetic gene catalog, our optimized signature gene sets estimate relative abundance significantly closer to the true relative abundance compared to the starting gene sets extracted from the metagenomic species. The method was able to replicate results from a study with real data and identify around three times as many metagenomic entities. Availability and implementation The code used for the analysis is available on GitHub https//github.com/trinezac/SG_optimization. Supplementary information Supplementary data are available at Bioinformatics Advances online.

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Diagnostic_studies Language: En Journal: Bioinform Adv Year: 2023 Type: Article Affiliation country: Denmark

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Diagnostic_studies Language: En Journal: Bioinform Adv Year: 2023 Type: Article Affiliation country: Denmark