Your browser doesn't support javascript.
loading
Discovering viral genomes in human metagenomic data by predicting unknown protein families.
Barrientos-Somarribas, Mauricio; Messina, David N; Pou, Christian; Lysholm, Fredrik; Bjerkner, Annelie; Allander, Tobias; Andersson, Björn; Sonnhammer, Erik L L.
Afiliação
  • Barrientos-Somarribas M; Department of Cell and Molecular Biology, Science for Life Laboratory, Karolinska Institutet, PO Box 285, SE-171 77, Stockholm, Sweden.
  • Messina DN; Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-171 21, Solna, Sweden.
  • Pou C; Department of Cell and Molecular Biology, Science for Life Laboratory, Karolinska Institutet, PO Box 285, SE-171 77, Stockholm, Sweden.
  • Lysholm F; Department of Cell and Molecular Biology, Science for Life Laboratory, Karolinska Institutet, PO Box 285, SE-171 77, Stockholm, Sweden.
  • Bjerkner A; IFM Bioinformatics and Swedish e-Science Research Centre (SeRC), Linköping University, SE-581 83, Linköping, Sweden.
  • Allander T; Karolinska Institutet, Department of Microbiology, Tumor- and Cell Biology, Laboratory for Clinical Microbiology, Karolinska University Hospital, SE-171 76, Stockholm, Sweden.
  • Andersson B; Karolinska Institutet, Department of Microbiology, Tumor- and Cell Biology, Laboratory for Clinical Microbiology, Karolinska University Hospital, SE-171 76, Stockholm, Sweden.
  • Sonnhammer ELL; Department of Cell and Molecular Biology, Science for Life Laboratory, Karolinska Institutet, PO Box 285, SE-171 77, Stockholm, Sweden. bjorn.andersson@ki.se.
Sci Rep ; 8(1): 28, 2018 01 08.
Article em En | MEDLINE | ID: mdl-29311716
ABSTRACT
Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.
Assuntos

Texto completo: 1 Coleções: 01-internacional Contexto em Saúde: 1_ASSA2030 Base de dados: MEDLINE Assunto principal: Proteínas Virais / Genoma Viral / Metagenoma / Metagenômica Tipo de estudo: Health_economic_evaluation / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Sci Rep Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Contexto em Saúde: 1_ASSA2030 Base de dados: MEDLINE Assunto principal: Proteínas Virais / Genoma Viral / Metagenoma / Metagenômica Tipo de estudo: Health_economic_evaluation / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Sci Rep Ano de publicação: 2018 Tipo de documento: Article