Pesquisa | Portal Regional da BVS

Biogeographic distribution of five Antarctic cyanobacteria using large-scale k-mer searching with sourmash branchwater.

Lumian, Jessica; Sumner, Dawn Y; Grettenberger, Christen L; Jungblut, Anne D; Irber, Luiz; Pierce-Ward, N Tessa; Brown, C Titus.

Front Microbiol ; 15: 1328083, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38440141

RESUMO

Cyanobacteria form diverse communities and are important primary producers in Antarctic freshwater environments, but their geographic distribution patterns in Antarctica and globally are still unresolved. There are however few genomes of cultured cyanobacteria from Antarctica available and therefore metagenome-assembled genomes (MAGs) from Antarctic cyanobacteria microbial mats provide an opportunity to explore distribution of uncultured taxa. These MAGs also allow comparison with metagenomes of cyanobacteria enriched communities from a range of habitats, geographic locations, and climates. However, most MAGs do not contain 16S rRNA gene sequences, making a 16S rRNA gene-based biogeography comparison difficult. An alternative technique is to use large-scale k-mer searching to find genomes of interest in public metagenomes. This paper presents the results of k-mer based searches for 5 Antarctic cyanobacteria MAGs from Lake Fryxell and Lake Vanda, assigned the names Phormidium pseudopriestleyi FRX01, Microcoleus sp. MP8IB2.171, Leptolyngbya sp. BulkMat.35, Pseudanabaenaceae cyanobacterium MP8IB2.15, and Leptolyngbyaceae cyanobacterium MP9P1.79 in 498,942 unassembled metagenomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The Microcoleus sp. MP8IB2.171 MAG was found in a wide variety of environments, the P. pseudopriestleyi MAG was found in environments with challenging conditions, the Leptolyngbyaceae cyanobacterium MP9P1.79 MAG was only found in Antarctica, and the Leptolyngbya sp. BulkMat.35 and Pseudanabaenaceae cyanobacterium MP8IB2.15 MAGs were found in Antarctic and other cold environments. The findings based on metagenome matches and global comparisons suggest that these Antarctic cyanobacteria have distinct distribution patterns ranging from locally restricted to global distribution across the cold biosphere and other climatic zones.

Critical Assessment of Metagenome Interpretation: the second round of challenges.

Meyer, Fernando; Fritz, Adrian; Deng, Zhi-Luo; Koslicki, David; Lesker, Till Robin; Gurevich, Alexey; Robertson, Gary; Alser, Mohammed; Antipov, Dmitry; Beghini, Francesco; Bertrand, Denis; Brito, Jaqueline J; Brown, C Titus; Buchmann, Jan; Buluç, Aydin; Chen, Bo; Chikhi, Rayan; Clausen, Philip T L C; Cristian, Alexandru; Dabrowski, Piotr Wojciech; Darling, Aaron E; Egan, Rob; Eskin, Eleazar; Georganas, Evangelos; Goltsman, Eugene; Gray, Melissa A; Hansen, Lars Hestbjerg; Hofmeyr, Steven; Huang, Pingqin; Irber, Luiz; Jia, Huijue; Jørgensen, Tue Sparholt; Kieser, Silas D; Klemetsen, Terje; Kola, Axel; Kolmogorov, Mikhail; Korobeynikov, Anton; Kwan, Jason; LaPierre, Nathan; Lemaitre, Claire; Li, Chenhao; Limasset, Antoine; Malcher-Miranda, Fabio; Mangul, Serghei; Marcelino, Vanessa R; Marchet, Camille; Marijon, Pierre; Meleshko, Dmitry; Mende, Daniel R; Milanese, Alessio.

Nat Methods ; 19(4): 429-440, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35396482

RESUMO

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.

Assuntos

Metagenoma , Metagenômica , Archaea/genética , Metagenômica/métodos , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Software

Context-aware genomic surveillance reveals hidden transmission of a carbapenemase-producing Klebsiella pneumoniae.

Viehweger, Adrian; Blumenscheit, Christian; Lippmann, Norman; Wyres, Kelly L; Brandt, Christian; Hans, Jörg B; Hölzer, Martin; Irber, Luiz; Gatermann, Sören; Lübbert, Christoph; Pletz, Mathias W; Holt, Kathryn E; König, Brigitte.

Microb Genom ; 7(12)2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34913861

RESUMO

Genomic surveillance can inform effective public health responses to pathogen outbreaks. However, integration of non-local data is rarely done. We investigate two large hospital outbreaks of a carbapenemase-carrying Klebsiella pneumoniae strain in Germany and show the value of contextual data. By screening about 10â000 genomes, over 400â000 metagenomes and two culture collections using in silico and in vitro methods, we identify a total of 415 closely related genomes reported in 28 studies. We identify the relationship between the two outbreaks through time-dated phylogeny, including their respective origin. One of the outbreaks presents extensive hidden transmission, with descendant isolates only identified in other studies. We then leverage the genome collection from this meta-analysis to identify genes under positive selection. We thereby identify an inner membrane transporter (ynjC) with a putative role in colistin resistance. Contextual data from other sources can thus enhance local genomic surveillance at multiple levels and should be integrated by default when available.

Assuntos

Infecção Hospitalar/microbiologia , Farmacorresistência Bacteriana , Infecções por Klebsiella/epidemiologia , Klebsiella pneumoniae/classificação , Proteínas de Membrana Transportadoras/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Colistina/farmacologia , Infecção Hospitalar/epidemiologia , Surtos de Doenças , Monitoramento Epidemiológico , Alemanha/epidemiologia , Humanos , Infecções por Klebsiella/microbiologia , Klebsiella pneumoniae/efeitos dos fármacos , Klebsiella pneumoniae/genética , Klebsiella pneumoniae/isolamento & purificação , Proteínas de Membrana Transportadoras/química , Modelos Moleculares , Filogenia , Conformação Proteica

Streamlining data-intensive biology with workflow systems.

Reiter, Taylor; Brooks, Phillip T; Irber, Luiz; Joslin, Shannon E K; Reid, Charles M; Scott, Camille; Brown, C Titus; Pierce-Ward, N Tessa.

Gigascience ; 10(1)2021 01 13.

Artigo em Inglês | MEDLINE | ID: mdl-33438730

RESUMO

As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.

Assuntos

Biologia Computacional , Software , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de Trabalho

Large-scale sequence comparisons with sourmash.

Pierce, N Tessa; Irber, Luiz; Reiter, Taylor; Brooks, Phillip; Brown, C Titus.

F1000Res ; 8: 1006, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31508216

RESUMO

The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

Assuntos

Genoma , Software , Bases de Dados Factuais

Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.

Stapleton, James A; Kim, Jeongwoon; Hamilton, John P; Wu, Ming; Irber, Luiz C; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R; Brown, C Titus; Chan, Christina; Buell, C Robin; Whitehead, Timothy A.

PLoS One ; 11(1): e0147229, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26789840

RESUMO

Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise.

Assuntos

Algoritmos , Genoma , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Animais , DNA Bacteriano/genética , DNA de Neoplasias/genética , DNA de Plantas/genética , Biblioteca Gênica , Humanos

The khmer software package: enabling efficient nucleotide sequence analysis.

Crusoe, Michael R; Alameldin, Hussien F; Awad, Sherine; Boucher, Elmar; Caldwell, Adam; Cartwright, Reed; Charbonneau, Amanda; Constantinides, Bede; Edvenson, Greg; Fay, Scott; Fenton, Jacob; Fenzl, Thomas; Fish, Jordan; Garcia-Gutierrez, Leonor; Garland, Phillip; Gluck, Jonathan; González, Iván; Guermond, Sarah; Guo, Jiarong; Gupta, Aditi; Herr, Joshua R; Howe, Adina; Hyer, Alex; Härpfer, Andreas; Irber, Luiz; Kidd, Rhys; Lin, David; Lippi, Justin; Mansour, Tamer; McA'Nulty, Pamela; McDonald, Eric; Mizzi, Jessica; Murray, Kevin D; Nahum, Joshua R; Nanlohy, Kaben; Nederbragt, Alexander Johan; Ortiz-Zuazaga, Humberto; Ory, Jeramia; Pell, Jason; Pepe-Ranney, Charles; Russ, Zachary N; Schwarz, Erich; Scott, Camille; Seaman, Josiah; Sievert, Scott; Simpson, Jared; Skennerton, Connor T; Spencer, James; Srinivasan, Ramakrishnan; Standage, Daniel.

F1000Res ; 4: 900, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26535114

RESUMO

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA