Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Food ; 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38724686

RESUMO

Salmonella enterica causes severe food-borne infections through contamination of the food supply chain. Its evolution has been associated with human activities, especially animal husbandry. Advances in intensive farming and global transportation have substantially reshaped the pig industry, but their impact on the evolution of associated zoonotic pathogens such as S. enterica remains unresolved. Here we investigated the population fluctuation, accumulation of antimicrobial resistance genes and international serovar Choleraesuis transmission of nine pig-enriched S. enterica populations comprising more than 9,000 genomes. Most changes were found to be attributable to the developments of the modern pig industry. All pig-enriched salmonellae experienced host transfers in pigs and/or population expansions over the past century, with pigs and pork having become the main sources of S. enterica transmissions to other hosts. Overall, our analysis revealed strong associations between the transmission of pig-enriched salmonellae and the global pork trade.

2.
mBio ; : e0058124, 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38683013

RESUMO

Recombination of short DNA fragments via horizontal gene transfer (HGT) can introduce beneficial alleles, create genomic disharmony through negative epistasis, and create adaptive gene combinations through positive epistasis. For non-core (accessory) genes, the negative epistatic cost is likely to be minimal because the incoming genes have not co-evolved with the recipient genome and are frequently observed as tightly linked cassettes with major effects. By contrast, interspecific recombination in the core genome is expected to be rare because disruptive allelic replacement is likely to introduce negative epistasis. Why then is homologous recombination common in the core of bacterial genomes? To understand this enigma, we take advantage of an exceptional model system, the common enteric pathogens Campylobacter jejuni and C. coli that are known for very high magnitude interspecies gene flow in the core genome. As expected, HGT does indeed disrupt co-adapted allele pairings, indirect evidence of negative epistasis. However, multiple HGT events enable recovery of the genome's co-adaption between introgressing alleles, even in core metabolism genes (e.g., formate dehydrogenase). These findings demonstrate that, even for complex traits, genetic coalitions can be decoupled, transferred, and independently reinstated in a new genetic background-facilitating transition between fitness peaks. In this example, the two-step recombinational process is associated with C. coli that are adapted to the agricultural niche.IMPORTANCEGenetic exchange among bacteria shapes the microbial world. From the acquisition of antimicrobial resistance genes to fundamental questions about the nature of bacterial species, this powerful evolutionary force has preoccupied scientists for decades. However, the mixing of genes between species rests on a paradox: 0n one hand, promoting adaptation by conferring novel functionality; on the other, potentially introducing disharmonious gene combinations (negative epistasis) that will be selected against. Taking an interdisciplinary approach to analyze natural populations of the enteric bacteria Campylobacter, an ideal example of long-range admixture, we demonstrate that genes can independently transfer across species boundaries and rejoin in functional networks in a recipient genome. The positive impact of two-gene interactions appears to be adaptive by expanding metabolic capacity and facilitating niche shifts through interspecific hybridization. This challenges conventional ideas and highlights the possibility of multiple-step evolution of multi-gene traits by interspecific introgression.

3.
Nat Commun ; 14(1): 8184, 2023 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-38081806

RESUMO

Helicobacter pylori, a dominant member of the gastric microbiota, shares co-evolutionary history with humans. This has led to the development of genetically distinct H. pylori subpopulations associated with the geographic origin of the host and with differential gastric disease risk. Here, we provide insights into H. pylori population structure as a part of the Helicobacter pylori Genome Project (HpGP), a multi-disciplinary initiative aimed at elucidating H. pylori pathogenesis and identifying new therapeutic targets. We collected 1011 well-characterized clinical strains from 50 countries and generated high-quality genome sequences. We analysed core genome diversity and population structure of the HpGP dataset and 255 worldwide reference genomes to outline the ancestral contribution to Eurasian, African, and American populations. We found evidence of substantial contribution of population hpNorthAsia and subpopulation hspUral in Northern European H. pylori. The genomes of H. pylori isolated from northern and southern Indigenous Americans differed in that bacteria isolated in northern Indigenous communities were more similar to North Asian H. pylori while the southern had higher relatedness to hpEastAsia. Notably, we also found a highly clonal yet geographically dispersed North American subpopulation, which is negative for the cag pathogenicity island, and present in 7% of sequenced US genomes. We expect the HpGP dataset and the corresponding strains to become a major asset for H. pylori genomics.


Assuntos
Infecções por Helicobacter , Helicobacter pylori , Humanos , Genoma Bacteriano/genética , Sequência de Bases , Genômica , Grupos Populacionais , Infecções por Helicobacter/microbiologia
4.
Genome Res ; 33(6): 988-998, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37253539

RESUMO

Bacterial genome data are accumulating at an unprecedented speed due to the routine use of sequencing in clinical diagnoses, public health surveillance, and population genetics studies. Genealogical reconstruction is fundamental to many of these uses; however, inferring genealogy from large-scale genome data sets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free method, PopPUNK, to increase its flexibility and interpretability across data sets. Our method, iterative-PopPUNK, rapidly produces multiple consistent cluster assignments across a range of sequence identities. By constructing a partially resolved genealogical tree with respect to these clusters, users can select a resolution most appropriate for their needs. We showed the accuracy of clusters at all levels of similarity and genealogical inference of iterative-PopPUNK based on simulated data and obtained phylogenetically concordant results in real data sets from seven bacterial species. Using two example sets of Escherichia/Shigella and Vibrio parahaemolyticus genomes, we show that iterative-PopPUNK can achieve cluster resolutions ranging from phylogroup down to sequence typing (ST). The iterative-PopPUNK algorithm is implemented in the "PopPUNK_iterate" program, available as part of the PopPUNK package.


Assuntos
Algoritmos , Genoma Bacteriano , Bactérias/genética , Análise por Conglomerados
6.
Nat Commun ; 13(1): 6842, 2022 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-36369175

RESUMO

Helicobacter pylori lives in the human stomach and has a population structure resembling that of its host. However, H. pylori from Europe and the Middle East trace substantially more ancestry from modern African populations than the humans that carry them. Here, we use a collection of Afro-Eurasian H. pylori genomes to show that this African ancestry is due to at least three distinct admixture events. H. pylori from East Asia, which have undergone little admixture, have accumulated many more non-synonymous mutations than African strains. European and Middle Eastern bacteria have elevated African ancestry at the sites of these mutations, implying selection to remove them during admixture. Simulations show that population fitness can be restored after bottlenecks by migration and subsequent admixture of small numbers of bacteria from non-bottlenecked populations. We conclude that recent spread of African DNA has been driven by deleterious mutations accumulated during the original out-of-Africa bottleneck.


Assuntos
Infecções por Helicobacter , Helicobacter pylori , Humanos , Helicobacter pylori/genética , Infecções por Helicobacter/microbiologia , População Negra/genética , África , Mutação
7.
mBio ; 13(6): e0215822, 2022 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-36286549

RESUMO

The Helicobacter pylori genome is more thoroughly mixed by homologous recombination than by any other organism that has been investigated, leading to apparent "free recombination" within populations. A recent mBio article by F. Ailloud, I. Estibariz, G. Pfaffinger, and S. Suerbaum (mBio 13:e01811-22, 2022, https://doi.org/10.1128/mbio.01811-22) helps to elucidate the cellular machinery that is used to achieve these unusual rates of genetic exchange. Specifically, they show that the UvrC gene, which is part of the repair machinery for DNA damage caused by ultraviolet light, has evolved an additional function in H. pylori, allowing very short tracts of DNA-with a mean length of only 28 bp-to be imported into the genome during natural transformation.


Assuntos
Infecções por Helicobacter , Helicobacter pylori , Humanos , Helicobacter pylori/genética , Recombinação Homóloga
8.
Microb Genom ; 8(2)2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35188454

RESUMO

The East Asian region, including China, Japan and Korea, accounts for half of gastric cancer deaths. However, different areas have contrasting gastric cancer incidences and the population structure of Helicobacter pylori in this ethnically diverse region is yet unknown. We aimed to investigate genomic differences in H. pylori between these areas to identify sequence polymorphisms associated with increased cancer risk. We analysed 381 H. pylori genomes collected from different areas of the three countries using phylogenetic and population genetic tools to characterize population differentiation. The functional consequences of SNPs with a highest fixation index (Fst) between subpopulations were examined by mapping amino acid changes on 3D protein structure, solved or modelled. Overall, 329/381 genomes belonged to the previously identified hspEAsia population indicating that import of bacteria from other regions of the world has been uncommon. Seven subregional clusters were found within hspEAsia, related to subpopulations with various ethnicities, geographies and gastric cancer risks. Subpopulation-specific amino acid changes were found in multidrug exporters (hefC), transporters (frpB-4), outer membrane proteins (hopI) and several genes involved in host interaction, such as a catalase site, involved in H2O2 entrance, and a flagellin site mimicking host glycosylation. Several of the top hits, including frpB-4, hefC, alpB/hopB and hofC, have been found to be differentiated within the Americas in previous studies, indicating that a handful of genes may be key to local geographic adaptation. H. pylori within East Asia are not homogeneous but have become differentiated geographically at multiple loci that might have facilitated adaptation to local conditions and hosts. This has important implications for further evaluation of these changes in relation to the varying gastric cancer incidence between geographical areas in this region.


Assuntos
Infecções por Helicobacter , Helicobacter pylori , Neoplasias Gástricas , Aminoácidos , Genômica , Infecções por Helicobacter/epidemiologia , Infecções por Helicobacter/microbiologia , Helicobacter pylori/genética , Humanos , Peróxido de Hidrogênio , Filogenia , Neoplasias Gástricas/epidemiologia , Neoplasias Gástricas/genética , Estados Unidos
9.
PLoS Genet ; 17(9): e1009829, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34582435

RESUMO

Measuring molecular evolution in bacteria typically requires estimation of the rate at which nucleotide changes accumulate in strains sampled at different times that share a common ancestor. This approach has been useful for dating ecological and evolutionary events that coincide with the emergence of important lineages, such as outbreak strains and obligate human pathogens. However, in multi-host (niche) transmission scenarios, where the pathogen is essentially an opportunistic environmental organism, sampling is often sporadic and rarely reflects the overall population, particularly when concentrated on clinical isolates. This means that approaches that assume recent common ancestry are not applicable. Here we present a new approach to estimate the molecular clock rate in Campylobacter that draws on the popular probability conundrum known as the 'birthday problem'. Using large genomic datasets and comparative genomic approaches, we use isolate pairs that share recent common ancestry to estimate the rate of nucleotide change for the population. Identifying synonymous and non-synonymous nucleotide changes, both within and outside of recombined regions of the genome, we quantify clock-like diversification to estimate synonymous rates of nucleotide change for the common pathogenic bacteria Campylobacter coli (2.4 x 10-6 s/s/y) and Campylobacter jejuni (3.4 x 10-6 s/s/y). Finally, using estimated total rates of nucleotide change, we infer the number of effective lineages within the sample time frame-analogous to a shared birthday-and assess the rate of turnover of lineages in our sample set over short evolutionary timescales. This provides a generalizable approach to calibrating rates in populations of environmental bacteria and shows that multiple lineages are maintained, implying that large-scale clonal sweeps may take hundreds of years or more in these species.


Assuntos
Campylobacter/genética , Evolução Molecular , Campylobacter/classificação , Genes Bacterianos , Variação Genética , Filogenia , Especificidade da Espécie
10.
Genome Biol ; 21(1): 138, 2020 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-32513234

RESUMO

BACKGROUND: Eubacterium rectale is one of the most prevalent human gut bacteria, but its diversity and population genetics are not well understood because large-scale whole-genome investigations of this microbe have not been carried out. RESULTS: Here, we leverage metagenomic assembly followed by a reference-based binning strategy to screen over 6500 gut metagenomes spanning geography and lifestyle and reconstruct over 1300 E. rectale high-quality genomes from metagenomes. We extend previous results of biogeographic stratification, identifying a new subspecies predominantly found in African individuals and showing that closely related non-human primates do not harbor E. rectale. Comparison of pairwise genetic and geographic distances between subspecies suggests that isolation by distance and co-dispersal with human populations might have contributed to shaping the contemporary population structure of E. rectale. We confirm that a relatively recently diverged E. rectale subspecies specific to Europe consistently lacks motility operons and that it is immotile in vitro, probably due to ancestral genetic loss. The same subspecies exhibits expansion of its carbohydrate metabolism gene repertoire including the acquisition of a genomic island strongly enriched in glycosyltransferase genes involved in exopolysaccharide synthesis. CONCLUSIONS: Our study provides new insights into the population structure and ecology of E. rectale and shows that shotgun metagenomes can enable population genomics studies of microbiota members at a resolution and scale previously attainable only by extensive isolate sequencing.


Assuntos
Eubacterium/genética , Microbioma Gastrointestinal , Genoma Bacteriano , Adolescente , Adulto , Idoso , Metabolismo dos Carboidratos/genética , Criança , Pré-Escolar , Glicosiltransferases/genética , Humanos , Lactente , Metagenoma , Pessoa de Meia-Idade , Filogeografia , Adulto Jovem
11.
Elife ; 92020 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-32195663

RESUMO

Investigating fitness interactions in natural populations remains a considerable challenge. We take advantage of the unique population structure of Vibrio parahaemolyticus, a bacterial pathogen of humans and shrimp, to perform a genome-wide screen for coadapted genetic elements. We identified 90 interaction groups (IGs) involving 1,560 coding genes. 82 IGs are between accessory genes, many of which have functions related to carbohydrate transport and metabolism. Only 8 involve both core and accessory genomes. The largest includes 1,540 SNPs in 82 genes and 338 accessory genome elements, many involved in lateral flagella and cell wall biogenesis. The interactions have a complex hierarchical structure encoding at least four distinct ecological strategies. One strategy involves a divergent profile in multiple genome regions, while the others involve fewer genes and are more plastic. Our results imply that most genetic alliances are ephemeral but that increasingly complex strategies can evolve and eventually cause speciation.


Assuntos
Adaptação Fisiológica/genética , Regulação Bacteriana da Expressão Gênica/fisiologia , Vibrio parahaemolyticus/genética , Especiação Genética , Genoma Bacteriano , Estudo de Associação Genômica Ampla
12.
Elife ; 82019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31305242

RESUMO

Red algae have adapted to extreme environments by acquiring genes from bacteria and archaea.


Assuntos
Transferência Genética Horizontal , Rodófitas , Archaea/genética , Genoma , Filogenia
13.
Microb Genom ; 5(8)2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31347998

RESUMO

Bacteria and archaea make up most of natural diversity, but the mechanisms that underlie the origin and maintenance of prokaryotic species are poorly understood. We investigated the speciation history of the genus Salmonella, an ecologically diverse bacterial lineage, within which S. enterica subsp. enterica is responsible for important human food-borne infections. We performed a survey of diversity across a large reference collection using multilocus sequence typing, followed by genome sequencing of distinct lineages. We identified 11 distinct phylogroups, 3 of which were previously undescribed. Strains assigned to S. enterica subsp. salamae are polyphyletic, with two distinct lineages that we designate Salamae A and B. Strains of the subspecies houtenae are subdivided into two groups, Houtenae A and B, and are both related to Selander's group VII. A phylogroup we designate VIII was previously unknown. A simple binary fission model of speciation cannot explain observed patterns of sequence diversity. In the recent past, there have been large-scale hybridization events involving an unsampled ancestral lineage and three distantly related lineages of the genus that have given rise to Houtenae A, Houtenae B and VII. We found no evidence for ongoing hybridization in the other eight lineages, but detected subtler signals of ancient recombination events. We are unable to fully resolve the speciation history of the genus, which might have involved additional speciation-by-hybridization or multi-way speciation events. Our results imply that traditional models of speciation by binary fission and divergence are not sufficient to account for Salmonella evolution.


Assuntos
Salmonella enterica/genética , Salmonella/classificação , Salmonella/genética , Técnicas de Tipagem Bacteriana/métodos , Evolução Biológica , Classificação/métodos , Evolução Molecular , Especiação Genética , Tipagem de Sequências Multilocus/métodos , Hibridização de Ácido Nucleico/métodos , Filogenia , Salmonella enterica/metabolismo
14.
ISME J ; 13(10): 2578-2588, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31235840

RESUMO

Humans have profoundly affected the ocean environment but little is known about anthropogenic effects on the distribution of microbes. Vibrio parahaemolyticus is found in warm coastal waters and causes gastroenteritis in humans and economically significant disease in shrimps. Based on data from 1103 genomes of environmental and clinical isolates, we show that V. parahaemolyticus is divided into four diverse populations, VppUS1, VppUS2, VppX and VppAsia. The first two are largely restricted to the US and Northern Europe, while the others are found worldwide, with VppAsia making up the great majority of isolates in the seas around Asia. Patterns of diversity within and between the populations are consistent with them having arisen by progressive divergence via genetic drift during geographical isolation. However, we find that there is substantial overlap in their current distribution. These observations can be reconciled without requiring genetic barriers to exchange between populations if long-range dispersal has increased dramatically in the recent past. We found that VppAsia isolates from the US have an average of 1.01% more shared ancestry with VppUS1 and VppUS2 isolates than VppAsia isolates from Asia itself. Based on time calibrated trees of divergence within epidemic lineages, we estimate that recombination affects about 0.017% of the genome per year, implying that the genetic mixture has taken place within the last few decades. These results suggest that human activity, such as shipping, aquatic products trade and increased human migration between continents, are responsible for the change of distribution pattern of this species.


Assuntos
Vibrioses/microbiologia , Vibrio parahaemolyticus/classificação , Vibrio parahaemolyticus/isolamento & purificação , Variação Genética , Genoma Bacteriano , Humanos , Filogenia , Frutos do Mar/microbiologia , Vibrio parahaemolyticus/genética
15.
Nat Commun ; 9(1): 3258, 2018 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-30108219

RESUMO

Genetic clustering algorithms, implemented in programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans as a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups that do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach, badMIXTURE, to assess the goodness of fit of the model using the ancestry "palettes" estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history.


Assuntos
Algoritmos , Genética Populacional , Povo Asiático/genética , População Negra/genética , Simulação por Computador , Humanos , Internacionalidade
16.
BMC Biol ; 16(1): 84, 2018 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-30071832

RESUMO

BACKGROUND: Helicobacter pylori are stomach-dwelling bacteria that are present in about 50% of the global population. Infection is asymptomatic in most cases, but it has been associated with gastritis, gastric ulcers and gastric cancer. Epidemiological evidence shows that progression to cancer depends upon the host and pathogen factors, but questions remain about why cancer phenotypes develop in a minority of infected people. Here, we use comparative genomics approaches to understand how genetic variation amongst bacterial strains influences disease progression. RESULTS: We performed a genome-wide association study (GWAS) on 173 H. pylori isolates from the European population (hpEurope) with known disease aetiology, including 49 from individuals with gastric cancer. We identified SNPs and genes that differed in frequency between isolates from patients with gastric cancer and those with gastritis. The gastric cancer phenotype was associated with the presence of babA and genes in the cag pathogenicity island, one of the major virulence determinants of H. pylori, as well as non-synonymous variations in several less well-studied genes. We devised a simple risk score based on the risk level of associated elements present, which has the potential to identify strains that are likely to cause cancer but will require refinement and validation. CONCLUSION: There are a number of challenges to applying GWAS to bacterial infections, including the difficulty of obtaining matched controls, multiple strain colonization and the possibility that causative strains may not be present when disease is detected. Our results demonstrate that bacterial factors have a sufficiently strong influence on disease progression that even a small-scale GWAS can identify them. Therefore, H. pylori GWAS can elucidate mechanistic pathways to disease and guide clinical treatment options, including for asymptomatic carriers.


Assuntos
Variação Genética , Genoma Bacteriano , Estudo de Associação Genômica Ampla , Helicobacter pylori/genética , Neoplasias Gástricas/microbiologia , Gastrite/etiologia , Humanos , Metaplasia/etiologia , Polimorfismo de Nucleotídeo Único , Risco , Neoplasias Gástricas/epidemiologia , Fatores de Virulência/genética
17.
Mol Biol Evol ; 35(5): 1284-1290, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29474601

RESUMO

Powerful approaches to inferring recent or current population structure based on nearest neighbor haplotype "coancestry" have so far been inaccessible to users without high quality genome-wide haplotype data. With a boom in nonmodel organism genomics, there is a pressing need to bring these methods to communities without access to such data. Here, we present RADpainter, a new program designed to infer the coancestry matrix from restriction-site-associated DNA sequencing (RADseq) data. We combine this program together with a previously published MCMC clustering algorithm into fineRADstructure-a complete, easy to use, and fast population inference package for RADseq data (https://github.com/millanek/fineRADstructure; last accessed February 24, 2018). Finally, with two example data sets, we illustrate its use, benefits, and robustness to missing RAD alleles in double digest RAD sequencing.


Assuntos
Genômica/métodos , Software , Alelos , Caryophyllaceae/genética , População , Análise de Sequência de DNA
19.
PLoS Genet ; 13(2): e1006546, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-28231283

RESUMO

For the last 500 years, the Americas have been a melting pot both for genetically diverse humans and for the pathogenic and commensal organisms associated with them. One such organism is the stomach-dwelling bacterium Helicobacter pylori, which is highly prevalent in Latin America where it is a major current public health challenge because of its strong association with gastric cancer. By analyzing the genome sequence of H. pylori isolated in North, Central and South America, we found evidence for admixture between H. pylori of European and African origin throughout the Americas, without substantial input from pre-Columbian (hspAmerind) bacteria. In the US, strains of African and European origin have remained genetically distinct, while in Colombia and Nicaragua, bottlenecks and rampant genetic exchange amongst isolates have led to the formation of national gene pools. We found three outer membrane proteins with atypical levels of Asian ancestry in American strains, as well as alleles that were nearly fixed specifically in South American isolates, suggesting a role for the ethnic makeup of hosts in the colonization of incoming strains. Our results show that new H. pylori subpopulations can rapidly arise, spread and adapt during times of demographic flux, and suggest that differences in transmission ecology between high and low prevalence areas may substantially affect the composition of bacterial populations.


Assuntos
Infecções por Helicobacter/genética , Helicobacter pylori/genética , Filogenia , Neoplasias Gástricas/genética , Alelos , DNA Mitocondrial/genética , Evolução Molecular , Genoma Bacteriano , Infecções por Helicobacter/epidemiologia , Helicobacter pylori/patogenicidade , Humanos , Indígenas Norte-Americanos , América Latina , Neoplasias Gástricas/epidemiologia , Neoplasias Gástricas/microbiologia , População Branca
20.
mSystems ; 1(3)2016.
Artigo em Inglês | MEDLINE | ID: mdl-27822531

RESUMO

Metagenomic profiling is challenging in part because of the highly uneven sampling of the tree of life by genome sequencing projects and the limitations imposed by performing phylogenetic inference at fixed taxonomic ranks. We present the algorithm MetaPalette, which uses long k-mer sizes (k = 30, 50) to fit a k-mer "palette" of a given sample to the k-mer palette of reference organisms. By modeling the k-mer palettes of unknown organisms, the method also gives an indication of the presence, abundance, and evolutionary relatedness of novel organisms present in the sample. The method returns a traditional, fixed-rank taxonomic profile which is shown on independently simulated data to be one of the most accurate to date. Tree figures are also returned that quantify the relatedness of novel organisms to reference sequences, and the accuracy of such figures is demonstrated on simulated spike-ins and a metagenomic soil sample. The software implementing MetaPalette is available at: https://github.com/dkoslicki/MetaPalette. Pretrained databases are included for Archaea, Bacteria, Eukaryota, and viruses. IMPORTANCE Taxonomic profiling is a challenging first step when analyzing a metagenomic sample. This work presents a method that facilitates fine-scale characterization of the presence, abundance, and evolutionary relatedness of organisms present in a given sample but absent from the training database. We calculate a "k-mer palette" which summarizes the information from all reads, not just those in conserved genes or containing taxon-specific markers. The compositions of palettes are easy to model, allowing rapid inference of community composition. In addition to providing strain-level information where applicable, our approach provides taxonomic profiles that are more accurate than those of competing methods. Author Video: An author video summary of this article is available.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...