Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 237, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38997633

RESUMO

BACKGROUND: With the emergence of Oxford Nanopore technology, now the on-site sequencing of 16S rRNA from environments is available. Due to the error level and structure, the analysis of such data demands some database of reference sequences. However, many taxa from complex and diverse environments, have poor representation in publicly available databases. In this paper, we propose the METASEED pipeline for the reconstruction of full-length 16S sequences from such environments, in order to improve the reference for the subsequent use of on-site sequencing. RESULTS: We show that combining high-precision short-read sequencing of both 16S and full metagenome from the same samples allow us to reconstruct high-quality 16S sequences from the more abundant taxa. A significant novelty is the carefully designed collection of metagenome reads that matches the 16S amplicons, based on a combination of uniqueness and abundance. Compared to alternative approaches this produces superior results. CONCLUSION: Our pipeline will facilitate numerous studies associated with various unknown microorganisms, thus allowing the comprehension of the diverse environments. The pipeline is a potential tool in generating a full length 16S rRNA gene database for any environment.


Assuntos
Metagenoma , RNA Ribossômico 16S , RNA Ribossômico 16S/genética , Metagenoma/genética , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas
2.
Scand J Immunol ; 99(4): e13346, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-39007947

RESUMO

Age-related gut bacterial changes during infancy have been widely studied, but it remains still unknown how these changes are associated with immune cell composition. This study's aim was to explore if the temporal development of gut bacteria during infancy prospectively affects immune cell composition. Faecal bacteria and short-chain fatty acids were analysed from 67 PreventADALL study participants at four timepoints (birth to 12 months) using reduced metagenome sequencing and gas chromatography. Immune cell frequencies were assessed using mass cytometry in whole blood samples at 12 months. The infants clustered into four groups based on immune cell composition: clusters 1 and 2 showed a high relative abundance of naïve cells, cluster 3 exhibited increased abundance of classical- and non-classical monocytes and clusters 3 and 4 had elevated neutrophil levels. At all age groups, we did observe significant associations between the gut microbiota and immune cell clusters; however, these were generally from low abundant species. Only at 6 months of age we observed significant associations between abundant (>8%) species and immune cell clusters. Bifidobacterium adolescentis and Porphyromonadaceae are associated with cluster 1, while Bacteroides fragilis and Bifidobacterium longum are associated with clusters 3 and 4 respectively. These species have been linked to T-cell polarization and maturation. No significant correlations were found between short-chain fatty acids and immune cell composition. Our findings suggest that abundant gut bacteria at 6 months may influence immune cell frequencies at 12 months, highlighting the potential role of gut microbiota in shaping later immune cell composition.


Assuntos
Fezes , Microbioma Gastrointestinal , Humanos , Lactente , Microbioma Gastrointestinal/imunologia , Masculino , Feminino , Fezes/microbiologia , Recém-Nascido , Bactérias/imunologia , Bactérias/classificação , Ácidos Graxos Voláteis/metabolismo , Metagenoma , Estudos Prospectivos
3.
BMC Genomics ; 24(1): 295, 2023 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-37259063

RESUMO

BACKGROUND: Our knowledge about the ecological role of bacterial antimicrobial peptides (bacteriocins) in the human gut is limited, particularly in relation to their role in the diversification of the gut microbiota during early life. The aim of this paper was therefore to address associations between bacteriocins and bacterial diversity in the human gut microbiota. To investigate this, we did an extensive screening of 2564 healthy human gut metagenomes for the presence of predicted bacteriocin-encoding genes, comparing bacteriocin gene presence to strain diversity and age. RESULTS: We found that the abundance of bacteriocin genes was significantly higher in infant-like metagenomes (< 2 years) compared to adult-like metagenomes (2-107 years). By comparing infant-like metagenomes with and without a given bacteriocin, we found that bacteriocin presence was associated with increased strain diversities. CONCLUSIONS: Our findings indicate that bacteriocins may play a role in the strain diversification during the infant gut microbiota establishment.


Assuntos
Microbioma Gastrointestinal , Metagenoma , Humanos , Pré-Escolar , Criança , Adolescente , Adulto Jovem , Adulto , Pessoa de Meia-Idade , Idoso , Idoso de 80 Anos ou mais , Mineração de Dados , Microbioma Gastrointestinal/efeitos dos fármacos , Bacteriocinas/farmacologia , Genoma
4.
Appl Environ Microbiol ; 89(7): e0078923, 2023 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-37338379

RESUMO

Bacteroides and Phocaeicola, members of the family Bacteroidaceae, are among the first microbes to colonize the human infant gut. While it is known that these microbes can be transmitted from mother to child, our understanding of the specific strains that are shared and potentially transmitted is limited. In this study, we aimed to investigate the shared strains of Bacteroides and Phocaeicola in mothers and their infants. We analyzed fecal samples from pregnant woman recruited at 18 weeks of gestation from the PreventADALL study, as well as offspring samples from early infancy, including skin swab samples taken within 10 min after birth, the first available fecal sample (meconium), and fecal samples at 3 months of age. We screened 464 meconium samples for Bacteroidaceae, with subsequent selection of 144 mother-child pairs for longitudinal analysis, based on the presence of Bacteroidaceae, longitudinal sample availability, and delivery mode. Our results showed that Bacteroidaceae members were mainly detected in samples from vaginally delivered infants. We identified high prevalences of Phocaeicola vulgatus, Phocaeicola dorei, Bacteroides caccae, and Bacteroides thetaiotaomicron in mothers and vaginally born infants. However, at the strain level, we observed high prevalences of only two strains: a B. caccae strain and a P. vulgatus strain. Notably, the B. caccae strain was identified as a novel component of mother-child shared strains, and its high prevalence was also observed in publicly available metagenomes worldwide. Our findings suggest that mode of delivery may play a role in shaping the early colonization of the infant gut microbiota, in particular the colonization of Bacteroidaceae members. IMPORTANCE Our study provides evidence that Bacteroidaceae strains present on infants' skin within 10 min after birth, in meconium samples, and in fecal samples at 3 months of age in vaginally delivered infants are shared with their mothers. Using strain resolution analyses, we identified two strains, belonging to Bacteroides caccae and Phocaeicola vulgatus, as shared between mothers and their infants. Interestingly, the B. caccae strain showed a high prevalence worldwide, while the P. vulgatus strain was less common. Our findings also showed that vaginal delivery was associated with early colonization of Bacteroidaceae members, whereas cesarean section delivery was associated with delayed colonization. Given the potential for these microbes to influence the colonic environment, our results suggest that understanding the bacterial-host relationship at the strain level may have implications for infant health and development later in life.


Assuntos
Bacteroidaceae , Cesárea , Lactente , Humanos , Feminino , Gravidez , Transmissão Vertical de Doenças Infecciosas , Bacteroides/genética , Fezes , Relações Mãe-Filho
5.
Appl Environ Microbiol ; 87(6)2021 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-33452029

RESUMO

The nutritional drivers for mother-child sharing of bacteria and the corresponding longitudinal trajectory of the infant gut microbiota development are not yet completely settled. We therefore aimed to characterize the mother-child sharing and the inferred nutritional utilization potential for the gut microbiota from a large unselected cohort. We analyzed in depth gut microbiota in 100 mother-child pairs enrolled antenatally from the general population-based Preventing Atopic Dermatitis and Allergies in Children (PreventADALL) cohort. Fecal samples collected at gestational week 18 for mothers and at birth (meconium), 3, 6, and 12 months for infants were analyzed by reduced metagenome sequencing to determine metagenome size and taxonomic composition. The nutrient utilization potential was determined based on the Virtual Metabolic Human (VMH, www.vmh.life) database. The estimated median metagenome size was ∼150 million base pairs (bp) for mothers and ∼20 million bp at birth for the children. Longitudinal analyses revealed mother-child sharing (P < 0.05, chi-square test) from birth up to 6 months for 3 prevalent Bacteroides species (prevalence, >25% for all age groups). In a multivariate analysis of variance (ANOVA), the mother-child-shared Bacteroides were associated with vaginal delivery (1.7% explained variance, P = 0.0001). Both vaginal delivery and mother-child sharing were associated with host-derived mucins as nutrient sources. The age-related increase in metagenome size corresponded to an increased diversity in nutrient utilization, with dietary polysaccharides as the main age-related factor. Our results support host-derived mucins as potential selection means for mother-child sharing of initial colonizers, while the age-related increase in diversity was associated with dietary polysaccharides.IMPORTANCE The initial bacterial colonization of human infants is crucial for lifelong health. Understanding the factors driving this colonization will therefore be of great importance. Here, we used a novel high-taxonomic-resolution approach to deduce the nutrient utilization potential of the infant gut microbiota in a large longitudinal mother-child cohort. We found mucins as potential selection means for the initial colonization of mother-child-shared bacteria, while the transition to a more adult-like microbiota was associated with dietary polysaccharide utilization potential. This knowledge will be important for a future understanding of the importance of diet in shaping the gut microbiota composition and development during infancy.


Assuntos
Fezes/microbiologia , Microbioma Gastrointestinal , Relações Mãe-Filho , Mucinas , Bactérias , Parto Obstétrico , Feminino , Humanos , Lactente , Recém-Nascido , Metagenoma , Mães , Nutrientes
6.
Appl Environ Microbiol ; 84(2)2018 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-29101198

RESUMO

Gut microbiota associations through habitat transitions are fundamentally important yet poorly understood. One such habitat transition is the migration from freshwater to saltwater for anadromous fish, such as salmon. The aim of the current work was therefore to determine the freshwater-to-saltwater transition impact on the gut microbiota in farmed Atlantic salmon, with dietary interventions resembling freshwater and saltwater diets with respect to fatty acid composition. Using deep 16S rRNA gene sequencing and quantitative PCR, we found that the freshwater-to-saltwater transition had a major association with the microbiota composition and quantity, while diet did not show significant associations with the microbiota. In saltwater there was a 100-fold increase in bacterial quantity, with a relative increase of Firmicutes and a relative decrease of both Actinobacteria and Proteobacteria Irrespective of an overall shift in microbiota composition from freshwater to saltwater, we identified three core clostridia and one Lactobacillus-affiliated phylotype with wide geographic distribution that were highly prevalent and co-occurring. Taken together, our results support the importance of the dominating bacteria in the salmon gut, with the freshwater microbiota being immature. Due to the low number of potentially host-associated bacterial species in the salmon gut, we believe that farmed salmon can represent an important model for future understanding of host-bacterium interactions in aquatic environments.IMPORTANCE Little is known about factors affecting the interindividual distribution of gut bacteria in aquatic environments. We have shown that there is a core of four highly prevalent and co-occurring bacteria irrespective of feed and freshwater-to-saltwater transition. The potential host interactions of the core bacteria, however, need to be elucidated further.


Assuntos
Bactérias/isolamento & purificação , Fenômenos Fisiológicos Bacterianos , Microbioma Gastrointestinal/fisiologia , Interações entre Hospedeiro e Microrganismos , Salmo salar/microbiologia , Actinobacteria/genética , Actinobacteria/isolamento & purificação , Ração Animal , Animais , Aquicultura , Bactérias/classificação , Bactérias/genética , Firmicutes/genética , Firmicutes/isolamento & purificação , Água Doce , Microbioma Gastrointestinal/genética , Lactobacillus/genética , Lactobacillus/isolamento & purificação , Filogenia , Reação em Cadeia da Polimerase , Proteobactérias/genética , Proteobactérias/isolamento & purificação , RNA Ribossômico 16S/genética , Salmo salar/anatomia & histologia , Salmo salar/fisiologia , Água do Mar
7.
BMC Bioinformatics ; 18(1): 172, 2017 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-28302051

RESUMO

BACKGROUND: Taxonomic classification based on the 16S rRNA gene sequence is important for the profiling of microbial communities. In addition to giving the best possible accuracy, it is also important to quantify uncertainties in the classifications. RESULTS: We present an R package with tools for making such classifications, where the heavy computations are implemented in C++ but operated through the standard R interface. The user may train classifiers based on specialized data sets, but we also supply a ready-to-use function trained on a comprehensive training data set designed specifically for this purpose. This tool also includes some novel ways to quantify uncertainties in the classifications. CONCLUSIONS: Based on input sequences of varying length and quality, we demonstrate how the output from the classifications can be used to obtain high quality taxonomic assignments from 16S sequences within the R computing environment. The package is publicly available at the Comprehensive R Archive Network.


Assuntos
RNA Ribossômico 16S/classificação , Software , Área Sob a Curva , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala , RNA Ribossômico 16S/genética , RNA Ribossômico 16S/metabolismo , Curva ROC , Análise de Sequência de RNA
8.
BMC Genomics ; 18(1): 151, 2017 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-28187704

RESUMO

BACKGROUND: The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. RESULTS: We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. CONCLUSION: The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.


Assuntos
Evolução Molecular , Genoma Microbiano/genética , Nucleotídeos/química , Seleção Genética , Composição de Bases , Sequência Rica em GC , Nucleotídeos/genética
9.
Microb Ecol Health Dis ; 28(1): 1352433, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28959179

RESUMO

Background: Colorectal cancer (CRC) is one of the most common cancer types worldwide. The role of the intestinal microbiota in CRC, however, is not well established. In particular, the co-variation between age, tumor progression and microbiota remains largely unknown. Objective and design: We therefore used a recently developed A/J Min/+ mouse model resembling human CRC to investigate how microbial composition in cecum correlates with tumor progression, butyrate and age. Results: We found that the association between the gut microbiota and tumor load was stronger, by far, than the association with both butyrate and age. The strongest direct tumor association was found for mucosal bacteria, with nearly 60% of the significantly correlating operational taxonomic units being correlated with CRC tumor load alone. Conclusion: We favor a systemic association between tumor load and microbiota, since the correlations are associated with tumor load in gut segments other than the cecum (both small and large intestine).

10.
Bioinformatics ; 31(11): 1708-15, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25644268

RESUMO

MOTIVATION: The explosion of whole-genome sequencing (WGS) as a tool in the mapping and understanding of genomes has been accompanied by an equally massive report of tools and pipelines for the analysis of DNA copy number variation (CNV). Most currently available tools are designed specifically for human genomes, with comparatively little literature devoted to CNVs in prokaryotic organisms. However, there are several idiosyncrasies in prokaryotic WGS data. This work proposes a step-by-step approach for detection and quantification of copy number variants specifically aimed at prokaryotes. RESULTS: After aligning WGS reads to a reference genome, we count the individual reads in a sliding window and normalize these counts for bias introduced by differences in GC content. We then investigate the coverage in two fundamentally different ways: (i) Employing a Hidden Markov Model and (ii) by repeated sampling with replacement (bootstrapping) on each individual gene. The latter bypasses the complex problem of breakpoint determination. To demonstrate our method, we apply it to real and simulated WGS data and benchmark it against two popular methods for CNV detection. The proposed methodology will in some cases represent a significant jump in accuracy from other current methods. AVAILABILITY AND IMPLEMENTATION: CNOGpro is written entirely in the R programming language and is available from the CRAN repository (http://cran.r-project.org) under the GNU General Public License.


Assuntos
Variações do Número de Cópias de DNA , Genoma Bacteriano , Software , Composição de Bases , Genoma Arqueal , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Análise de Sequência de DNA
11.
BMC Bioinformatics ; 16: 79, 2015 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-25888166

RESUMO

BACKGROUND: A pan-genome is defined as the set of all unique gene families found in one or more strains of a prokaryotic species. Due to the extensive within-species diversity in the microbial world, the pan-genome is often many times larger than a single genome. Studies of pan-genomes have become popular due to the easy access to whole-genome sequence data for prokaryotes. A pan-genome study reveals species diversity and gene families that may be of special interest, e.g because of their role in bacterial survival or their ability to discriminate strains. RESULTS: We present an R package for the study of prokaryotic pan-genomes. The R computing environment harbors endless possibilities with respect to statistical analyses and graphics. External free software is used for the heavy computations involved, and the R package provides functions for building a computational pipeline. CONCLUSIONS: We demonstrate parts of the package on a data set for the gram positive bacterium Enterococcus faecalis. The package is free to download and install from The Comprehensive R Archive Network.


Assuntos
Algoritmos , Biologia Computacional/métodos , Enterococcus/genética , Genoma Bacteriano , Genômica/métodos , Software , Enterococcus/classificação
12.
BMC Bioinformatics ; 16: 205, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-26130333

RESUMO

BACKGROUND: The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. RESULTS: The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. CONCLUSIONS: We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.


Assuntos
Algoritmos , Bactérias/classificação , Bactérias/genética , Biodiversidade , Classificação/métodos , RNA Ribossômico 16S/genética , Teorema de Bayes
13.
Infect Immun ; 83(5): 2156-67, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25776747

RESUMO

In the present study, the commensal and pathogenic host-microbe interaction of Enterococcus faecalis was explored using a Caenorhabditis elegans model system. The virulence of 28 E. faecalis isolates representing 24 multilocus sequence types (MLSTs), including human commensal and clinical isolates as well as isolates from animals and of insect origin, was investigated using C. elegans strain glp-4 (bn2ts); sek-1 (km4). This revealed that 6 E. faecalis isolates behaved in a commensal manner with no nematocidal effect, while the remaining strains showed a time to 50% lethality ranging from 47 to 120 h. Principal component analysis showed that the difference in nematocidal activity explained 94% of the variance in the data. Assessment of known virulence traits revealed that gelatinase and cytolysin production accounted for 40.8% and 36.5% of the observed pathogenicity, respectively. However, coproduction of gelatinase and cytolysin did not increase virulence additively, accounting for 50.6% of the pathogenicity and therefore indicating a significant (26.7%) saturation effect. We employed a comparative genomic analysis approach using the 28 isolates comprising a collection of 82,356 annotated coding sequences (CDS) to identify 2,325 patterns of presence or absence among the investigated strains. Univariate statistical analysis of variance (ANOVA) established that individual patterns positively correlated (n = 61) with virulence. The patterns were investigated to identify potential new virulence traits, among which we found five patterns consisting of the phage03-like gene clusters. Strains harboring phage03 showed, on average, 17% higher killing of C. elegans (P = 4.4e(-6)). The phage03 gene cluster was also present in gelatinase-and-cytolysin-negative strain E. faecalis JH2-2. Deletion of this phage element from the JH2-2 clinical strain rendered the mutant apathogenic in C. elegans, and a similar mutant of the nosocomial V583 isolate showed significantly attenuated virulence. Bioinformatics investigation indicated that, unlike other E. faecalis virulence traits, phage03-like elements were found at a higher frequency among nosocomial isolates. In conclusion, our report provides a valuable virulence map that explains enhancement in E. faecalis virulence and contributes to a deeper comprehension of the genetic mechanism leading to the transition from commensalism to a pathogenic lifestyle.


Assuntos
Bacteriófagos/genética , Caenorhabditis elegans/microbiologia , Caenorhabditis elegans/fisiologia , Enterococcus faecalis/crescimento & desenvolvimento , Enterococcus faecalis/genética , Prófagos/genética , Fatores de Virulência/genética , Adulto , Animais , Modelos Animais de Doenças , Enterococcus faecalis/isolamento & purificação , Enterococcus faecalis/virologia , Genoma Bacteriano , Infecções por Bactérias Gram-Positivas/microbiologia , Humanos , Lactente , Insetos/microbiologia , Tipagem de Sequências Multilocus , Análise de Sobrevida , Simbiose , Virulência
14.
BMC Genomics ; 15: 882, 2014 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-25297974

RESUMO

BACKGROUND: There are several studies describing loss of genes through reductive evolution in microbes, but how selective forces are associated with genome expansion due to horizontal gene transfer (HGT) has not received similar attention. The aim of this study was therefore to examine how selective pressures influence genome expansion in 53 fully sequenced and assembled Escherichia coli strains. We also explored potential connections between genome expansion and the attainment of virulence factors. This was performed using estimations of several genomic parameters such as AT content, genomic drift (measured using relative entropy), genome size and estimated HGT size, which were subsequently compared to analogous parameters computed from the core genome consisting of 1729 genes common to the 53 E. coli strains. Moreover, we analyzed how selective pressures (quantified using relative entropy and dN/dS), acting on the E. coli core genome, influenced lineage and phylogroup formation. RESULTS: Hierarchical clustering of dS and dN estimations from the E. coli core genome resulted in phylogenetic trees with topologies in agreement with known E. coli taxonomy and phylogroups. High values of dS, compared to dN, indicate that the E. coli core genome has been subjected to substantial purifying selection over time; significantly more than the non-core part of the genome (p<0.001). This is further supported by a linear association between strain-wise dS and dN values (ß = 26.94 ± 0.44, R2~0.98, p<0.001). The non-core part of the genome was also significantly more AT-rich (p<0.001) than the core genome and E. coli genome size correlated with estimated HGT size (p<0.001). In addition, genome size (p<0.001), AT content (p<0.001) as well as estimated HGT size (p<0.005) were all associated with the presence of virulence factors, suggesting that pathogenicity traits in E. coli are largely attained through HGT. No associations were found between selective pressures operating on the E. coli core genome, as estimated using relative entropy, and genome size (p~0.98). CONCLUSIONS: On a larger time frame, genome expansion in E. coli, which is significantly associated with the acquisition of virulence factors, appears to be independent of selective forces operating on the core genome.


Assuntos
Escherichia coli , Genoma Bacteriano , Filogenia , Fatores de Virulência/genética , Composição de Bases , Análise por Conglomerados , Entropia , Escherichia coli/classificação , Escherichia coli/genética , Escherichia coli/patogenicidade , Proteínas de Escherichia coli/genética , Transferência Genética Horizontal , Pirofosfatases/genética
15.
Biom J ; 56(6): 1055-75, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25243581

RESUMO

Gene set analysis methods are popular tools for identifying differentially expressed gene sets in microarray data. Most existing methods use a permutation test to assess significance for each gene set. The permutation test's assumption of exchangeable samples is often not satisfied for time-series data and complex experimental designs, and in addition it requires a certain number of samples to compute p-values accurately. The method presented here uses a rotation test rather than a permutation test to assess significance. The rotation test can compute accurate p-values also for very small sample sizes. The method can handle complex designs and is particularly suited for longitudinal microarray data where the samples may have complex correlation structures. Dependencies between genes, modeled with the use of gene networks, are incorporated in the estimation of correlations between samples. In addition, the method can test for both gene sets that are differentially expressed and gene sets that show strong time trends. We show on simulated longitudinal data that the ability to identify important gene sets may be improved by taking the correlation structure between samples into account. Applied to real data, the method identifies both gene sets with constant expression and gene sets with strong time trends.


Assuntos
Biometria/métodos , Perfilação da Expressão Gênica , Análise de Variância , Enterococcus faecalis/genética , Enterococcus faecalis/fisiologia , Redes Reguladoras de Genes , Modelos Lineares , Estudos Longitudinais , Estresse Fisiológico/genética
16.
ISME Commun ; 4(1): ycae071, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38873028

RESUMO

The performance of sequence variant resolution analytic tools for metabarcoding has not yet been adequately benchmarked for high-diversity environmental samples. We therefore evaluated the sequence variant tools DADA2, Deblur, Swarm, and UNOISE, using high-diversity seafloor samples, resulting in comparisons of 1800 sequence variant tables. The evaluation was based on 30 sediment grab samples, for which 3 replica samples were collected. Each replica sample was extracted using 5 common DNA extraction kits, resulting in 450 DNA extracts which were 16S rRNA gene sequenced (V3-V4), using Illumina. Assessments included variation across replica samples, extraction kits, and denoising methods, in addition to applying prior knowledge about alpha diversity correlations toward the cosmopolitan marine archaeon Nitrosopumilus with high diversity and the sulfide oxidizing Sulfurovum with low diversity. DADA2 displayed the highest variance between replicates (Manhattan distance 1.14), while Swarm showed the lowest variance (Manhattan distance 0.93). For the analysis based on prior biological knowledge, UNOISE displayed the highest alpha diversity (Simpson's D) correlation toward Nitrosopumilus (Spearman rho = 0.85), while DADA2 showed the lowest (Spearman rho = 0.10). Deblur completely eliminated Nitrosopumilus from the dataset. For Sulfurovum, on the other hand, all the methods showed comparable results. In conclusion, our evaluations show that Swarm and UNOISE performed better than DADA2 and Deblur for high-diversity seafloor samples.

17.
Biotechniques ; 74(1): 9-21, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36601888

RESUMO

Butyrate and propionate represent two of three main short-chain fatty acids produced by the intestinal microbiota. In healthy populations, their levels are reportedly equimolar, whereas a deviation in their ratio has been observed in various diseased cohorts. Monitoring such a ratio represents a valuable metric; however, it remains a challenge to adopt short-chain fatty acid detection techniques in clinical settings because of the volatile nature of these acids. Here we aimed to estimate short-chain fatty acid information indirectly through a novel, simple quantitative PCR-compatible assay (liquid array diagnostics) targeting a limited number of microbiome 16S markers. Utilizing 15 liquid array diagnostics probes to target microbiome markers selected by a model that combines partial least squares and linear discriminant analysis, the classes (normal vs high propionate-to-butyrate ratio) separated at a threshold of 2.6 with a prediction accuracy of 96%.


Assuntos
Butiratos , Microbiota , Propionatos , RNA Ribossômico 16S/genética , Ácidos Graxos Voláteis/análise , Bactérias/genética
18.
BMC Bioinformatics ; 13: 327, 2012 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-23216988

RESUMO

BACKGROUND: Multivariate approaches have been successfully applied to genome wide association studies. Recently, a Partial Least Squares (PLS) based approach was introduced for mapping yeast genotype-phenotype relations, where background information such as gene function classification, gene dispensability, recent or ancient gene copy number variations and the presence of premature stop codons or frameshift mutations in reading frames, were used post hoc to explain selected genes. One of the latest advancement in PLS named L-Partial Least Squares (L-PLS), where 'L' presents the used data structure, enables the use of background information at the modeling level. Here, a modification of L-PLS with variable importance on projection (VIP) was implemented using a stepwise regularized procedure for gene and background information selection. Results were compared to PLS-based procedures, where no background information was used. RESULTS: Applying the proposed methodology to yeast Saccharomyces cerevisiae data, we found the relationship between genotype-phenotype to have improved understandability. Phenotypic variations were explained by the variations of relatively stable genes and stable background variations. The suggested procedure provides an automatic way for genotype-phenotype mapping. The selected phenotype influencing genes were evolving 29% faster than non-influential genes, and the current results are supported by a recently conducted study. Further power analysis on simulated data verified that the proposed methodology selects relevant variables. CONCLUSIONS: A modification of L-PLS with VIP in a stepwise regularized elimination procedure can improve the understandability and stability of selected genes and background information. The approach is recommended for genome wide association studies where background information is available.


Assuntos
Estudos de Associação Genética/métodos , Genótipo , Fenótipo , Saccharomyces cerevisiae/genética , Análise dos Mínimos Quadrados
19.
BMC Bioinformatics ; 13: 97, 2012 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-22583558

RESUMO

BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies.


Assuntos
Algoritmos , Bactérias/genética , Fases de Leitura Aberta , Archaea/genética , Códon , Homologia de Genes , Genômica , Cadeias de Markov , Modelos Genéticos , Análise Multivariada , Análise de Sequência de DNA , Análise de Sequência de Proteína
20.
BMC Genomics ; 13: 66, 2012 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-22325062

RESUMO

BACKGROUND: We sought to assess whether the concept of relative entropy (information capacity), could aid our understanding of the process of horizontal gene transfer in microbes. We analyzed the differences in information capacity between prokaryotic chromosomes, genomic islands (GI), phages, and plasmids. Relative entropy was estimated using the Kullback-Leibler measure. RESULTS: Relative entropy was highest in bacterial chromosomes and had the sequence chromosomes > GI > phage > plasmid. There was an association between relative entropy and AT content in chromosomes, phages, plasmids and GIs with the strongest association being in phages. Relative entropy was also found to be lower in the obligate intracellular Mycobacterium leprae than in the related M. tuberculosis when measured on a shared set of highly conserved genes. CONCLUSIONS: We argue that relative entropy differences reflect how plasmids, phages and GIs interact with microbial host chromosomes and that all these biological entities are, or have been, subjected to different selective pressures. The rate at which amelioration of horizontally acquired DNA occurs within the chromosome is likely to account for the small differences between chromosomes and stably incorporated GIs compared to the transient or independent replicons such as phages and plasmids.


Assuntos
Bacteriófagos/genética , Cromossomos Bacterianos/genética , Ilhas Genômicas , Plasmídeos/genética , DNA Bacteriano/química , Entropia , Transferência Genética Horizontal , Mycobacterium leprae/genética , Mycobacterium tuberculosis/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA