Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
PLoS One ; 18(4): e0284443, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37058511

RESUMO

Data simulation is fundamental for machine learning and causal inference, as it allows exploration of scenarios and assessment of methods in settings with full control of ground truth. Directed acyclic graphs (DAGs) are well established for encoding the dependence structure over a collection of variables in both inference and simulation settings. However, while modern machine learning is applied to data of an increasingly complex nature, DAG-based simulation frameworks are still confined to settings with relatively simple variable types and functional forms. We here present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations. A succinct YAML format for defining the simulation model structure promotes transparency, while separate user-provided functions for generating each variable based on its parents ensure simulation code modularization. We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences. DagSim is available as a Python package at PyPI. Source code and documentation are available at: https://github.com/uio-bmi/dagsim.


Assuntos
Software , Simulação por Computador
2.
Lancet Microbe ; 3(11): e881-e887, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36152674

RESUMO

Measurement and manipulation of the microbiome is generally considered to have great potential for understanding the causes of complex diseases in humans, developing new therapies, and finding preventive measures. Many studies have found significant associations between the microbiome and various diseases; however, Koch's classical postulates remind us about the importance of causative reasoning when considering the relationship between microbes and a disease manifestation. Although causal discovery in observational microbiome data faces many challenges, methodological advances in causal structure learning have improved the potential of data-driven prediction of causal effects in large-scale biological systems. In this Personal View, we show the capability of existing methods for inferring causal effects from metagenomic data, and we highlight ways in which the introduction of causal structures that are more flexible than existing structures offers new opportunities for causal reasoning. Our observations suggest that microbiome research can further benefit from tools developed in the past 5 years in causal discovery and learn from their applications elsewhere.


Assuntos
Microbiota , Humanos , Metagenômica/métodos , Causalidade , Metagenoma
3.
Mol Biol Evol ; 39(1)2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34662416

RESUMO

The soil bacterium Burkholderia pseudomallei is the causative agent of melioidosis and a significant cause of human morbidity and mortality in many tropical and subtropical countries. The species notoriously survives harsh environmental conditions but the genetic architecture for these adaptations remains unclear. Here we employed a powerful combination of genome-wide epistasis and co-selection studies (2,011 genomes), condition-wide transcriptome analyses (82 diverse conditions), and a gene knockout assay to uncover signals of "co-selection"-that is a combination of genetic markers that have been repeatedly selected together through B. pseudomallei evolution. These enabled us to identify 13,061 mutation pairs under co-selection in distinct genes and noncoding RNA. Genes under co-selection displayed marked expression correlation when B. pseudomallei was subjected to physical stress conditions, highlighting the conditions as one of the major evolutionary driving forces for this bacterium. We identified a putative adhesin (BPSL1661) as a hub of co-selection signals, experimentally confirmed a BPSL1661 role under nutrient deprivation, and explored the functional basis of co-selection gene network surrounding BPSL1661 in facilitating the bacterial survival under nutrient depletion. Our findings suggest that nutrient-limited conditions have been the common selection pressure acting on this species, and allelic variation of BPSL1661 may have promoted B. pseudomallei survival during harsh environmental conditions by facilitating bacterial adherence to different surfaces, cells, or living hosts.


Assuntos
Evolução Biológica , Burkholderia pseudomallei , Adesinas Bacterianas , Alelos , Burkholderia pseudomallei/genética , Burkholderia pseudomallei/fisiologia , Seleção Genética , Estresse Fisiológico
5.
Nat Commun ; 12(1): 765, 2021 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-33536414

RESUMO

Chickens are the most common birds on Earth and colibacillosis is among the most common diseases affecting them. This major threat to animal welfare and safe sustainable food production is difficult to combat because the etiological agent, avian pathogenic Escherichia coli (APEC), emerges from ubiquitous commensal gut bacteria, with no single virulence gene present in all disease-causing isolates. Here, we address the underlying evolutionary mechanisms of extraintestinal spread and systemic infection in poultry. Combining population scale comparative genomics and pangenome-wide association studies, we compare E. coli from commensal carriage and systemic infections. We identify phylogroup-specific and species-wide genetic elements that are enriched in APEC, including pathogenicity-associated variation in 143 genes that have diverse functions, including genes involved in metabolism, lipopolysaccharide synthesis, heat shock response, antimicrobial resistance and toxicity. We find that horizontal gene transfer spreads pathogenicity elements, allowing divergent clones to cause infection. Finally, a Random Forest model prediction of disease status (carriage vs. disease) identifies pathogenic strains in the emergent ST-117 poultry-associated lineage with 73% accuracy, demonstrating the potential for early identification of emergent APEC in healthy flocks.


Assuntos
Infecções por Escherichia coli/prevenção & controle , Escherichia coli/genética , Evolução Molecular , Genoma Bacteriano/genética , Doenças das Aves Domésticas/prevenção & controle , Animais , Galinhas , Escherichia coli/classificação , Escherichia coli/patogenicidade , Infecções por Escherichia coli/diagnóstico , Infecções por Escherichia coli/microbiologia , Genes Bacterianos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Filogenia , Doenças das Aves Domésticas/diagnóstico , Doenças das Aves Domésticas/microbiologia , Virulência/genética
6.
Nat Mach Intell ; 3(11): 936-944, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37396030

RESUMO

Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

7.
Microb Genom ; 6(12)2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33253085

RESUMO

Enterococcus faecium is a gut commensal of the gastro-digestive tract, but also known as nosocomial pathogen among hospitalized patients. Population genetics based on whole-genome sequencing has revealed that E. faecium strains from hospitalized patients form a distinct clade, designated clade A1, and that plasmids are major contributors to the emergence of nosocomial E. faecium. Here we further explored the adaptive evolution of E. faecium using a genome-wide co-evolution study (GWES) to identify co-evolving single-nucleotide polymorphisms (SNPs). We identified three genomic regions harbouring large numbers of SNPs in tight linkage that are not proximal to each other based on the completely assembled chromosome of the clade A1 reference hospital isolate AUS0004. Close examination of these regions revealed that they are located at the borders of four different types of large-scale genomic rearrangements, insertion sites of two different genomic islands and an IS30-like transposon. In non-clade A1 isolates, these regions are adjacent to each other and they lack the insertions of the genomic islands and IS30-like transposon. Additionally, among the clade A1 isolates there is one group of pet isolates lacking the genomic rearrangement and insertion of the genomic islands, suggesting a distinct evolutionary trajectory. In silico analysis of the biological functions of the genes encoded in three regions revealed a common link to a stress response. This suggests that these rearrangements may reflect adaptation to the stringent conditions in the hospital environment, such as antibiotics and detergents, to which bacteria are exposed. In conclusion, to our knowledge, this is the first study using GWES to identify genomic rearrangements, suggesting that there is considerable untapped potential to unravel hidden evolutionary signals from population genomic data.


Assuntos
Enterococcus faecium/classificação , Infecções por Bactérias Gram-Positivas/microbiologia , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma/métodos , Infecção Hospitalar/microbiologia , Elementos de DNA Transponíveis , Enterococcus faecium/genética , Evolução Molecular , Ilhas Genômicas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Filogenia , Plasmídeos/genética
8.
mBio ; 11(1)2020 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-32071274

RESUMO

A fundamental goal of contemporary biomedical research is to understand the molecular basis of disease pathogenesis and exploit this information to develop targeted and more-effective therapies. Necrotizing myositis caused by the bacterial pathogen Streptococcus pyogenes is a devastating human infection with a high mortality rate and few successful therapeutic options. We used dual transcriptome sequencing (RNA-seq) to analyze the transcriptomes of S. pyogenes and host skeletal muscle recovered contemporaneously from infected nonhuman primates. The in vivo bacterial transcriptome was strikingly remodeled compared to organisms grown in vitro, with significant upregulation of genes contributing to virulence and altered regulation of metabolic genes. The transcriptome of muscle tissue from infected nonhuman primates (NHPs) differed significantly from that of mock-infected animals, due in part to substantial changes in genes contributing to inflammation and host defense processes. We discovered significant positive correlations between group A streptococcus (GAS) virulence factor transcripts and genes involved in the host immune response and inflammation. We also discovered significant correlations between the magnitude of bacterial virulence gene expression in vivo and pathogen fitness, as assessed by previously conducted genome-wide transposon-directed insertion site sequencing (TraDIS). By integrating the bacterial RNA-seq data with the fitness data generated by TraDIS, we discovered five new pathogen genes, namely, S. pyogenes 0281 (Spy0281 [dahA]), ihk-irr, slr, isp, and ciaH, that contribute to necrotizing myositis and confirmed these findings using isogenic deletion-mutant strains. Taken together, our study results provide rich new information about the molecular events occurring in severe invasive infection of primate skeletal muscle that has extensive translational research implications.IMPORTANCE Necrotizing myositis caused by Streptococcus pyogenes has high morbidity and mortality rates and relatively few successful therapeutic options. In addition, there is no licensed human S. pyogenes vaccine. To gain enhanced understanding of the molecular basis of this infection, we employed a multidimensional analysis strategy that included dual RNA-seq and other data derived from experimental infection of nonhuman primates. The data were used to target five streptococcal genes for pathogenesis research, resulting in the unambiguous demonstration that these genes contribute to pathogen-host molecular interactions in necrotizing infections. We exploited fitness data derived from a recently conducted genome-wide transposon mutagenesis study to discover significant correlation between the magnitude of bacterial virulence gene expression in vivo and pathogen fitness. Collectively, our findings have significant implications for translational research, potentially including vaccine efforts.


Assuntos
Fasciite Necrosante/microbiologia , Miosite/microbiologia , Infecções Estreptocócicas/microbiologia , Streptococcus pyogenes/genética , Streptococcus pyogenes/metabolismo , Transcriptoma , Fatores de Virulência/genética , Animais , Proteínas de Bactérias/metabolismo , Regulação Bacteriana da Expressão Gênica , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/fisiologia , Músculo Esquelético/microbiologia , Músculo Esquelético/patologia , Miosite/genética , Miosite/metabolismo , Primatas , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , Streptococcus pyogenes/patogenicidade , Virulência/genética , Fatores de Virulência/metabolismo
9.
Nucleic Acids Res ; 47(18): e112, 2019 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-31361894

RESUMO

Covariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data.


Assuntos
Biologia Computacional/métodos , Epistasia Genética , Genoma Bacteriano/genética , Genômica , Resistência Microbiana a Medicamentos/genética , Humanos , Metagenômica/métodos , Neisseria meningitidis/genética , Neisseria meningitidis/patogenicidade , Streptococcus pneumoniae/genética , Virulência/genética
10.
Nat Genet ; 51(3): 548-559, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30778225

RESUMO

Streptococcus pyogenes causes 700 million human infections annually worldwide, yet, despite a century of intensive effort, there is no licensed vaccine against this bacterium. Although a number of large-scale genomic studies of bacterial pathogens have been published, the relationships among the genome, transcriptome, and virulence in large bacterial populations remain poorly understood. We sequenced the genomes of 2,101 emm28 S. pyogenes invasive strains, from which we selected 492 phylogenetically diverse strains for transcriptome analysis and 50 strains for virulence assessment. Data integration provided a novel understanding of the virulence mechanisms of this model organism. Genome-wide association study, expression quantitative trait loci analysis, machine learning, and isogenic mutant strains identified and confirmed a one-nucleotide indel in an intergenic region that significantly alters global transcript profiles and ultimately virulence. The integrative strategy that we used is generally applicable to any microbe and may lead to new therapeutics for many human pathogens.


Assuntos
Genoma Bacteriano/genética , Streptococcus pyogenes/genética , Transcriptoma/genética , Virulência/genética , Regulação Bacteriana da Expressão Gênica/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Filogenia , Locos de Características Quantitativas/genética
11.
Sex Abuse ; 31(4): 374-396, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28933247

RESUMO

In assessments of child sexual abuse (CSA) allegations, informative background information is often overlooked or not used properly. We therefore created and tested an instrument that uses accessible background information to calculate the probability of a child being a CSA victim that can be used as a starting point in the following investigation. Studying 903 demographic and socioeconomic variables from over 11,000 Finnish children, we identified 42 features related to CSA. Using Bayesian logic to calculate the probability of abuse, our instrument-the Finnish Investigative Instrument of Child Sexual Abuse (FICSA)-has two separate profiles for boys and girls. A cross-validation procedure suggested excellent diagnostic utility (area under the curve [AUC] = 0.97 for boys and AUC = 0.88 for girls). We conclude that the presented method can be useful in forensic assessments of CSA allegations by adding a reliable statistical approach to considering background information, and to support clinical decision making and guide investigative efforts.


Assuntos
Abuso Sexual na Infância/diagnóstico , Adolescente , Teorema de Bayes , Criança , Técnicas de Apoio para a Decisão , Finlândia , Humanos
12.
Nat Commun ; 9(1): 5034, 2018 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-30487573

RESUMO

Some of the most common infectious diseases are caused by bacteria that naturally colonise humans asymptomatically. Combating these opportunistic pathogens requires an understanding of the traits that differentiate infecting strains from harmless relatives. Staphylococcus epidermidis is carried asymptomatically on the skin and mucous membranes of virtually all humans but is a major cause of nosocomial infection associated with invasive procedures. Here we address the underlying evolutionary mechanisms of opportunistic pathogenicity by combining pangenome-wide association studies and laboratory microbiology to compare S. epidermidis from bloodstream and wound infections and asymptomatic carriage. We identify 61 genes containing infection-associated genetic elements (k-mers) that correlate with in vitro variation in known pathogenicity traits (biofilm formation, cell toxicity, interleukin-8 production, methicillin resistance). Horizontal gene transfer spreads these elements, allowing divergent clones to cause infection. Finally, Random Forest model prediction of disease status (carriage vs. infection) identifies pathogenicity elements in 415 S. epidermidis isolates with 80% accuracy, demonstrating the potential for identifying risk genotypes pre-operatively.


Assuntos
Dermatopatias/microbiologia , Infecções Estafilocócicas/microbiologia , Staphylococcus epidermidis/genética , Staphylococcus epidermidis/patogenicidade , Genoma Bacteriano/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Interleucina-8/metabolismo
13.
Microb Genom ; 4(6)2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29813016

RESUMO

The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104-105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.


Assuntos
Epistasia Genética , Genoma Bacteriano , Estudos de Associação Genética , Loci Gênicos , Genômica , Humanos , Modelos Genéticos , Filogenia , Polimorfismo de Nucleotídeo Único , Conformação Proteica , Streptococcus pneumoniae/genética
14.
Twin Res Hum Genet ; 16(1): 150-6, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23153722

RESUMO

The Genetics of Sexuality and Aggression (GSA) project was launched at the Abo Akademi University in Turku, Finland in 2005 and has so far undertaken two major population-based data collections involving twins and siblings of twins. To date, it consists of about 14,000 individuals (including 1,147 informative monozygotic twin pairs, 1,042 informative same-sex dizygotic twin pairs, 741 informative opposite-sex dizygotic twin pairs). Participants have been recruited through the Central Population Registry of Finland and were 18-49 years of age at the time of the data collections. Saliva samples for DNA genotyping (n = 4,278) and testosterone analyses (n = 1,168) were collected in 2006. The primary focus of the data collections has been on sexuality (both sexual functioning and sexual behavior) and aggressive behavior. This paper provides an overview of the data collections as well as an outline of the phenotypes and biological data assembled within the project. A detailed overview of publications can be found at the project's Web site: http://www.cebg.fi/.


Assuntos
Agressão/psicologia , Sistema de Registros , Sexualidade/psicologia , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética , Adolescente , Adulto , Estudos de Coortes , Feminino , Finlândia/epidemiologia , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Desenvolvimento Psicossexual , Inquéritos e Questionários , Gêmeos Dizigóticos/psicologia , Gêmeos Monozigóticos/psicologia , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...