RESUMEN
The 3D structure of the genome is an important mediator of gene expression. As phenotypic divergence is largely driven by gene regulatory variation, comparing genome 3D contacts across species can further understanding of the molecular basis of species differences. However, while experimental data on genome 3D contacts in humans is increasingly abundant, only a handful of 3D genome contact maps exist for other species. Here, we demonstrate that human experimental data can be used to close this data gap. We apply a machine learning model that predicts 3D genome contacts from DNA sequence to the genomes from 56 bonobos and chimpanzees and identify species-specific patterns of genome folding. We estimated 3D divergence between individuals from the resulting contact maps in 4,420 1 Mb genomic windows, of which â¼17% were substantially divergent in predicted genome contacts. Bonobos and chimpanzees diverged at 89 windows, overlapping genes associated with multiple traits implicated in Pan phenotypic divergence. We discovered 51 bonobo-specific variants that individually produce the observed bonobo contact pattern in bonobo-chimpanzee divergent windows. Our results demonstrate that machine learning methods can leverage human data to fill in data gaps across species, offering the first look at population-level 3D genome variation in non-human primates. We also identify loci where changes in 3D folding may contribute to phenotypic differences in our closest living relatives.
RESUMEN
Exclusive enteral nutrition (EEN) is a first-line therapy for pediatric Crohn's disease (CD), but protective mechanisms remain unknown. We established a prospective pediatric cohort to characterize the function of fecal microbiota and metabolite changes of treatment-naive CD patients in response to EEN (German Clinical Trials DRKS00013306). Integrated multi-omics analysis identified network clusters from individually variable microbiome profiles, with Lachnospiraceae and medium-chain fatty acids as protective features. Bioorthogonal non-canonical amino acid tagging selectively identified bacterial species in response to medium-chain fatty acids. Metagenomic analysis identified high strain-level dynamics in response to EEN. Functional changes in diet-exposed fecal microbiota were further validated using gut chemostat cultures and microbiota transfer into germ-free Il10-deficient mice. Dietary model conditions induced individual patient-specific strain signatures to prevent or cause inflammatory bowel disease (IBD)-like inflammation in gnotobiotic mice. Hence, we provide evidence that EEN therapy operates through explicit functional changes of temporally and individually variable microbiome profiles.
RESUMEN
Chronic inflammation and tissue fibrosis are common responses that worsen organ function, yet the molecular mechanisms governing their cross-talk are poorly understood. In diseased organs, stress-induced gene expression changes fuel maladaptive cell state transitions1 and pathological interaction between cellular compartments. Although chronic fibroblast activation worsens dysfunction in the lungs, liver, kidneys and heart, and exacerbates many cancers2, the stress-sensing mechanisms initiating transcriptional activation of fibroblasts are poorly understood. Here we show that conditional deletion of the transcriptional co-activator Brd4 in infiltrating Cx3cr1+ macrophages ameliorates heart failure in mice and significantly reduces fibroblast activation. Analysis of single-cell chromatin accessibility and BRD4 occupancy in vivo in Cx3cr1+ cells identified a large enhancer proximal to interleukin-1ß (IL-1ß, encoded by Il1b), and a series of CRISPR-based deletions revealed the precise stress-dependent regulatory element that controls Il1b expression. Secreted IL-1ß activated a fibroblast RELA-dependent (also known as p65) enhancer near the transcription factor MEOX1, resulting in a profibrotic response in human cardiac fibroblasts. In vivo, antibody-mediated IL-1ß neutralization improved cardiac function and tissue fibrosis in heart failure. Systemic IL-1ß inhibition or targeted Il1b deletion in Cx3cr1+ cells prevented stress-induced Meox1 expression and fibroblast activation. The elucidation of BRD4-dependent cross-talk between a specific immune cell subset and fibroblasts through IL-1ß reveals how inflammation drives profibrotic cell states and supports strategies that modulate this process in heart disease and other chronic inflammatory disorders featuring tissue remodelling.
RESUMEN
The genetic diversity of the gut microbiota has a central role in host health. Here, we created pangenomes for 728 human gut prokaryotic species, quadrupling the genes of strain-specific genomes. Each of these species has a core set of a thousand genes, differing even between closely related species, and an accessory set of genes unique to the different strains. Functional analysis shows high strain variability associates with sporulation, whereas low variability is linked with antibiotic resistance. We further map the antibiotic resistome across the human gut population and find 237 cases of extreme resistance even to last-resort antibiotics, with a predominance among Enterobacteriaceae. Lastly, the presence of specific genes in the microbiota relates to host age and sex. Our study underscores the genetic complexity of the human gut microbiota, emphasizing its significant implications for host health. The pangenomes and antibiotic resistance map constitute a valuable resource for further research.
Asunto(s)
Antibacterianos , Bacterias , Microbioma Gastrointestinal , Variación Genética , Humanos , Microbioma Gastrointestinal/genética , Antibacterianos/farmacología , Bacterias/genética , Bacterias/efectos de los fármacos , Bacterias/clasificación , Femenino , Masculino , Genoma Bacteriano , Estrés Fisiológico , Farmacorresistencia Bacteriana/genética , Adulto , Filogenia , Persona de Mediana EdadRESUMEN
Understanding variation in chromatin contact patterns across diverse humans is critical for interpreting non-coding variants and their effects on gene expression and phenotypes. However, experimental determination of chromatin contact patterns across large samples is prohibitively expensive. To overcome this challenge, we develop and validate a machine learning method to quantify the variation in 3D chromatin contacts at 2 kilobase resolution from genome sequence alone. We apply this approach to thousands of human genomes from the 1000 Genomes Project and the inferred hominin ancestral genome. While patterns of 3D contact divergence genome-wide are qualitatively similar to patterns of sequence divergence, we find substantial differences in 3D divergence and sequence divergence in local 1 megabase genomic windows. In particular, we identify 392 windows with significantly greater 3D divergence than expected from sequence. Moreover, for 31\% of genomic windows, a single individual has a rare divergent 3D contact map pattern. Using in silico mutagenesis we find that most single nucleotide sequence changes do not result in changes to 3D chromatin contacts. However, in windows with substantial 3D divergence just one or a few variants can lead to divergent 3D chromatin contacts without the individuals carrying those variants having high sequence divergence. In summary, inferring 3D chromatin contact maps across human populations reveals variable contact patterns. We anticipate that these genetically diverse maps of 3D chromatin contact will provide a reference for future work on the function and evolution of 3D chromatin contact variation across human populations.
RESUMEN
Oxytocin receptor (Oxtr) signaling influences complex social behaviors in diverse species, including social monogamy in prairie voles. How Oxtr regulates specific components of social attachment behaviors and the neural mechanisms mediating them remains unknown. Here, we examine prairie voles lacking Oxtr and demonstrate that pair bonding comprises distinct behavioral modules: the preference for a bonded partner, and the rejection of novel potential mates. Our longitudinal study of social attachment shows that Oxtr sex-specifically influences early interactions between novel partners facilitating the formation of partner preference. Additionally, Oxtr suppresses promiscuity towards novel potential mates following pair bonding, contributing to rejection. Oxtr function regulates coordinated patterns of gene expression in regions implicated in attachment behaviors and regulates the expression of oxytocin in the paraventricular nucleus of the hypothalamus, a principal source of oxytocin. Thus, Oxtr controls genetically separable components of pair bonding behaviors and coordinates development of the neural substrates of attachment.
RESUMEN
Fecal microbial transplantation (FMT) offers promise for treating ulcerative colitis (UC), though the mechanisms underlying treatment failure are unknown. This study harnessed longitudinally collected colonic biopsies (n = 38) and fecal samples (n = 179) from 19 adults with mild-to-moderate UC undergoing serial FMT in which antimicrobial pre-treatment and delivery mode (capsules versus enema) were assessed for clinical response (≥ 3 points decrease from the pre-treatment Mayo score). Colonic biopsies underwent dual RNA-Seq; fecal samples underwent parallel 16S rRNA and shotgun metagenomic sequencing as well as untargeted metabolomic analyses. Pre-FMT, the colonic mucosa of non-responsive (NR) patients harbored an increased burden of bacteria, including Bacteroides, that expressed more antimicrobial resistance genes compared to responsive (R) patients. NR patients also exhibited muted mucosal expression of innate immune antimicrobial response genes. Post-FMT, NR and R fecal microbiomes and metabolomes exhibited significant divergence. NR metabolomes had elevated concentrations of immunostimulatory compounds including sphingomyelins, lysophospholipids and taurine. NR fecal microbiomes were enriched for Bacteroides fragilis and Bacteroides salyersiae strains that encoded genes capable of taurine production. These findings suggest that both effective mucosal microbial clearance and reintroduction of bacteria that reshape luminal metabolism associate with FMT success and that persistent mucosal and fecal colonization by antimicrobial-resistant Bacteroides species may contribute to FMT failure.
Asunto(s)
Bacteroides , Colitis Ulcerosa , Trasplante de Microbiota Fecal , Heces , Mucosa Intestinal , Humanos , Colitis Ulcerosa/microbiología , Colitis Ulcerosa/terapia , Colitis Ulcerosa/metabolismo , Masculino , Femenino , Heces/microbiología , Bacteroides/genética , Adulto , Mucosa Intestinal/microbiología , Mucosa Intestinal/metabolismo , Persona de Mediana Edad , Microbioma Gastrointestinal , Insuficiencia del Tratamiento , ARN Ribosómico 16S/genética , MetabolomaRESUMEN
Recent studies have highlighted the impact of both transcription and transcripts on 3D genome organization, particularly its dynamics. Here, we propose a deep learning framework, called AkitaR, that leverages both genome sequences and genome-wide RNA-DNA interactions to investigate the roles of chromatin-associated RNAs (caRNAs) on genome folding in HFFc6 cells. In order to disentangle the cis- and trans-regulatory roles of caRNAs, we have compared models with nascent transcripts, trans-located caRNAs, open chromatin data, or DNA sequence alone. Both nascent transcripts and trans-located caRNAs improve the models' predictions, especially at cell-type-specific genomic regions. Analyses of feature importance scores reveal the contribution of caRNAs at TAD boundaries, chromatin loops and nuclear sub-structures such as nuclear speckles and nucleoli to the models' predictions. Furthermore, we identify non-coding RNAs (ncRNAs) known to regulate chromatin structures, such as MALAT1 and NEAT1, as well as several new RNAs, RNY5, RPPH1, POLG-DT and THBS1-IT1, that might modulate chromatin architecture through trans-interactions in HFFc6. Our modeling also suggests that transcripts from Alus and other repetitive elements may facilitate chromatin interactions through trans R-loop formation. Our findings provide insights and generate testable hypotheses about the roles of caRNAs in shaping chromatin organization.
Asunto(s)
Cromatina , Aprendizaje Profundo , Cromatina/metabolismo , Cromatina/genética , Humanos , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Línea Celular , ARN/metabolismo , ARN/genética , ADN/metabolismo , ADN/genéticaRESUMEN
The evolution of the modern human brain was accompanied by distinct molecular and cellular specializations, which underpin our diverse cognitive abilities but also increase our susceptibility to neurological diseases. These features, some specific to humans and others shared with related species, manifest during different stages of brain development. In this multi-stage process, neural stem cells proliferate to produce a large and diverse progenitor pool, giving rise to excitatory or inhibitory neurons that integrate into circuits during further maturation. This process unfolds over varying time scales across species and has progressively become slower in the human lineage, with differences in tempo correlating with differences in brain size, cell number and diversity, and connectivity. Here we introduce the terms 'bradychrony' and 'tachycrony' to describe slowed and accelerated developmental tempos, respectively. We review how recent technical advances across disciplines, including advanced engineering of in vitro models, functional comparative genetics and high-throughput single-cell profiling, are leading to a deeper understanding of how specializations of the human brain arise during bradychronic neurodevelopment. Emerging insights point to a central role for genetics, gene-regulatory networks, cellular innovations and developmental tempo, which together contribute to the establishment of human specializations during various stages of neurodevelopment and at different points in evolution.
Asunto(s)
Evolución Biológica , Encéfalo , Animales , Humanos , Encéfalo/anatomía & histología , Encéfalo/citología , Encéfalo/crecimiento & desarrollo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Técnicas In Vitro , Células-Madre Neurales/citología , Células-Madre Neurales/fisiología , Neurogénesis , Neuronas/citología , Neuronas/fisiología , Tamaño de los Órganos , Análisis de la Célula Individual , Factores de Tiempo , Inhibición NeuralRESUMEN
In frontotemporal lobar degeneration (FTLD), pathological protein aggregation in specific brain regions is associated with declines in human-specialized social-emotional and language functions. In most patients, disease protein aggregates contain either TDP-43 (FTLD-TDP) or tau (FTLD-tau). Here, we explored whether FTLD-associated regional degeneration patterns relate to regional gene expression of human accelerated regions (HARs), conserved sequences that have undergone positive selection during recent human evolution. To this end, we used structural neuroimaging from patients with FTLD and human brain regional transcriptomic data from controls to identify genes expressed in FTLD-targeted brain regions. We then integrated primate comparative genomic data to test our hypothesis that FTLD targets brain regions linked to expression levels of recently evolved genes. In addition, we asked whether genes whose expression correlates with FTLD atrophy are enriched for genes that undergo cryptic splicing when TDP-43 function is impaired. We found that FTLD-TDP and FTLD-tau subtypes target brain regions with overlapping and distinct gene expression correlates, highlighting many genes linked to neuromodulatory functions. FTLD atrophy-correlated genes were strongly enriched for HARs. Atrophy-correlated genes in FTLD-TDP showed greater overlap with TDP-43 cryptic splicing genes and genes with more numerous TDP-43 binding sites compared with atrophy-correlated genes in FTLD-tau. Cryptic splicing genes were enriched for HAR genes, and vice versa, but this effect was due to the confounding influence of gene length. Analyses performed at the individual-patient level revealed that the expression of HAR genes and cryptically spliced genes within putative regions of disease onset differed across FTLD-TDP subtypes. Overall, our findings suggest that FTLD targets brain regions that have undergone recent evolutionary specialization and provide intriguing potential leads regarding the transcriptomic basis for selective vulnerability in distinct FTLD molecular-anatomical subtypes.
Asunto(s)
Encéfalo , Degeneración Lobar Frontotemporal , Humanos , Degeneración Lobar Frontotemporal/genética , Degeneración Lobar Frontotemporal/metabolismo , Encéfalo/metabolismo , Encéfalo/patología , Masculino , Femenino , Anciano , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Persona de Mediana Edad , Proteínas tau/genética , Proteínas tau/metabolismo , Atrofia/genética , Animales , Evolución Molecular , Expresión Génica/genéticaRESUMEN
SUMMARY: The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences. AVAILABILITY AND IMPLEMENTATION: SuPreMo was written in Python, and can be run using only one line of code to generate both sequences and 3D genome disruption scores. The codebase, instructions for installation and use, and tutorials are on the GitHub page: https://github.com/ketringjoni/SuPreMo.
Asunto(s)
Aprendizaje Automático , Programas Informáticos , Simulación por Computador , Biología Computacional/métodos , Humanos , MutagénesisRESUMEN
Nucleotide changes in gene regulatory elements are important determinants of neuronal development and diseases. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 open chromatin regions, including thousands of sequences with cell type-specific accessibility and variants associated with brain gene regulation. In primary cells, we identified 46,802 active enhancer sequences and 164 variants that alter enhancer activity. Activity was comparable in organoids and primary cells, suggesting that organoids provide an adequate model for the developing cortex. Using deep learning we decoded the sequence basis and upstream regulators of enhancer activity. This work establishes a comprehensive catalog of functional gene regulatory elements and variants in human neuronal development.
Asunto(s)
Corteza Cerebral , Neurogénesis , Organoides , Humanos , Corteza Cerebral/embriología , Corteza Cerebral/metabolismo , Cromatina/metabolismo , Cromatina/genética , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Regulación del Desarrollo de la Expresión Génica , Neurogénesis/genética , Neuronas/metabolismo , Organoides/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Regiones Promotoras Genéticas , Elementos Reguladores de la TranscripciónRESUMEN
Cis-regulatory elements (CREs) interact with trans regulators to orchestrate gene expression, but how transcriptional regulation is coordinated in multi-gene loci has not been experimentally defined. We sought to characterize the CREs controlling dynamic expression of the adjacent costimulatory genes CD28, CTLA4 and ICOS, encoding regulators of T cell-mediated immunity. Tiling CRISPR interference (CRISPRi) screens in primary human T cells, both conventional and regulatory subsets, uncovered gene-, cell subset- and stimulation-specific CREs. Integration with CRISPR knockout screens and assay for transposase-accessible chromatin with sequencing (ATAC-seq) profiling identified trans regulators influencing chromatin states at specific CRISPRi-responsive elements to control costimulatory gene expression. We then discovered a critical CCCTC-binding factor (CTCF) boundary that reinforces CRE interaction with CTLA4 while also preventing promiscuous activation of CD28. By systematically mapping CREs and associated trans regulators directly in primary human T cell subsets, this work overcomes longstanding experimental limitations to decode context-dependent gene regulatory programs in a complex, multi-gene locus critical to immune homeostasis.
Asunto(s)
Antígenos CD28 , Antígeno CTLA-4 , Cromatina , Regulación de la Expresión Génica , Humanos , Antígeno CTLA-4/genética , Antígenos CD28/genética , Cromatina/genética , Cromatina/metabolismo , Linfocitos T/inmunología , Linfocitos T/metabolismo , Proteína Coestimuladora de Linfocitos T Inducibles/genética , Proteína Coestimuladora de Linfocitos T Inducibles/metabolismo , Factor de Unión a CCCTC/metabolismo , Factor de Unión a CCCTC/genética , Sistemas CRISPR-CasRESUMEN
Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.
Asunto(s)
Empalme Alternativo , Encéfalo , Regulación del Desarrollo de la Expresión Génica , Trastornos Mentales , Humanos , Atlas como Asunto , Trastorno del Espectro Autista/genética , Encéfalo/metabolismo , Encéfalo/crecimiento & desarrollo , Encéfalo/embriología , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Sitios de Carácter Cuantitativo , Esquizofrenia/genética , Transcriptoma , Trastornos Mentales/genéticaRESUMEN
Ticks are increasingly important vectors of human and agricultural diseases. While many studies have focused on tick-borne bacteria, far less is known about tick-associated viruses and their roles in public health or tick physiology. To address this, we investigated patterns of bacterial and viral communities across two field populations of western black-legged ticks (Ixodes pacificus). Through metatranscriptomic analysis of 100 individual ticks, we quantified taxon prevalence, abundance, and co-occurrence with other members of the tick microbiome. In addition to commonly found tick-associated microbes, we assembled 11 novel RNA virus genomes from Rhabdoviridae, Chuviridae, Picornaviridae, Phenuiviridae, Reoviridae, Solemovidiae, Narnaviridae and two highly divergent RNA virus genomes lacking sequence similarity to any known viral families. We experimentally verified the presence of these in I. pacificus ticks across several life stages. We also unexpectedly identified numerous virus-like transcripts that are likely encoded by tick genomic DNA, and which are distinct from known endogenous viral element-mediated immunity pathways in invertebrates. Taken together, our work reveals that I. pacificus ticks carry a greater diversity of viruses than previously appreciated, in some cases resulting in evolutionarily acquired virus-like transcripts. Our findings highlight how pervasive and intimate tick-virus interactions are, with major implications for both the fundamental biology and vectorial capacity of I. pacificus ticks. IMPORTANCE: Ticks are increasingly important vectors of disease, particularly in the United States where expanding tick ranges and intrusion into previously wild areas has resulted in increasing human exposure to ticks. Emerging human pathogens have been identified in ticks at an increasing rate, and yet little is known about the full community of microbes circulating in various tick species, a crucial first step to understanding how they interact with each and their tick host, as well as their ability to cause disease in humans. We investigated the bacterial and viral communities of the Western blacklegged tick in California and found 11 previously uncharacterized viruses circulating in this population.
Asunto(s)
Ixodes , Animales , Ixodes/virología , Ixodes/microbiología , Transcriptoma , ARN Mensajero/genética , Microbiota/genética , Genoma Viral/genética , Virus ARN/genética , Virus ARN/aislamiento & purificación , Bacterias/genética , Bacterias/virología , Bacterias/aislamiento & purificaciónRESUMEN
CellWalker2 is a graph diffusion-based method for single-cell genomics data integration. It extends the CellWalker model by incorporating hierarchical relationships between cell types, providing estimates of statistical significance, and adding data structures for analyzing multi-omics data so that gene expression and open chromatin can be jointly modeled. Our open-source software enables users to annotate cells using existing ontologies and to probabilistically match cell types between two or more contexts, including across species. CellWalker2 can also map genomic regions to cell ontologies, enabling precise annotation of elements derived from bulk data, such as enhancers, genetic variants, and sequence motifs. Through simulation studies, we show that CellWalker2 performs better than existing methods in cell type annotation and mapping. We then use data from the brain and immune system to demonstrate CellWalker2's ability to discover cell type-specific regulatory programs and both conserved and divergent cell type relationships in complex tissues.
RESUMEN
The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.
RESUMEN
The most prevalent microbial eukaryote in the human gut is Blastocystis, an obligate commensal protist also common in many other vertebrates. Blastocystis is descended from free-living stramenopile ancestors; how it has adapted to thrive within humans and a wide range of hosts is unclear. Here, we cultivated six Blastocystis strains spanning the diversity of the genus and generated highly contiguous, annotated genomes with long-read DNA-seq, Hi-C, and RNA-seq. Comparative genomics between these strains and two closely related stramenopiles with different lifestyles, the lizard gut symbiont Proteromonas lacertae and the free-living marine flagellate Cafeteria burkhardae, reveal the evolutionary history of the Blastocystis genus. We find substantial gene content variability between Blastocystis strains. Blastocystis isolated from an herbivorous tortoise has many plant carbohydrate metabolizing enzymes, some horizontally acquired from bacteria, likely reflecting fermentation within the host gut. In contrast, human-isolated Blastocystis have gained many heat shock proteins, and we find numerous subtype-specific expansions of host-interfacing genes, including cell adhesion and cell surface glycan genes. In addition, we observe that human-isolated Blastocystis have substantial changes in gene structure, including shortened introns and intergenic regions, as well as genes lacking canonical termination codons. Finally, our data indicate that the common ancestor of Blastocystis lost nearly all ancestral genes for heterokont flagella morphology, including cilia proteins, microtubule motor proteins, and ion channel proteins. Together, these findings underscore the huge functional variability within the Blastocystis genus and provide candidate genes for the adaptations these lineages have undergone to thrive in the gut microbiomes of diverse vertebrates.
RESUMEN
Phenotypic divergence between closely related species, including bonobos and chimpanzees (genus Pan), is largely driven by variation in gene regulation. The 3D structure of the genome mediates gene expression; however, genome folding differences in Pan are not well understood. Here, we apply machine learning to predict genome-wide 3D genome contact maps from DNA sequence for 56 bonobos and chimpanzees, encompassing all five extant lineages. We use a pairwise approach to estimate 3D divergence between individuals from the resulting contact maps in 4,420 1 Mb genomic windows. While most pairs were similar, â¼17% were predicted to be substantially divergent in genome folding. The most dissimilar maps were largely driven by single individuals with rare variants that produce unique 3D genome folding in a region. We also identified 89 genomic windows where bonobo and chimpanzee contact maps substantially diverged, including several windows harboring genes associated with traits implicated in Pan phenotypic divergence. We used in silico mutagenesis to identify 51 3D-modifying variants in these bonobo-chimpanzee divergent windows, finding that 34 or 66.67% induce genome folding changes via CTCF binding motif disruption. Our results reveal 3D genome variation at the population-level and identify genomic regions where changes in 3D folding may contribute to phenotypic differences in our closest living relatives.