RESUMO
In stationary-phase Escherichia coli, Dps (DNA-binding protein from starved cells) is the most abundant protein component of the nucleoid. Dps compacts DNA into a dense complex and protects it from damage. Dps has also been proposed to act as a global regulator of transcription. Here, we directly examine the impact of Dps-induced compaction of DNA on the activity of RNA polymerase (RNAP). Strikingly, deleting the dps gene decompacted the nucleoid but did not significantly alter the transcriptome and only mildly altered the proteome during stationary phase. Complementary in vitro assays demonstrated that Dps blocks restriction endonucleases but not RNAP from binding DNA. Single-molecule assays demonstrated that Dps dynamically condenses DNA around elongating RNAP without impeding its progress. We conclude that Dps forms a dynamic structure that excludes some DNA-binding proteins yet allows RNAP free access to the buried genes, a behavior characteristic of phase-separated organelles.
Assuntos
DNA Bacteriano , Proteínas de Escherichia coli/metabolismo , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Transcrição Gênica , Proteínas da Membrana Bacteriana Externa/metabolismo , Enzimas de Restrição do DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , RNA Polimerases Dirigidas por DNA/metabolismo , Holoenzimas/metabolismo , Microscopia de Fluorescência , Poliestirenos/química , Proteoma , Análise de Sequência de RNA , Estresse Mecânico , TranscriptomaRESUMO
Increasing natural resistance and resilience in plants is key for ensuring food security within a changing climate. Breeders improve these traits by crossing cultivars with their wild relatives and introgressing specific alleles through meiotic recombination. However, some genomic regions are devoid of recombination especially in crosses between divergent genomes, limiting the combinations of desirable alleles. Here, we used pooled-pollen sequencing to build a map of recombinant and non-recombinant regions between tomato and five wild relatives commonly used for introgressive tomato breeding. We detected hybrid-specific recombination coldspots that underscore the role of structural variations in modifying recombination patterns and maintaining genetic linkage in interspecific crosses. Crossover regions and coldspots show strong association with specific TE superfamilies exhibiting differentially accessible chromatin between somatic and meiotic cells. About two-thirds of the genome are conserved coldspots, located mostly in the pericentromeres and enriched with retrotransposons. The coldspots also harbor genes associated with agronomic traits and stress resistance, revealing undesired consequences of linkage drag and possible barriers to breeding. We presented examples of linkage drag that can potentially be resolved by pairing tomato with other wild species. Overall, this catalogue will help breeders better understand crossover localization and make informed decisions on generating new tomato varieties.
Assuntos
Genoma de Planta , Recombinação Genética , Solanum lycopersicum , Solanum lycopersicum/genética , Hibridização Genética , Ligação Genética , Melhoramento Vegetal , Retroelementos/genética , Troca Genética , Meiose/genética , Mapeamento Cromossômico , Cromossomos de Plantas/genética , AlelosRESUMO
Many plant transcription factors (TFs) are multifunctional and regulate growth and development in more than one tissue. These TFs can generally associate with different protein partners depending on the tissue type, thereby regulating tissue-specific target gene sets. However, how interaction specificity is ensured is still largely unclear. Here, we examine protein-protein interaction specificity using subfunctionalized co-orthologs of the FRUITFULL (FUL) subfamily of MADS-domain TFs. In Arabidopsis, FUL is multifunctional, playing important roles in flowering and fruiting, whereas these functions have partially been divided in the tomato co-orthologs FUL1 and FUL2. By linking protein sequence and function, we discovered a key amino acid motif that determines interaction specificity of MADS-domain TFs, which in Arabidopsis FUL determines the interaction with AGAMOUS and SEPALLATA proteins, linked to the regulation of a subset of targets. This insight offers great opportunities to dissect the biological functions of multifunctional MADS TFs.
RESUMO
Phytoplasmas are pathogenic bacteria that reprogram plant host development for their own benefit. Previous studies have characterized a few different phytoplasma effector proteins that destabilize specific plant transcription factors. However, these are only a small fraction of the potential effectors used by phytoplasmas; therefore, the molecular mechanisms through which phytoplasmas modulate their hosts require further investigation. To obtain further insights into the phytoplasma infection mechanisms, we generated a protein-protein interaction network between a broad set of phytoplasma effectors and a large, unbiased collection of Arabidopsis thaliana transcription factors and transcriptional regulators. We found widespread, but specific, interactions between phytoplasma effectors and host transcription factors, especially those related to host developmental processes. In particular, many unrelated effectors target specific sets of TCP transcription factors, which regulate plant development and immunity. Comparison with other host-pathogen protein interaction networks shows that phytoplasma effectors have unusual targets, indicating that phytoplasmas have evolved a unique and unusual infection strategy. This study contributes a rich and solid data source that guides further investigations of the functions of individual effectors, as demonstrated for some herein. Moreover, the dataset provides insights into the underlying molecular mechanisms of phytoplasma infection.
Assuntos
Arabidopsis , Phytoplasma , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Plantas/metabolismo , Arabidopsis/metabolismo , Mapeamento de Interação de Proteínas , Doenças das Plantas/microbiologiaRESUMO
Natural populations of Arabidopsis thaliana provide powerful systems to study the adaptation of wild plant species. Previous research has predominantly focused on global populations or accessions collected from regions with diverse climates. However, little is known about the genetics underlying adaptation in regions with mild environmental clines. We have examined a diversity panel consisting of 192 A. thaliana accessions collected from the Netherlands, a region with limited climatic variation. Despite the relatively uniform climate, we identified evidence of local adaptation within this population. Notably, semidwarf accessions, due to mutation of the GIBBERELLIC ACID REQUIRING 5 (GA5) gene, occur at a relatively high frequency near the coast and these displayed enhanced tolerance to high wind velocities. Additionally, we evaluated the performance of the population under iron deficiency conditions and found that allelic variation in the FE SUPEROXIDE DISMUTASE 3 (FSD3) gene affects tolerance to low iron levels. Moreover, we explored patterns of local adaptation to environmental clines in temperature and precipitation, observing that allelic variation at LA RELATED PROTEIN 1C (LARP1c) likely affects drought tolerance. Not only is the genetic variation observed in a diversity panel of A. thaliana collected in a region with mild environmental clines comparable to that in collections sampled over larger geographic ranges but it is also sufficiently rich to elucidate the genetic and environmental factors underlying natural plant adaptation.
RESUMO
It has been known for decades that codon usage contributes to translation efficiency and hence to protein production levels. However, its role in protein synthesis is still only partly understood. This lack of understanding hampers the design of synthetic genes for efficient protein production. In this study, we generated a synonymous codon-randomized library of the complete coding sequence of red fluorescent protein. Protein production levels and the full coding sequences were determined for 1459 gene variants in Escherichia coli. Using different machine learning approaches, these data were used to reveal correlations between codon usage and protein production. Interestingly, protein production levels can be relatively accurately predicted (Pearson correlation of 0.762) by a Random Forest model that only relies on the sequence information of the first eight codons. In this region, close to the translation initiation site, mRNA secondary structure rather than Codon Adaptation Index (CAI) is the key determinant of protein production. This study clearly demonstrates the key role of codons at the start of the coding sequence. Furthermore, these results imply that commonly used CAI-based codon optimization of the full coding sequence is not a very effective strategy. One should rather focus on optimizing protein production via reducing mRNA secondary structure formation with the first few codons.
Assuntos
Escherichia coli , Aprendizado de Máquina , Distribuição Aleatória , Códon/genética , Códon/metabolismo , RNA Mensageiro/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Biossíntese de ProteínasRESUMO
The selective pressure of pathogen-host symbiosis drives adaptations. How these interactions shape the metabolism of pathogens is largely unknown. Here, we use comparative genomics to systematically analyze the metabolic networks of oomycetes, a diverse group of eukaryotes that includes saprotrophs as well as animal and plant pathogens, with the latter causing devastating diseases with significant economic and/or ecological impacts. In our analyses of 44 oomycete species, we uncover considerable variation in metabolism that can be linked to lifestyle differences. Comparisons of metabolic gene content reveal that plant pathogenic oomycetes have a bipartite metabolism consisting of a conserved core and an accessory set. The accessory set can be associated with the degradation of defense compounds produced by plants when challenged by pathogens. Obligate biotrophic oomycetes have smaller metabolic networks, and taxonomically distantly related biotrophic lineages display convergent evolution by repeated gene losses in both the conserved as well as the accessory set of metabolisms. When investigating to what extent the metabolic networks in obligate biotrophs differ from those in hemibiotrophic plant pathogens, we observe that the losses of metabolic enzymes in obligate biotrophs are not random and that gene losses predominantly influence the terminal branches of the metabolic networks. Our analyses represent the first metabolism-focused comparison of oomycetes at this scale and will contribute to a better understanding of the evolution of oomycete metabolism in relation to lifestyle adaptation. Numerous oomycete species are devastating plant pathogens that cause major damage in crops and natural ecosystems. Their interactions with hosts are shaped by strong selection, but how selection affects adaptation of the primary metabolism to a pathogenic lifestyle is not yet well established. By pan-genome and metabolic network analyses of distantly related oomycete pathogens and their nonpathogenic relatives, we reveal considerable lifestyle- and lineage-specific adaptations. This study contributes to a better understanding of metabolic adaptations in pathogenic oomycetes in relation to lifestyle, host, and environment, and the findings will help in pinpointing potential targets for disease control. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Assuntos
Oomicetos , Redes e Vias Metabólicas/genética , Adaptação Fisiológica , Doenças das Plantas/microbiologia , Interações Hospedeiro-Patógeno , Filogenia , Simbiose , Plantas/microbiologia , Plantas/metabolismo , GenômicaRESUMO
BACKGROUND: Breeding of lettuce (Lactuca sativa L.), the most important leafy vegetable worldwide, for enhanced disease resistance and resilience relies on multiple wild relatives to provide the necessary genetic diversity. In this study, we constructed a super-pangenome based on four Lactuca species (representing the primary, secondary and tertiary gene pools) and comprising 474 accessions. We include 68 newly sequenced accessions to improve cultivar coverage and add important foundational breeding lines. RESULTS: With the super-pangenome we find substantial presence/absence variation (PAV) and copy-number variation (CNV). Functional enrichment analyses of core and variable genes show that transcriptional regulators are conserved whereas disease resistance genes are variable. PAV-genome-wide association studies (GWAS) and CNV-GWAS are largely congruent with single-nucleotide polymorphism (SNP)-GWAS. Importantly, they also identify several major novel quantitative trait loci (QTL) for resistance against Bremia lactucae in variable regions not present in the reference lettuce genome. The usability of the super-pangenome is demonstrated by identifying the likely origin of non-reference resistance loci from the wild relatives Lactuca serriola, Lactuca saligna and Lactuca virosa. CONCLUSIONS: The super-pangenome offers a broader view on the gene repertoire of lettuce, revealing relevant loci that are not in the reference genome(s). The provided methodology and data provide a strong basis for research into PAVs, CNVs and other variation underlying important biological traits of lettuce and other crops.
Assuntos
Genoma de Planta , Estudo de Associação Genômica Ampla , Lactuca , Locos de Características Quantitativas , Lactuca/genética , Polimorfismo de Nucleotídeo Único , Resistência à Doença/genética , Variações do Número de Cópias de DNA , Genes de Plantas , Melhoramento Vegetal/métodos , Variação GenéticaRESUMO
BACKGROUND AND AIMS: The Brassiceae tribe encompasses many economically important crops and exhibits high intraspecific and interspecific phenotypic variation. After a shared whole-genome triplication (WGT) event (Br-α, ~15.9 million years ago), differential lineage diversification and genomic changes contributed to an array of divergence in morphology, biochemistry, and physiology underlying photosynthesis-related traits. Here, the C3 species Hirschfeldia incana is studied as it displays high photosynthetic rates under high-light conditions. Our aim was to elucidate the evolution that gave rise to the genome of H. incana and its high-photosynthesis traits. METHODS: We reconstructed a chromosome-level genome assembly for H. incana (Nijmegen, v2.0) using nanopore and chromosome conformation capture (Hi-C) technologies, with 409Mb in size and an N50 of 52Mb (a 10× improvement over the previously published scaffold-level v1.0 assembly). The updated assembly and annotation was subsequently employed to investigate the WGT history of H. incana in a comparative phylogenomic framework from the Brassiceae ancestral genomic blocks and related diploidized crops. KEY RESULTS: Hirschfeldia incana (x=7) shares extensive genome collinearity with Raphanus sativus (x=9). These two species share some commonalities with Brassica rapa and B. oleracea (A genome, x=10 and C genome, x=9, respectively) and other similarities with B. nigra (B genome, x=8). Phylogenetic analysis revealed that H. incana and R. sativus form a monophyletic clade in between the Brassica A/C and B genomes. We postulate that H. incana and R. sativus genomes are results of hybridization or introgression of the Brassica A/C and B genome types. Our results might explain the discrepancy observed in published studies regarding phylogenetic placement of H. incana and R. sativus in relation to the "Triangle of U" species. Expression analysis of WGT retained gene copies revealed sub-genome expression divergence, likely due to neo- or sub-functionalization. Finally, we highlighted genes associated with physio-biochemical-anatomical adaptive changes observed in H. incana which likely facilitate its high-photosynthesis traits under high light. CONCLUSIONS: The improved H. incana genome assembly, annotation and results presented in this work will be a valuable resource for future research to unravel the genetic basis of its ability to maintain a high photosynthetic efficiency in high-light conditions and thereby improve photosynthesis for enhanced agricultural production.
RESUMO
Photosynthesis is a key process in sustaining plant and human life. Improving the photosynthetic capacity of agricultural crops is an attractive means to increase their yields. While the core mechanisms of photosynthesis are highly conserved in C3 plants, these mechanisms are very flexible, allowing considerable diversity in photosynthetic properties. Among this diversity is the maintenance of high photosynthetic light-use efficiency at high irradiance as identified in a small number of exceptional C3 species. Hirschfeldia incana, a member of the Brassicaceae family, is such an exceptional species, and because it is easy to grow, it is an excellent model for studying the genetic and physiological basis of this trait. Here, we present a reference genome of H. incana and confirm its high photosynthetic light-use efficiency. While H. incana has the highest photosynthetic rates found so far in the Brassicaceae, the light-saturated assimilation rates of closely related Brassica rapa and Brassica nigra are also high. The H. incana genome has extensively diversified from that of B. rapa and B. nigra through large chromosomal rearrangements, species-specific transposon activity, and differential retention of duplicated genes. Duplicated genes in H. incana, B. rapa, and B. nigra that are involved in photosynthesis and/or photoprotection show a positive correlation between copy number and gene expression, providing leads into the mechanisms underlying the high photosynthetic efficiency of these species. Our work demonstrates that the H. incana genome serves as a valuable resource for studying the evolution of high photosynthetic light-use efficiency and enhancing photosynthetic rates in crop species.
Assuntos
Brassica rapa , Brassicaceae , Humanos , Brassicaceae/metabolismo , Fotossíntese/genética , Produtos Agrícolas , FenótipoRESUMO
Meiotic recombination is a biological process of key importance in breeding, to generate genetic diversity and develop novel or agronomically relevant haplotypes. In crop tomato, recombination is curtailed as manifested by linkage disequilibrium decay over a longer distance and reduced diversity compared with wild relatives. Here, we compared domesticated and wild populations of tomato and found an overall conserved recombination landscape, with local changes in effective recombination rate in specific genomic regions. We also studied the dynamics of recombination hotspots resulting from domestication and found that loss of such hotspots is associated with selective sweeps, most notably in the pericentromeric heterochromatin. We detected footprints of genetic changes and structural variants, among them associated with transposable elements, linked with hotspot divergence during domestication, likely causing fine-scale alterations to recombination patterns and resulting in linkage drag.
Assuntos
Domesticação , Solanum lycopersicum , Elementos de DNA Transponíveis/genética , Solanum lycopersicum/genética , Melhoramento Vegetal , Recombinação GenéticaRESUMO
SUMMARY: The ever-increasing number of sequenced genomes necessitates the development of pangenomic approaches for comparative genomics. Introduced in 2016, PanTools is a platform that allows pangenome construction, homology grouping and pangenomic read mapping. The use of graph database technology makes PanTools versatile, applicable from small viral genomes like SARS-CoV-2 up to large plant or animal genomes like tomato or human. Here, we present our third major update to PanTools that enables the integration of functional annotations and provides both gene-level analyses and phylogenetics. AVAILABILITY AND IMPLEMENTATION: PanTools is implemented in Java 8 and released under the GNU GPLv3 license. Software and documentation are available at https://git.wur.nl/bioinformatics/pantools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Filogenia , SARS-CoV-2/genética , Software , Genoma ViralRESUMO
The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.
Assuntos
Galinhas/genética , DNA Intergênico/genética , Genômica/métodos , Alelos , Animais , Sequência Conservada/genética , DNA Intergênico/metabolismo , Evolução Molecular , Frequência do Gene/genética , Variação Genética/genética , Genoma/genética , Íntrons/genética , Metagenômica/métodos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência/métodosRESUMO
T-cell prolymphocytic leukemia (T-PLL) is mostly characterized by aberrant expansion of small- to medium-sized prolymphocytes with a mature post-thymic phenotype, high aggressiveness of the disease and poor prognosis. However, T-PLL is more heterogeneous with a wide range of clinical, morphological, and molecular features, which occasionally impedes the diagnosis. We hypothesized that T-PLL consists of phenotypic and/or genotypic subgroups that may explain the heterogeneity of the disease. Multi-dimensional immuno-phenotyping and gene expression profiling did not reveal clear T-PLL subgroups, and no clear T-cell receptor a or ß CDR3 skewing was observed between different T-PLL cases. We revealed that the expression of microRNA (miRNA) is aberrant and often heterogeneous in T-PLL. We identified 35 miRNA that were aberrantly expressed in T-PLL with miR-200c/141 as the most differentially expressed cluster. High miR- 200c/141 and miR-181a/181b expression was significantly correlated with increased white blood cell counts and poor survival. Furthermore, we found that overexpression of miR-200c/141 correlated with downregulation of their targets ZEB2 and TGFßR3 and aberrant TGFß1- induced phosphorylated SMAD2 (p-SMAD2) and p-SMAD3, indicating that the TGFß pathway is affected in T-PLL. Our results thus highlight the potential role for aberrantly expressed oncogenic miRNA in T-PLL and pave the way for new therapeutic targets in this disease.
Assuntos
Leucemia Prolinfocítica de Células T , MicroRNAs , Perfilação da Expressão Gênica , Humanos , Leucemia Prolinfocítica de Células T/diagnóstico , Leucemia Prolinfocítica de Células T/genética , Leucemia Prolinfocítica de Células T/terapia , Linfócitos , MicroRNAs/genética , Fator de Crescimento Transformador beta , Homeobox 2 de Ligação a E-box com Dedos de Zinco/genéticaRESUMO
Sesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.
Assuntos
Alquil e Aril Transferases/metabolismo , Evolução Molecular , Aprendizado de Máquina , Plantas/enzimologia , Sesquiterpenos/metabolismo , Alquil e Aril Transferases/química , Sequência de Aminoácidos , Cátions , Conformação Proteica , Homologia de Sequência de Aminoácidos , Especificidade por SubstratoRESUMO
The genotype-phenotype link is a major research topic in the life sciences but remains highly complex to disentangle. Part of the complexity arises from the number of genes contributing to the observed phenotype. Despite the vast increase of molecular data, pinpointing the causal variant underlying a phenotype of interest is still challenging. In this study, we present an approach to map causal variation and molecular pathways underlying important phenotypes in pigs. We prioritize variation by utilizing and integrating predicted variant impact scores (pCADD), functional genomic information, and associated phenotypes in other mammalian species. We demonstrate the efficacy of our approach by reporting known and novel causal variants, of which many affect non-coding sequences. Our approach allows the disentangling of the biology behind important phenotypes by accelerating the discovery of novel causal variants and molecular mechanisms affecting important phenotypes in pigs. This information on molecular mechanisms could be applicable in other mammalian species, including humans.
Assuntos
Variação Genética , Genômica , Animais , Genótipo , Mamíferos , Fenótipo , Suínos/genéticaRESUMO
Förster resonance energy transfer (FRET) is a useful phenomenon in biomolecular investigations, as it can be leveraged for nanoscale measurements. The optical signals produced by such experiments can be analyzed by fitting a statistical model. Several software tools exist to fit such models in an unsupervised manner but lack the flexibility to adapt to different experimental setups and require local installations. Here, we propose to fit models to optical signals more intuitively by adopting a semisupervised approach, in which the user interactively guides the model to fit a given data set, and introduce FRETboard, a web tool that allows users to provide such guidance. We show that our approach is able to closely reproduce ground truth FRET statistics in a wide range of simulated single-molecule scenarios and correctly estimate parameters for up to 11 states. On in vitro data, we retrieve parameters identical to those obtained by laborious manual classification in a fraction of the required time. Moreover, we designed FRETboard to be easily extendable to other models, allowing it to adapt to future developments in FRET measurement and analysis.
Assuntos
Transferência Ressonante de Energia de Fluorescência , Software , NanotecnologiaRESUMO
Genome wide screening of pooled pollen samples from a single interspecific F1 hybrid obtained from a cross between tomato, Solanum lycopersicum and its wild relative, Solanum pimpinellifolium using linked read sequencing of the haploid nuclei, allowed profiling of the crossover (CO) and gene conversion (GC) landscape. We observed a striking overlap between cold regions of CO in the male gametes and our previously established F6 recombinant inbred lines (RILs) population. COs were overrepresented in non-coding regions in the gene promoter and 5'UTR regions of genes. Poly-A/T and AT rich motifs were found enriched in 1 kb promoter regions flanking the CO sites. Non-crossover associated allelic and ectopic GCs were detected in most chromosomes, confirming that besides CO, GC represents also a source for genetic diversity and genome plasticity in tomato. Furthermore, we identified processed break junctions pointing at the involvement of both homology directed and non-homology directed repair pathways, suggesting a recombination machinery in tomato that is more complex than currently anticipated.
Assuntos
Meiose/fisiologia , Solanum lycopersicum/citologia , Solanum lycopersicum/genética , Regiões 5' não Traduzidas/genética , Cromossomos de Plantas/genética , Troca Genética , Genoma de Planta/genética , Genótipo , Meiose/genética , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNARESUMO
BACKGROUND: Bacterial plant pathogens of the Pectobacterium genus are responsible for a wide spectrum of diseases in plants, including important crops such as potato, tomato, lettuce, and banana. Investigation of the genetic diversity underlying virulence and host specificity can be performed at genome level by using a comprehensive comparative approach called pangenomics. A pangenomic approach, using newly developed functionalities in PanTools, was applied to analyze the complex phylogeny of the Pectobacterium genus. We specifically used the pangenome to investigate genetic differences between virulent and avirulent strains of P. brasiliense, a potato blackleg causing species dominantly present in Western Europe. RESULTS: Here we generated a multilevel pangenome for Pectobacterium, comprising 197 strains across 19 species, including type strains, with a focus on P. brasiliense. The extensive phylogenetic analysis of the Pectobacterium genus showed robust distinct clades, with most detail provided by 452,388 parsimony-informative single-nucleotide polymorphisms identified in single-copy orthologs. The average Pectobacterium genome consists of 47% core genes, 1% unique genes, and 52% accessory genes. Using the pangenome, we zoomed in on differences between virulent and avirulent P. brasiliense strains and identified 86 genes associated to virulent strains. We found that the organization of genes is highly structured and linked with gene conservation, function, and transcriptional orientation. CONCLUSION: The pangenome analysis demonstrates that evolution in Pectobacteria is a highly dynamic process, including gene acquisitions partly in clusters, genome rearrangements, and loss of genes. Pectobacterium species are typically not characterized by a set of species-specific genes, but instead present themselves using new gene combinations from the shared gene pool. A multilevel pangenomic approach, fusing DNA, protein, biological function, taxonomic group, and phenotypes, facilitates studies in a flexible taxonomic context.
Assuntos
Pectobacterium , Solanum tuberosum , Europa (Continente) , Pool Gênico , Pectobacterium/genética , Filogenia , Doenças das Plantas , Solanum tuberosum/genéticaRESUMO
MOTIVATION: As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well. RESULTS: We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family. AVAILABILITY AND IMPLEMENTATION: Python code available at https://git.wur.nl/durai001/geometricus.