RESUMO
The COVID-19 pandemic saw the emergence of various Variants of Concern (VOCs) that took the world by storm, often replacing the ones that preceded them. The characteristic mutant constellations of these VOCs increased viral transmissibility and infectivity. Their origin and evolution remain puzzling. With the help of data mining efforts and the GISAID database, a chronology of 22 haplotypes described viral evolution up until 23 July 2023. Since the three-dimensional atomic structures of proteins corresponding to the identified haplotypes are not available, ab initio methods were here utilized. Regions of intrinsic disorder proved to be important for viral evolution, as evidenced by the targeted change to the nucleocapsid (N) protein at the sequence, structure, and biochemical levels. The linker region of the N-protein, which binds to the RNA genome and self-oligomerizes for efficient genome packaging, was greatly impacted by mutations throughout the pandemic, followed by changes in structure and intrinsic disorder. Remarkably, VOC constellations acted co-operatively to balance the more extreme effects of individual haplotypes. Our strategy of mapping the dynamic evolutionary landscape of genetically linked mutations to the N-protein structure demonstrates the utility of ab initio modeling and deep learning tools for therapeutic intervention.
Assuntos
COVID-19 , Proteínas do Nucleocapsídeo de Coronavírus , Haplótipos , SARS-CoV-2 , SARS-CoV-2/genética , SARS-CoV-2/química , Humanos , COVID-19/virologia , COVID-19/epidemiologia , Proteínas do Nucleocapsídeo de Coronavírus/genética , Proteínas do Nucleocapsídeo de Coronavírus/química , Mutação , Evolução Molecular , Modelos Moleculares , Estações do Ano , Fosfoproteínas/genética , Fosfoproteínas/química , Conformação Proteica , Proteínas do Nucleocapsídeo/genética , Proteínas do Nucleocapsídeo/químicaRESUMO
Virus taxonomy uses a Linnaean-like subsumption hierarchy to classify viruses into taxonomic units at species and higher rank levels. Virus species are considered monophyletic groups of mobile genetic elements (MGEs) often delimited by the phylogenetic analysis of aligned genomic or metagenomic sequences. Taxonomic units are assumed to be independent organizational, functional and evolutionary units that follow a 'natural history' rationale. Here, I use phylogenomic and other arguments to show that viruses are not self-standing genetically-driven systems acting as evolutionary units. Instead, they are crucial components of holobionts, which are units of biological organization that dynamically integrate the genetics, epigenetic, physiological and functional properties of their co-evolving members. Remarkably, phylogenomic analyses show that viruses share protein domains and loops with cells throughout history via massive processes of reticulate evolution, helping spread evolutionary innovations across a wider taxonomic spectrum. Thus, viruses are not merely MGEs or microbes. Instead, their genomes and proteomes conduct cellularly integrated processes akin to those cataloged by the GO Consortium. This prompts the generation of compositional hierarchies that replace the 'is-a-kind-of' by a 'is-a-part-of' logic to better describe the mereology of integrated cellular and viral makeup. My analysis demands a new paradigm that integrates virus taxonomy into a modern evolutionarily centered taxonomy of organisms.
Assuntos
Evolução Molecular , Genoma Viral , Filogenia , Domínios Proteicos , Vírus , Vírus/genética , Vírus/classificação , Genômica/métodosRESUMO
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and ß-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ?processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Assuntos
Evolução Molecular , Filogenia , Proteínas/genética , Proteínas/química , Origem da Vida , Modelos MolecularesRESUMO
The slow experimental acquisition of high-quality atomic structures of the rapidly changing proteins of the COVID-19 virus challenges vaccine and therapeutic drug development efforts. Fortunately, deep learning tools such as AlphaFold2 can quickly generate reliable models of atomic structure at experimental resolution. Current modeling studies have focused solely on definitions of mutant constellations of Variants of Concern (VOCs), leaving out the impact of haplotypes on protein structure. Here, we conduct a thorough comparative structural analysis of S-proteins belonging to major VOCs and corresponding latitude-delimited haplotypes that affect viral seasonal behavior. Our approach identified molecular regions of importance as well as patterns of structural recruitment. The S1 subunit hosted the majority of structural changes, especially those involving the N-terminal domain (NTD) and the receptor-binding domain (RBD). In particular, structural changes in the NTD were much greater than just translations in three-dimensional space, altering the sub-structures to greater extents. We also revealed a notable pattern of structural recruitment with the early VOCs Alpha and Delta behaving antagonistically by suppressing regions of structural change introduced by their corresponding haplotypes, and the current VOC Omicron behaving synergistically by amplifying or collecting structural change. Remarkably, haplotypes altering the galectin-like structure of the NTD were major contributors to seasonal behavior, supporting its putative environmental-sensing role. Our results provide an extensive view of the evolutionary landscape of the S-protein across the COVID-19 pandemic. This view will help predict important regions of structural change in future variants and haplotypes for more efficient vaccine and drug development.
RESUMO
Intrinsic disorder accounts for the flexibility of protein loops, molecular building blocks that are largely responsible for the processes and molecular functions of the living world. While loops likely represent early structural forms that served as intermediates in the emergence of protein structural domains, their origin and evolution remain poorly understood. Here, we conduct a phylogenomic survey of disorder in loop prototypes sourced from the ArchDB classification. Tracing prototypes associated with protein fold families along an evolutionary chronology revealed that ancient prototypes tended to be more disordered than their derived counterparts, with ordered prototypes developing later in evolution. This highlights the central evolutionary role of disorder and flexibility. While mean disorder increased with time, a minority of ordered prototypes exist that emerged early in evolutionary history, possibly driven by the need to preserve specific molecular functions. We also revealed the percolation of evolutionary constraints from higher to lower levels of organization. Percolation resulted in trade-offs between flexibility and rigidity that impacted prototype structure and geometry. Our findings provide a deep evolutionary view of the link between structure, disorder, flexibility, and function, as well as insights into the evolutionary role of intrinsic disorder in loops and their contribution to protein structure and function.
RESUMO
The structures and functions of proteins are embedded into the loop scaffolds of structural domains. Their origin and evolution remain mysterious. Here, we use a novel graph-theoretical approach to describe how modular and non-modular loop prototypes combine to form folded structures in protein domain evolution. Phylogenomic data-driven chronologies reoriented a bipartite network of loops and domains (and its projections) into 'waterfalls' depicting an evolving 'elementary functionome' (EF). Two primordial waves of functional innovation involving founder 'p-loop' and 'winged-helix' domains were accompanied by an ongoing emergence and reuse of structural and functional novelty. Metabolic pathways expanded before translation functionalities. A dual hourglass recruitment pattern transferred scale-free properties from loop to domain components of the EF network in generative cycles of hierarchical modularity. Modeling the evolutionary emergence of the oldest P-loop and winged-helix domains with AlphFold2 uncovered rapid convergence towards folded structure, suggesting that a folding vocabulary exists in loops for protein fold repurposing and design.
Assuntos
Dermatite , Humanos , Desenvolvimento Embrionário , Filogenia , Domínios Proteicos , TraduçõesRESUMO
Taxonomical classification has preceded evolutionary understanding. For that reason, taxonomy has become a battleground fueled by knowledge gaps, technical limitations, and a priorism. Here we assess the current state of the challenging field, focusing on fallacies that are common in viral classification. We emphasize that viruses are crucial contributors to the genomic and functional makeup of holobionts, organismal communities that behave as units of biological organization. Consequently, viruses cannot be considered taxonomic units because they challenge crucial concepts of organismality and individuality. Instead, they should be considered processes that integrate virions and their hosts into life cycles. Viruses harbor phylogenetic signatures of genetic transfer that compromise monophyly and the validity of deep taxonomic ranks. A focus on building phylogenetic networks using alignment-free methodologies and molecular structure can help mitigate the impasse, at least in part. Finally, structural phylogenomic analysis challenges the polyphyletic scenario of multiple viral origins adopted by virus taxonomy, defeating a polyphyletic origin and supporting instead an ancient cellular origin of viruses. We therefore, prompt abandoning deep ranks and urgently reevaluating the validity of taxonomic units and principles of virus classification.
RESUMO
Biomolecular communication demands that interactions between parts of a molecular system act as scaffolds for message transmission. It also requires an organized system of signs-a communicative agency-for creating and transmitting meaning. The emergence of agency, the capacity to act in a given context and generate end-directed behaviors, has baffled evolutionary biologists for centuries. Here, I explore its emergence with knowledge grounded in over two decades of evolutionary genomic and bioinformatic exploration. Biphasic processes of growth and diversification exist that generate hierarchy and modularity in biological systems at widely ranging time scales. Similarly, a biphasic process exists in communication that constructs a message before it can be transmitted for interpretation. Transmission dissipates matter-energy and information and involves computation. Agency emerges when molecular machinery generates hierarchical layers of vocabularies in an entangled communication network clustered around the universal Turing machine of the ribosome. Computations canalize biological systems to perform biological functions in a dissipative quest to structure long-lived occurrents. This occurs within the confines of a "triangle of persistence" that maximizes invariance with trade-offs between economy, flexibility, and robustness. Thus, learning from previous historical and circumstantial experiences unifies modules in a hierarchy that expands the agency of systems.
Assuntos
Cognição , Biologia Computacional , Humanos , Evolução BiológicaRESUMO
Many viral diseases exhibit seasonal behavior and can be affected by environmental stressors. Using time-series correlation charts extrapolated from worldwide data, we provide strong support for the seasonal development of COVID-19 regardless of the immunity of the population, behavioral changes, and the periodic appearance of new variants with higher rates of infectivity and transmissibility. Statistically significant latitudinal gradients were also observed with indicators of global change. Using the Environmental Protection Index (EPI) and State of Global Air (SoGA) metrics, a bilateral analysis of environmental health and ecosystem vitality effects showed associations with COVID-19 transmission. Air quality, pollution emissions, and other indicators showed strong correlations with COVID-19 incidence and mortality. Remarkably, EPI category and performance indicators also correlated with latitude, suggesting cultural and psychological diversity in human populations not only impact wealth and happiness but also planetary health at latitudinal level. Looking forward, we conclude there will be a need to disentangle the seasonal and global change effects of COVID-19 noting that countries that go against the health of the planet affect health in general.
RESUMO
Background: Variants of concern (VOCs) have been replacing each other during the still rampant COVID-19 pandemic. As a result, SARS-CoV-2 populations have evolved increasingly intricate constellations of mutations that often enhance transmissibility, disease severity, and other epidemiological characteristics. The origin and evolution of these constellations remain puzzling. Methods: Here we study the evolution of VOCs at the proteome level by analyzing about 12 million genomic sequences retrieved from GISAID on July 23, 2022. A total 183,276 mutations were identified and filtered with a relevancy heuristic. The prevalence of haplotypes and free-standing mutations was then tracked monthly in various latitude corridors of the world. Results: A chronology of 22 haplotypes defined three phases driven by protein flexibility-rigidity, environmental sensing, and immune escape. A network of haplotypes illustrated the recruitment and coalescence of mutations into major VOC constellations and seasonal effects of decoupling and loss. Protein interaction networks mediated by haplotypes predicted communications impacting the structure and function of proteins, showing the increasingly central role of molecular interactions involving the spike (S), nucleocapsid (N), and membrane (M) proteins. Haplotype markers either affected fusogenic regions while spreading along the sequence of the S-protein or clustered around binding domains. Modeling of protein structure with AlphaFold2 showed that VOC Omicron and one of its haplotypes were major contributors to the distortion of the M-protein endodomain, which behaves as a receptor of other structural proteins during virion assembly. Remarkably, VOC constellations acted cooperatively to balance the more extreme effects of individual haplotypes. Conclusions: Our study uncovers seasonal patterns of emergence and diversification occurring amid a highly dynamic evolutionary landscape of bursts and waves. The mapping of genetically-linked mutations to structures that sense environmental change with powerful ab initio modeling tools demonstrates the potential of deep-learning for COVID-19 predictive intelligence and therapeutic intervention.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , COVID-19/genética , Haplótipos/genética , Pandemias , Estações do AnoRESUMO
Recruitment is a pervasive activity of life that is at the center of novelty generation and persistence. Without recruitment, novelties cannot spread and biological systems cannot maintain identity through time. Here we explore the problem of identity and change unfolding in space and time. We illustrate recruitment operating at different timescales with metabolic networks, protein domain makeup, the functionome, and the rise of viral 'variants of concern' during the coronavirus disease 2019 (COVID-19) pandemic. We define persistence within a framework of fluxes of matter-energy and information and signal processing in response to internal and external challenges. A 'triangle of persistence' describing reuse, innovation and stasis defines a useful polytope in a phase space of trade-offs between economy, flexibility and robustness. We illustrate how the concept of temporal parts embraced by the perdurantist school provides a processual 4-dimensional 'worm' view of biology that is historical and atemporal. This view is made explicit with chronologies and evolving networks inferred with phylogenomic methodologies. Exploring the origin and evolution of the ribosome reveals recruitment of helical segments and/or large fragments of interacting rRNA molecules in a unification process of accretion that is counteracted by diversification. A biphasic (bow-tie) theory of module generation models this frustrated dynamics. Finally, we further elaborate on a theory of entanglement that takes advantage of the dimensionality reduction offered by holographic principles to propose that short and long-distance interactions are responsible for the increasingly granular and tangled structure of biological systems.
Assuntos
COVID-19 , Humanos , FilogeniaRESUMO
SARS-CoV-2 continues to evolve, even after implementation of public-wide vaccination, as can be observed by an increasing number of mutations over time. Compared to responses by the United States and European countries, the disease mitigation strategies employed by the Australian government have been swift and effective. This provides a unique opportunity to study the emergence of variants of concern (VOCs) at many latitude levels in a country that has been able to control infection for the majority of the pandemic. In the present study, we explored the occurrence and accumulation of major mutations typical of VOCs in different regions of Australia and the effects that latitude has on the establishment of VOC-induced disease. We also studied the constellation of mutations characteristic of VOCs to determine if the mutation sets acted as haplotypes. Our goal was to explore processes behind the emergence of VOCs as the viral disease progresses towards becoming endemic. Most reported COVID-19 cases were in largest cities located within a -30°S to - 50°S latitude corridor previously identified to be associated with seasonal behavior. Accumulation plots of individual amino acid variants of major VOCs showed that the first major haplotypes reported worldwide were also present in Australia. A classification of accumulation plots revealed the existence of 18 additional haplotypes associated with VOCs alpha, delta and omicron. Core mutant constellations for these VOCs and curve overlaps for variants in each set of haplotypes demonstrated significant decoupling patterns, suggesting processes of emergence. Finally, construction of a "haplotype network" that describes the viral population landscape of Australia throughout the COVID-19 pandemic revealed significant and unanticipated seasonal patterns of emergence and diversification. These results provide a unique window into our evolutionary understanding of a human pathogen of great significance. They may guide future research into mitigation and prediction strategies for future VOCs.
RESUMO
Many biological systems across scales of size and complexity exhibit a time-varying complex network structure that emerges and self-organizes as a result of interactions with the environment. Network interactions optimize some intrinsic cost functions that are unknown and involve for example energy efficiency, robustness, resilience, and frailty. A wide range of networks exist in biology, from gene regulatory networks important for organismal development, protein interaction networks that govern physiology and metabolism, and neural networks that store and convey information to networks of microbes that form microbiomes within hosts, animal contact networks that underlie social systems, and networks of populations on the landscape connected by migration. Increasing availability of extensive (big) data is amplifying our ability to quantify biological networks. Similarly, theoretical methods that describe network structure and dynamics are being developed. Beyond static networks representing snapshots of biological systems, collections of longitudinal data series can help either at defining and characterizing network dynamics over time or analyzing the dynamics constrained to networked architectures. Moreover, due to interactions with the environment and other biological systems, a biological network may not be fully observable. Also, subnetworks may emerge and disappear as a result of the need for the biological system to cope with for example invaders or new information flows. The confluence of these developments renders tractable the question of how the structure of biological networks predicts and controls network dynamics. In particular, there may be structural features that result in homeostatic networks with specific higher-order statistics (e.g., multifractal spectrum), which maintain stability over time through robustness and/or resilience to perturbation. Alternative, plastic networks may respond to perturbation by (adaptive to catastrophic) shifts in structure. Here, we explore the opportunity for discovering universal laws connecting the structure of biological networks with their function, positioning them on the spectrum of time-evolving network structure, that is, dynamics of networks, from highly stable to exquisitely sensitive to perturbation. If such general laws exist, they could transform our ability to predict the response of biological systems to perturbations-an increasingly urgent priority in the face of anthropogenic changes to the environment that affect life across the gamut of organizational scales.
Assuntos
Algoritmos , Animais , HomeostaseRESUMO
Seasonal behaviour is an attribute of many viral diseases. Like other 'winter' RNA viruses, infections caused by the causative agent of COVID-19, SARS-CoV-2, appear to exhibit significant seasonal changes. Here we discuss the seasonal behaviour of COVID-19, emerging viral phenotypes, viral evolution, and how the mutational landscape of the virus affects the seasonal attributes of the disease. We propose that the multiple seasonal drivers behind infectious disease spread (and the spread of COVID-19 specifically) are in 'trade-off' relationships and can be better described within a framework of a 'triangle of viral persistence' modulated by the environment, physiology, and behaviour. This 'trade-off' exists as one trait cannot increase without a decrease in another. We also propose that molecular components of the virus can act as sensors of environment and physiology, and could represent molecular culprits of seasonality. We searched for flexible protein structures capable of being modulated by the environment and identified a galectin-like fold within the N-terminal domain of the spike protein of SARS-CoV-2 as a potential candidate. Tracking the prevalence of mutations in this structure resulted in the identification of a hemisphere-dependent seasonal pattern driven by mutational bursts. We propose that the galectin-like structure is a frequent target of mutations because it helps the virus evade or modulate the physiological responses of the host to further its spread and survival. The flexible regions of the N-terminal domain should now become a focus for mitigation through vaccines and therapeutics and for prediction and informed public health decision making.
RESUMO
INTRODUCTION: While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED: Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION: Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Assuntos
Evolução Molecular , Proteoma , Genômica , Humanos , Filogenia , Dobramento de Proteína , Proteoma/genéticaRESUMO
Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf's law, a special case of the scale-free distribution, the Heaps' law describing sublinear growth typical of economies of scales, and the Menzerath-Altmann's law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a "triangle of persistence" describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A "causal" word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.