RESUMO
Since its introduction in the human population, SARS-CoV-2 has evolved into multiple clades, but the events in its intrahost diversification are not well understood. Here, we compare three-dimensional (3D) self-organized neural haplotype maps (SOMs) of SARS-CoV-2 from thirty individual nasopharyngeal diagnostic samples obtained within a 19-day interval in Madrid (Spain), at the time of transition between clades 19 and 20. SOMs have been trained with the haplotype repertoire present in the mutant spectra of the nsp12- and spike (S)-coding regions. Each SOM consisted of a dominant neuron (displaying the maximum frequency), surrounded by a low-frequency neuron cloud. The sequence of the master (dominant) neuron was either identical to that of the reference Wuhan-Hu-1 genome or differed from it at one nucleotide position. Six different deviant haplotype sequences were identified among the master neurons. Some of the substitutions in the neural clouds affected critical sites of the nsp12-nsp8-nsp7 polymerase complex and resulted in altered kinetics of RNA synthesis in an in vitro primer extension assay. Thus, the analysis has identified mutations that are relevant to modification of viral RNA synthesis, present in the mutant clouds of SARS-CoV-2 quasispecies. These mutations most likely occurred during intrahost diversification in several COVID-19 patients, during an initial stage of the pandemic, and within a brief time period.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Haplótipos , Proteínas não Estruturais Virais , RNA ViralRESUMO
The creation of fitness maps from viral populations especially in the case of RNA viruses, with high mutation rates producing quasispecies, is complex since the mutant spectrum is in a very high-dimensional space. In this work, a new approach is presented using a class of neural networks, Self-Organized Maps (SOM), to represent realistic fitness landscapes in two RNA viruses: Human Immunodeficiency Virus type 1 (HIV-1) and Hepatitis C Virus (HCV). This methodology has proven to be very effective in the classification of viral quasispecies, using as criterium the mutant sequences in the population. With HIV-1, the fitness landscapes are constructed by representing the experimentally determined fitness on the sequence map. This approach permitted the depiction of the evolutionary paths of the variants subjected to processes of fitness loss and gain in cell culture. In the case of HCV, the efficiency was measured as a function of the frequency of each haplotype in the population by ultra-deep sequencing. The fitness landscapes obtained provided information on the efficiency of each variant in the quasispecies environment, that is, in relation to the entire spectrum of mutants. With the SOM maps, it is possible to determine the evolutionary dynamics of the different haplotypes.
Assuntos
HIV-1 , Hepatite C , Humanos , HIV-1/genética , MutaçãoRESUMO
Catalytic reaction networks consist of molecular arrays interconnected by autocatalysis and cross-catalytic pathways among the reactants, and serve as bottom-up models for the design and understanding of molecular evolution and emergent phenomena. An important example of the latter is the emergence of homochirality in biomolecules during chemical evolution. This chiral symmetry breaking is triggered by bistability and bifurcation in networks of chiral replicators. Spontaneous mirror symmetry breaking (SMSB) results from hypercyclic connectivity when the chirality and enantioselectivity of the replicators are taken into account. Heretofore, SMSB has been generally understood as involving chemical transformations yielding scalemic outcomes as non-equilibrium steady states (NESS). Here, in marked contrast, we consider the chaotic regime, in which steady states do not exist. The dissipation, or entropy production, is chaotic as is the exchange entropy. The rate of change of the total system entropy, governed by the entropy balance equation, is also chaotic. Subsequent to the mirror symmetry breaking transition, the time averaged entropy production is minimized in the final chaotic chiral state with respect to the former chaotic racemic state. The chemical forces (i.e., the affinities) evolve in time so as to lower the sum of the entropy production and the exchange entropy, in compliance with the general evolution criterion extended to reaction networks subject to volumetric open flow.
RESUMO
MOTIVATION: Self-organizing maps (SOMs) are readily available bioinformatics methods for clustering and visualizing high-dimensional data, provided that such biological information is previously transformed to fixed-size, metric-based vectors. To increase the usefulness of SOM-based approaches for the analysis of genomic sequence data, novel representation methods are required that automatically and objectively transform aligned nucleotide sequences into numeric vectors, dealing with both nucleotide ambiguity and gaps derived from sequence alignment. RESULTS: Six different codification variants based on Euclidean space, just like SOM processing, have been tested using two SOM models: the classical Kohonen's SOM and growing cell structures. They have been applied to two different sets of sequences: 32 sequences of small sub-unit ribosomal RNA from organisms belonging to the three domains of life, and 44 sequences of the reverse transcriptase region of the pol gene of human immunodeficiency virus type 1 belonging to different groups and sub-types. Our results show that the most important factor affecting the accuracy of sequence clustering is the assignment of an extra weight to the presence of alignment-derived gaps. Although each of the codification variants shows a different level of taxonomic consistency, the results are in agreement with sequence-based phylogenetic reconstructions and anticipate a broad applicability of this codification method.
Assuntos
Algoritmos , Biologia Computacional , Genoma Humano , Redes Neurais de Computação , Filogenia , RNA Ribossômico/genética , Produtos do Gene pol do Vírus da Imunodeficiência Humana/genética , Análise por Conglomerados , Genômica , Humanos , Alinhamento de SequênciaRESUMO
BACKGROUND: We describe the pioneering experience of a Spanish family pursuing the goal of understanding their own personal genetic data to the fullest possible extent using Direct to Consumer (DTC) tests. With full informed consent from the Corpas family, all genotype, exome and metagenome data from members of this family, are publicly available under a public domain Creative Commons 0 (CC0) license waiver. All scientists or companies analysing these data ("the Corpasome") were invited to return results to the family. METHODS: We released 5 genotypes, 4 exomes, 1 metagenome from the Corpas family via a blog and figshare under a public domain license, inviting scientists to join the crowdsourcing efforts to analyse the genomes in return for coauthorship or acknowldgement in derived papers. Resulting analysis data were compiled via social media and direct email. RESULTS: Here we present the results of our investigations, combining the crowdsourced contributions and our own efforts. Four companies offering annotations for genomic variants were applied to four family exomes: BIOBASE, Ingenuity, Diploid, and GeneTalk. Starting from a common VCF file and after selecting for significant results from company reports, we find no overlap among described annotations. We additionally report on a gut microbiome analysis of a member of the Corpas family. CONCLUSIONS: This study presents an analysis of a diverse set of tools and methods offered by four DTC companies. The striking discordance of the results mirrors previous findings with respect to DTC analysis of SNP chip data, and highlights the difficulties of using DTC data for preventive medical care. To our knowledge, the data and analysis results from our crowdsourced study represent the most comprehensive exome and analysis for a family quartet using solely DTC data generation to date.
Assuntos
Crowdsourcing , Família , Testes Genéticos , Genômica , Crowdsourcing/métodos , Exoma , Feminino , Frequência do Gene , Testes Genéticos/métodos , Genômica/métodos , Genótipo , Humanos , Masculino , Metagenoma , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Característica Quantitativa Herdável , EspanhaRESUMO
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
Assuntos
Disciplinas das Ciências Biológicas/educação , Biologia Computacional/educação , Currículo , Mineração de Dados , Sistemas de Gerenciamento de Base de Dados , Linguagens de Programação , Design de Software , EnsinoRESUMO
SUMMARY: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. AVAILABILITY: http://iann.pro/iannviewer CONTACT: manuel.corpas@tgac.ac.uk.
Assuntos
Disciplinas das Ciências Biológicas , Software , Aniversários e Eventos Especiais , Congressos como Assunto , InternetRESUMO
On May 23-24, 2024, the 1st Spanish Conference on Genomic Medicine convened in Madrid, Spain. An international and multidisciplinary group of experts gathered to discuss the current state and prospects of genomic medicine in the Spanish-speaking world. There were 278 attendees from Latin America, US, UK, Germany, and Spain, and the topics covered included rare diseases, genome medicine in national health systems (NHSs), artificial intelligence, and commercial development ventures. One particular area of attention was our still sketchy understanding of genome variants. This is evidenced by the fact that many diagnoses in rare diseases continue to yield odysseys that take years, with up to 50% of cases that may go undiagnosed. Since a lot of the genome remains to poorly understood, as new technologies such as long read sequencing become more ubiquitous and cheaper, it is expected that current gaps in genome references will improve. However, disparities within the NHSs suggest that advancements do not necessarily rely on resources but the appropriate regulation and pathways for education of professionals being properly implemented. This is where Genomics England can be a clinical genomic implementation example for routine health care. Ethical challenges, including privacy, informed consent, equity, representation, and genetic discrimination, also require the need for robust legal frameworks and culturally sensitive practices. The future of genomics in Spanish-speaking countries depends on addressing all of these issues. By navigating these challenges responsibly, Spanish-speaking countries can harness the power of genomics to improve health outcomes and advance scientific knowledge, ensuring that the benefits of personalized medicine are realized in an inclusive and equitable manner.
Assuntos
Genômica , Humanos , Espanha , Inteligência Artificial , Congressos como Assunto , Doenças Raras/terapia , Doenças Raras/genética , Medicina de PrecisãoRESUMO
A generalized Fisher equation (GFE) relates the time derivative of the average of the intrinsic rate of growth to its variance. The GFE is an exact mathematical result that has been widely used in population dynamics and genetics, where it originated. Here we demonstrate that the GFE can also be useful in other fields, specifically in chemistry, with models of two chemical reaction systems for which the mechanisms and rate coefficients correspond reasonably well to experiments. A bad fit of the GFE can be a sign of high levels of measurement noise; for low or moderate levels of noise, fulfillment of the GFE is not degraded. Hence, the GFE presents a noise threshold that may be used to test the validity of experimental measurements without requiring any additional information. In a different approach information about the system (model) is included in the calculations. In that case, the discrepancy with the GFE can be used as an optimization criterion for the determination of rate coefficients in a given reaction mechanism.
RESUMO
A fundamental landmark in the emergence and maintenance of the first proto-biological systems must have been the formation of a "closed" metabolic organization, and this paper describes a stochastic analysis of a simple model of a system that is closed to efficient causation. Although it shows an absorbing barrier corresponding to the trivial solution that implies collapse and extinction, for certain values of the kinetic parameters it can also show a "coexistence state" in which there are non-null populations of its intermediates, which corresponds approximately to a non-trivial deterministic stable steady state. Depending on the initial conditions, fluctuations can drive the system either to the self-maintaining regime or to extinction, with different probabilities. Different lines of equal probability have been obtained and compared with the deterministic results, and the average time for reaching these states (characteristic time) has been estimated. The system shows strong dependence on volume size, and there is a critical volume below which it collapses very rapidly. The characteristic time is also affected by the volume, with faster responses for lower system volumes. All these results are discussed in the context of the origin of living organization.
Assuntos
Redes e Vias Metabólicas/fisiologia , Modelos Biológicos , Animais , Catálise , Extinção Biológica , Processos Estocásticos , Teoria de SistemasRESUMO
We introduce systematic approaches to chemical kinetics based on the use of phase-phase (log-log) representations of the rate equations. For slow processes, we obtain a corrected form of the mass-action law, where the concentrations are replaced by kinetic activities. For fast reactions, delay expressions are derived. The phase-phase expansion is, in general, applicable to kinetic and transport processes. A mechanism is introduced for the occurrence of a generalized mass-action law as a result of self-similar recycling. We show that our self-similar recycling model applied to prothrombin assays reproduces the empirical equations for the International Normalized Ratio calibration (INR), as well as the Watala, Golanski, and Kardas relation (WGK) for the dependence of the INR on the concentrations of coagulation factors. Conversely, the experimental calibration equation for the INR, combined with the experimental WGK relation, without the use of theoretical models, leads to a generalized mass-action type kinetic law.
Assuntos
Modelos Biológicos , Bioensaio , Transporte Biológico , Calibragem , Simulação por Computador , Cinética , Protrombina/metabolismoRESUMO
Populations of RNA viruses are composed of complex and dynamic mixtures of variant genomes that are termed mutant spectra or mutant clouds. This applies also to SARS-CoV-2, and mutations that are detected at low frequency in an infected individual can be dominant (represented in the consensus sequence) in subsequent variants of interest or variants of concern. Here we briefly review the main conclusions of our work on mutant spectrum characterization of hepatitis C virus (HCV) and SARS-CoV-2 at the nucleotide and amino acid levels and address the following two new questions derived from previous results: (i) how is the SARS-CoV-2 mutant and deletion spectrum composition in diagnostic samples, when examined at progressively lower cut-off mutant frequency values in ultra-deep sequencing; (ii) how the frequency distribution of minority amino acid substitutions in SARS-CoV-2 compares with that of HCV sampled also from infected patients. The main conclusions are the following: (i) the number of different mutations found at low frequency in SARS-CoV-2 mutant spectra increases dramatically (50- to 100-fold) as the cut-off frequency for mutation detection is lowered from 0.5% to 0.1%, and (ii) that, contrary to HCV, SARS-CoV-2 mutant spectra exhibit a deficit of intermediate frequency amino acid substitutions. The possible origin and implications of mutant spectrum differences among RNA viruses are discussed.
RESUMO
A living organism must not only organize itself from within; it must also maintain its organization in the face of changes in its environment and degradation of its components. We show here that a simple (M,R)-system consisting of three interlocking catalytic cycles, with every catalyst produced by the system itself, can both establish a non-trivial steady state and maintain this despite continuous loss of the catalysts by irreversible degradation. As long as at least one catalyst is present at a sufficient concentration in the initial state, the others can be produced and maintained. The system shows bistability, because if the amount of catalyst in the initial state is insufficient to reach the non-trivial steady state the system collapses to a trivial steady state in which all fluxes are zero. It is also robust, because if one catalyst is catastrophically lost when the system is in steady state it can recreate the same state. There are three elementary flux modes, but none of them is an enzyme-maintaining mode, the entire network being necessary to maintain the two catalysts.
Assuntos
Enzimas/química , Metabolismo , Modelos Biológicos , CatáliseRESUMO
A new approach for parameter estimation in chemical kinetics has been recently proposed (Ross et al. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 12777). It makes use of an optimization criterion based on a Generalized Fisher Equation (GFE). Its utility has been demonstrated with two reaction mechanisms, the chlorite-iodide and Oregonator, which are computationally stiff systems. In this Article, the performance of the GFE-based algorithm is compared to that obtained from minimization of the squared distances between the observed and predicted concentrations obtained by solving the corresponding initial value problem (we call this latter approach "traditional" for simplicity). Comparison of the proposed GFE-based optimization method with the "traditional" one has revealed their differences in performance. This difference can be seen as a trade-off between speed (which favors GFE) and accuracy (which favors the traditional method). The chlorite-iodide and Oregonator systems are again chosen as case studies. An identifiability analysis is performed for both of them, followed by an optimal experimental design based on the Fisher Information Matrix (FIM). This allows to identify and overcome most of the previously encountered identifiability issues, improving the estimation accuracy. With the new data, obtained from optimally designed experiments, it is now possible to estimate effectively more parameters than with the previous data. This result, which holds for both GFE-based and traditional methods, stresses the importance of an appropriate experimental design. Finally, a new hybrid method that combines advantages from the GFE and traditional approaches is presented.
Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Cloretos/química , Iodetos/química , CinéticaRESUMO
We develop a method for parameter evaluation from incomplete data. Improved estimates of the desired parameters are evaluated step by step, from experiment to experiment by using both Bayesian and informational methods. We make dynamical, improved predictions while the experiments are still going on and keep and interpret information about local fluctuations, which is lost on applying global techniques. The input of information in small packets leads to semi-analytic methods for data processing. An evolution criterion for parameter evaluation, similar to Fisher's theorem of population selection, is derived. We develop direct processing methods, which can be applied to low dimensional systems, semi-analytic methods based on direct or double logarithmic phase expansions, steepest descent approaches, variation and perturbation methods. The techniques are illustrated by developing a method of long-term planning of treatments with oral anticoagulants based on limited clinical data. The efficiency of treatment by oral anticoagulants depends strongly on various anthropometric and genotypic factors, which lead to large variations of the clinical response. We use the clinical data, which accumulates from medical consultations, for extracting improved, incremental information about the statistical properties of the kinetic and anthropometric parameters for a given patient, which in turn is used for making repeated, improved clinical predictions as the treatment proceeds.
Assuntos
Anticoagulantes/farmacologia , Modelos Estatísticos , Administração Oral , Anticoagulantes/administração & dosagem , Teorema de Bayes , Simulação por Computador , Humanos , Fatores de TempoRESUMO
RNA viruses replicate as complex mutant spectra termed viral quasispecies. The frequency of each individual genome in a mutant spectrum depends on its rate of generation and its relative fitness in the replicating population ensemble. The advent of deep sequencing methodologies allows for the first-time quantification of haplotype abundances within mutant spectra. There is no information on the haplotype profile of the resident genomes and how the landscape evolves when a virus replicates in a controlled cell culture environment. Here, we report the construction of intramutant spectrum haplotype landscapes of three amplicons of the NS5A-NS5B coding region of hepatitis C virus (HCV). Two-dimensional (2D) neural networks were constructed for 44 related HCV populations derived from a common clonal ancestor that was passaged up to 210 times in human hepatoma Huh-7.5 cells in the absence of external selective pressures. The haplotype profiles consisted of an extended dense basal platform, from which a lower number of protruding higher peaks emerged. As HCV increased its adaptation to the cells, the number of haplotype peaks within each mutant spectrum expanded, and their distribution shifted in the 2D network. The results show that extensive HCV replication in a monotonous cell culture environment does not limit HCV exploration of sequence space through haplotype peak movements. The landscapes reflect dynamic variation in the intramutant spectrum haplotype profile and may serve as a reference to interpret the modifications produced by external selective pressures or to compare with the landscapes of mutant spectra in complex in vivo environments. IMPORTANCE The study provides for the first time the haplotype profile and its variation in the course of virus adaptation to a cell culture environment in the absence of external selective constraints. The deep sequencing-based self-organized maps document a two-layer haplotype distribution with an ample basal platform and a lower number of protruding peaks. The results suggest an inferred intramutant spectrum fitness landscape structure that offers potential benefits for virus resilience to mutational inputs.
Assuntos
Adaptação Fisiológica/genética , Genoma Viral/genética , Haplótipos/genética , Hepacivirus/genética , RNA Polimerase Dependente de RNA/genética , Proteínas não Estruturais Virais/genética , Substituição de Aminoácidos/genética , Linhagem Celular Tumoral , Mapeamento Cromossômico , Evolução Molecular , Hepacivirus/crescimento & desenvolvimento , Hepatite C/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação/genética , Quase-Espécies/genética , RNA Viral/genética , Replicação ViralRESUMO
This paper presents an extension of stoichiometric analysis in systems where the catalytic compounds (enzymes) are also intermediates of the metabolic network (dual property), so they are produced and degraded by the reaction network itself. To take this property into account, we introduce the definition of enzyme-maintaining mode, a set of reactions that produces its own catalyst and can operate at stationary state. Moreover, an enzyme-maintaining mode is defined as elementary with respect to a given reaction if the removal of any of the remaining reactions causes the cessation of any steady state flux through this reference reaction. These concepts are applied to determine the network structure of a simple self-maintaining system.
Assuntos
Redes e Vias Metabólicas/fisiologia , Modelos Biológicos , Biologia de Sistemas , Animais , Catálise , Enzimas/fisiologiaRESUMO
We derive exact Langevin-type equations governing quasispecies dynamics. The inherent multiplicative noise has both real and imaginary parts. The numerical simulation of the underlying complex stochastic partial differential equations is carried out employing the Cholesky decomposition for the noise covariance matrix. This noise produces unavoidable spatiotemporal density fluctuations about the mean-field value. In two dimensions, the fluctuations are suppressed only when the diffusion time scale is much smaller than the amplification time scale for the master species.
RESUMO
The nonequilibrium dynamic fluctuations of a stochastic version of the Gray-Scott (GS) model are studied analytically in leading order in perturbation theory by means of the dynamic renormalization group. There is an attracting stable fixed point at one-loop order, and the asymptotic scaling of the correlation functions is predicted for both spatial and temporally correlated noise sources. New effective three-body reaction terms, not present in the original GS model, are induced by the combined interplay of the fluctuations and nonlinearities.
RESUMO
We study a general class of nonlinear macroscopic evolution equations with "transport" and "reaction" terms which describe the dynamics of a species of moving individuals (atoms, molecules, quasiparticles, organisms, etc.). We consider that two types of individuals exist, "not marked" and "marked," respectively. We assume that the concentrations of both types of individuals are measurable and that they obey a neutrality condition, that is, the kinetic and transport properties of the "not marked" and "marked" individuals are identical. We suggest a response experiment, which consists in varying the fraction of "marked" individuals with the preservation of total fluxes, and show that the response of the system can be represented by a linear superposition law even though the underlying dynamics of the system is in general highly nonlinear. The linear response law is valid even for large perturbations and is not the result of a linearization procedure but rather a necessary consequence of the neutrality condition. First, we apply the response theorem to chemical kinetics, where the "marked species" is a molecule labeled with a radioactive isotope and there is no kinetic isotope effect. The susceptibility function of the response law can be related to the reaction mechanism of the process. Secondly we study the geographical distribution of the nonrecurrent, nonreversible neutral mutations of the nonrecombining portion of the Y chromosome from human populations and show that the fraction of mutants at a given point in space and time obeys a linear response law of the type introduced in this paper. The theory may be used for evaluating the geographic position and the moment in time where and when a mutation originated.