RESUMEN
The interactions of microorganisms among themselves and with their multicellular host take place at the microscale, forming complex networks and spatial patterns. Existing technology does not allow the simultaneous investigation of spatial interactions between a host and the multitude of its colonizing microorganisms, which limits our understanding of host-microorganism interactions within a plant or animal tissue. Here we present spatial metatranscriptomics (SmT), a sequencing-based approach that leverages 16S/18S/ITS/poly-d(T) multimodal arrays for simultaneous host transcriptome- and microbiome-wide characterization of tissues at 55-µm resolution. We showcase SmT in outdoor-grown Arabidopsis thaliana leaves as a model system, and find tissue-scale bacterial and fungal hotspots. By network analysis, we study inter- and intrakingdom spatial interactions among microorganisms, as well as the host response to microbial hotspots. SmT provides an approach for answering fundamental questions on host-microbiome interplay.
RESUMEN
Neutrophils play critical roles in a broad spectrum of clinical conditions. Accordingly, manipulation of neutrophil function may provide a powerful immunotherapeutic approach. However, due to neutrophils characteristic short half-life and their large population number, this possibility was considered impractical. Here we describe the identification of peptides which specifically bind either murine or human neutrophils. Although the murine and human neutrophil-specific peptides are not cross-reactive, we identified CD177 as the neutrophil-expressed binding partner in both species. Decorating nanoparticles with a neutrophil-specific peptide confers neutrophil specificity and these neutrophil-specific nanoparticles accumulate in sites of inflammation. Significantly, we demonstrate that encapsulating neutrophil modifying small molecules within these nanoparticles yields specific modulation of neutrophil function (ROS production, degranulation, polarization), intracellular signaling and longevity both in vitro and in vivo. Collectively, our findings demonstrate that neutrophil specific targeting may serve as a novel mode of immunotherapy in disease.
Asunto(s)
Nanopartículas , Neutrófilos , Ratones , Humanos , Animales , Neutrófilos/metabolismo , Especies Reactivas de Oxígeno/metabolismo , Inflamación/metabolismoRESUMEN
The community structure in the plant-associated microbiome depends collectively on host-microbe, microbe-microbe and host-microbe-microbe interactions. The ensemble of interactions between the host and microbial consortia may lead to outcomes that are not easily predicted from pairwise interactions. Plant-microbe-microbe interactions are important to plant health but could depend on both host and microbe strain variation. Here we study interactions between groups of naturally co-existing commensal and pathogenic Pseudomonas strains in the Arabidopsis thaliana phyllosphere. We find that commensal Pseudomonas prompt a host response that leads to selective inhibition of a specific pathogenic lineage, resulting in plant protection. The extent of protection depends on plant genotype, supporting that these effects are host-mediated. Strain-specific effects are also demonstrated by one individual Pseudomonas isolate eluding the plant protection provided by commensals. Our work highlights how within-species genetic differences in both hosts and microbes can affect host-microbe-microbe dynamics.
Asunto(s)
Arabidopsis , Microbiota , Arabidopsis/genética , Plantas , Pseudomonas , SimbiosisRESUMEN
Plants are protected from pathogens not only by their own immunity but often also by colonizing commensal microbes. In Arabidopsis thaliana, a group of cryptically pathogenic Pseudomonas strains often dominates local populations. This group coexists in nature with commensal Pseudomonas strains that can blunt the deleterious effects of the pathogens in the laboratory. We have investigated the interaction between one of the Pseudomonas pathogens and 99 naturally co-occurring commensals, finding plant protection to be common among non-pathogenic Pseudomonas. While protective ability is enriched in one specific lineage, there is also a substantial variation for this trait among isolates of this lineage. These functional differences do not align with core-genome phylogenies, suggesting repeated gene inactivation or loss as causal. Using genome-wide association, we discovered that different bacterial genes are linked to plant protection in each lineage. We validated a protective role of several lineage-specific genes by gene inactivation, highlighting iron acquisition and biofilm formation as prominent mechanisms of plant protection in this Pseudomonas lineage. Collectively, our work illustrates the importance of functional redundancy in plant protective traits across an important group of commensal bacteria.
Asunto(s)
Arabidopsis , Arabidopsis/genética , Arabidopsis/microbiología , Genes Bacterianos , Estudio de Asociación del Genoma Completo , Pseudomonas/genética , SimbiosisRESUMEN
Antibodies provide a comprehensive record of the encounters with threats and insults to the immune system. The ability to examine the repertoire of antibodies in serum and discover those that best represent "discriminating features" characteristic of various clinical situations, is potentially very useful. Recently, phage display technologies combined with Next-Generation Sequencing (NGS) produced a powerful experimental methodology, coined "Deep-Panning", in which the spectrum of serum antibodies is probed. In order to extract meaningful biological insights from the tens of millions of affinity-selected peptides generated by Deep-Panning, advanced bioinformatics algorithms are a must. In this study, we describe Motifier, a computational pipeline comprised of a set of algorithms that systematically generates discriminatory peptide motifs based on the affinity-selected peptides identified by Deep-Panning. These motifs are shown to effectively characterize antibody binding activities and through the implementation of machine-learning protocols are shown to accurately classify complex antibody mixtures representing various biological conditions.
Asunto(s)
Anticuerpos/química , Biología Computacional/métodos , Algoritmos , Secuencias de Aminoácidos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Aprendizaje Automático , Biblioteca de PéptidosRESUMEN
Plants produce diverse metabolites to cope with the challenges presented by complex and ever-changing environments. These challenges drive the diversification of specialized metabolites within and between plant species. However, we are just beginning to understand how frequently new alleles arise controlling specialized metabolite diversity and how the geographic distribution of these alleles may be structured by ecological and demographic pressures. Here, we measure the variation in specialized metabolites across a population of 797 natural Arabidopsis thaliana accessions. We show that a combination of geography, environmental parameters, demography and different genetic processes all combine to influence the specific chemotypes and their distribution. This showed that causal loci in specialized metabolism contain frequent independently generated alleles with patterns suggesting potential within-species convergence. This provides a new perspective about the complexity of the selective forces and mechanisms that shape the generation and distribution of allelic variation that may influence local adaptation.
Since plants cannot move, they have evolved chemical defenses to help them respond to changes in their surroundings. For example, where animals run from predators, plants may produce toxins to put predators off. This approach is why plants are such a rich source of drugs, poisons, dyes and other useful substances. The chemicals plants produce are known as specialized metabolites, and they can change a lot between, and even within, plant species. The variety of specialized metabolites is a result of genetic changes and evolution over millions of years. Evolution is a slow process, yet plants are able to rapidly develop new specialized metabolites to protect them from new threats. Even different populations of the same species produce many distinct metabolites that help them survive in their surroundings. However, the factors that lead plants to produce new metabolites are not well understood, and it is not known how this affects genetic variation. To gain a better understanding of this process, Katz et al. studied 797 European variants of a common weed species called Arabidopsis thaliana, which is widely studied. The investigation found that many factors affect the range of specialized metabolites in each variant. These included local geography and environment, as well as genetics and population history (demography). Katz et al. revealed a pattern of relationships between the variants that could mirror their evolutionary history as the species spread and adapted to new locations. These results highlight the complex network of factors that affect plant evolution. Rapid diversification is key to plant survival in new and changing environments and has resulted in a wide range of specialized metabolites. As such they are of interest both for studying plant evolution and for understanding their ecology. Expanding similar work to more populations and other species will broaden the scope of our ability to understand how plants adapt to their surroundings.
Asunto(s)
Adaptación Fisiológica/genética , Arabidopsis/genética , Arabidopsis/metabolismo , Ambiente , Variación Genética , Genoma de Planta , Adaptación Fisiológica/fisiología , Europa (Continente) , Geografía , Redes y Vías Metabólicas , FenotipoRESUMEN
Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.
Asunto(s)
Conformación Proteica , Programas Informáticos , Sitios de Unión , Proteínas de la Nucleocápside de Coronavirus/química , Proteínas de Unión al ADN/química , Fosfoproteínas/química , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/fisiología , Proteínas de Unión al ARN/química , Alineación de Secuencia , Análisis de Secuencia de ProteínaRESUMEN
Hybrid necrosis in plants arises from conflict between divergent alleles of immunity genes contributed by different parents, resulting in autoimmunity. We investigate a severe hybrid necrosis case in Arabidopsis thaliana, where the hybrid does not develop past the cotyledon stage and dies 3 weeks after sowing. Massive transcriptional changes take place in the hybrid, including the upregulation of most NLR (nucleotide-binding site leucine-rich repeat) disease-resistance genes. This is due to an incompatible interaction between the singleton TIR-NLR gene DANGEROUS MIX 10 (DM10), which was recently relocated from a larger NLR cluster, and an unlinked locus, DANGEROUS MIX 11 (DM11). There are multiple DM10 allelic variants in the global A. thaliana population, several of which have premature stop codons. One of these, which has a truncated LRR-PL (leucine-rich repeat [LRR]-post-LRR) region, corresponds to the DM10 risk allele. The DM10 locus and the adjacent genomic region in the risk allele carriers are highly differentiated from those in the nonrisk carriers in the global A. thaliana population, suggesting that this allele became geographically widespread only relatively recently. The DM11 risk allele is much rarer and found only in two accessions from southwestern Spain-a region from which the DM10 risk haplotype is absent-indicating that the ranges of DM10 and DM11 risk alleles may be nonoverlapping.
Asunto(s)
Arabidopsis/genética , Hibridación Genética , Proteínas NLR/genética , Alelos , Estudio de Asociación del Genoma Completo , Necrosis , Sitios de Carácter CuantitativoRESUMEN
Patterns observed by examining the evolutionary relationships among proteins of common origin can reveal the structural and functional importance of specific residue positions. In particular, amino acids that are highly conserved (i.e., their positions evolve at a slower rate than other positions) are particularly likely to be of biological importance, for example, for ligand binding. ConSurf is a bioinformatics tool for accurately estimating the evolutionary rate of each position in a protein family. Here we introduce a new release of ConSurf-DB, a database of precalculated ConSurf evolutionary conservation profiles for proteins of known structure. ConSurf-DB provides high-accuracy estimates of the evolutionary rates of the amino acids in each protein. A reliable estimate of a query protein's evolutionary rates depends on having a sufficiently large number of effective homologues (i.e., nonredundant yet sufficiently similar). With current sequence data, ConSurf-DB covers 82% of the PDB proteins. It will be updated on a regular basis to ensure that coverage remains high-and that it might even increase. Much effort was dedicated to improving the user experience. The repository is available at https://consurfdb.tau.ac.il/. BROADER AUDIENCE: By comparing a protein to other proteins of similar origin, it is possible to determine the extent to which each amino acid position in the protein evolved slowly or rapidly. A protein's evolutionary profile can provide valuable insights: For example, amino acid positions that are highly conserved (i.e., evolved slowly) are particularly likely to be of structural and/or functional importance, for example, for ligand binding and catalysis. We introduce here a new and improved version of ConSurf-DB, a continually updated database that provides precalculated evolutionary profiles of proteins with known structure.
Asunto(s)
Biología Computacional/métodos , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Secuencia Conservada , Bases de Datos de Proteínas , Evolución Molecular , Conformación ProteicaRESUMEN
The presence of pathogen-specific antibodies in an individual's blood-sample is used as an indication of previous exposure and infection to that specific pathogen (e.g., virus or bacterium). Measurement of the diagnostic antibodies is routinely achieved using solid phase immuno-assays such as ELISA tests and western blots. Here, we describe a sero-diagnostic approach based on phage-display of epitope arrays we term "Domain-Scan". We harness Next-generation sequencing (NGS) to measure the serum binding to dozens of epitopes derived from HIV-1 and HCV simultaneously. The distinction of healthy individuals from those infected with either HIV-1 or HCV, is modeled as a machine-learning classification problem, in which each determinant ("domain") is considered as a feature, and its NGS read-out provides values that correspond to the level of determinant-specific antibodies in the sample. We show that following training of a machine-learning model on labeled examples, we can very accurately classify unlabeled samples and pinpoint the domains that contribute most to the classification. Our experimental/computational Domain-Scan approach is general and can be adapted to other pathogens as long as sufficient training samples are provided.
Asunto(s)
Enfermedades Transmisibles/diagnóstico , Anticuerpos Anti-VIH/sangre , Proteína p24 del Núcleo del VIH/inmunología , Proteínas gp160 de Envoltorio del VIH/inmunología , Infecciones por VIH/diagnóstico , Anticuerpos contra la Hepatitis C/sangre , Antígenos de la Hepatitis C/inmunología , Hepatitis C/diagnóstico , Aprendizaje Automático , Biblioteca de Péptidos , Pruebas Serológicas/métodos , Serodiagnóstico del SIDA/métodos , Secuencia de Aminoácidos , Reacciones Antígeno-Anticuerpo , Secuencia de Bases , Código de Barras del ADN Taxonómico , ADN Recombinante/inmunología , Epítopos/genética , Epítopos/inmunología , Vectores Genéticos , Proteína p24 del Núcleo del VIH/genética , Antígenos de la Hepatitis C/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Oligonucleótidos/genética , Oligonucleótidos/inmunología , Fragmentos de Péptidos/genética , Fragmentos de Péptidos/inmunología , Reacción en Cadena de la Polimerasa/métodosRESUMEN
The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.
Asunto(s)
Clasificación/métodos , Filogenia , Alineación de Secuencia , Programas Informáticos , Simulación por Computador , Reproducibilidad de los ResultadosRESUMEN
Classic alignment algorithms utilize scoring functions which maximize similarity or minimize edit distances. These scoring functions account for both insertion-deletion (indel) and substitution events. In contrast, alignments based on stochastic models aim to explicitly describe the evolutionary dynamics of sequences by inferring relevant probabilistic parameters from input sequences. Despite advances in stochastic modeling during the last two decades, scoring-based methods are still dominant, partially due to slow running times of probabilistic approaches. Alignment inference using stochastic models involves estimating the probability of events, such as the insertion or deletion of a specific number of characters. In this work, we present SimBa-SAl, a simulation-based approach to statistical alignment inference, which relies on an explicit continuous time Markov model for both indels and substitutions. SimBa-SAl has several advantages. First, using simulations, it decouples the estimation of event probabilities from the inference stage, which allows the introduction of accelerations to the alignment inference procedure. Second, it is general and can accommodate various stochastic models of indel formation. Finally, it allows computing the maximum-likelihood alignment, the probability of a given pair of sequences integrated over all possible alignments, and sampling alternative alignments according to their probability. We first show that SimBa-SAl allows accurate estimation of parameters of the long-indel model previously developed by Miklós et al. (2004). We next show that SimBa-SAl is more accurate than previously developed pairwise alignment algorithms, when analyzing simulated as well as empirical data sets. Finally, we study the goodness-of-fit of the long-indel and TKF91 models. We show that although the long-indel model fits the data sets better than TKF91, there is still room for improvement concerning the realistic modeling of evolutionary sequence dynamics.
Asunto(s)
Clasificación/métodos , Modelos Estadísticos , Filogenia , Simulación por Computador , Evolución Molecular , Mutación INDEL/genéticaRESUMEN
Cation/proton antiporters (CPAs) play a major role in maintaining living cells' homeostasis. CPAs are commonly divided into two main groups, CPA1 and CPA2, and are further characterized by two main phenotypes: ion selectivity and electrogenicity. However, tracing the evolutionary relationships of these transporters is challenging because of the high diversity within CPAs. Here, we conduct comprehensive evolutionary analysis of 6537 representative CPAs, describing the full complexity of their phylogeny, and revealing a sequence motif that appears to determine central phenotypic characteristics. In contrast to previous suggestions, we show that the CPA1/CPA2 division only partially correlates with electrogenicity. Our analysis further indicates two acidic residues in the binding site that carry the protons in electrogenic CPAs, and a polar residue in the unwound transmembrane helix 4 that determines ion selectivity. A rationally designed triple mutant successfully converted the electrogenic CPA, EcNhaA, to be electroneutral.
Asunto(s)
Antiportadores/clasificación , Filogenia , Protones , Aminoácidos/metabolismo , Sitios de Unión , Cationes , Humanos , Modelos Moleculares , Mutación/genética , Transporte de Proteínas/efectos de los fármacos , Sodio/farmacología , Valinomicina/farmacologíaRESUMEN
Reproducible and robust data on antibody repertoires are invaluable for basic and applied immunology. Next-generation sequencing (NGS) of antibody variable regions has emerged as a powerful tool in systems immunology, providing quantitative molecular information on antibody polyclonal composition. However, major computational challenges exist when analyzing antibody sequences, from error handling to hypermutation profiles and clonal expansion analyses. In this work, we developed the ASAP (A webserver for Immunoglobulin-Seq Analysis Pipeline) webserver (https://asap.tau.ac.il). The input to ASAP is a paired-end sequence dataset from one or more replicates, with or without unique molecular identifiers. These datasets can be derived from NGS of human or murine antibody variable regions. ASAP first filters and annotates the sequence reads using public or user-provided germline sequence information. The ASAP webserver next performs various calculations, including somatic hypermutation level, CDR3 lengths, V(D)J family assignments, and V(D)J combination distribution. These analyses are repeated for each replicate. ASAP provides additional information by analyzing the commonalities and differences between the repeats ("joint" analysis). For example, ASAP examines the shared variable regions and their frequency in each replicate to determine which sequences are less likely to be a result of a sample preparation derived and/or sequencing errors. Moreover, ASAP clusters the data to clones and reports the identity and prevalence of top ranking clones (clonal expansion analysis). ASAP further provides the distribution of synonymous and non-synonymous mutations within the V genes somatic hypermutations. Finally, ASAP provides means to process the data for proteomic analysis of serum/secreted antibodies by generating a variable region database for liquid chromatography high resolution tandem mass spectrometry (LC-MS/MS) interpretation. ASAP is user-friendly, free, and open to all users, with no login requirement. ASAP is applicable for researchers interested in basic questions related to B cell development and differentiation, as well as applied researchers who are interested in vaccine development and monoclonal antibody engineering. By virtue of its user-friendliness, ASAP opens the antibody analysis field to non-expert users who seek to boost their research with immune repertoire analysis.
Asunto(s)
Biología Computacional/métodos , Inmunoglobulinas/genética , Análisis de Secuencia de ADN , Programas Informáticos , Navegador Web , Secuencia de Aminoácidos , Animales , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunoglobulinas/química , Recombinación V(D)JRESUMEN
Peptide-expressing phage display libraries are widely used for the interrogation of antibodies. Affinity selected peptides are then analyzed to discover epitope mimetics, or are subjected to computational algorithms for epitope prediction. A critical assumption for these applications is the random representation of amino acids in the initial naïve peptide library. In a previous study, we implemented next generation sequencing to evaluate a naïve library and discovered severe deviations from randomness in UAG codon over-representation as well as in high G phosphoramidite abundance causing amino acid distribution biases. In this study, we demonstrate that the UAG over-representation can be attributed to the burden imposed on the phage upon the assembly of the recombinant Protein 8 subunits. This was corrected by constructing the libraries using supE44-containing bacteria which suppress the UAG driven abortive termination. We also demonstrate that the overabundance of G stems from variant synthesis-efficiency and can be corrected using compensating oligonucleotide-mixtures calibrated by mass spectroscopy. Construction of libraries implementing these correctives results in markedly improved libraries that display random distribution of amino acids, thus ensuring that enriched peptides obtained in biopanning represent a genuine selection event, a fundamental assumption for phage display applications.
Asunto(s)
Biblioteca de Péptidos , Aminoácidos , Técnicas de Visualización de Superficie CelularRESUMEN
Many analyses for the detection of biological phenomena rely on a multiple sequence alignment as input. The results of such analyses are often further studied through parametric bootstrap procedures, using sequence simulators. One of the problems with conducting such simulation studies is that users currently have no means to decide which insertion and deletion (indel) parameters to choose, so that the resulting sequences mimic biological data. Here, we present SpartaABC, a web server that aims to solve this issue. SpartaABC implements an approximate-Bayesian-computation rejection algorithm to infer indel parameters from sequence data. It does so by extracting summary statistics from the input. It then performs numerous sequence simulations under randomly sampled indel parameters. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC retains only parameters behind simulations close to the real data. As output, SpartaABC provides point estimates and approximate posterior distributions of the indel parameters. In addition, SpartaABC allows simulating sequences with the inferred indel parameters. To this end, the sequence simulators, Dawg 2.0 and INDELible were integrated. Using SpartaABC we demonstrate the differences in indel dynamics among three protein-coding genes across mammalian orthologs. SpartaABC is freely available for use at http://spartaabc.tau.ac.il/webserver.
Asunto(s)
Algoritmos , Mutación INDEL , Análisis de Secuencia/métodos , Programas Informáticos , Teorema de Bayes , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Internet , Receptor de Hormona Paratiroídea Tipo 1/genética , Globulina de Unión a Tiroxina/genéticaRESUMEN
The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.
Asunto(s)
Teorema de Bayes , Mutación INDEL , Modelos Estadísticos , Algoritmos , Biología Computacional/métodos , Simulación por Computador , Evolución Molecular , Humanos , Modelos Genéticos , Tasa de Mutación , Programas InformáticosRESUMEN
Understanding species adaptation at the molecular level has been a central goal of evolutionary biology and genomics research. This important task becomes increasingly relevant with the constant rise in both genotypic and phenotypic data availabilities. The TraitRateProp web server offers a unique perspective into this task by allowing the detection of associations between sequence evolution rate and whole-organism phenotypes. By analyzing sequences and phenotypes of extant species in the context of their phylogeny, it identifies sequence sites in a gene/protein whose evolutionary rate is associated with shifts in the phenotype. To this end, it considers alternative histories of whole-organism phenotypic changes, which result in the extant phenotypic states. Its joint likelihood framework that combines models of sequence and phenotype evolution allows testing whether an association between these processes exists. In addition to predicting sequence sites most likely to be associated with the phenotypic trait, the server can optionally integrate structural 3D information. This integration allows a visual detection of trait-associated sequence sites that are juxtapose in 3D space, thereby suggesting a common functional role. We used TraitRateProp to study the shifts in sequence evolution rate of the RPS8 protein upon transitions into heterotrophy in Orchidaceae. TraitRateProp is available at http://traitrate.tau.ac.il/prop.
Asunto(s)
Evolución Molecular , Análisis de Secuencia , Programas Informáticos , Algoritmos , Internet , Orchidaceae/genética , Fenotipo , Filogenia , Proteínas Ribosómicas/química , Proteínas Ribosómicas/genéticaRESUMEN
The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree.
Asunto(s)
Evolución Biológica , ADN/química , Modelos Estadísticos , Proteínas/química , ARN/química , Interfaz Usuario-Computador , Algoritmos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Gráficos por Computador , Secuencia Conservada , ADN/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Humanos , Internet , Conformación de Ácido Nucleico , Filogenia , Plantas/genética , Plantas/metabolismo , Dominios Proteicos , Estructura Secundaria de Proteína , Proteínas/genética , ARN/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Homología de Secuencia de Ácido NucleicoRESUMEN
In this study, we present a novel methodology to infer indel parameters from multiple sequence alignments (MSAs) based on simulations. Our algorithm searches for the set of evolutionary parameters describing indel dynamics which best fits a given input MSA. In each step of the search, we use parametric bootstraps and the Mahalanobis distance to estimate how well a proposed set of parameters fits input data. Using simulations, we demonstrate that our methodology can accurately infer the indel parameters for a large variety of plausible settings. Moreover, using our methodology, we show that indel parameters substantially vary between three genomic data sets: Mammals, bacteria, and retroviruses. Finally, we demonstrate how our methodology can be used to simulate MSAs based on indel parameters inferred from real data sets.