RESUMEN
The investigation of allosteric effects in biomolecular structures is of great current interest in diverse areas, from fundamental biological enquiry to drug discovery. Here we present ProteinLens, a user-friendly and interactive web application for the investigation of allosteric signalling based on atomistic graph-theoretical methods. Starting from the PDB file of a biomolecule (or a biomolecular complex) ProteinLens obtains an atomistic, energy-weighted graph description of the structure of the biomolecule, and subsequently provides a systematic analysis of allosteric signalling and communication across the structure using two computationally efficient methods: Markov Transients and bond-to-bond propensities. ProteinLens scores and ranks every bond and residue according to the speed and magnitude of the propagation of fluctuations emanating from any site of choice (e.g. the active site). The results are presented through statistical quantile scores visualised with interactive plots and adjustable 3D structure viewers, which can also be downloaded. ProteinLens thus allows the investigation of signalling in biomolecular structures of interest to aid the detection of allosteric sites and pathways. ProteinLens is implemented in Python/SQL and freely available to use at: www.proteinlens.io.
Asunto(s)
Proteínas/química , Programas Informáticos , Regulación Alostérica , Sitio Alostérico , ADN/química , Glucoquinasa/química , Humanos , Internet , Conformación ProteicaRESUMEN
Allostery is one of the cornerstones of biological function, as it plays a fundamental role in regulating protein activity. The modelling of allostery has gradually moved from a conformation-based framework, linked to structural changes, to dynamics-based allostery, whereby the effects of ligand binding propagate via signal transduction from the allosteric site to other regions of the protein via inter-residue interactions. Characterising such allosteric signalling pathways, which do not necessarily lead to conformational changes, has been pursued experimentally and complemented by computational analysis of protein networks to detect subtle dynamic propagation paths. Considering allostery from the perspective of signal transduction broadens the understanding of allosteric mechanisms, underscores the importance of protein topology, and can provide insights into allosteric drug design.
Asunto(s)
Diseño de Fármacos , Proteínas , Regulación Alostérica , Proteínas/química , Sitio Alostérico , Transducción de Señal , Simulación de Dinámica Molecular , Conformación ProteicaRESUMEN
Allostery is a pervasive mechanism that regulates protein activity through ligand binding at a site different from the orthosteric site. The universality of allosteric regulation complemented by the benefits of highly specific and potentially non-toxic allosteric drugs makes uncovering allosteric sites invaluable. However, there are few computational methods to effectively predict them. Bond-to-bond propensity analysis has successfully predicted allosteric sites in 19 of 20 cases using an energy-weighted atomistic graph. We here extended the analysis onto 432 structures of 146 proteins from two benchmarking datasets for allosteric proteins: ASBench and CASBench. We further introduced two statistical measures to account for the cumulative effect of high-propensity residues and the crucial residues in a given site. The allosteric site is recovered for 127 of 146 proteins (407 of 432 structures) knowing only the orthosteric sites or ligands. The quantitative analysis using a range of statistical measures enables better characterization of potential allosteric sites and mechanisms involved.
RESUMEN
Inhibiting the main protease of SARS-CoV-2 is of great interest in tackling the COVID-19 pandemic caused by the virus. Most efforts have been centred on inhibiting the binding site of the enzyme. However, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantages of allosteric drug targeting. Here, we report the allosteric communication pathways in the main protease dimer by using two novel fully atomistic graph-theoretical methods: Bond-to-bond propensity, which has been previously successful in identifying allosteric sites in extensive benchmark data sets without a priori knowledge, and Markov transient analysis, which has previously aided in finding novel drug targets in catalytic protein families. Using statistical bootstrapping, we score the highest ranking sites against random sites at similar distances, and we identify four statistically significant putative allosteric sites as good candidates for alternative drug targeting.
Asunto(s)
Proteasas 3C de Coronavirus , Sitio Alostérico , Proteasas 3C de Coronavirus/química , Simulación del Acoplamiento Molecular , Conformación ProteicaRESUMEN
Allostery commonly refers to the mechanism that regulates protein activity through the binding of a molecule at a different, usually distal, site from the orthosteric site. The omnipresence of allosteric regulation in nature and its potential for drug design and screening render the study of allostery invaluable. Nevertheless, challenges remain as few computational methods are available to effectively predict allosteric sites, identify signalling pathways involved in allostery, or to aid with the design of suitable molecules targeting such sites. Recently, bond-to-bond propensity analysis has been shown successful at identifying allosteric sites for a large and diverse group of proteins from knowledge of the orthosteric sites and its ligands alone by using network analysis applied to energy-weighted atomistic protein graphs. To address the identification of signalling pathways, we propose here a method to compute and score paths of optimised propensity that link the orthosteric site with the identified allosteric sites, and identifies crucial residues that contribute to those paths. We showcase the approach with three well-studied allosteric proteins: h-Ras, caspase-1, and 3-phosphoinositide-dependent kinase-1 (PDK1). Key residues in both orthosteric and allosteric sites were identified and showed agreement with experimental results, and pivotal signalling residues along the pathway were also revealed, thus providing alternative targets for drug design. By using the computed path scores, we were also able to differentiate the activity of different allosteric modulators.
Asunto(s)
Proteínas Quinasas Dependientes de 3-Fosfoinosítido , Caspasa 1 , Proteínas Proto-Oncogénicas p21(ras) , Transducción de Señal , Proteínas Quinasas Dependientes de 3-Fosfoinosítido/química , Regulación Alostérica , Sitio Alostérico , Caspasa 1/química , Ligandos , Proteínas Proto-Oncogénicas p21(ras)/químicaRESUMEN
Networks are widely used as mathematical models of complex systems across many scientific disciplines. Decades of work have produced a vast corpus of research characterizing the topological, combinatorial, statistical, and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and sometimes overlapping) characteristics of a network. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph datasets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterization of graph datasets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark datasets while retaining the interpretability of network features. We exemplify HCGA by predicting the charge transfer in organic semiconductors and clustering a dataset of neuronal morphology images.
RESUMEN
The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners' behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequential behaviours of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task-centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.
RESUMEN
Lung and bladder cancers are mostly incurable because of the early development of drug resistance and metastatic dissemination. Hence, improved therapies that tackle these two processes are urgently needed to improve clinical outcome. We have identified RSK4 as a promoter of drug resistance and metastasis in lung and bladder cancer cells. Silencing this kinase, through either RNA interference or CRISPR, sensitized tumor cells to chemotherapy and hindered metastasis in vitro and in vivo in a tail vein injection model. Drug screening revealed several floxacin antibiotics as potent RSK4 activation inhibitors, and trovafloxacin reproduced all effects of RSK4 silencing in vitro and in/ex vivo using lung cancer xenograft and genetically engineered mouse models and bladder tumor explants. Through x-ray structure determination and Markov transient and Deuterium exchange analyses, we identified the allosteric binding site and revealed how this compound blocks RSK4 kinase activation through binding to an allosteric site and mimicking a kinase autoinhibitory mechanism involving the RSK4's hydrophobic motif. Last, we show that patients undergoing chemotherapy and adhering to prophylactic levofloxacin in the large placebo-controlled randomized phase 3 SIGNIFICANT trial had significantly increased (P = 0.048) long-term overall survival times. Hence, we suggest that RSK4 inhibition may represent an effective therapeutic strategy for treating lung and bladder cancer.
Asunto(s)
Neoplasias Pulmonares , Neoplasias de la Vejiga Urinaria , Animales , Línea Celular Tumoral , Resistencia a Antineoplásicos , Regulación Neoplásica de la Expresión Génica , Humanos , Pulmón/metabolismo , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/genética , Ratones , Proteínas Quinasas S6 Ribosómicas 90-kDa/genética , Proteínas Quinasas S6 Ribosómicas 90-kDa/metabolismo , Neoplasias de la Vejiga Urinaria/tratamiento farmacológico , Neoplasias de la Vejiga Urinaria/genéticaRESUMEN
Electronic healthcare records contain large volumes of unstructured data in different forms. Free text constitutes a large portion of such data, yet this source of richly detailed information often remains under-used in practice because of a lack of suitable methodologies to extract interpretable content in a timely manner. Here we apply network-theoretical tools to the analysis of free text in Hospital Patient Incident reports in the English National Health Service, to find clusters of reports in an unsupervised manner and at different levels of resolution based directly on the free text descriptions contained within them. To do so, we combine recently developed deep neural network text-embedding methodologies based on paragraph vectors with multi-scale Markov Stability community detection applied to a similarity graph of documents obtained from sparsified text vector similarities. We showcase the approach with the analysis of incident reports submitted in Imperial College Healthcare NHS Trust, London. The multiscale community structure reveals levels of meaning with different resolution in the topics of the dataset, as shown by relevant descriptive terms extracted from the groups of records, as well as by comparing a posteriori against hand-coded categories assigned by healthcare personnel. Our content communities exhibit good correspondence with well-defined hand-coded categories, yet our results also provide further medical detail in certain areas as well as revealing complementary descriptors of incidents beyond the external classification. We also discuss how the method can be used to monitor reports over time and across different healthcare providers, and to detect emerging trends that fall outside of pre-existing categories.
RESUMEN
The widespread adoption of online courses opens opportunities for analysing learner behaviour and optimising web-based learning adapted to observed usage. Here, we introduce a mathematical framework for the analysis of time-series of online learner engagement, which allows the identification of clusters of learners with similar online temporal behaviour directly from the raw data without prescribing a priori subjective reference behaviours. The method uses a dynamic time warping kernel to create a pair-wise similarity between time-series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to identify groups of learners with similar temporal behaviour. To showcase our approach, we analyse task completion data from a cohort of learners taking an online post-graduate degree at Imperial Business School. Our analysis reveals clusters of learners with statistically distinct patterns of engagement, from distributed to massed learning, with different levels of regularity, adherence to pre-planned course structure and task completion. The approach also reveals outlier learners with highly sporadic behaviour. A posteriori comparison against student performance shows that, whereas high-performing learners are spread across clusters with diverse temporal engagement, low performers are located significantly in the massed learning cluster, and our unsupervised clustering identifies low performers more accurately than common machine learning classification methods trained on temporal statistics of the data. Finally, we test the applicability of the method by analysing two additional data sets: a different cohort of the same course, and time-series of different format from another university.
RESUMEN
Aspartate carbamoyltransferase (ATCase) is a large dodecameric enzyme with six active sites that exhibits allostery: its catalytic rate is modulated by the binding of various substrates at distal points from the active sites. A recently developed method, bond-to-bond propensity analysis, has proven capable of predicting allosteric sites in a wide range of proteins using an energy-weighted atomistic graph obtained from the protein structure and given knowledge only of the location of the active site. Bond-to-bond propensity establishes if energy fluctuations at given bonds have significant effects on any other bond in the protein, by considering their propagation through the protein graph. In this work, we use bond-to-bond propensity analysis to study different aspects of ATCase activity using three different protein structures and sources of fluctuations. First, we predict key residues and bonds involved in the transition between inactive (T) and active (R) states of ATCase by analysing allosteric substrate binding as a source of energy perturbations in the protein graph. Our computational results also indicate that the effect of multiple allosteric binding is non linear: a switching effect is observed after a particular number and arrangement of substrates is bound suggesting a form of long range communication between the distantly arranged allosteric sites. Second, cooperativity is explored by considering a bisubstrate analogue as the source of energy fluctuations at the active site, also leading to the identification of highly significant residues to the T â R transition that enhance cooperativity across active sites. Finally, the inactive (T) structure is shown to exhibit a strong, non linear communication between the allosteric sites and the interface between catalytic subunits, rather than the active site. Bond-to-bond propensity thus offers an alternative route to explain allosteric and cooperative effects in terms of detailed atomistic changes to individual bonds within the protein, rather than through phenomenological, global thermodynamic arguments.
Asunto(s)
Aspartato Carbamoiltransferasa/metabolismo , Multimerización de Proteína , Adenosina Trifosfato/metabolismo , Regulación Alostérica , Sitio Alostérico , Aspartato Carbamoiltransferasa/química , Ácido Aspártico/análogos & derivados , Ácido Aspártico/metabolismo , Dominio Catalítico , Citidina Trifosfato/metabolismo , Estabilidad de Enzimas , Modelos Moleculares , Ácido Fosfonoacético/análogos & derivados , Ácido Fosfonoacético/metabolismo , Subunidades de Proteína/química , Subunidades de Proteína/metabolismo , Especificidad por SustratoRESUMEN
The Bowman-Birk inhibitors (BBIs) are a family of proteins that share a canonical loop structure whose presence in a conserved conformation is linked to their inhibitory activity. We study the conformational properties of the canonical loop using a graph theoretical approach as implemented in the floppy inclusions and rigid substructure topography (FIRST). We find that the canonical loop is an independent rigid cluster in the natural inhibitors. We have further used this technique to identify residues that play an important role in the structural rigidity of the protein by quantifying their contribution to the overall rigidity of the inhibitor. We find that the conserved elements among the natural and synthetic peptides are the ones that contribute the most to rigidity, even if they are located far from the active site, as rigidity effects are nonlinear and hence nonlocal. The results help to elucidate why certain mutations in the loop of the BBI produce peptides that fail to have the designed inhibitory activity.
Asunto(s)
Imitación Molecular , Péptidos/química , Inhibidores de Proteasas/química , Secuencia de Aminoácidos , Mutación , Péptidos/genética , Conformación ProteicaRESUMEN
We show that the Markovian approximation assumed in current particle-based coarse-grained techniques, like dissipative particle dynamics, is unreliable in situations in which sound plays an important role. As an example we solve analytically and numerically the dynamics of coarse-grained harmonic systems by using first principle methods, showing the presence of long-lived memory kernels. This effect raises questions about the connection of these approaches at their current form to molecular dynamics.
RESUMEN
Great cities connect people; failed cities isolate people. Despite the fundamental importance of physical, face-to-face social ties in the functioning of cities, these connectivity networks are not explicitly observed in their entirety. Attempts at estimating them often rely on unrealistic over-simplifications such as the assumption of spatial homogeneity. Here we propose a mathematical model of human interactions in terms of a local strategy of maximizing the number of beneficial connections attainable under the constraint of limited individual travelling-time budgets. By incorporating census and openly available online multi-modal transport data, we are able to characterize the connectivity of geometrically and topologically complex cities. Beyond providing a candidate measure of greatness, this model allows one to quantify and assess the impact of transport developments, population growth, and other infrastructure and demographic changes on a city. Supported by validations of gross domestic product and human immunodeficiency virus infection rates across US metropolitan areas, we illustrate the effect of changes in local and city-wide connectivities by considering the economic impact of two contemporary inter- and intra-city transport developments in the UK: High Speed 2 and London Crossrail. This derivation of the model suggests that the scaling of different urban indicators with population size has an explicitly mechanistic origin.
Asunto(s)
Modelos Teóricos , Remodelación Urbana , HumanosRESUMEN
Directionality is a crucial ingredient in many complex networks in which information, energy or influence are transmitted. In such directed networks, analysing flows (and not only the strength of connections) is crucial to reveal important features of the network that might go undetected if the orientation of connections is ignored. We showcase here a flow-based approach for community detection through the study of the network of the most influential Twitter users during the 2011 riots in England. Firstly, we use directed Markov Stability to extract descriptions of the network at different levels of coarseness in terms of interest communities, i.e. groups of nodes within which flows of information are contained and reinforced. Such interest communities reveal user groupings according to location, profession, employer and topic. The study of flows also allows us to generate an interest distance, which affords a personalized view of the attention in the network as viewed from the vantage point of any given user. Secondly, we analyse the profiles of incoming and outgoing long-range flows with a combined approach of role-based similarity and the novel relaxed minimum spanning tree algorithm to reveal that the users in the network can be classified into five roles. These flow roles go beyond the standard leader/follower dichotomy and differ from classifications based on regular/structural equivalence. We then show that the interest communities fall into distinct informational organigrams characterized by a different mix of user roles reflecting the quality of dialogue within them. Our generic framework can be used to provide insight into how flows are generated, distributed, preserved and consumed in directed networks.
Asunto(s)
Internet , Modelos Teóricos , Tumultos , Apoyo Social , Femenino , Humanos , Masculino , Reino UnidoRESUMEN
In recent years, there has been a surge of interest in community detection algorithms for complex networks. A variety of computational heuristics, some with a long history, have been proposed for the identification of communities or, alternatively, of good graph partitions. In most cases, the algorithms maximize a particular objective function, thereby finding the 'right' split into communities. Although a thorough comparison of algorithms is still lacking, there has been an effort to design benchmarks, i.e., random graph models with known community structure against which algorithms can be evaluated. However, popular community detection methods and benchmarks normally assume an implicit notion of community based on clique-like subgraphs, a form of community structure that is not always characteristic of real networks. Specifically, networks that emerge from geometric constraints can have natural non clique-like substructures with large effective diameters, which can be interpreted as long-range communities. In this work, we show that long-range communities escape detection by popular methods, which are blinded by a restricted 'field-of-view' limit, an intrinsic upper scale on the communities they can detect. The field-of-view limit means that long-range communities tend to be overpartitioned. We show how by adopting a dynamical perspective towards community detection [1], [2], in which the evolution of a Markov process on the graph is used as a zooming lens over the structure of the network at all scales, one can detect both clique- or non clique-like communities without imposing an upper scale to the detection. Consequently, the performance of algorithms on inherently low-diameter, clique-like benchmarks may not always be indicative of equally good results in real networks with local, sparser connectivity. We illustrate our ideas with constructive examples and through the analysis of real-world networks from imaging, protein structures and the power grid, where a multiscale structure of non clique-like communities is revealed.
Asunto(s)
Biología Computacional/métodos , Adenilato Quinasa/química , Algoritmos , Suministros de Energía Eléctrica , Procesamiento de Imagen Asistido por Computador , Cadenas de Markov , Modelos Estadísticos , Conformación Molecular , Conformación Proteica , Estructura Secundaria de Proteína , Características de la Residencia , Programas InformáticosRESUMEN
Collagen fibers are essential components of tissues, which are highly conserved across the animal kingdom and could be extremely useful in tissue engineering. The formation of these macromolecular fibers depends on molecular interactions-based self-assembly of the basic building blocks of collagen called tropocollagens. Several attempts to produce biomimetic collagen have been described, however the best method to achieve the optimal material for tissue engineering has not been established. Here, we describe a bottom-up approach to design two computationally mutated molecular models that use non-covalent interactions to cross-link triple helices of tropocollagen molecules and thus promote self-association. Implementing a graph theory approach in the software FIRST reveals the hotspots that are crucial for the overall rigidity of the supramolecular helical structures and the remaining non-hotspots available for mutations. The mutated models were further decorated with GFOGER, a known collagen cell binding motif, to depict a biofunctional model. In addition to their recognized role of cell binding, the charged residues of the binding motif appeared to enhance further the supramolecular helical association. These findings could help to produce biomimetic collagen for biomedical applications.
Asunto(s)
Materiales Biomiméticos/química , Colágeno/química , Simulación de Dinámica Molecular , Péptidos/química , Secuencia de Aminoácidos , Materiales Biomiméticos/metabolismo , Colágeno/metabolismo , Datos de Secuencia Molecular , Péptidos/metabolismo , Unión Proteica , Ingeniería de TejidosRESUMEN
Many important biological functions are strongly dependent on specific chemical interactions. Modelling how the physicochemical molecular details emerge at much larger scales is an active area of research, currently pursued with a variety of methods. We describe a series of theoretical and computational approaches that aim to derive bottom-up descriptions that capture the specificity that ensues from atomistic detail by extracting relevant features at the different scales. The multiscale models integrate the descriptions at different length and time scales by exploiting the idea of mechanical responses. The methodologies bring together concepts and tools developed in seemingly unrelated areas of mathematics such as algebraic geometry, model reduction, structural graph theory and non-convex optimization. We showcase the applicability of the framework with examples from protein engineering and enzyme catalysis, protein assembly, and with the description of lipid bilayers at different scales. Many challenges remain as it is clear that no single methodology will answer all questions in such multidimensional complex problems.