RESUMO
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Assuntos
Proteínas , Reprodutibilidade dos Testes , Proteínas/metabolismo , Ligação ProteicaRESUMO
Flexible loops are paramount to protein functions, with action modes ranging from localized dynamics contributing to the free energy of the system, to large amplitude conformational changes accounting for the repositioning whole secondary structure elements or protein domains. However, generating diverse and low energy loops remains a difficult problem. This work introduces a novel paradigm to sample loop conformations, in the spirit of the hit-and-run (HAR) Markov chain Monte Carlo technique. The algorithm uses a decomposition of the loop into tripeptides, and a novel characterization of necessary conditions for Tripeptide Loop Closure to admit solutions. Denoting m the number of tripeptides, the algorithm works in an angular space of dimension 12 m. In this space, the hyper-surfaces associated with the aforementioned necessary conditions are used to run a HAR-like sampling technique. On classical loop cases up to 15 amino acids, our parameter free method compares favorably to previous work, generating more diverse conformational ensembles. We also report experiments on a 30 amino acids long loop, a size not processed in any previous work.
Assuntos
Aminoácidos , Proteínas , Modelos Moleculares , Proteínas/química , Estrutura Secundária de Proteína , Aminoácidos/química , Algoritmos , Conformação ProteicaRESUMO
Designing movesets providing high quality protein conformations remains a hard problem, especially when it comes to deform a long protein backbone segment, and a key building block to do so is the so-called tripeptide loop closure (TLC). Consider a tripeptide whose first and last bonds ( N 1 C α ; 1 and C α ; 3 C 3 ) are fixed, and so are all internal coordinates except the six Ï ψ i = 1,2,3 dihedral angles associated to the three C α carbons. Under these conditions, the TLC algorithm provides all possible values for these six dihedral angles-there exists at most 16 solutions. TLC moves atoms up to â¼ 5 Å in one step and retains low energy conformations, whence its pivotal role to design move sets sampling protein loop conformations. In this work, we relax the previous constraints, allowing the last bond ( C α ; 3 C 3 ) to freely move in 3D space-or equivalently in a 5D configuration space. We exhibit necessary geometric constraints in this 5D space for TLC to admit solutions. Our analysis provides key insights on the geometry of solutions for TLC. Most importantly, when using TLC to sample loop conformations based on m consecutive tripeptides along a protein backbone, we obtain an exponential gain in the volume of the 5 m -dimensional configuration space to be explored.
Assuntos
Algoritmos , Modelos Moleculares , Conformação ProteicaRESUMO
We introduce multiple interface string alignment (MISA), a visualization tool to display coherently various sequence and structure based statistics at protein-protein interfaces (SSE elements, buried surface area, ΔASA , B factor values, etc). The amino acids supporting these annotations are obtained from Voronoi interface models. The benefit of MISA is to collate annotated sequences of (homologous) chains found in different biological contexts, that is, bound with different partners or unbound. The aggregated views MISA/SSE, MISA/BSA, MISA/ΔASA, and so forth, make it trivial to identify commonalities and differences between chains, to infer key interface residues, and to understand where conformational changes occur upon binding. As such, they should prove of key relevance for knowledge-based annotations of protein databases such as the Protein Data Bank. Illustrations are provided on the receptor binding domain of coronaviruses, in complex with their cognate partner or (neutralizing) antibodies. MISA computed with a minimal number of structures complement and enrich findings previously reported. The corresponding package is available from the Structural Bioinformatics Library (http://sbl.inria.frand https://sbl.inria.fr/doc/Multiple_interface_string_alignment-user-manual.html).
Assuntos
Coronavirus/química , Glicoproteína da Espícula de Coronavírus/química , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Análise de Sequência de Proteína , Interface Usuário-ComputadorRESUMO
Tripeptide loop closure (TLC) is a standard procedure to reconstruct protein backbone conformations, by solving a zero-dimensional polynomial system yielding up to 16 solutions. In this work, we first show that multiprecision is required in a TLC solver to guarantee the existence and the accuracy of solutions. We then compare solutions yielded by the TLC solver against tripeptides from the Protein Data Bank. We show that these solutions are geometrically diverse (up to 3Å Root mean square deviation with respect to the data) and sound in terms of potential energy. Finally, we compare Ramachandran distributions of data and reconstructions for the three amino acids. The distribution of reconstructions in the second angular space Ï2ψ2 stands out, with a rather uniform distribution leaving a central void. We anticipate that these insights, coupled to our robust implementation in the Structural Bioinformatics Library ( https://sbl.inria.fr/doc/Tripeptide_loop_closure-user-manual.html), will help understanding the properties of TLC reconstructions, with potential applications to the generation of conformations of flexible loops in particular.
Assuntos
Oligopeptídeos/química , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Relação Estrutura-AtividadeRESUMO
Resistance-nodulation-cell division family proteins are transmembrane proteins identified as large spectrum drug transporters involved in multidrug resistance. A prototypical case in this superfamily, responsible for antibiotic resistance in selected gram-negative bacteria, is AcrB. AcrB forms a trimer using the proton motive force to efflux drugs, implementing a functional rotation mechanism. Unfortunately, the size of the system (1049 amino acid per monomer and membrane) has prevented a systematic dynamical exploration, so that the mild understanding of this coupled transport jeopardizes our ability to counter it. The large number of crystal structures of AcrB prompts studies to further our understanding of the mechanism. To this end, we present a novel strategy based on two key ingredients, which are to study dynamics by exploiting information embodied in the numerous crystal structures obtained to date, and to systematically consider subdomains, their dynamics, and their interactions. Along the way, we identify the subdomains responsible for dynamic events, refine the states (A, B, E) of the functional rotation mechanism, and analyze the evolution of intramonomer and intermonomer interfaces along the functional cycle. Our analysis shows the relevance of AcrB's efflux mechanism as a template within the HAE1 family but not beyond. It also paves the way to targeted simulations exploiting the most relevant degrees of freedom at certain steps, and to a targeting of specific interfaces to block the drug efflux. Our work shows that complex dynamics can be unveiled from static snapshots, a strategy that may be used on a variety of molecular machines of large size.
Assuntos
Proteínas de Escherichia coli , Proteínas Associadas à Resistência a Múltiplos Medicamentos , Sítio Alostérico , Antibacterianos/química , Antibacterianos/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Simulação de Dinâmica Molecular , Proteínas Associadas à Resistência a Múltiplos Medicamentos/química , Proteínas Associadas à Resistência a Múltiplos Medicamentos/metabolismo , Ligação Proteica , Conformação ProteicaRESUMO
The root mean square deviation (RMSD) and the least RMSD are two widely used similarity measures in structural bioinformatics. Yet, they stem from global comparisons, possibly obliterating locally conserved motifs. We correct these limitations with the so-called combined RMSD, which mixes independent lRMSD measures, each computed with its own rigid motion. The combined RMSD is relevant in two main scenarios, namely to compare (quaternary) structures based on motifs defined from the sequence (domains and SSE) and to compare structures based on structural motifs yielded by local structural alignment methods. We illustrate the benefits of combined RMSD over the usual RMSD on three problems, namely (a) the assignment of quaternary structures for hemoglobin (scenario #1), (b) the calculation of structural phylogenies (case study: class II fusion proteins; scenario #1), and (c) the analysis of conformational changes based on combined RMSD of rigid structural motifs (case study: one class II fusion protein; scenario #2). Based on these illustrations, we argue that the combined RMSD is a tool of choice to perform positive and negative discrimination of degree of freedom, with applications to the design of move sets and collective coordinates. Executables to compute combined RMSD are available within the Structural Bioinformatics Library (http://sbl.inria.fr).
Assuntos
Biologia Computacional/estatística & dados numéricos , Estrutura Quaternária de Proteína , Proteínas/química , Algoritmos , Motivos de Aminoácidos/genética , Sequência de Aminoácidos , Análise dos Mínimos Quadrados , Conformação Proteica , Alinhamento de Sequência/estatística & dados numéricosRESUMO
Rheumatoid arthritis (RA) is associated with abnormal B cell-functions implicating antibody-dependent and -independent mechanisms. B cells have emerged as important cytokine-producing cells, and cytokines are well-known drivers of RA pathogenesis. To identify novel cytokine-mediated B-cell functions in RA, we comprehensively analysed the capacity of B cells from RA patients with an inadequate response to disease modifying anti-rheumatic drugs to produce cytokines in comparison with healthy donors (HD). RA B cells displayed a constitutively higher production of the pathogenic factors interleukin (IL)-8 and Gro-α, while their production of several cytokines upon activation via the B cell receptor for antigen (BCR) was broadly suppressed, including a loss of the expression of the protective factor TRAIL, compared to HD B cells. These defects were partly erased after treatment with the IL-6-signalling inhibitor tocilizumab, indicating that abnormal IL-6 signalling contributed to these abnormalities. Noteworthy, the clinical response of individual patients to tocilizumab therapy could be predicted using the amounts of MIP-1ß and ß-NGF produced by these patients' B cells before treatment. Taken together, our study highlights hitherto unknown abnormal B-cell functions in RA patients, which are related to the unbalanced cytokine network, and are potentially relevant for RA pathogenesis and treatment.
Assuntos
Anticorpos Monoclonais Humanizados/farmacologia , Artrite Reumatoide/tratamento farmacológico , Artrite Reumatoide/patologia , Linfócitos B/imunologia , Interleucina-6/antagonistas & inibidores , Interleucina-6/metabolismo , Artrite Reumatoide/imunologia , Quimiocina CCL4/biossíntese , Quimiocina CXCL1/biossíntese , Humanos , Interleucina-8/biossíntese , Fator de Crescimento Neural/biossíntese , Ligante Indutor de Apoptose Relacionado a TNF/biossínteseRESUMO
Motivation: Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library ( SBL , http://sbl.inria.fr ), a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances. Results: The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C ++ toolbox with modular design, involving core (2) algorithms , (3) biophysical models and (4) modules , the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback. Availability and Implementation: The SBL is available from http://sbl.inria.fr. Contact: Frederic.Cazals@inria.fr. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Biblioteca Gênica , Modelos Moleculares , Algoritmos , Licenciamento , SoftwareRESUMO
Consider a set of oligomers listing the subunits involved in subcomplexes of a macromolecular assembly, obtained e.g. using native mass spectrometry or affinity purification. Given these oligomers, connectivity inference (CI) consists of finding the most plausible contacts between these subunits, and minimum connectivity inference (MCI) is the variant consisting of finding a set of contacts of smallest cardinality. MCI problems avoid speculating on the total number of contacts but yield a subset of all contacts and do not allow exploiting a priori information on the likelihood of individual contacts. In this context, we present two novel algorithms, MILP-W and MILP-WB. The former solves the minimum weight connectivity inference (MWCI), an optimization problem whose criterion mixes the number of contacts and their likelihood. The latter uses the former in a bootstrap fashion to improve the sensitivity and the specificity of solution sets.Experiments on three systems (yeast exosome, yeast proteasome lid, human eIF3), for which reference contacts are known (crystal structure, cryo electron microscopy, cross-linking), show that our algorithms predict contacts with high specificity and sensitivity, yielding a very significant improvement over previous work, typically a twofold increase in sensitivity.The software accompanying this paper is made available and should prove of ubiquitous interest whenever connectivity inference from oligomers is faced.
Assuntos
Algoritmos , Substâncias Macromoleculares/metabolismo , Fator de Iniciação 3 em Eucariotos/metabolismo , Exossomos/metabolismo , Humanos , Modelos Teóricos , Complexo de Endopeptidases do Proteassoma/metabolismo , Subunidades Proteicas/metabolismo , Saccharomyces cerevisiae/metabolismoRESUMO
Predicting protein binding affinities from structural data has remained elusive, a difficulty owing to the variety of protein binding modes. Using the structure-affinity-benchmark (SAB, 144 cases with bound/unbound crystal structures and experimental affinity measurements), prediction has been undertaken either by fitting a model using a handfull of predefined variables, or by training a complex model from a large pool of parameters (typically hundreds). The former route unnecessarily restricts the model space, while the latter is prone to overfitting. We design models in a third tier, using 12 variables describing enthalpic and entropic variations upon binding, and a model selection procedure identifying the best sparse model built from a subset of these variables. Using these models, we report three main results. First, we present models yielding a marked improvement of affinity predictions. For the whole dataset, we present a model predicting Kd within 1 and 2 orders of magnitude for 48% and 79% of cases, respectively. These statistics jump to 62% and 89% respectively, for the subset of the SAB consisting of high resolution structures. Second, we show that these performances owe to a new parameter encoding interface morphology and packing properties of interface atoms. Third, we argue that interface flexibility and prediction hardness do not correlate, and that for flexible cases, a performance matching that of the whole SAB can be achieved. Overall, our work suggests that the affinity prediction problem could be partly solved using databases of high resolution complexes whose affinity is known.
Assuntos
Proteínas/química , Proteínas/metabolismo , Animais , Cristalografia por Raios X , Bases de Dados de Proteínas , Humanos , Modelos Biológicos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , TermodinâmicaRESUMO
The number of local minima of the potential energy landscape (PEL) of molecular systems generally grows exponentially with the number of degrees of freedom, so that a crucial property of PEL exploration algorithms is their ability to identify local minima, which are low lying and diverse. In this work, we present a new exploration algorithm, retaining the ability of basin hopping (BH) to identify local minima, and that of transition based rapidly exploring random trees (T-RRT) to foster the exploration of yet unexplored regions. This ability is obtained by interleaving calls to the extension procedures of BH and T-RRT, and we show tuning the balance between these two types of calls allows the algorithm to focus on low lying regions. Computational efficiency is obtained using state-of-the art data structures, in particular for searching approximate nearest neighbors in metric spaces. We present results for the BLN69, a protein model whose conformational space has dimension 207 and whose PEL has been studied exhaustively. On this system, we show that the propensity of our algorithm to explore low lying regions of the landscape significantly outperforms those of BH and T-RRT.
Assuntos
Algoritmos , Proteínas/química , Inteligência Artificial , Biologia Computacional , Conformação Proteica , TermodinâmicaRESUMO
The 2015 D3R Grand Challenge provided an opportunity to test our new model for the binding free energy of small molecules, as well as to assess our protocol to predict binding poses for protein-ligand complexes. Our pose predictions were ranked 3-9 for the HSP90 dataset, depending on the assessment metric. For the MAP4K dataset the ranks are very dispersed and equal to 2-35, depending on the assessment metric, which does not provide any insight into the accuracy of the method. The main success of our pose prediction protocol was the re-scoring stage using the recently developed Convex-PL potential. We make a thorough analysis of our docking predictions made with AutoDock Vina and discuss the effect of the choice of rigid receptor templates, the number of flexible residues in the binding pocket, the binding pocket size, and the benefits of re-scoring. However, the main challenge was to predict experimentally determined binding affinities for two blind test sets. Our affinity prediction model consisted of two terms, a pairwise-additive enthalpy, and a non pairwise-additive entropy. We trained the free parameters of the model with a regularized regression using affinity and structural data from the PDBBind database. Our model performed very well on the training set, however, failed on the two test sets. We explain the drawback and pitfalls of our model, in particular in terms of relative coverage of the test set by the training set and missed dynamical properties from crystal structures, and discuss different routes to improve it.
Assuntos
Proteínas de Choque Térmico HSP90/química , Simulação de Acoplamento Molecular/métodos , Sítios de Ligação , Bases de Dados de Proteínas , Desenho de Fármacos , Entropia , Humanos , Ligantes , Estudos Prospectivos , Ligação Proteica , Conformação Proteica , Análise de Regressão , Relação Estrutura-Atividade , TermodinâmicaRESUMO
We consider a coarse-graining of high-dimensional potential energy landscapes based upon persistences, which correspond to lowest barrier heights to lower-energy minima. Persistences can be calculated efficiently for local minima in kinetic transition networks that are based on stationary points of the prevailing energy landscape. The networks studied here represent peptides, proteins, nucleic acids, an atomic cluster, and a glassy system. Minima with high persistence values are likely to represent some form of alternative structural morphology, which, if appreciably populated at the prevailing temperature, could compete with the global minimum (defined as infinitely persistent). Threshold values on persistences (and in some cases equilibrium occupation probabilities) have therefore been used in this work to select subsets of minima, which were then analysed to see how well they can represent features of the full network. Simplified disconnectivity graphs showing only the selected minima can convey the funnelling (including any multiple-funnel) characteristics of the corresponding full graphs. The effect of the choice of persistence threshold on the reduced disconnectivity graphs was considered for a system with a hierarchical, glassy landscape. Sets of persistent minima were also found to be useful in comparing networks for the same system sampled under different conditions, using minimum oriented spanning forests.
RESUMO
We present novel algorithms and software addressing four core problems in computational structural biology, namely analyzing a conformational ensemble, comparing two conformational ensembles, analyzing a sampled energy landscape, and comparing two sampled energy landscapes. Using recent developments in computational topology, graph theory, and combinatorial optimization, we make two notable contributions. First, we present a generic algorithm analyzing height fields. We then use this algorithm to perform density-based clustering of conformations, and to analyze a sampled energy landscape in terms of basins and transitions between them. In both cases, topological persistence is used to manage (geometric) frustration. Second, we introduce two algorithms to compare transition graphs. The first is the classical earth mover distance metric which depends only on local minimum energy configurations along with their statistical weights, while the second incorporates topological constraints inherent to conformational transitions. Illustrations are provided on a simplified protein model (BLN69), whose frustrated potential energy landscape has been thoroughly studied. The software implementing our tools is also made available, and should prove valuable wherever conformational ensembles and energy landscapes are used.
Assuntos
Algoritmos , Proteínas/química , Termodinâmica , Modelos Moleculares , Conformação Molecular , Conformação Proteica , SoftwareRESUMO
Upon infection, B-lymphocytes expressing antibodies specific for the intruding pathogen develop clonal responses triggered by pathogen recognition via the B-cell receptor. The constant region of antibodies produced by such responding clones dictates their functional properties. In teleost fish, the clonal structure of B-cell responses and the respective contribution of the three isotypes IgM, IgD and IgT remain unknown. The expression of IgM and IgT are mutually exclusive, leading to the existence of two B-cell subsets expressing either both IgM and IgD or only IgT. Here, we undertook a comprehensive analysis of the variable heavy chain (VH) domain repertoires of the IgM, IgD and IgT in spleen of homozygous isogenic rainbow trout (Onchorhynchus mykiss) before, and after challenge with a rhabdovirus, the Viral Hemorrhagic Septicemia Virus (VHSV), using CDR3-length spectratyping and pyrosequencing of immunoglobulin (Ig) transcripts. In healthy fish, we observed distinct repertoires for IgM, IgD and IgT, respectively, with a few amplified µ and τ junctions, suggesting the presence of IgM- and IgT-secreting cells in the spleen. In infected animals, we detected complex and highly diverse IgM responses involving all VH subgroups, and dominated by a few large public and private clones. A lower number of robust clonal responses involving only a few VH were detected for the mucosal IgT, indicating that both IgM(+) and IgT(+) spleen B cells responded to systemic infection but at different degrees. In contrast, the IgD response to the infection was faint. Although fish IgD and IgT present different structural features and evolutionary origin compared to mammalian IgD and IgA, respectively, their implication in the B-cell response evokes these mouse and human counterparts. Thus, it appears that the general properties of antibody responses were already in place in common ancestors of fish and mammals, and were globally conserved during evolution with possible functional convergences.
Assuntos
Células Clonais/metabolismo , Imunoglobulina M/metabolismo , Imunoglobulinas/metabolismo , Novirhabdovirus/imunologia , Oncorhynchus mykiss/imunologia , Baço/imunologia , Animais , Subpopulações de Linfócitos B , Células Clonais/citologia , Células Clonais/imunologia , Evolução Molecular , Doenças dos Peixes/imunologia , Proteínas de Peixes , Humanos , Imunoglobulina D/genética , Imunoglobulina D/metabolismo , Imunoglobulina M/genética , Imunoglobulinas/genética , Imuno-Histoquímica , Camundongos , Mucosa/imunologia , Mucosa/metabolismo , Análise de Sequência de DNA , Especificidade da Espécie , Baço/citologia , Baço/metabolismo , Coloração e RotulagemRESUMO
Reconstruction by data integration is an emerging trend to reconstruct large protein assemblies, but uncertainties on the input data yield average models whose quantitative interpretation is challenging. This article presents methods to probe fuzzy models of large assemblies against atomic resolution models of subsystems. Consider a toleranced model (TOM) of a macromolecular assembly, namely a continuum of nested shapes representing the assembly at multiple scales. Also consider a template namely an atomic resolution 3D model of a subsystem (a complex) of this assembly. We present graph-based algorithms performing a multi-scale assessment of the complexes of the TOM, by comparing the pairwise contacts which appear in the TOM against those of the template. We apply this machinery on TOM derived from an average model of the nuclear pore complex, to explore the connections among members of its well-characterized Y-complex.
Assuntos
Proteínas/química , Proteínas/metabolismo , Algoritmos , Substâncias Macromoleculares , Modelos TeóricosRESUMO
We introduce toleranced models (TOMs), a generic and versatile framework meant to handle models of macromolecular assemblies featuring uncertainties on the shapes and the positions of proteins. A TOM being a continuum of nested shapes, the inner (resp. outer) ones representing high (low) confidence regions, we present topological and geometric statistics assessing features of this continuum at multiple scales. While the topological statistics qualify contacts between instances of protein types and complexes involving prescribed protein types, the geometric statistics scale the geometric accuracy of these complexes. We validate the TOM framework on recent average models of the entire nuclear pore complex (NPC) obtained from reconstruction by data integration, and confront our quantitative analysis against experimental findings related to complexes of the NPC, namely the Y-complex, the T-complex, and the Nsp1-Nup82-Nup159 complex. In the three cases, our analysis bridges the gap between global qualitative models of the entire NPC, and atomic resolution models or putative models of the aforementioned complexes. In a broader perspective, the quantitative assessments provided by the TOM framework should prove instrumental to implement a virtuous loop "model reconstruction-model selection", in the context of reconstruction by data integration.
Assuntos
Modelos Químicos , Complexos Multiproteicos/química , Complexo de Proteínas Formadoras de Poros Nucleares/química , Algoritmos , Modelos Moleculares , Poro Nuclear/química , Reprodutibilidade dos TestesRESUMO
Let the patch of a partner in a protein complex be the collection of atoms accounting for the interaction. To improve our understanding of the structure-function relationship, we present a patch model decoupling the topological and geometric properties. While the geometry is classically encoded by the atomic positions, the topology is recorded in a graph encoding the relative position of concentric shells partitioning the interface atoms. The topological-geometric duality provides the basis of a generic dynamic programming-based algorithm comparing patches at the shell level, which may favor topological or geometric features. On the biological side, we address four questions, using 249 cocrystallized heterodimers organized in biological families. First, we dissect the morphology of binding patches and show that Nature enjoyed the topological and geometric degrees of freedom independently while retaining a finite set of qualitatively distinct topological signatures. Second, we argue that our shell-based comparison is effective to perform atomic-level comparisons and show that topological similarity is a less stringent than geometric similarity. We also use the topological versus geometric duality to exhibit topo-rigid patches, whose topology (but not geometry) remains stable upon docking. Third, we use our comparison algorithms to infer specificity-related information amidst a database of complexes. Finally, we exhibit a descriptor outperforming its contenders to predict the binding affinities of the affinity benchmark. The softwares developed with this article are availablefrom http://team.inria.fr/abs/vorpatch_compatch/.
Assuntos
Proteínas/química , Proteínas/metabolismo , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Internet , Modelos Moleculares , Ligação Proteica , SoftwareRESUMO
Prioritizing genes for their role in drug sensitivity, is an important step in understanding drugs mechanisms of action and discovering new molecular targets for co-treatment. To formalize this problem, we consider two sets of genes X and P respectively composing the gene signature of cell sensitivity at the drug IC50 and the genes involved in its mechanism of action, as well as a protein interaction network (PPIN) containing the products of X and P as nodes. We introduce Genetrank, a method to prioritize the genes in X for their likelihood to regulate the genes in P. Genetrank uses asymmetric random walks with restarts, absorbing states, and a suitable renormalization scheme. Using novel so-called saturation indices, we show that the conjunction of absorbing states and renormalization yields an exploration of the PPIN which is much more progressive than that afforded by random walks with restarts only. Using MINT as underlying network, we apply Genetrank to a predictive gene signature of cancer cells sensitivity to tumor-necrosis-factor-related apoptosis-inducing ligand (TRAIL), performed in single-cells. Our ranking provides biological insights on drug sensitivity and a gene set considerably enriched in genes regulating TRAIL pharmacodynamics when compared to the most significant differentially expressed genes obtained from a statistical analysis framework alone. We also introduce gene expression radars, a visualization tool embedded in MA plots to assess all pairwise interactions at a glance on graphical representations of transcriptomics data. Genetrank is made available in the Structural Bioinformatics Library (https://sbl.inria.fr/doc/Genetrank-user-manual.html). It should prove useful for mining gene sets in conjunction with a signaling pathway, whenever other approaches yield relatively large sets of genes.