RESUMO
MOTIVATION: The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS: We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION: https://github.com/KIT-MBS/pycofitness.
Assuntos
RNA , Software , RNA/genética , Sequência de Aminoácidos , Biologia Computacional , ProteínasRESUMO
Progress continues in the field of cancer biology, yet much remains to be unveiled regarding the mechanisms of cancer invasion. In particular, complex biophysical mechanisms enable a tumor to remodel the surrounding extracellular matrix (ECM), allowing cells to invade alone or collectively. Tumor spheroids cultured in collagen represent a simplified, reproducible 3D model system, which is sufficiently complex to recapitulate the evolving organization of cells and interaction with the ECM that occur during invasion. Recent experimental approaches enable high resolution imaging and quantification of the internal structure of invading tumor spheroids. Concurrently, computational modeling enables simulations of complex multicellular aggregates based on first principles. The comparison between real and simulated spheroids represents a way to fully exploit both data sources, but remains a challenge. We hypothesize that comparing any two spheroids requires first the extraction of basic features from the raw data, and second the definition of key metrics to match such features. Here, we present a novel method to compare spatial features of spheroids in 3D. To do so, we define and extract features from spheroid point cloud data, which we simulated using Cells in Silico (CiS), a high-performance framework for large-scale tissue modeling previously developed by us. We then define metrics to compare features between individual spheroids, and combine all metrics into an overall deviation score. Finally, we use our features to compare experimental data on invading spheroids in increasing collagen densities. We propose that our approach represents the basis for defining improved metrics to compare large 3D data sets. Moving forward, this approach will enable the detailed analysis of spheroids of any origin, one application of which is informing in silico spheroids based on their in vitro counterparts. This will enable both basic and applied researchers to close the loop between modeling and experiments in cancer research.
Assuntos
Neoplasias Experimentais , Neoplasias , Animais , Esferoides Celulares , Colágeno/química , Matriz ExtracelularRESUMO
A hallmark of Huntington's disease (HD) is a prolonged polyglutamine sequence in the huntingtin protein and, correspondingly, an expanded cytosine, adenine, and guanine (CAG) triplet repeat region in the mRNA. A majority of studies investigating disease pathology were concerned with toxic huntingtin protein, but the mRNA moved into focus due to its recruitment to RNA foci and emerging novel therapeutic approaches targeting the mRNA. A hallmark of CAG-RNA is that it forms a stable hairpin in vitro which seems to be crucial for specific protein interactions. Using in-cell folding experiments, we show that the CAG-RNA is largely destabilized in cells compared to dilute buffer solutions but remains folded in the cytoplasm and nucleus. Surprisingly, we found the same folding stability in the nucleoplasm and in nuclear speckles under physiological conditions suggesting that CAG-RNA does not undergo a conformational transition upon recruitment to the nuclear speckles. We found that the metabolite adenosine triphosphate (ATP) plays a crucial role in promoting unfolding, enabling its recruitment to nuclear speckles and preserving its mobility. Using in vitro experiments and molecular dynamics simulations, we found that the ATP effects can be attributed to a direct interaction of ATP with the nucleobases of the CAG-RNA rather than ATP acting as "a fuel" for helicase activity. ATP-driven changes in CAG-RNA homeostasis could be disease-relevant since mitochondrial function is affected in HD disease progression leading to a decline in cellular ATP levels.
Assuntos
Trifosfato de Adenosina , Doença de Huntington , Humanos , Salpicos Nucleares , Proteína Huntingtina/metabolismo , Adenina , RNA/metabolismo , RNA Mensageiro , Doença de Huntington/genética , Expansão das Repetições de TrinucleotídeosRESUMO
The ephrin type-A receptor 2 (EPHA2) kinase belongs to the largest family of receptor tyrosine kinases. There are several indications of an involvement of EPHA2 in the development of infectious diseases and cancer. Despite pharmacological potential, EPHA2 is an under-examined target protein. In this study, we synthesized a series of derivatives of the inhibitor NVP-BHG712 and triazine-based compounds. These compounds were evaluated to determine their potential as kinase inhibitors of EPHA2, including elucidation of their binding mode (X-ray crystallography), affinity (microscale thermophoresis), and selectivity (Kinobeads assay). Eight inhibitors showed affinities in the low-nanomolar regime (KD <10â nM). Testing in up to seven colon cancer cell lines that express EPHA2 reveals that several derivatives feature promising effects for the control of human colon carcinoma. Thus, we have developed a set of powerful tool compounds for fundamental new research on the interplay of EPH receptors in a cellular context.
Assuntos
Neoplasias Colorretais , Pirazóis , Humanos , Pirazóis/química , Pirimidinas/farmacologia , Pirimidinas/química , Linhagem Celular , Neoplasias Colorretais/tratamento farmacológico , Linhagem Celular TumoralRESUMO
Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.
Assuntos
Redes Neurais de Computação , RNA/química , Algoritmos , Modelos Moleculares , Conformação de Ácido Nucleico , RiboswitchRESUMO
RNA molecules play many pivotal roles in a cell that are still not fully understood. Any detailed understanding of RNA function requires knowledge of its three-dimensional structure, yet experimental RNA structure resolution remains demanding. Recent advances in sequencing provide unprecedented amounts of sequence data that can be statistically analyzed by methods such as direct coupling analysis (DCA) to determine spatial proximity or contacts of specific nucleic acid pairs, which improve the quality of structure prediction. To quantify this structure prediction improvement, we here present a well curated data set of about 70 RNA structures of high resolution and compare different nucleotide-nucleotide contact prediction methods available in the literature. We observe only minor differences between the performances of the different methods. Moreover, we discuss how robust these predictions are for different contact definitions and how strongly they depend on procedures used to curate and align the families of homologous RNA sequences.
Assuntos
RNA/genética , Análise de Dados , Conjuntos de Dados como Assunto , Conformação de Ácido Nucleico , Alinhamento de Sequência/métodosRESUMO
Despite the incredible progress of experimental techniques, protein structure determination still remains a challenging task. Due to the rapid improvements of computer technology, simulations are often used to complement or interpret experimental data, particularly for sparse or low-resolution data. Many such in silico methods allow us to obtain highly accurate models of a protein structure either de novo or via refinement of a physical model with experimental restraints. One crucial question is how to select a representative member or ensemble out of the vast number of computationally generated structures. Here, we introduce such a method. As a representative task, we add co-evolutionary contact pairs as distance restraints to a physical force field and want to select a good characterization of the resulting native-like ensemble. To generate large ensembles, we run replica-exchange molecular dynamics (REMD) on five mid-sized test proteins and over a wide temperature range. High temperatures allow overcoming energetic barriers while low temperatures perform local searches of native-like conformations. The integrated bias is based on co-evolutionary contact pairs derived from a deep residual neural network to guide the simulation toward native-like conformations. We shortly compare and discuss the achieved model precision of contact-guided REMD for mid-sized proteins. Finally, we discuss four robust ensemble-selection algorithms in great detail, which are capable to extract the representative structure models with a high certainty. To assess the performance of the selection algorithms, we exemplarily mimic a "blind scenario," i.e., where the target structure is unknown, and select a representative structural ensemble of native-like folds.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , Algoritmos , Conformação Molecular , Conformação Proteica , Proteínas/químicaRESUMO
Nicotinamide adenine dinucleotide (NAD) provides an important link between metabolism and signal transduction and has emerged as central hub between bioenergetics and all major cellular events. NAD-dependent signaling (e.g., by sirtuins and poly-adenosine diphosphate [ADP] ribose polymerases [PARPs]) consumes considerable amounts of NAD. To maintain physiological functions, NAD consumption and biosynthesis need to be carefully balanced. Using extensive phylogenetic analyses, mathematical modeling of NAD metabolism, and experimental verification, we show that the diversification of NAD-dependent signaling in vertebrates depended on 3 critical evolutionary events: 1) the transition of NAD biosynthesis to exclusive usage of nicotinamide phosphoribosyltransferase (NamPT); 2) the occurrence of nicotinamide N-methyltransferase (NNMT), which diverts nicotinamide (Nam) from recycling into NAD, preventing Nam accumulation and inhibition of NAD-dependent signaling reactions; and 3) structural adaptation of NamPT, providing an unusually high affinity toward Nam, necessary to maintain NAD levels. Our results reveal an unexpected coevolution and kinetic interplay between NNMT and NamPT that enables extensive NAD signaling. This has implications for therapeutic strategies of NAD supplementation and the use of NNMT or NamPT inhibitors in disease treatment.
Assuntos
Evolução Biológica , NAD/metabolismo , Transdução de Sinais , Sequência de Aminoácidos , Animais , Vias Biossintéticas , Células HeLa , Humanos , Cinética , Nicotinamida N-Metiltransferase , Nicotinamida Fosforribosiltransferase/química , Nicotinamida Fosforribosiltransferase/metabolismo , Filogenia , Especificidade por Substrato , Vertebrados/metabolismoRESUMO
In this article, we investigate the binding processes of a fragment of the coronavirus spike protein receptor binding domain (RBD), the hexapeptide YKYRYL on the angiotensin-converting enzyme 2 (ACE2) receptor, and its inhibitory effect on the binding and activation of the coronavirus-2 spike protein CoV-2 RBD at ACE2. In agreement with an experimental study, we find a high affinity of the hexapeptide to the binding interface between CoV-2 RBD and ACE2, which we investigate using 20 independent equilibrium molecular dynamics (MD) simulations over a total of 1 µs and a 200-ns enhanced correlation guided MD simulation. We then evaluate the effect of the hexapeptide on the assembly process of the CoV-2 RBD to ACE2 in long-time enhanced correlation guided MD simulations. In that set of simulations, we find that CoV-2 RBD does not bind to ACE2 with the binding motif shown in experiments, but it rotates because of an electrostatic repulsion and forms a hydrophobic interface with ACE2. Surprisingly, we observe that the hexapeptide binds to CoV-2 RBD, which has the effect that this protein only weakly attaches to ACE2 so that the activation of CoV-2 RBD might be inhibited in this case. Our results indicate that the hexapeptide might be a possible treatment option that prevents the viral activation through the inhibition of the interaction between ACE2 and CoV-2 RBD.
Assuntos
Enzima de Conversão de Angiotensina 2/metabolismo , Peptídeos/farmacologia , Glicoproteína da Espícula de Coronavírus/antagonistas & inibidores , Sequência de Aminoácidos , Enzima de Conversão de Angiotensina 2/química , Humanos , Simulação de Dinâmica Molecular , Peptídeos/química , Ligação Proteica/efeitos dos fármacos , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/metabolismoRESUMO
MOTIVATION: The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. RESULTS: Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds. AVAILABILITY AND IMPLEMENTATION: pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
RNA , Software , Sequência de Aminoácidos , Proteínas , Alinhamento de SequênciaRESUMO
During embryogenesis, morphogens form a concentration gradient in responsive tissue, which is then translated into a spatial cellular pattern. The mechanisms by which morphogens spread through a tissue to establish such a morphogenetic field remain elusive. Here, we investigate by mutually complementary simulations and in vivo experiments how Wnt morphogen transport by cytonemes differs from typically assumed diffusion-based transport for patterning of highly dynamic tissue such as the neural plate in zebrafish. Stochasticity strongly influences fate acquisition at the single cell level and results in fluctuating boundaries between pattern regions. Stable patterning can be achieved by sorting through concentration dependent cell migration and apoptosis, independent of the morphogen transport mechanism. We show that Wnt transport by cytonemes achieves distinct Wnt thresholds for the brain primordia earlier compared with diffusion-based transport. We conclude that a cytoneme-mediated morphogen transport together with directed cell sorting is a potentially favored mechanism to establish morphogen gradients in rapidly expanding developmental systems.
Assuntos
Padronização Corporal/fisiologia , Regulação da Expressão Gênica no Desenvolvimento , Vertebrados/embriologia , Proteínas Wnt/fisiologia , Animais , Apoptose , Encéfalo/embriologia , Linhagem da Célula , Movimento Celular , Biologia Computacional , Simulação por Computador , Desenvolvimento Embrionário , Crista Neural/embriologia , Placa Neural/embriologia , Transporte Proteico , Transdução de Sinais , Software , Processos Estocásticos , Peixe-Zebra/embriologia , beta Catenina/fisiologiaRESUMO
In this paper, we present a fast and adaptive correlation guided enhanced sampling method (CORE-MD II). The CORE-MD II technique relies, in part, on partitioning of the entire pathway into short trajectories that we refer to as instances. The sampling within each instance is accelerated by adaptive path-dependent metadynamics simulations. The second part of this approach involves kinetic Monte Carlo (kMC) sampling between the different states that have been accessed during each instance. Through the combination of the partition of the total simulation into short non-equilibrium simulations and the kMC sampling, the CORE-MD II method is capable of sampling protein folding without any a priori definitions of reaction pathways and additional parameters. In the validation simulations, we applied the CORE-MD II on the dialanine peptide and the folding of two peptides: TrpCage and TrpZip2. In a comparison with long time equilibrium Molecular Dynamics (MD), 1 µs replica exchange MD (REMD), and CORE-MD I simulations, we find that the level of convergence of the CORE-MD II method is improved by a factor of 8.8, while the CORE-MD II method reaches acceleration factors of â¼120. In the CORE-MD II simulation of TrpZip2, we observe the formation of the native state in contrast to the REMD and the CORE-MD I simulations. The method is broadly applicable for MD simulations and is not restricted to simulations of protein folding or even biomolecules but also applicable to simulations of protein aggregation, protein signaling, or even materials science simulations.
Assuntos
Simulação de Dinâmica Molecular , Proteínas/química , Cinética , Método de Monte Carlo , Conformação ProteicaRESUMO
Inspired by the modular architecture of natural signaling proteins, ligand binding proteins are equipped with two fluorescent proteins (FPs) in order to obtain Förster resonance energy transfer (FRET)-based biosensors. Here, we investigated a glucose sensor where the donor and acceptor FPs were attached to a glucose binding protein using a variety of different linker sequences. For three resulting sensor constructs the corresponding glucose induced conformational changes were measured by small angle X-ray scattering (SAXS) and compared to recently published single molecule FRET results (Höfig et al., ACS Sensors, 2018). For one construct which exhibits a high change in energy transfer and a large change of the radius of gyration upon ligand binding, we performed coarse-grained molecular dynamics simulations for the ligand-free and the ligand-bound state. Our analysis indicates that a carefully designed attachment of the donor FP is crucial for the proper transfer of the glucose induced conformational change of the glucose binding protein into a well pronounced FRET signal change as measured in this sensor construct. Since the other FP (acceptor) does not experience such a glucose induced alteration, it becomes apparent that only one of the FPs needs to have a well-adjusted attachment to the glucose binding protein.
Assuntos
Técnicas Biossensoriais , Transferência Ressonante de Energia de Fluorescência , Proteínas , Espalhamento a Baixo Ângulo , Difração de Raios XRESUMO
BACKGROUND: Discoveries in cellular dynamics and tissue development constantly reshape our understanding of fundamental biological processes such as embryogenesis, wound-healing, and tumorigenesis. High-quality microscopy data and ever-improving understanding of single-cell effects rapidly accelerate new discoveries. Still, many computational models either describe few cells highly detailed or larger cell ensembles and tissues more coarsely. Here, we connect these two scales in a joint theoretical model. RESULTS: We developed a highly parallel version of the cellular Potts model that can be flexibly applied and provides an agent-based model driving cellular events. The model can be modular extended to a multi-model simulation on both scales. Based on the NAStJA framework, a scaling implementation running efficiently on high-performance computing systems was realized. We demonstrate independence of bias in our approach as well as excellent scaling behavior. CONCLUSIONS: Our model scales approximately linear beyond 10,000 cores and thus enables the simulation of large-scale three-dimensional tissues only confined by available computational resources. The strict modular design allows arbitrary models to be configured flexibly and enables applications in a wide range of research questions. Cells in Silico (CiS) can be easily molded to different model assumptions and help push computational scientists to expand their simulations to a new area in tissue simulations. As an example we highlight a 10003 voxel-sized cancerous tissue simulation at sub-cellular resolution.
Assuntos
Células/metabolismo , Simulação por Computador , Especificidade de Órgãos , Transporte Biológico , Morte Celular , Difusão , Modelos Teóricos , Mutação/genética , Interface Usuário-ComputadorRESUMO
SUMMARY: The distance geometry problem is often encountered in molecular biology and the life sciences at large, as a host of experimental methods produce ambiguous and noisy distance data. In this note, we present diSTruct; an adaptation of the generic MaxEnt-Stress graph drawing algorithm to the domain of biological macromolecules. diSTruct is fast, provides reliable structural models even from incomplete or noisy distance data and integrates access to graph analysis tools. AVAILABILITY AND IMPLEMENTATION: diSTruct is written in C++, Cython and Python 3. It is available from https://github.com/KIT-MBS/distruct.git or in the Python package index under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Disciplinas das Ciências Biológicas , Software , Algoritmos , Biologia MolecularRESUMO
The notochord defines the axial structure of all vertebrates during development. Notogenesis is a result of major cell reorganization in the mesoderm, the convergence and the extension of the axial cells. However, it is currently not fully understood how these processes act together in a coordinated way during notochord formation. The prechordal plate is an actively migrating cell population in the central mesoderm anterior to the trailing notochordal plate cells. We show that prechordal plate cells express Protocadherin 18a (Pcdh18a), a member of the cadherin superfamily. We find that Pcdh18a-mediated recycling of E-cadherin adhesion complexes transforms prechordal plate cells into a cohesive and fast migrating cell group. In turn, the prechordal plate cells subsequently instruct the trailing mesoderm. We simulated cell migration during early mesoderm formation using a lattice-based mathematical framework and predicted that the requirement for an anterior, local motile cell cluster could guide the intercalation and extension of the posterior, axial cells. Indeed, a grafting experiment validated the prediction and local Pcdh18a expression induced an ectopic prechordal plate-like cell group migrating towards the animal pole. Our findings indicate that the Pcdh18a is important for prechordal plate formation, which influences the trailing mesodermal cell sheet by orchestrating the morphogenesis of the notochord.
Assuntos
Caderinas/metabolismo , Mesoderma/metabolismo , Peixe-Zebra/embriologia , Animais , Caderinas/genética , Endocitose , Células HeLa , Humanos , Mesoderma/citologia , Mutação , Células Tumorais CultivadasRESUMO
The fundamental aim of structural analyses in biophysics is to reveal a mutual relation between a molecule's dynamic structure and its physiological function. Small-angle X-ray scattering (SAXS) is an experimental technique for structural characterization of macromolecules in solution and enables time-resolved analysis of conformational changes under physiological conditions. As such experiments measure spatially averaged low-resolution scattering intensities only, the sparse information obtained is not sufficient to uniquely reconstruct a three-dimensional atomistic model. Here, we integrate the information from SAXS into molecular dynamics simulations using computationally efficient native structure-based models. Dynamically fitting an initial structure towards a scattering intensity, such simulations produce atomistic models in agreement with the target data. In this way, SAXS data can be rapidly interpreted while retaining physico-chemical knowledge and sampling power of the underlying force field. We demonstrate our method's performance using the example of three protein systems. Simulations are faster than full molecular dynamics approaches by more than two orders of magnitude and consistently achieve comparable accuracy. Computational demands are reduced sufficiently to run the simulations on commodity desktop computers instead of high-performance computing systems. These results underline that scattering-guided structure-based simulations provide a suitable framework for rapid early-stage refinement of structures towards SAXS data with particular focus on minimal computational resources and time.
Assuntos
Proteínas/química , Proteínas/fisiologia , Espalhamento a Baixo Ângulo , Difração de Raios X/métodos , Biologia Computacional , Simulação de Dinâmica Molecular , Conformação ProteicaRESUMO
Structured RNA plays many functionally relevant roles in molecular life. Structural information, while required to understand the functional cycles in detail, is challenging to gather. Computational methods promise to complement experimental efforts by predicting three-dimensional RNA models. Here, we provide a concise view of the state of the art methodologies with a focus on the strengths and the weaknesses of the different approaches. Furthermore, we analyzed the recent developments regarding the use of coevolutionary information and how it can boost the prediction performances. We finally discuss some open perspectives and challenges for the near future in the RNA structural stability field.
Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos , RNA/genética , Estabilidade de RNA/genética , SoftwareRESUMO
We present an enhanced Molecular Dynamics (MD) simulation method, which is free from the requirement of a priori structural information of the system. The technique is capable of folding proteins with very low computational effort and requires only an energy parameter. The path correlated MD (CORE-MD) method uses the autocorrelation of the path integral over the reduced action and propagates the system along the history dependent path correlation. We validate the new technique in simulations of the conformational landscapes of dialanine and the TrpCage mini-peptide. We find that the novel method accelerates the sampling by three orders of magnitude and observe convergence of the conformational sampling in both cases. We conclude that the new method is broadly applicable for the enhanced sampling in MD simulations. The CORE-MD algorithm reaches a high accuracy compared with long time equilibrium MD simulations.
Assuntos
Dipeptídeos/química , Modelos Químicos , Simulação de Dinâmica Molecular , Peptídeos/química , Algoritmos , Modelos Moleculares , Conformação Proteica , Dobramento de ProteínaRESUMO
Proteins have evolved to perform diverse cellular functions, from serving as reaction catalysts to coordinating cellular propagation and development. Frequently, proteins do not exert their full potential as monomers but rather undergo concerted interactions as either homo-oligomers or with other proteins as hetero-oligomers. The experimental study of such protein complexes and interactions has been arduous. Theoretical structure prediction methods are an attractive alternative. Here, we investigate homo-oligomeric interfaces by tracing residue coevolution via the global statistical direct coupling analysis (DCA). DCA can accurately infer spatial adjacencies between residues. These adjacencies can be included as constraints in structure prediction techniques to predict high-resolution models. By taking advantage of the ongoing exponential growth of sequence databases, we go significantly beyond anecdotal cases of a few protein families and apply DCA to a systematic large-scale study of nearly 2,000 Pfam protein families with sufficient sequence information and structurally resolved homo-oligomeric interfaces. We find that large interfaces are commonly identified by DCA. We further demonstrate that DCA can differentiate between subfamilies with different binding modes within one large Pfam family. Sequence-derived contact information for the subfamilies proves sufficient to assemble accurate structural models of the diverse protein-oligomers. Thus, we provide an approach to investigate oligomerization for arbitrary protein families leading to structural models complementary to often-difficult experimental methods. Combined with ever more abundant sequential data, we anticipate that this study will be instrumental to allow the structural description of many heteroprotein complexes in the future.