RESUMO
Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically through sidechains, is an important need in protein design. However, constructing an all-atom generative model requires an appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded in the structure and sequence. We describe an all-atom diffusion model of protein structure, Protpardelle, which represents all sidechain states at once as a "superposition" state; superpositions defining a protein are collapsed into individual residue types and conformations during sample generation. When combined with sequence design methods, our model is able to codesign all-atom protein structure and sequence. Generated proteins are of good quality under the typical quality, diversity, and novelty metrics, and sidechains reproduce the chemical features and behavior of natural proteins. Finally, we explore the potential of our model to conduct all-atom protein design and scaffold functional motifs in a backbone- and rotamer-free way.
Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas , Proteínas/química , Sequência de AminoácidosRESUMO
Neural phenotypes are the result of probabilistic developmental processes. This means that stochasticity is an intrinsic aspect of the brain as it self-organizes over a protracted period. In other words, while both genomic and environmental factors shape the developing nervous system, another significant-though often neglected-contributor is the randomness introduced by probability distributions. Using generative modeling of brain networks, we provide a framework for probing the contribution of stochasticity to neurodevelopmental diversity. To mimic the prenatal scaffold of brain structure set by activity-independent mechanisms, we start our simulations from the medio-posterior neonatal rich club (Developing Human Connectome Project, n = 630). From this initial starting point, models implementing Hebbian-like wiring processes generate variable yet consistently plausible brain network topologies. By analyzing repeated runs of the generative process (>107 simulations), we identify critical determinants and effects of stochasticity. Namely, we find that stochastic variation has a greater impact on brain organization when networks develop under weaker constraints. This heightened stochasticity makes brain networks more robust to random and targeted attacks, but more often results in non-normative phenotypic outcomes. To test our framework empirically, we evaluated whether stochasticity varies according to the experience of early-life deprivation using a cohort of neurodiverse children (Centre for Attention, Learning and Memory; n = 357). We show that low-socioeconomic status predicts more stochastic brain wiring. We conclude that stochasticity may be an unappreciated contributor to relevant developmental outcomes and make specific predictions for future research.
Assuntos
Encéfalo , Aprendizagem , Criança , Recém-Nascido , Humanos , Processos EstocásticosRESUMO
Detecting differentially expressed genes is important for characterizing subpopulations of cells. In scRNA-seq data, however, nuisance variation due to technical factors like sequencing depth and RNA capture efficiency obscures the underlying biological signal. Deep generative models have been extensively applied to scRNA-seq data, with a special focus on embedding cells into a low-dimensional latent space and correcting for batch effects. However, little attention has been paid to the problem of utilizing the uncertainty from the deep generative model for differential expression (DE). Furthermore, the existing approaches do not allow for controlling for effect size or the false discovery rate (FDR). Here, we present lvm-DE, a generic Bayesian approach for performing DE predictions from a fitted deep generative model, while controlling the FDR. We apply the lvm-DE framework to scVI and scSphere, two deep generative models. The resulting approaches outperform state-of-the-art methods at estimating the log fold change in gene expression levels as well as detecting differentially expressed genes between subpopulations of cells.
Assuntos
RNA , Análise de Célula Única , Teorema de Bayes , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodosRESUMO
Recent advances in multiplexed single-cell transcriptomics experiments facilitate the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single-cell level for unseen dosages, cell types, time points, and species. Using newly generated single-cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single-cell Perturb-seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single-cell level and thus accelerate therapeutic applications using single-cell technologies.
Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Ensaios de Triagem em Larga Escala , Análise da Expressão Gênica de Célula ÚnicaRESUMO
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
RESUMO
Dynamic models of ongoing BOLD fMRI brain dynamics and models of communication strategies have been two important approaches to understanding how brain network structure constrains function. However, dynamic models have yet to widely incorporate one of the most important insights from communication models: the brain may not use all of its connections in the same way or at the same time. Here we present a variation of a phase delayed Kuramoto coupled oscillator model that dynamically limits communication between nodes on each time step. An active subgraph of the empirically derived anatomical brain network is chosen in accordance with the local dynamic state on every time step, thus coupling dynamics and network structure in a novel way. We analyze this model with respect to its fit to empirical time-averaged functional connectivity, finding that, with the addition of only one parameter, it significantly outperforms standard Kuramoto models with phase delays. We also perform analyses on the novel time series of active edges it produces, demonstrating a slowly evolving topology moving through intermittent episodes of integration and segregation. We hope to demonstrate that the exploration of novel modeling mechanisms and the investigation of dynamics of networks in addition to dynamics on networks may advance our understanding of the relationship between brain structure and function.
Assuntos
Encéfalo , Modelos Neurológicos , Humanos , Vias Neurais , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos , Imageamento por Ressonância Magnética/métodos , Rede Nervosa/diagnóstico por imagemRESUMO
PURPOSE: We introduce a framework that enables efficient sampling from learned probability distributions for MRI reconstruction. METHOD: Samples are drawn from the posterior distribution given the measured k-space using the Markov chain Monte Carlo (MCMC) method, different from conventional deep learning-based MRI reconstruction techniques. In addition to the maximum a posteriori estimate for the image, which can be obtained by maximizing the log-likelihood indirectly or directly, the minimum mean square error estimate and uncertainty maps can also be computed from those drawn samples. The data-driven Markov chains are constructed with the score-based generative model learned from a given image database and are independent of the forward operator that is used to model the k-space measurement. RESULTS: We numerically investigate the framework from these perspectives: (1) the interpretation of the uncertainty of the image reconstructed from undersampled k-space; (2) the effect of the number of noise scales used to train the generative models; (3) using a burn-in phase in MCMC sampling to reduce computation; (4) the comparison to conventional â 1 $$ {\ell}_1 $$ -wavelet regularized reconstruction; (5) the transferability of learned information; and (6) the comparison to fastMRI challenge. CONCLUSION: A framework is described that connects the diffusion process and advanced generative models with Markov chains. We demonstrate its flexibility in terms of contrasts and sampling patterns using advanced generative priors and the benefits of also quantifying the uncertainty for every pixel.
Assuntos
Algoritmos , Imageamento por Ressonância Magnética , Incerteza , Teorema de Bayes , Método de Monte CarloRESUMO
Technologies for single-cell profiling of the immune system have enabled researchers to extract rich interconnected networks of cellular abundance, phenotypical and functional cellular parameters. These studies can power machine learning approaches to understand the role of the immune system in various diseases. However, the performance of these approaches and the generalizability of the findings have been hindered by limited cohort sizes in translational studies, partially due to logistical demands and costs associated with longitudinal data collection in sufficiently large patient cohorts. An evolving challenge is the requirement for ever-increasing cohort sizes as the dimensionality of datasets grows. We propose a deep learning model derived from a novel pipeline of optimal temporal cell matching and overcomplete autoencoders that uses data from a small subset of patients to learn to forecast an entire patient's immune response in a high dimensional space from one timepoint to another. In our analysis of 1.08 million cells from patients pre- and post-surgical intervention, we demonstrate that the generated patient-specific data are qualitatively and quantitatively similar to real patient data by demonstrating fidelity, diversity, and usefulness.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Humanos , ProteômicaRESUMO
Although recent deep energy-based generative models (EBMs) have shown encouraging results in many image-generation tasks, how to take advantage of self-adversarial cogitation in deep EBMs to boost the performance of magnetic resonance imaging (MRI) reconstruction is still desired. With the successful application of deep learning in a wide range of MRI reconstructions, a line of emerging research involves formulating an optimization-based reconstruction method in the space of a generative model. Leveraging this, a novel regularization strategy is introduced in this article that takes advantage of self-adversarial cogitation of the deep energy-based model. More precisely, we advocate alternating learning by a more powerful energy-based model with maximum likelihood estimation to obtain the deep energy-based information, represented as a prior image. Simultaneously, implicit inference with Langevin dynamics is a unique property of reconstruction. In contrast to other generative models for reconstruction, the proposed method utilizes deep energy-based information as the image prior in reconstruction to improve the quality of image. Experimental results imply the proposed technique can obtain remarkable performance in terms of high reconstruction accuracy that is competitive with state-of-the-art methods, and which does not suffer from mode collapse. Algorithmically, an iterative approach is presented to strengthen EBM training with the gradient of energy network. The robustness and reproducibility of the algorithm were also experimentally validated. More importantly, the proposed reconstruction framework can be generalized for most MRI reconstruction scenarios.
Assuntos
Algoritmos , Imageamento por Ressonância Magnética , Reprodutibilidade dos Testes , Imageamento por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodosRESUMO
Mimicking bioactive conformations of peptide segments involved in the formation of protein-protein interfaces with small molecules is thought to represent a promising strategy for the design of protein-protein interaction (PPI) inhibitors. For compound design, the use of three-dimensional (3D) scaffolds rich in sp3-centers makes it possible to precisely mimic bioactive peptide conformations. Herein, we introduce DeepCubist, a molecular generator for designing peptidomimetics based on 3D scaffolds. Firstly, enumerated 3D scaffolds are superposed on a target peptide conformation to identify a preferred template structure for designing peptidomimetics. Secondly, heteroatoms and unsaturated bonds are introduced into the template via a deep generative model to produce candidate compounds. DeepCubist was applied to design peptidomimetics of exemplary peptide turn, helix, and loop structures in pharmaceutical targets engaging in PPIs.
Assuntos
Peptidomiméticos , Peptidomiméticos/farmacologia , Peptídeos/química , Proteínas/químicaRESUMO
Distinct scientific theories can make similar predictions. To adjudicate between theories, we must design experiments for which the theories make distinct predictions. Here we consider the problem of comparing deep neural networks as models of human visual recognition. To efficiently compare models' ability to predict human responses, we synthesize controversial stimuli: images for which different models produce distinct responses. We applied this approach to two visual recognition tasks, handwritten digits (MNIST) and objects in small natural images (CIFAR-10). For each task, we synthesized controversial stimuli to maximize the disagreement among models which employed different architectures and recognition algorithms. Human subjects viewed hundreds of these stimuli, as well as natural examples, and judged the probability of presence of each digit/object category in each image. We quantified how accurately each model predicted the human judgments. The best-performing models were a generative analysis-by-synthesis model (based on variational autoencoders) for MNIST and a hybrid discriminative-generative joint energy model for CIFAR-10. These deep neural networks (DNNs), which model the distribution of images, performed better than purely discriminative DNNs, which learn only to map images to labels. None of the candidate models fully explained the human responses. Controversial stimuli generalize the concept of adversarial examples, obviating the need to assume a ground-truth model. Unlike natural images, controversial stimuli are not constrained to the stimulus distribution models are trained on, thus providing severe out-of-distribution tests that reveal the models' inductive biases. Controversial stimuli therefore provide powerful probes of discrepancies between models and human perception.
Assuntos
Cognição/fisiologia , Aprendizado Profundo , Modelos Neurológicos , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Fisiológico de Modelo/fisiologia , Adulto , Feminino , Humanos , Masculino , Distribuição NormalRESUMO
Modeling and representing 3D shapes of the human body and face is a prominent field due to its applications in the healthcare, clothes, and movie industry. In our work, we tackled the problem of 3D face and body synthesis by reducing 3D meshes to 2D image representations. We show that the face can naturally be modeled on a 2D grid. At the same time, for more challenging 3D body geometries, we proposed a novel non-bijective 3D-2D conversion method representing the 3D body mesh as a plurality of rendered projections on the 2D grid. Then, we trained a state-of-the-art vector-quantized variational autoencoder (VQ-VAE-2) to learn a latent representation of 2D images and fit a PixelSNAIL autoregressive model to sample novel synthetic meshes. We evaluated our method versus a classical one based on principal component analysis (PCA) by sampling from the empirical cumulative distribution of the PCA scores. We used the empirical distributions of two commonly used metrics, specificity and diversity, to quantitatively demonstrate that the synthetic faces generated with our method are statistically closer to real faces when compared with the PCA ones. Our experiment on the 3D body geometry requires further research to match the test set statistics but shows promising results.
RESUMO
PURPOSE: Inter-scan motion is a substantial source of error in R1 estimation methods based on multiple volumes, for example, variable flip angle (VFA), and can be expected to increase at 7T where B1 fields are more inhomogeneous. The established correction scheme does not translate to 7T since it requires a body coil reference. Here we introduce two alternatives that outperform the established method. Since they compute relative sensitivities they do not require body coil images. THEORY: The proposed methods use coil-combined magnitude images to obtain the relative coil sensitivities. The first method efficiently computes the relative sensitivities via a simple ratio; the second by fitting a more sophisticated generative model. METHODS: R1 maps were computed using the VFA approach. Multiple datasets were acquired at 3T and 7T, with and without motion between the acquisition of the VFA volumes. R1 maps were constructed without correction, with the proposed corrections, and (at 3T) with the previously established correction scheme. The effect of the greater inhomogeneity in the transmit field at 7T was also explored by acquiring B1+ maps at each position. RESULTS: At 3T, the proposed methods outperform the baseline method. Inter-scan motion artifacts were also reduced at 7T. However, at 7T reproducibility only converged on that of the no motion condition if position-specific transmit field effects were also incorporated. CONCLUSION: The proposed methods simplify inter-scan motion correction of R1 maps and are applicable at both 3T and 7T, where a body coil is typically not available. The open-source code for all methods is made publicly available.
Assuntos
Artefatos , Imageamento por Ressonância Magnética , Imageamento por Ressonância Magnética/métodos , Movimento (Física) , Cintilografia , Reprodutibilidade dos TestesRESUMO
Exploring the origin of multi-target activity of small molecules and designing new multi-target compounds are highly topical issues in pharmaceutical research. We have investigated the ability of a generative neural network to create multi-target compounds. Data sets of experimentally confirmed multi-target, single-target, and consistently inactive compounds were extracted from public screening data considering positive and negative assay results. These data sets were used to fine-tune the REINVENT generative model via transfer learning to systematically recognize multi-target compounds, distinguish them from single-target or inactive compounds, and construct new multi-target compounds. During fine-tuning, the model showed a clear tendency to increasingly generate multi-target compounds and structural analogs. Our findings indicate that generative models can be adopted for de novo multi-target compound design.
Assuntos
Desenho de Fármacos , Redes Neurais de ComputaçãoRESUMO
Deep machine learning is expanding the conceptual framework and capacity of computational compound design, enabling new applications through generative modeling. We have explored the systematic design of covalent protein kinase inhibitors by learning from kinome-relevant chemical space, followed by focusing on an exemplary kinase of interest. Covalent inhibitors experience a renaissance in drug discovery, especially for targeting protein kinases. However, computational design of this class of inhibitors has thus far only been little investigated. To this end, we have devised a computational approach combining fragment-based design and deep generative modeling augmented by three-dimensional pharmacophore screening. This approach is thought to be particularly relevant for medicinal chemistry applications because it combines knowledge-based elements with deep learning and is chemically intuitive. As an exemplary application, we report for Bruton's tyrosine kinase (BTK), a major drug target for the treatment of inflammatory diseases and leukemia, the generation of novel candidate inhibitors with a specific chemically reactive group for covalent modification, requiring only little target-specific compound information to guide the design efforts. Newly generated compounds include known inhibitors and characteristic substructures and many novel candidates, thus lending credence to the computational approach, which is readily applicable to other targets.
Assuntos
Inibidores de Proteínas QuinasesRESUMO
To bridge the gap between preclinical cellular models of disease and in vivo imaging of human cognitive network dynamics, there is a pressing need for informative biophysical models. Here we assess dynamic causal models (DCM) of cortical network responses, as generative models of magnetoencephalographic observations during an auditory oddball roving paradigm in healthy adults. This paradigm induces robust perturbations that permeate frontotemporal networks, including an evoked 'mismatch negativity' response and transiently induced oscillations. Here, we probe GABAergic influences in the networks using double-blind placebo-controlled randomized-crossover administration of the GABA reuptake inhibitor, tiagabine (oral, 10 mg) in healthy older adults. We demonstrate the facility of conductance-based neural mass mean-field models, incorporating local synaptic connectivity, to investigate laminar-specific and GABAergic mechanisms of the auditory response. The neuronal model accurately recapitulated the observed magnetoencephalographic data. Using parametric empirical Bayes for optimal model inversion across both drug sessions, we identify the effect of tiagabine on GABAergic modulation of deep pyramidal and interneuronal cell populations. We found a transition of the main GABAergic drug effects from auditory cortex in standard trials to prefrontal cortex in deviant trials. The successful integration of pharmaco- magnetoencephalography with dynamic causal models of frontotemporal networks provides a potential platform on which to evaluate the effects of disease and pharmacological interventions.SIGNIFICANCE STATEMENT Understanding human brain function and developing new treatments require good models of brain function. We tested a detailed generative model of cortical microcircuits that accurately reproduced human magnetoencephalography, to quantify network dynamics and connectivity in frontotemporal cortex. This approach identified the effect of a test drug (GABA-reuptake inhibitor, tiagabine) on neuronal function (GABA-ergic dynamics), opening the way for psychopharmacological studies in health and disease with the mechanistic precision afforded by generative models of the brain.
Assuntos
Córtex Auditivo/diagnóstico por imagem , Lobo Frontal/diagnóstico por imagem , Modelos Neurológicos , Rede Nervosa/diagnóstico por imagem , Neurônios/fisiologia , Idoso , Córtex Auditivo/efeitos dos fármacos , Estudos Cross-Over , Método Duplo-Cego , Feminino , Lobo Frontal/efeitos dos fármacos , Inibidores da Captação de GABA/farmacologia , Humanos , Magnetoencefalografia/métodos , Masculino , Pessoa de Meia-Idade , Rede Nervosa/efeitos dos fármacos , Neurônios/efeitos dos fármacos , Tiagabina/farmacologiaRESUMO
Aspirin is considered a potential confound for functional magnetic resonance imaging (fMRI) studies. This is because aspirin affects the synthesis of prostaglandin, a vasoactive mediator centrally involved in neurovascular coupling, a process underlying blood oxygenated level dependent (BOLD) responses. Aspirin-induced changes in BOLD signal are a potential confound for fMRI studies of at-risk individuals or patients (e.g. with cardiovascular conditions or stroke) who receive low-dose aspirin prophylactically and are compared to healthy controls without aspirin. To examine the severity of this potential confound, we combined high field (7 Tesla) MRI during a simple hand movement task with a biophysically informed hemodynamic model. We compared elderly individuals receiving aspirin for primary or secondary prophylactic purposes versus age-matched volunteers without aspirin medication, testing for putative differences in BOLD responses. Specifically, we fitted hemodynamic models to BOLD responses from 14 regions activated by the task and examined whether model parameter estimates were significantly altered by aspirin. While our analyses indicate that hemodynamics differed across regions, consistent with the known regional variability of BOLD responses, we neither found a significant main effect of aspirin (i.e., an average effect across brain regions) nor an expected drug × region interaction. While our sample size is not sufficiently large to rule out small-to-medium global effects of aspirin, we had adequate statistical power for detecting the expected interaction. Altogether, our analysis suggests that patients with cardiovascular risk receiving low-dose aspirin for primary or secondary prophylactic purposes do not show strongly altered BOLD signals when compared to healthy controls without aspirin.
Assuntos
Aspirina , Doenças Cardiovasculares , Idoso , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico , Fatores de Risco de Doenças Cardíacas , Hemodinâmica , Humanos , Imageamento por Ressonância Magnética , Oxigênio , Fatores de RiscoRESUMO
The structure-activity relationship (SAR) matrix (SARM) methodology and data structure was originally developed to extract structurally related compound series from data sets of any composition, organize these series in matrices reminiscent of R-group tables, and visualize SAR patterns. The SARM approach combines the identification of structural relationships between series of active compounds with analog design, which is facilitated by systematically exploring combinations of core structures and substituents that have not been synthesized. The SARM methodology was extended through the introduction of DeepSARM, which added deep learning and generative modeling to target-based analog design by taking compound information from related targets into account to further increase structural novelty. Herein, we present the foundations of the SARM methodology and discuss how DeepSARM modeling can be adapted for the design of compounds with dual-target activity. Generating dual-target compounds represents an equally attractive and challenging task for polypharmacology-oriented drug discovery. The DeepSARM-based approach is illustrated using a computational proof-of-concept application focusing on the design of candidate inhibitors for two prominent anti-cancer targets.
Assuntos
Desenho de Fármacos , Descoberta de Drogas , Bibliotecas de Moléculas Pequenas/química , Humanos , Ligantes , Modelos Moleculares , Polifarmacologia , Bibliotecas de Moléculas Pequenas/farmacologia , Relação Estrutura-AtividadeRESUMO
Despite advances in our understanding of the geographic and temporal scope of the Paleolithic record, we know remarkably little about the evolutionary and ecological consequences of changes in human behavior. Recent inquiries suggest that human evolution reflects a long history of interconnections between the behavior of humans and their surrounding ecosystems (e.g., niche construction). Developing expectations to identify such phenomena is remarkably difficult because it requires understanding the multi-generational impacts of changes in behavior. These long-term dynamics require insights into the emergent phenomena that alter selective pressures over longer time periods which are not possible to observe, and are also not intuitive based on observations derived from ethnographic time scales. Generative models show promise for probing these potentially unexpected consequences of human-environment interaction. Changes in the uses of landscapes may have long term implications for the environments that hominins occupied. We explore other potential proxies of behavior and examine how modeling may provide expectations for a variety of phenomena.
Assuntos
Evolução Biológica , Ecossistema , Animais , Arqueologia , Dieta , Hominidae/fisiologia , Humanos , África do SulRESUMO
Archetypes represent extreme manifestations of a population with respect to specific characteristic traits or features. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. As mixing of archetypes is performed directly on the input data, linear Archetypal Analysis requires additivity of the input, which is a strong assumption unlikely to hold e.g. in case of image data. To address this problem, we propose learning an appropriate latent feature space while simultaneously identifying suitable archetypes. We thus introduce a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a deep variational information bottleneck and an optimal representation, together with the archetypes, can be learned end-to-end. Moreover, the information bottleneck framework allows for a natural incorporation of arbitrarily complex side information during training. As a consequence, learned archetypes become easily interpretable as they derive their meaning directly from the included side information. Applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. By using different kinds of side information we demonstrate how identified archetypes, along with their interpretation, largely depend on the side information provided. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11263-020-01390-3.