RESUMO
BACKGROUND: Analysis of time-resolved postprandial metabolomics data can improve the understanding of metabolic mechanisms, potentially revealing biomarkers for early diagnosis of metabolic diseases and advancing precision nutrition and medicine. Postprandial metabolomics measurements at several time points from multiple subjects can be arranged as a subjects by metabolites by time points array. Traditional analysis methods are limited in terms of revealing subject groups, related metabolites, and temporal patterns simultaneously from such three-way data. RESULTS: We introduce an unsupervised multiway analysis approach based on the CANDECOMP/PARAFAC (CP) model for improved analysis of postprandial metabolomics data guided by a simulation study. Because of the lack of ground truth in real data, we generate simulated data using a comprehensive human metabolic model. This allows us to assess the performance of CP models in terms of revealing subject groups and underlying metabolic processes. We study three analysis approaches: analysis of fasting-state data using principal component analysis, T0-corrected data (i.e., data corrected by subtracting fasting-state data) using a CP model and full-dynamic (i.e., full postprandial) data using CP. Through extensive simulations, we demonstrate that CP models capture meaningful and stable patterns from simulated meal challenge data, revealing underlying mechanisms and differences between diseased versus healthy groups. CONCLUSIONS: Our experiments show that it is crucial to analyze both fasting-state and T0-corrected data for understanding metabolic differences among subject groups. Depending on the nature of the subject group structure, the best group separation may be achieved by CP models of T0-corrected or full-dynamic data. This study introduces an improved analysis approach for postprandial metabolomics data while also shedding light on the debate about correcting baseline values in longitudinal data analysis.
Assuntos
Medicina , Metabolômica , Humanos , Simulação por Computador , Análise de Dados , Nível de SaúdeRESUMO
INTRODUCTION: Longitudinal metabolomics data from a meal challenge test contains both fasting and dynamic signals, that may be related to metabolic health and diseases. Recent work has explored the multiway structure of time-resolved metabolomics data by arranging it as a three-way array with modes: subjects, metabolites, and time. The analysis of such dynamic data (where the fasting data is subtracted from postprandial states) reveals dynamic markers of various phenotypes, and differences between fasting and dynamic states. However, there is still limited success in terms of extracting static and dynamic biomarkers for the same subject stratifications. OBJECTIVES: Through joint analysis of fasting and dynamic metabolomics data, our goal is to capture static and dynamic biomarkers of a phenotype for the same subject stratifications providing a complete picture, that will be more effective for precision health. METHODS: We jointly analyze fasting and dynamic metabolomics data collected during a meal challenge test from the COPSAC 2000 cohort using coupled matrix and tensor factorizations (CMTF), where the dynamic data (subjects by metabolites by time) is coupled with the fasting data (subjects by metabolites) in the subjects mode. RESULTS: The proposed data fusion approach extracts shared subject stratifications in terms of BMI (body mass index) from fasting and dynamic signals as well as the static and dynamic metabolic biomarker patterns corresponding to those stratifications. Specifically, we observe a subject stratification showing the positive association with all fasting VLDLs and higher BMI. For the same subject stratification, a subset of dynamic VLDLs (mainly the smaller sizes) correlates negatively with higher BMI. Higher correlations of the subject quantifications with the phenotype of interest are observed using such a data fusion approach compared to individual analyses of the fasting and postprandial state. CONCLUSION: The CMTF-based approach provides a complete picture of static and dynamic biomarkers for the same subject stratifications-when markers are present in both fasting and dynamic states.
Assuntos
Biomarcadores , Jejum , Metabolômica , Período Pós-Prandial , Humanos , Biomarcadores/sangue , Biomarcadores/metabolismo , Metabolômica/métodos , Jejum/metabolismo , Masculino , Feminino , Adulto , Pessoa de Meia-IdadeRESUMO
INTRODUCTION: Analysis of time-resolved postprandial metabolomics data can improve our understanding of the human metabolism by revealing similarities and differences in postprandial responses of individuals. Traditional data analysis methods often rely on data summaries or univariate approaches focusing on one metabolite at a time. OBJECTIVES: Our goal is to provide a comprehensive picture in terms of the changes in the human metabolism in response to a meal challenge test, by revealing static and dynamic markers of phenotypes, i.e., subject stratifications, related clusters of metabolites, and their temporal profiles. METHODS: We analyze Nuclear Magnetic Resonance (NMR) spectroscopy measurements of plasma samples collected during a meal challenge test from 299 individuals from the COPSAC2000 cohort using a Nightingale NMR panel at the fasting and postprandial states (15, 30, 60, 90, 120, 150, 240 min). We investigate the postprandial dynamics of the metabolism as reflected in the dynamic behaviour of the measured metabolites. The data is arranged as a three-way array: subjects by metabolites by time. We analyze the fasting state data to reveal static patterns of subject group differences using principal component analysis (PCA), and fasting state-corrected postprandial data using the CANDECOMP/PARAFAC (CP) tensor factorization to reveal dynamic markers of group differences. RESULTS: Our analysis reveals dynamic markers consisting of certain metabolite groups and their temporal profiles showing differences among males according to their body mass index (BMI) in response to the meal challenge. We also show that certain lipoproteins relate to the group difference differently in the fasting vs. dynamic state. Furthermore, while similar dynamic patterns are observed in males and females, the BMI-related group difference is observed only in males in the dynamic state. CONCLUSION: The CP model is an effective approach to analyze time-resolved postprandial metabolomics data, and provides a compact but a comprehensive summary of the postprandial data revealing replicable and interpretable dynamic markers crucial to advance our understanding of changes in the metabolism in response to a meal challenge.
Assuntos
Metabolômica , Período Pós-Prandial , Humanos , Período Pós-Prandial/fisiologia , Masculino , Feminino , Metabolômica/métodos , Adulto , Jejum/metabolismo , Análise de Componente Principal , Espectroscopia de Ressonância Magnética/métodos , Pessoa de Meia-Idade , Análise de Dados , Metaboloma/fisiologiaRESUMO
BACKGROUND: Analysis of dynamic metabolomics data holds the promise to improve our understanding of underlying mechanisms in metabolism. For example, it may detect changes in metabolism due to the onset of a disease. Dynamic or time-resolved metabolomics data can be arranged as a three-way array with entries organized according to a subjects mode, a metabolites mode and a time mode. While such time-evolving multiway data sets are increasingly collected, revealing the underlying mechanisms and their dynamics from such data remains challenging. For such data, one of the complexities is the presence of a superposition of several sources of variation: induced variation (due to experimental conditions or inborn errors), individual variation, and measurement error. Multiway data analysis (also known as tensor factorizations) has been successfully used in data mining to find the underlying patterns in multiway data. To explore the performance of multiway data analysis methods in terms of revealing the underlying mechanisms in dynamic metabolomics data, simulated data with known ground truth can be studied. RESULTS: We focus on simulated data arising from different dynamic models of increasing complexity, i.e., a simple linear system, a yeast glycolysis model, and a human cholesterol model. We generate data with induced variation as well as individual variation. Systematic experiments are performed to demonstrate the advantages and limitations of multiway data analysis in analyzing such dynamic metabolomics data and their capacity to disentangle the different sources of variations. We choose to use simulations since we want to understand the capability of multiway data analysis methods which is facilitated by knowing the ground truth. CONCLUSION: Our numerical experiments demonstrate that despite the increasing complexity of the studied dynamic metabolic models, tensor factorization methods CANDECOMP/PARAFAC(CP) and Parallel Profiles with Linear Dependences (Paralind) can disentangle the sources of variations and thereby reveal the underlying mechanisms and their dynamics.
Assuntos
Metabolômica , Simulação por Computador , HumanosRESUMO
Investigating the biogeochemistry of dissolved organic matter (DOM) requires the synthesis of data from several complementary analytical techniques. The traditional approach to data synthesis is to search for correlations between measurements made on the same sample using different instruments. In contrast, data fusion simultaneously decomposes data from multiple instruments into the underlying shared and unshared components. Here, Advanced Coupled Matrix and Tensor Factorization (ACMTF) was used to identify the molecular fingerprint of DOM fluorescence fractions in Arctic fjords. ACMTF explained 99.84% of the variability with six fully shared components. Individual molecular formulas were linked to multiple fluorescence components and vice versa. Molecular fingerprints differed in diversity and oceanographic patterns, suggesting a link to the biogeochemical sources and diagenetic state of DOM. The fingerprints obtained through ACMTF were more specific compared to traditional correlation analysis and yielded greater compositional insight. Multivariate data fusion aligns extremely complex, heterogeneous DOM data sets and thus facilitates a more holistic understanding of DOM biogeochemistry.
RESUMO
Data fusion, that is, extracting information through the fusion of complementary data sets, is a topic of great interest in metabolomics because analytical platforms such as liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) spectroscopy commonly used for chemical profiling of biofluids provide complementary information. In this study, with a goal of forecasting acute coronary syndrome (ACS), breast cancer, and colon cancer, we jointly analyzed LC-MS, NMR measurements of plasma samples, and the metadata corresponding to the lifestyle of participants. We used supervised data fusion based on multiple kernel learning and exploited the linearity of the models to identify significant metabolites/features for the separation of healthy referents and the cases developing a disease. We demonstrated that (i) fusing LC-MS, NMR, and metadata provided better separation of ACS cases and referents compared with individual data sets, (ii) NMR data performed the best in terms of forecasting breast cancer, while fusion degraded the performance, and (iii) neither the individual data sets nor their fusion performed well for colon cancer. Furthermore, we showed the strengths and limitations of the fusion models by discussing their performance in terms of capturing known biomarkers for smoking and coffee. While fusion may improve performance in terms of separating certain conditions by jointly analyzing metabolomics and metadata sets, it is not necessarily always the best approach as in the case of breast cancer.
Assuntos
Síndrome Coronariana Aguda/diagnóstico , Neoplasias da Mama/diagnóstico , Neoplasias do Colo/diagnóstico , Metaboloma , Modelos Estatísticos , Síndrome Coronariana Aguda/sangue , Biomarcadores/sangue , Neoplasias da Mama/sangue , Cafeína/efeitos adversos , Cromatografia Líquida , Doença Crônica , Café/química , Neoplasias do Colo/sangue , Feminino , Humanos , Espectroscopia de Ressonância Magnética , Masculino , Espectrometria de Massas , Prognóstico , Fatores de Risco , Fumar/fisiopatologiaRESUMO
A previous study has shown effects of the New Nordic Diet (NND) to stimulate weight loss and lower systolic and diastolic blood pressure in obese Danish women and men in a randomized, controlled dietary intervention study. This work demonstrates long-term metabolic effects of the NND as compared with an Average Danish Diet (ADD) in blood plasma and reveals associations between metabolic changes and health beneficial effects of the NND including weight loss. A total of 145 individuals completed the intervention and blood samples were taken along with clinical examinations before the intervention started (week 0) and after 12 and 26 weeks. The plasma metabolome was measured using GC-MS, and the final metabolite table contained 144 variables. Significant and novel metabolic effects of the diet, resulting weight loss, gender, and intervention study season were revealed using PLS-DA and ASCA. Several metabolites reflecting specific differences in the diets, especially intake of plant foods and seafood, and in energy metabolism related to ketone bodies and gluconeogenesis formed the predominant metabolite pattern discriminating the intervention groups. Among NND subjects, higher levels of vaccenic acid and 3-hydroxybutanoic acid were related to a higher weight loss, while higher concentrations of salicylic, lactic, and N-aspartic acids and 1,5-anhydro-d-sorbitol were related to a lower weight loss. Specific gender and seasonal differences were also observed. The study strongly indicates that healthy diets high in fish, vegetables, fruit, and whole grain facilitated weight loss and improved insulin sensitivity by increasing ketosis and gluconeogenesis in the fasting state.
Assuntos
Dieta/métodos , Comportamento Alimentar/fisiologia , Metabolômica/métodos , Obesidade/dietoterapia , Adulto , Animais , Dinamarca , Dieta/normas , Grão Comestível , Feminino , Frutas , Cromatografia Gasosa-Espectrometria de Massas , Humanos , Resistência à Insulina , Estudos Longitudinais , Masculino , Metaboloma , Pessoa de Meia-Idade , Plasma/química , Plasma/metabolismo , Alimentos Marinhos , Estações do Ano , Fatores Sexuais , Verduras , Redução de Peso , Adulto JovemRESUMO
BACKGROUND: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.
Assuntos
Química/métodos , Estatística como Assunto/métodos , Cromatografia Líquida , Espectroscopia de Ressonância Magnética , Espectrometria de MassasRESUMO
Analysis of time-evolving data is crucial to understand the functioning of dynamic systems such as the brain. For instance, analysis of functional magnetic resonance imaging (fMRI) data collected during a task may reveal spatial regions of interest, and how they evolve during the task. However, capturing underlying spatial patterns as well as their change in time is challenging. The traditional approach in fMRI data analysis is to assume that underlying spatial regions of interest are static. In this article, using fractional amplitude of low-frequency fluctuations (fALFF) as an effective way to summarize the variability in fMRI data collected during a task, we arrange time-evolving fMRI data as a subjects by voxels by time windows tensor, and analyze the tensor using a tensor factorization-based approach called a PARAFAC2 model to reveal spatial dynamics. The PARAFAC2 model jointly analyzes data from multiple time windows revealing subject-mode patterns, evolving spatial regions (also referred to as networks) and temporal patterns. We compare the PARAFAC2 model with matrix factorization-based approaches relying on independent components, namely, joint independent component analysis (ICA) and independent vector analysis (IVA), commonly used in neuroimaging data analysis. We assess the performance of the methods in terms of capturing evolving networks through extensive numerical experiments demonstrating their modeling assumptions. In particular, we show that (i) PARAFAC2 provides a compact representation in all modes, i.e., subjects, time, and voxels, revealing temporal patterns as well as evolving spatial networks, (ii) joint ICA is as effective as PARAFAC2 in terms of revealing evolving networks but does not reveal temporal patterns, (iii) IVA's performance depends on sample size, data distribution and covariance structure of underlying networks. When these assumptions are satisfied, IVA is as accurate as the other methods, (iv) when subject-mode patterns differ from one time window to another, IVA is the most accurate. Furthermore, we analyze real fMRI data collected during a sensory motor task, and demonstrate that a component indicating statistically significant group difference between patients with schizophrenia and healthy controls is captured, which includes primary and secondary motor regions, cerebellum, and temporal lobe, revealing a meaningful spatial map and its temporal change.
RESUMO
Fusing complementary information from different modalities can lead to the discovery of more accurate diagnostic biomarkers for psychiatric disorders. However, biomarker discovery through data fusion is challenging since it requires extracting interpretable and reproducible patterns from data sets, consisting of shared/unshared patterns and of different orders. For example, multi-channel electroencephalography (EEG) signals from multiple subjects can be represented as a third-order tensor with modes: subject, time, and channel, while functional magnetic resonance imaging (fMRI) data may be in the form of subject by voxel matrices. Traditional data fusion methods rearrange higher-order tensors, such as EEG, as matrices to use matrix factorization-based approaches. In contrast, fusion methods based on coupled matrix and tensor factorizations (CMTF) exploit the potential multi-way structure of higher-order tensors. The CMTF approach has been shown to capture underlying patterns more accurately without imposing strong constraints on the latent neural patterns, i.e., biomarkers. In this paper, EEG, fMRI, and structural MRI (sMRI) data collected during an auditory oddball task (AOD) from a group of subjects consisting of patients with schizophrenia and healthy controls, are arranged as matrices and higher-order tensors coupled along the subject mode, and jointly analyzed using structure-revealing CMTF methods [also known as advanced CMTF (ACMTF)] focusing on unique identification of underlying patterns in the presence of shared/unshared patterns. We demonstrate that joint analysis of the EEG tensor and fMRI matrix using ACMTF reveals significant and biologically meaningful components in terms of differentiating between patients with schizophrenia and healthy controls while also providing spatial patterns with high resolution and improving the clustering performance compared to the analysis of only the EEG tensor. We also show that these patterns are reproducible, and study reproducibility for different model parameters. In comparison to the joint independent component analysis (jICA) data fusion approach, ACMTF provides easier interpretation of EEG data by revealing a single summary map of the topography for each component. Furthermore, fusion of sMRI data with EEG and fMRI through an ACMTF model provides structural patterns; however, we also show that when fusing data sets from multiple modalities, hence of very different nature, preprocessing plays a crucial role.
RESUMO
SCOPE: Self-reported dietary intake does not represent an objective unbiased assessment. The effect of the new Nordic diet (NND) versus average Danish diet (ADD) on plasma metabolic profiles is investigated to identify biomarkers of compliance and metabolic effects. METHODS AND RESULTS: In a 26-week controlled dietary intervention study, 146 subjects followed either NND, a predominantly organic diet high in fruit, vegetables, whole grains, and fish, or ADD, a diet higher in imported and processed foods. Fasting plasma samples are analyzed with untargeted ultra-performance liquid chromatography-quadruple time-of-flight. It is demonstrated that supervised machine learning with feature selection can separate NND and ADD samples with an average test set performance of up to 0.88 area under the curve. The NND plasma metabolome is characterized by diet-related metabolites, such as pipecolic acid betaine (whole grain), trimethylamine oxide, and prolyl hydroxyproline (both fish intake), while theobromine (chocolate) and proline betaine (citrus) were associated with ADD. Amino acid (i.e., indolelactic acid and hydroxy-3-methylbutyrate) and fat metabolism (butyryl carnitine) characterize ADD whereas NND is associated with higher concentrations of polyunsaturated phosphatidylcholines. CONCLUSIONS: The plasma metabolite profiles are predictive of dietary patterns and reflected good compliance while indicating effects of potential health benefit, including changes in fat metabolism and glucose utilization.
Assuntos
Biomarcadores/sangue , Dieta , Metabolômica/métodos , Adolescente , Adulto , Idoso , Metabolismo dos Carboidratos , Cromatografia Líquida de Alta Pressão , Ingestão de Alimentos , Jejum , Feminino , Humanos , Masculino , Espectrometria de Massas , Pessoa de Meia-Idade , Noruega , Fosfolipídeos/sangue , Fosfolipídeos/químicaRESUMO
MOTIVATION: The success or failure of an epilepsy surgery depends greatly on the localization of epileptic focus (origin of a seizure). We address the problem of identification of a seizure origin through an analysis of ictal electroencephalogram (EEG), which is proven to be an effective standard in epileptic focus localization. SUMMARY: With a goal of developing an automated and robust way of visual analysis of large amounts of EEG data, we propose a novel approach based on multiway models to study epilepsy seizure structure. Our contributions are 3-fold. First, we construct an Epilepsy Tensor with three modes, i.e. time samples, scales and electrodes, through wavelet analysis of multi-channel ictal EEG. Second, we demonstrate that multiway analysis techniques, in particular parallel factor analysis (PARAFAC), provide promising results in modeling the complex structure of an epilepsy seizure, localizing a seizure origin and extracting artifacts. Third, we introduce an approach for removing artifacts using multilinear subspace analysis and discuss its merits and drawbacks. RESULTS: Ictal EEG analysis of 10 seizures from 7 patients are included in this study. Our results for 8 seizures match with clinical observations in terms of seizure origin and extracted artifacts. On the other hand, for 2 of the seizures, seizure localization is not achieved using an initial trial of PARAFAC modeling. In these cases, first, we apply an artifact removal method and subsequently apply the PARAFAC model on the epilepsy tensor from which potential artifacts have been removed. This method successfully identifies the seizure origin in both cases.
Assuntos
Inteligência Artificial , Mapeamento Encefálico/métodos , Diagnóstico por Computador/métodos , Eletroencefalografia/métodos , Epilepsia/diagnóstico , Humanos , Análise Multivariada , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: Recently, we demonstrated that human mesenchymal stem cells (hMSC) stimulated with dexamethazone undergo gene focusing during osteogenic differentiation (Stem Cells Dev 14(6): 1608-20, 2005). Here, we examine the protein expression profiles of three additional populations of hMSC stimulated to undergo osteogenic differentiation via either contact with pro-osteogenic extracellular matrix (ECM) proteins (collagen I, vitronectin, or laminin-5) or osteogenic media supplements (OS media). Specifically, we annotate these four protein expression profiles, as well as profiles from naïve hMSC and differentiated human osteoblasts (hOST), with known gene ontologies and analyze them as a tensor with modes for the expressed proteins, gene ontologies, and stimulants. RESULTS: Direct component analysis in the gene ontology space identifies three components that account for 90% of the variance between hMSC, osteoblasts, and the four stimulated hMSC populations. The directed component maps the differentiation stages of the stimulated stem cell populations along the differentiation axis created by the difference in the expression profiles of hMSC and hOST. Surprisingly, hMSC treated with ECM proteins lie closer to osteoblasts than do hMSC treated with OS media. Additionally, the second component demonstrates that proteomic profiles of collagen I- and vitronectin-stimulated hMSC are distinct from those of OS-stimulated cells. A three-mode tensor analysis reveals additional focus proteins critical for characterizing the phenotypic variations between naïve hMSC, partially differentiated hMSC, and hOST. CONCLUSION: The differences between the proteomic profiles of OS-stimulated hMSC and ECM-hMSC characterize different transitional phenotypes en route to becoming osteoblasts. This conclusion is arrived at via a three-mode tensor analysis validated using hMSC plated on laminin-5.
Assuntos
Desenvolvimento Ósseo , Células-Tronco Mesenquimais/metabolismo , Osteoblastos/metabolismo , Proteômica , Diferenciação Celular , Humanos , Células-Tronco Mesenquimais/citologia , Osteoblastos/citologiaRESUMO
Tensor factorisations have proven useful to model amplitude and spectral information of brain recordings. Here, we assess the usefulness of tensor factorisations in the multiway analysis of other brain signal features in the context of complexity measures recently proposed to inspect multiscale dynamics. We consider the "refined composite multiscale entropy" (rcMSE), which computes entropy "profiles" showing levels of physiological complexity over temporal scales for individual signals. We compute the rcMSE of resting-state magnetoencephalogram (MEG) recordings from 36 patients with Alzheimer's disease and 26 control subjects. Instead of traditional simple visual examinations, we organise the entropy profiles as a three-way tensor to inspect relationships across temporal and spatial scales and subjects with multiway data analysis techniques based on PARAFAC and PARAFAC2 factorisations. A PARAFAC2 model with two factors was appropriate to account for the interactions in the entropy tensor between temporal scales and MEG channels for all subjects. Moreover, the PARAFAC2 factors had information related to the subjects' diagnosis, achieving a cross-validated area under the ROC curve of 0.77. This confirms the suitability of tensor factorisations to represent electrophysiological brain data efficiently despite the unsupervised nature of these techniques. This article is part of a Special Issue entitled 'Neural data analysis'.
Assuntos
Doença de Alzheimer/fisiopatologia , Eletroencefalografia/métodos , Magnetoencefalografia/métodos , Potenciais de Ação , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Encéfalo/fisiopatologia , Entropia , Feminino , Humanos , Masculino , Descanso , Estatística como AssuntoRESUMO
In many disciplines, data from multiple sources are acquired and jointly analyzed for enhanced knowledge discovery. For instance, in metabolomics, different analytical techniques are used to measure biological fluids in order to identify the chemicals related to certain diseases. It is widely-known that, some of these analytical methods, e.g., LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) spectroscopy, provide complementary data sets and their joint analysis may enable us to capture a larger proportion of the complete metabolome belonging to a specific biological system. Fusing data from multiple sources has proved useful in many fields including bioinformatics, signal processing and social network analysis. However, identification of common (shared) and individual (unshared) structures across multiple data sets remains a major challenge in data fusion studies. With a goal of addressing this challenge, we propose a novel unsupervised data fusion model. Our contributions are two-fold: (i) We formulate a data fusion model based on joint factorization of matrices and higher-order tensors, which can automatically reveal common and individual components. (ii) We demonstrate that the proposed approach provides promising results in joint analysis of metabolomics data sets consisting of fluorescence and NMR measurements of plasma samples in terms of separation of colorectal cancer patients from controls.
Assuntos
Neoplasias Colorretais/sangue , Neoplasias Colorretais/metabolismo , Biologia Computacional , Metabolômica , Algoritmos , Cromatografia Líquida , Simulação por Computador , Humanos , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , Metaboloma , Modelos Teóricos , Processamento de Sinais Assistido por ComputadorRESUMO
The structure/function relationship is fundamental to our understanding of biological systems at all levels, and drives most, if not all, techniques for detecting, diagnosing, and treating disease. However, at the tissue level of biological complexity we encounter a gap in the structure/function relationship: having accumulated an extraordinary amount of detailed information about biological tissues at the cellular and subcellular level, we cannot assemble it in a way that explains the correspondingly complex biological functions these structures perform. To help close this information gap we define here several quantitative temperospatial features that link tissue structure to its corresponding biological function. Both histological images of human tissue samples and fluorescence images of three-dimensional cultures of human cells are used to compare the accuracy of in vitro culture models with their corresponding human tissues. To the best of our knowledge, there is no prior work on a quantitative comparison of histology and in vitro samples. Features are calculated from graph theoretical representations of tissue structures and the data are analyzed in the form of matrices and higher-order tensors using matrix and tensor factorization methods, with a goal of differentiating between cancerous and healthy states of brain, breast, and bone tissues. We also show that our techniques can differentiate between the structural organization of native tissues and their corresponding in vitro engineered cell culture models.
Assuntos
Neoplasias Ósseas/patologia , Osso e Ossos/anatomia & histologia , Encéfalo/anatomia & histologia , Neoplasias da Mama/patologia , Mama/anatomia & histologia , Glioma/patologia , Algoritmos , Osso e Ossos/citologia , Encéfalo/citologia , Mama/citologia , Técnicas de Cultura de Células , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Modelos AnatômicosRESUMO
BACKGROUND: Systems biology refers to multidisciplinary approaches designed to uncover emergent properties of biological systems. Stem cells are an attractive target for this analysis, due to their broad therapeutic potential. A central theme of systems biology is the use of computational modeling to reconstruct complex systems from a wealth of reductionist, molecular data (e.g., gene/protein expression, signal transduction activity, metabolic activity, etc.). A number of deterministic, probabilistic, and statistical learning models are used to understand sophisticated cellular behaviors such as protein expression during cellular differentiation and the activity of signaling networks. However, many of these models are bimodal i.e., they only consider row-column relationships. In contrast, multiway modeling techniques (also known as tensor models) can analyze multimodal data, which capture much more information about complex behaviors such as cell differentiation. In particular, tensors can be very powerful tools for modeling the dynamic activity of biological networks over time. Here, we review the application of systems biology to stem cells and illustrate application of tensor analysis to model collagen-induced osteogenic differentiation of human mesenchymal stem cells. RESULTS: We applied Tucker1, Tucker3, and Parallel Factor Analysis (PARAFAC) models to identify protein/gene expression patterns during extracellular matrix-induced osteogenic differentiation of human mesenchymal stem cells. In one case, we organized our data into a tensor of type protein/gene locus link x gene ontology category x osteogenic stimulant, and found that our cells expressed two distinct, stimulus-dependent sets of functionally related genes as they underwent osteogenic differentiation. In a second case, we organized DNA microarray data in a three-way tensor of gene IDs x osteogenic stimulus x replicates, and found that application of tensile strain to a collagen I substrate accelerated the osteogenic differentiation induced by a static collagen I substrate. CONCLUSION: Our results suggest gene- and protein-level models whereby stem cells undergo transdifferentiation to osteoblasts, and lay the foundation for mechanistic, hypothesis-driven studies. Our analysis methods are applicable to a wide range of stem cell differentiation models.
Assuntos
Modelos Biológicos , Células-Tronco/citologia , Biologia de Sistemas , Diferenciação Celular , Regulação da Expressão Gênica , Humanos , Células-Tronco Mesenquimais/citologia , Células-Tronco Mesenquimais/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Osteogênese/genética , Células-Tronco/metabolismo , Fatores de TempoRESUMO
With a goal of automating visual analysis of electroencephalogram (EEG) data and assessing the performance of various features in seizure recognition, we introduce a mathematical model capable of recognizing patient-specific epileptic seizures with high accuracy. We represent multi-channel scalp EEG using a set of features. These features expected to have distinct trends during seizure and non-seizure periods include features from both time and frequency domains. The contributions of this paper are threefold. First, we rearrange multi-channel EEG signals as a third-order tensor called an Epilepsy Feature Tensor with modes: time epochs, features and electrodes. Second, we model the Epilepsy Feature Tensor using a multilinear regression model, i.e., Multilinear Partial Least Squares regression, which is the generalization of Partial Least Squares (PLS) regression to higher-order datasets. This two-step approach facilitates EEG data analysis from multiple electrodes represented by several features from different domains. Third, we identify which features are more significant for seizure recognition. Our results based on the analysis of 19 seizures from 5 epileptic patients demonstrate that multiway analysis of an Epilepsy Feature Tensor can detect (patient-specific) seizures with classification accuracy ranging between 77-96%.