Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 120(32): e2303647120, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37523521

RESUMO

Multimodal single-cell technologies profile multiple modalities for each cell simultaneously, enabling a more thorough characterization of cell populations. Existing dimension-reduction methods for multimodal data capture the "union of information," producing a lower-dimensional embedding that combines the information across modalities. While these tools are useful, we focus on a fundamentally different task of separating and quantifying the information among cells that is shared between the two modalities as well as unique to only one modality. Hence, we develop Tilted Canonical Correlation Analysis (Tilted-CCA), a method that decomposes a paired multimodal dataset into three lower-dimensional embeddings-one embedding captures the "intersection of information," representing the geometric relations among the cells that is common to both modalities, while the remaining two embeddings capture the "distinct information for a modality," representing the modality-specific geometric relations. We analyze single-cell multimodal datasets sequencing RNA along surface antibodies (i.e., CITE-seq) as well as RNA alongside chromatin accessibility (i.e., 10x) for blood cells and developing neurons via Tilted-CCA. These analyses show that Tilted-CCA enables meaningful visualization and quantification of the cross-modal information. Finally, Tilted-CCA's framework allows us to perform two specific downstream analyses. First, for single-cell datasets that simultaneously profile transcriptome and surface antibody markers, we show that Tilted-CCA helps design the target antibody panel to complement the transcriptome best. Second, for developmental single-cell datasets that simultaneously profile transcriptome and chromatin accessibility, we show that Tilted-CCA helps identify development-informative genes and distinguish between transient versus terminal cell types.


Assuntos
Algoritmos , Análise de Correlação Canônica , Transcriptoma , Análise de Célula Única/métodos
2.
Stat Med ; 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39250913

RESUMO

A mediation analysis approach is proposed for multiple exposures, multiple mediators, and a continuous scalar outcome under the linear structural equation modeling framework. It assumes that there exist orthogonal components that demonstrate parallel mediation mechanisms on the outcome, and thus is named principal component mediation analysis (PCMA). Likelihood-based estimators are introduced for simultaneous estimation of the component projections and effect parameters. The asymptotic distribution of the estimators is derived for low-dimensional data. A bootstrap procedure is introduced for inference. Simulation studies illustrate the superior performance of the proposed approach. Applied to a proteomics-imaging dataset from the Alzheimer's disease neuroimaging initiative (ADNI), the proposed framework identifies protein deposition - brain atrophy - memory deficit mechanisms consistent with existing knowledge and suggests potential AD pathology by integrating data collected from different modalities.

3.
Biostatistics ; 23(4): 1200-1217, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-35358296

RESUMO

Integrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets.


Assuntos
Transcriptoma , Simulação por Computador , Humanos
4.
Entropy (Basel) ; 25(12)2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38136486

RESUMO

In multiview data clustering, consistent or complementary information in the multiview data can achieve better clustering results. However, the high dimensions, lack of labeling, and redundancy of multiview data certainly affect the clustering effect, posing a challenge to multiview clustering. A clustering algorithm based on multiview feature selection clustering (MFSC), which combines similarity graph learning and unsupervised feature selection, is designed in this study. During the MFSC implementation, local manifold regularization is integrated into similarity graph learning, with the clustering label of similarity graph learning as the standard for unsupervised feature selection. MFSC can retain the characteristics of the clustering label on the premise of maintaining the manifold structure of multiview data. The algorithm is systematically evaluated using benchmark multiview and simulated data. The clustering experiment results prove that the MFSC algorithm is more effective than the traditional algorithm.

5.
Biometrics ; 78(3): 1018-1030, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-33792914

RESUMO

In this paper, we consider data consisting of multiple networks, each composed of a different edge set on a common set of nodes. Many models have been proposed for the analysis of such multiview network data under the assumption that the data views are closely related. In this paper, we provide tools for evaluating this assumption. In particular, we ask: given two networks that each follow a stochastic block model, is there an association between the latent community memberships of the nodes in the two networks? To answer this question, we extend the stochastic block model for a single network view to the two-view setting, and develop a new hypothesis test for the null hypothesis that the latent community memberships in the two data views are independent. We apply our test to protein-protein interaction data from the HINT database. We find evidence of a weak association between the latent community memberships of proteins defined with respect to binary interaction data and the latent community memberships of proteins defined with respect to cocomplex association data. We also extend this proposal to the setting of a network with node covariates. The proposed methods extend readily to three or more network/multivariate data views.


Assuntos
Algoritmos , Proteínas
6.
Artigo em Inglês | MEDLINE | ID: mdl-39055313

RESUMO

Alzheimer's disease (AD) is affecting a growing number of individuals. As a result, there is a pressing need for accurate and early diagnosis methods. This study aims to achieve this goal by developing an optimal data analysis strategy to enhance computational diagnosis. Although various modalities of AD diagnostic data are collected, past research on computational methods of AD diagnosis has mainly focused on using single-modal inputs. We hypothesize that integrating, or "fusing," various data modalities as inputs to prediction models could enhance diagnostic accuracy by offering a more comprehensive view of an individual's health profile. However, a potential challenge arises as this fusion of multiple modalities may result in significantly higher dimensional data. We hypothesize that employing suitable dimensionality reduction methods across heterogeneous modalities would not only help diagnosis models extract latent information but also enhance accuracy. Therefore, it is imperative to identify optimal strategies for both data fusion and dimensionality reduction. In this paper, we have conducted a comprehensive comparison of over 80 statistical machine learning methods, considering various classifiers, dimensionality reduction techniques, and data fusion strategies to assess our hypotheses. Specifically, we have explored three primary strategies: (1) Simple data fusion, which involves straightforward concatenation (fusion) of datasets before inputting them into a classifier; (2) Early data fusion, in which datasets are concatenated first, and then a dimensionality reduction technique is applied before feeding the resulting data into a classifier; and (3) Intermediate data fusion, in which dimensionality reduction methods are applied individually to each dataset before concatenating them to construct a classifier. For dimensionality reduction, we have explored several commonly-used techniques such as principal component analysis (PCA), autoencoder (AE), and LASSO. Additionally, we have implemented a new dimensionality-reduction method called the supervised encoder (SE), which involves slight modifications to standard deep neural networks. Our results show that SE substantially improves prediction accuracy compared to PCA, AE, and LASSO, especially in combination with intermediate fusion for multiclass diagnosis prediction.

7.
Artif Intell Med ; 143: 102605, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37673574

RESUMO

Machine learning (ML) has demonstrated its ability to exploit important relationships within data collection, which can be used in the diagnosis, treatment, and prediction of outcomes in a variety of clinical contexts. Anxiety mental disorder analysis is one of the pending difficulties that ML can help with. A thorough study is demanded to gain a better understanding of this illness. Since the anxiety data is generally multidimensional, which complicates processing and as a result of technology improvements, medical data from several perspectives, known as multiview data (MVD), is being collected. Each view has its own data type and feature values, so there is a lot of diversity. This work introduces a novel preprocessing feature selection (FS) approach, multiview harris hawk optimization (MHHO), which has the potential to reduce the dimensionality of anxiety data, hence reducing analytical effort. The uniqueness of MHHO originates from combining a multiview linking methodology with the power of the harris hawk optimization (HHO) method. The HHO is used to identify the lowest optimal MVD feature subset, while multiview linking is utilized to find a promising fitness function to direct the HHO FS while accounting for all data views' heterogeneity. The complexity of MHHO is O(THL2), where T is the number of iterations, H is the number of involved harris hawks, and L is the number of objects. Using two publicly available anxiety MVDs, MHHO is validated against ten recent rivals in its category. The experimental findings show that MHHO has a considerable advantage in terms of convergence speed (converging in less than ten iterations), subset size (removing 75% of the views; reducing feature size by 66%), and classification accuracy (approaching 100%). Furthermore, statistical analyses reveal that MHHO is statistically different from its competitors, bolstering its applicability. Finally, feature importance is evaluated, shedding light on the most anxiety-inducing characteristics. The likelihood of developing additional disorders (such as depression or stress) is also investigated.


Assuntos
Ansiedade , Falconiformes , Humanos , Animais , Transtornos de Ansiedade/diagnóstico , Algoritmos , Exercício Físico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA