Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.309
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 185(6): 1065-1081.e23, 2022 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-35245431

RESUMO

Motor behaviors are often planned long before execution but only released after specific sensory events. Planning and execution are each associated with distinct patterns of motor cortex activity. Key questions are how these dynamic activity patterns are generated and how they relate to behavior. Here, we investigate the multi-regional neural circuits that link an auditory "Go cue" and the transition from planning to execution of directional licking. Ascending glutamatergic neurons in the midbrain reticular and pedunculopontine nuclei show short latency and phasic changes in spike rate that are selective for the Go cue. This signal is transmitted via the thalamus to the motor cortex, where it triggers a rapid reorganization of motor cortex state from planning-related activity to a motor command, which in turn drives appropriate movement. Our studies show how midbrain can control cortical dynamics via the thalamus for rapid and precise motor behavior.


Assuntos
Córtex Motor , Movimento , Tálamo , Animais , Mesencéfalo , Camundongos , Córtex Motor/fisiologia , Neurônios/fisiologia , Tálamo/fisiologia
2.
Cell ; 183(4): 954-967.e21, 2020 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-33058757

RESUMO

The curse of dimensionality plagues models of reinforcement learning and decision making. The process of abstraction solves this by constructing variables describing features shared by different instances, reducing dimensionality and enabling generalization in novel situations. Here, we characterized neural representations in monkeys performing a task described by different hidden and explicit variables. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training, which requires a particular geometry of neural representations. Neural ensembles in prefrontal cortex, hippocampus, and simulated neural networks simultaneously represented multiple variables in a geometry reflecting abstraction but that still allowed a linear classifier to decode a large number of other variables (high shattering dimensionality). Furthermore, this geometry changed in relation to task events and performance. These findings elucidate how the brain and artificial systems represent variables in an abstract format while preserving the advantages conferred by high shattering dimensionality.


Assuntos
Hipocampo/anatomia & histologia , Córtex Pré-Frontal/anatomia & histologia , Animais , Comportamento Animal , Mapeamento Encefálico , Simulação por Computador , Hipocampo/fisiologia , Aprendizagem , Macaca mulatta , Masculino , Modelos Neurológicos , Redes Neurais de Computação , Neurônios/fisiologia , Córtex Pré-Frontal/fisiologia , Reforço Psicológico , Análise e Desempenho de Tarefas
3.
Annu Rev Neurosci ; 45: 249-271, 2022 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-35316610

RESUMO

The brain plans and executes volitional movements. The underlying patterns of neural population activity have been explored in the context of movements of the eyes, limbs, tongue, and head in nonhuman primates and rodents. How do networks of neurons produce the slow neural dynamics that prepare specific movements and the fast dynamics that ultimately initiate these movements? Recent work exploits rapid and calibrated perturbations of neural activity to test specific dynamical systems models that are capable of producing the observed neural activity. These joint experimental and computational studies show that cortical dynamics during motor planning reflect fixed points of neural activity (attractors). Subcortical control signals reshape and move attractors over multiple timescales, causing commitment to specific actions and rapid transitions to movement execution. Experiments in rodents are beginning to reveal how these algorithms are implemented at the level of brain-wide neural circuits.


Assuntos
Córtex Motor , Algoritmos , Animais , Encéfalo/fisiologia , Córtex Motor/fisiologia , Movimento/fisiologia , Neurônios/fisiologia
4.
Proc Natl Acad Sci U S A ; 121(12): e2317284121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38478692

RESUMO

Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Filogenia , Aprendizado de Máquina
5.
Proc Natl Acad Sci U S A ; 121(10): e2319491121, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38427601

RESUMO

Translocation of cytoplasmic molecules to the plasma membrane is commonplace in cell signaling. Membrane localization has been hypothesized to increase intermolecular association rates; however, it has also been argued that association should be faster in the cytosol because membrane diffusion is slow. Here, we directly compare an identical association reaction, the binding of complementary DNA strands, in solution and on supported membranes. The measured rate constants show that for a 10-µm-radius spherical cell, association is 22- to 33-fold faster at the membrane than in the cytoplasm. The kinetic advantage depends on cell size and is essentially negligible for typical ~1 µm prokaryotic cells. The rate enhancement is attributable to a combination of higher encounter rates in two dimensions and a higher reaction probability per encounter.


Assuntos
Transdução de Sinais , Citoplasma/metabolismo , Membrana Celular/metabolismo , Citosol/metabolismo , Membranas , Cinética
6.
Trends Immunol ; 44(5): 329-332, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36997459

RESUMO

Profiling immune responses across several dimensions, including time, patients, molecular features, and tissue sites, can deepen our understanding of immunity as an integrated system. These studies require new analytical approaches to realize their full potential. We highlight recent applications of tensor methods and discuss several future opportunities.


Assuntos
Doenças Transmissíveis , Imunidade , Humanos
7.
Proc Natl Acad Sci U S A ; 120(48): e2311420120, 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-37988465

RESUMO

Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.

8.
Proc Natl Acad Sci U S A ; 120(40): e2303523120, 2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37748075

RESUMO

Sensorimotor transformation is the process of first sensing an object in the environment and then producing a movement in response to that stimulus. For visually guided saccades, neurons in the superior colliculus (SC) emit a burst of spikes to register the appearance of stimulus, and many of the same neurons discharge another burst to initiate the eye movement. We investigated whether the neural signatures of sensation and action in SC depend on context. Spiking activity along the dorsoventral axis was recorded with a laminar probe as Rhesus monkeys generated saccades to the same stimulus location in tasks that require either executive control to delay saccade onset until permission is granted or the production of an immediate response to a target whose onset is predictable. Using dimensionality reduction and discriminability methods, we show that the subspaces occupied during the visual and motor epochs were both distinct within each task and differentiable across tasks. Single-unit analyses, in contrast, show that the movement-related activity of SC neurons was not different between tasks. These results demonstrate that statistical features in neural activity of simultaneously recorded ensembles provide more insight than single neurons. They also indicate that cognitive processes associated with task requirements are multiplexed in SC population activity during both sensation and action and that downstream structures could use this activity to extract context. Additionally, the entire manifolds associated with sensory and motor responses, respectively, may be larger than the subspaces explored within a certain set of experiments.


Assuntos
Líquidos Corporais , Colículos Superiores , Animais , Movimentos Oculares , Neurônios , Macaca mulatta , Sensação
9.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38018908

RESUMO

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.


Assuntos
Multiômica , Neoplasias , Humanos
10.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37418278

RESUMO

Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Conformação Proteica , Proteínas/química
11.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36458451

RESUMO

In epistasis analysis, single-nucleotide polymorphism-single-nucleotide polymorphism interactions (SSIs) among genes may, alongside other environmental factors, influence the risk of multifactorial diseases. To identify SSI between cases and controls (i.e. binary traits), the score for model quality is affected by different objective functions (i.e. measurements) because of potential disease model preferences and disease complexities. Our previous study proposed a multiobjective approach-based multifactor dimensionality reduction (MOMDR), with the results indicating that two objective functions could enhance SSI identification with weak marginal effects. However, SSI identification using MOMDR remains a challenge because the optimal measure combination of objective functions has yet to be investigated. This study extended MOMDR to the many-objective version (i.e. many-objective MDR, MaODR) by integrating various disease probability measures based on a two-way contingency table to improve the identification of SSI between cases and controls. We introduced an objective function selection approach to determine the optimal measure combination in MaODR among 10 well-known measures. In total, 6 disease models with and 40 disease models without marginal effects were used to evaluate the general algorithms, namely those based on multifactor dimensionality reduction, MOMDR and MaODR. Our results revealed that the MaODR-based three objective function model, correct classification rate, likelihood ratio and normalized mutual information (MaODR-CLN) exhibited the higher 6.47% detection success rates (Accuracy) than MOMDR and higher 17.23% detection success rates than MDR through the application of an objective function selection approach. In a Wellcome Trust Case Control Consortium, MaODR-CLN successfully identified the significant SSIs (P < 0.001) associated with coronary artery disease. We performed a systematic analysis to identify the optimal measure combination in MaODR among 10 objective functions. Our combination detected SSIs-based binary traits with weak marginal effects and thus reduced spurious variables in the score model. MOAI is freely available at https://sites.google.com/view/maodr/home.


Assuntos
Epistasia Genética , Modelos Genéticos , Algoritmos , Fenótipo , Redução Dimensional com Múltiplos Fatores/métodos , Polimorfismo de Nucleotídeo Único
12.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37088976

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a revolutionary breakthrough that determines the precise gene expressions on individual cells and deciphers cell heterogeneity and subpopulations. However, scRNA-seq data are much noisier than traditional high-throughput RNA-seq data because of technical limitations, leading to many scRNA-seq data studies about dimensionality reduction and visualization remaining at the basic data-stacking stage. In this study, we propose an improved variational autoencoder model (termed DREAM) for dimensionality reduction and a visual analysis of scRNA-seq data. Here, DREAM combines the variational autoencoder and Gaussian mixture model for cell type identification, meanwhile explicitly solving 'dropout' events by introducing the zero-inflated layer to obtain the low-dimensional representation that describes the changes in the original scRNA-seq dataset. Benchmarking comparisons across nine scRNA-seq datasets show that DREAM outperforms four state-of-the-art methods on average. Moreover, we prove that DREAM can accurately capture the expression dynamics of human preimplantation embryonic development. DREAM is implemented in Python, freely available via the GitHub website, https://github.com/Crystal-JJ/DREAM.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Humanos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , RNA-Seq , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
13.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36946414

RESUMO

In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.


Assuntos
Proteínas , Software , Humanos , Sequência de Aminoácidos , Evolução Biológica
14.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113074

RESUMO

Optimizing and benchmarking data reduction methods for dynamic or spatial visualization and interpretation (DSVI) face challenges due to many factors, including data complexity, lack of ground truth, time-dependent metrics, dimensionality bias and different visual mappings of the same data. Current studies often focus on independent static visualization or interpretability metrics that require ground truth. To overcome this limitation, we propose the MIBCOVIS framework, a comprehensive and interpretable benchmarking and computational approach. MIBCOVIS enhances the visualization and interpretability of high-dimensional data without relying on ground truth by integrating five robust metrics, including a novel time-ordered Markov-based structural metric, into a semi-supervised hierarchical Bayesian model. The framework assesses method accuracy and considers interaction effects among metric features. We apply MIBCOVIS using linear and nonlinear dimensionality reduction methods to evaluate optimal DSVI for four distinct dynamic and spatial biological processes captured by three single-cell data modalities: CyTOF, scRNA-seq and CODEX. These data vary in complexity based on feature dimensionality, unknown cell types and dynamic or spatial differences. Unlike traditional single-summary score approaches, MIBCOVIS compares accuracy distributions across methods. Our findings underscore the joint evaluation of visualization and interpretability, rather than relying on separate metrics. We reveal that prioritizing average performance can obscure method feature performance. Additionally, we explore the impact of data complexity on visualization and interpretability. Specifically, we provide optimal parameters and features and recommend methods, like the optimized variational contractive autoencoder, for targeted DSVI for various data complexities. MIBCOVIS shows promise for evaluating dynamic single-cell atlases and spatiotemporal data reduction models.


Assuntos
Benchmarking , Análise de Célula Única , Teorema de Bayes , Análise de Célula Única/métodos
15.
Proc Natl Acad Sci U S A ; 119(26): e2113651119, 2022 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-35737842

RESUMO

The high-dimensional character of most biological systems presents genuine challenges for modeling and prediction. Here we propose a neural network-based approach for dimensionality reduction and analysis of biological gene expression data, using, as a case study, a well-known genetic network in the early Drosophila embryo, the gap gene patterning system. We build an autoencoder compressing the dynamics of spatial gap gene expression into a two-dimensional (2D) latent map. The resulting 2D dynamics suggests an almost linear model, with a small bare set of essential interactions. Maternally defined spatial modes control gap genes positioning, without the classically assumed intricate set of repressive gap gene interactions. This, surprisingly, predicts minimal changes of neighboring gap domains when knocking out gap genes, consistent with previous observations. Latent space geometries in maternal mutants are also consistent with the existence of such spatial modes. Finally, we show how positional information is well defined and interpretable as a polar angle in latent space. Our work illustrates how optimization of small neural networks on medium-sized biological datasets is sufficiently informative to capture essential underlying mechanisms of network function.


Assuntos
Proteínas de Drosophila , Redes Reguladoras de Genes , Redes Neurais de Computação , Animais , Drosophila/embriologia , Drosophila/genética , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Modelos Genéticos
16.
Biophys J ; 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38932456

RESUMO

Biomolecules often exhibit complex free energy landscapes in which long-lived metastable states are separated by large energy barriers. Overcoming these barriers to robustly sample transitions between the metastable states with classical molecular dynamics (MD) simulations presents a challenge. To circumvent this issue, collective variable (CV)-based enhanced sampling MD approaches are often employed. Traditional CV selection relies on intuition and prior knowledge of the system. This approach introduces bias, which can lead to incomplete mechanistic insights. Thus, automated CV detection is desired to gain a deeper understanding of the system/process. Analysis of MD data with various machine learning algorithms, such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA)-based approaches have been implemented for automated CV detection. However, their performance has not been systematically evaluated on structurally and mechanistically complex biological systems. Here, we applied these methods to MD simulations of the MFSD2A (Major Facilitator Superfamily Domain 2A) lysolipid transporter in multiple functionally relevant metastable states with the goal of identifying optimal CVs that would structurally discriminate these states. Specific emphasis was on the automated detection and interpretive power of LDA-based CVs. We found that LDA methods, which included a novel gradient descent-based multiclass harmonic variant, termed GDHLDA, we developed here, outperform PCA in class separation, exhibiting remarkable consistency in extracting CVs critical for distinguishing metastable states. Furthermore, the identified CVs included features previously associated with conformational transitions in MFSD2A. Specifically, conformational shifts in transmembrane helix 7 and in residue Y294 on this helix emerged as critical features discriminating the metastable states in MFSD2A. This highlights the effectiveness of LDA-based approaches in automatically extracting from MD trajectories CVs of functional relevance that can be used to drive biased MD simulations to efficiently sample conformational transitions in the molecular system.

17.
J Neurosci ; 43(29): 5350-5364, 2023 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-37217308

RESUMO

A sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: (1) the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation; and (2) this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous MEG and intracranial EEG. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than jabberwocky. Furthermore, multivariate decoding of normal versus jabberwocky confirmed three dynamic patterns: (1) a phasic pattern following each word, peaking in temporal and parietal areas; (2) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri; and (3) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition.SIGNIFICANCE STATEMENT Starting from general linguistic concepts, we make two sets of predictions in neural signals evoked by reading multiword sentences. First, the intrinsic dimensionality of the representation should grow with additional meaningful words. Second, the neural dynamics should exhibit signatures of encoding, maintaining, and resolving semantic composition. We successfully validated these hypotheses in deep neural language models, artificial neural networks trained on text and performing very well on many natural language processing tasks. Then, using a unique combination of MEG and intracranial electrodes, we recorded high-resolution brain data from human participants while they read a controlled set of sentences. Time-resolved dimensionality analysis showed increasing dimensionality with meaning, and multivariate decoding allowed us to isolate the three dynamical patterns we had hypothesized.


Assuntos
Encéfalo , Idioma , Masculino , Humanos , Feminino , Encéfalo/fisiologia , Semântica , Linguística , Mapeamento Encefálico/métodos , Leitura , Imageamento por Ressonância Magnética/métodos
18.
BMC Bioinformatics ; 25(1): 171, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38689234

RESUMO

BACKGROUND: Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS: This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS: Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.


Assuntos
RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , RNA-Seq/métodos , Análise por Conglomerados , Humanos , Análise de Sequência de RNA/métodos , Algoritmos , Redes Neurais de Computação , Análise da Expressão Gênica de Célula Única
19.
BMC Bioinformatics ; 25(1): 167, 2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38671342

RESUMO

BACKGROUND: Numerous transcriptomic-based models have been developed to predict or understand the fundamental mechanisms driving biological phenotypes. However, few models have successfully transitioned into clinical practice due to challenges associated with generalizability and interpretability. To address these issues, researchers have turned to dimensionality reduction methods and have begun implementing transfer learning approaches. METHODS: In this study, we aimed to determine the optimal combination of dimensionality reduction and regularization methods for predictive modeling. We applied seven dimensionality reduction methods to various datasets, including two supervised methods (linear optimal low-rank projection and low-rank canonical correlation analysis), two unsupervised methods [principal component analysis and consensus independent component analysis (c-ICA)], and three methods [autoencoder (AE), adversarial variational autoencoder, and c-ICA] within a transfer learning framework, trained on > 140,000 transcriptomic profiles. To assess the performance of the different combinations, we used a cross-validation setup encapsulated within a permutation testing framework, analyzing 30 different transcriptomic datasets with binary phenotypes. Furthermore, we included datasets with small sample sizes and phenotypes of varying degrees of predictability, and we employed independent datasets for validation. RESULTS: Our findings revealed that regularized models without dimensionality reduction achieved the highest predictive performance, challenging the necessity of dimensionality reduction when the primary goal is to achieve optimal predictive performance. However, models using AE and c-ICA with transfer learning for dimensionality reduction showed comparable performance, with enhanced interpretability and robustness of predictors, compared to models using non-dimensionality-reduced data. CONCLUSION: These findings offer valuable insights into the optimal combination of strategies for enhancing the predictive performance, interpretability, and generalizability of transcriptomic-based models.


Assuntos
Fenótipo , Transcriptoma , Transcriptoma/genética , Humanos , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , Biologia Computacional/métodos , Algoritmos , Análise de Componente Principal
20.
J Proteome Res ; 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38690713

RESUMO

Spatial segmentation is an essential processing method for image analysis aiming to identify the characteristic suborgans or microregions from mass spectrometry imaging (MSI) data, which is critical for understanding the spatial heterogeneity of biological information and function and the underlying molecular signatures. Due to the intrinsic characteristics of MSI data including spectral nonlinearity, high-dimensionality, and large data size, the common segmentation methods lack the capability for capturing the accurate microregions associated with biological functions. Here we proposed an ensemble learning-based spatial segmentation strategy, named eLIMS, that combines a randomized unified manifold approximation and projection (r-UMAP) dimensionality reduction module for extracting significant features and an ensemble pixel clustering module for aggregating the clustering maps from r-UMAP. Three MSI datasets are used to evaluate the performance of eLIMS, including mouse fetus, human adenocarcinoma, and mouse brain. Experimental results demonstrate that the proposed method has potential in partitioning the heterogeneous tissues into several subregions associated with anatomical structure, i.e., the suborgans of the brain region in mouse fetus data are identified as dorsal pallium, midbrain, and brainstem. Furthermore, it effectively discovers critical microregions related to physiological and pathological variations offering new insight into metabolic heterogeneity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA