Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.356
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 185(6): 1065-1081.e23, 2022 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-35245431

RESUMO

Motor behaviors are often planned long before execution but only released after specific sensory events. Planning and execution are each associated with distinct patterns of motor cortex activity. Key questions are how these dynamic activity patterns are generated and how they relate to behavior. Here, we investigate the multi-regional neural circuits that link an auditory "Go cue" and the transition from planning to execution of directional licking. Ascending glutamatergic neurons in the midbrain reticular and pedunculopontine nuclei show short latency and phasic changes in spike rate that are selective for the Go cue. This signal is transmitted via the thalamus to the motor cortex, where it triggers a rapid reorganization of motor cortex state from planning-related activity to a motor command, which in turn drives appropriate movement. Our studies show how midbrain can control cortical dynamics via the thalamus for rapid and precise motor behavior.


Assuntos
Córtex Motor , Movimento , Tálamo , Animais , Mesencéfalo , Camundongos , Córtex Motor/fisiologia , Neurônios/fisiologia , Tálamo/fisiologia
2.
Cell ; 183(4): 954-967.e21, 2020 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-33058757

RESUMO

The curse of dimensionality plagues models of reinforcement learning and decision making. The process of abstraction solves this by constructing variables describing features shared by different instances, reducing dimensionality and enabling generalization in novel situations. Here, we characterized neural representations in monkeys performing a task described by different hidden and explicit variables. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training, which requires a particular geometry of neural representations. Neural ensembles in prefrontal cortex, hippocampus, and simulated neural networks simultaneously represented multiple variables in a geometry reflecting abstraction but that still allowed a linear classifier to decode a large number of other variables (high shattering dimensionality). Furthermore, this geometry changed in relation to task events and performance. These findings elucidate how the brain and artificial systems represent variables in an abstract format while preserving the advantages conferred by high shattering dimensionality.


Assuntos
Hipocampo/anatomia & histologia , Córtex Pré-Frontal/anatomia & histologia , Animais , Comportamento Animal , Mapeamento Encefálico , Simulação por Computador , Hipocampo/fisiologia , Aprendizagem , Macaca mulatta , Masculino , Modelos Neurológicos , Redes Neurais de Computação , Neurônios/fisiologia , Córtex Pré-Frontal/fisiologia , Reforço Psicológico , Análise e Desempenho de Tarefas
3.
Annu Rev Neurosci ; 45: 249-271, 2022 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-35316610

RESUMO

The brain plans and executes volitional movements. The underlying patterns of neural population activity have been explored in the context of movements of the eyes, limbs, tongue, and head in nonhuman primates and rodents. How do networks of neurons produce the slow neural dynamics that prepare specific movements and the fast dynamics that ultimately initiate these movements? Recent work exploits rapid and calibrated perturbations of neural activity to test specific dynamical systems models that are capable of producing the observed neural activity. These joint experimental and computational studies show that cortical dynamics during motor planning reflect fixed points of neural activity (attractors). Subcortical control signals reshape and move attractors over multiple timescales, causing commitment to specific actions and rapid transitions to movement execution. Experiments in rodents are beginning to reveal how these algorithms are implemented at the level of brain-wide neural circuits.


Assuntos
Córtex Motor , Algoritmos , Animais , Encéfalo/fisiologia , Córtex Motor/fisiologia , Movimento/fisiologia , Neurônios/fisiologia
4.
Proc Natl Acad Sci U S A ; 121(12): e2317284121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38478692

RESUMO

Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Filogenia , Aprendizado de Máquina
5.
Proc Natl Acad Sci U S A ; 121(45): e2417688121, 2024 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-39475648

RESUMO

Mining of electronic health records (EHR) promises to automate the identification of comprehensive disease phenotypes. However, the realization of this promise is hindered by the unavailability of generalizable ground-truth information, data incompleteness and heterogeneity, and the lack of generalization to multiple cohorts. We present here a data-driven approach to identify clinical states that we implement for 585 critical care patients with suspected pneumonia recruited by the SCRIPT study, which we compare to and integrate with 9,918 pneumonia patients from the MIMIC-IV dataset. We extract and curate from their structured EHRs a primary set of clinical features (53 and 59 features for SCRIPT and MIMIC-IV, respectively), including disease severity scores, vital signs, and so on, at various degrees of completeness. We aggregate irregular time series into daily frequency, resulting in 12,495 and 94,684 patient-day pairs for SCRIPT and MIMIC, respectively. We define a "common-sense" ground truth that we then use in a semisupervised pipeline to optimize choices for data preprocessing, and reduce the feature space to four principal components. We describe and validate an ensemble-based clustering method that enables us to robustly identify five clinical states, and use a Gaussian mixture model to quantify uncertainty in cluster assignment. Demonstrating the clinical relevance of the identified states, we find that three states are strongly associated with disease outcomes (dying vs. recovering), while the other two reflect disease etiology. The outcome associated clinical states provide significantly increased discrimination of mortality rates over standard approaches.


Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Pneumonia , Humanos , Pneumonia/mortalidade , Pneumonia/epidemiologia , Mineração de Dados/métodos , Masculino , Feminino , Análise por Conglomerados
6.
Proc Natl Acad Sci U S A ; 121(35): e2400082121, 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39178232

RESUMO

To efficiently yet reliably represent and process information, our brains need to produce information-rich signals that differentiate between moments or cognitive states, while also being robust to noise or corruption. For many, though not all, natural systems, these two properties are often inversely related: More information-rich signals are less robust, and vice versa. Here, we examined how these properties change with ongoing cognitive demands. To this end, we applied dimensionality reduction algorithms and pattern classifiers to functional neuroimaging data collected as participants listened to a story, temporally scrambled versions of the story, or underwent a resting state scanning session. We considered two primary aspects of the neural data recorded in these different experimental conditions. First, we treated the maximum achievable decoding accuracy across participants as an indicator of the "informativeness" of the recorded patterns. Second, we treated the number of features (components) required to achieve a threshold decoding accuracy as a proxy for the "compressibility" of the neural patterns (where fewer components indicate greater compression). Overall, we found that the peak decoding accuracy (achievable without restricting the numbers of features) was highest in the intact (unscrambled) story listening condition. However, the number of features required to achieve comparable classification accuracy was also lowest in the intact story listening condition. Taken together, our work suggests that our brain networks flexibly reconfigure according to ongoing task demands and that the activity patterns associated with higher-order cognition and high engagement are both more informative and more compressible than the activity patterns associated with lower-order tasks and lower engagement.


Assuntos
Encéfalo , Cognição , Imageamento por Ressonância Magnética , Humanos , Cognição/fisiologia , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Masculino , Feminino , Adulto , Imageamento por Ressonância Magnética/métodos , Mapeamento Encefálico/métodos , Adulto Jovem , Algoritmos
7.
Proc Natl Acad Sci U S A ; 121(10): e2319491121, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38427601

RESUMO

Translocation of cytoplasmic molecules to the plasma membrane is commonplace in cell signaling. Membrane localization has been hypothesized to increase intermolecular association rates; however, it has also been argued that association should be faster in the cytosol because membrane diffusion is slow. Here, we directly compare an identical association reaction, the binding of complementary DNA strands, in solution and on supported membranes. The measured rate constants show that for a 10-µm-radius spherical cell, association is 22- to 33-fold faster at the membrane than in the cytoplasm. The kinetic advantage depends on cell size and is essentially negligible for typical ~1 µm prokaryotic cells. The rate enhancement is attributable to a combination of higher encounter rates in two dimensions and a higher reaction probability per encounter.


Assuntos
Transdução de Sinais , Citoplasma/metabolismo , Membrana Celular/metabolismo , Citosol/metabolismo , Membranas , Cinética
8.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39327063

RESUMO

Dimensionality reduction and clustering are crucial tasks in single-cell RNA sequencing (scRNA-seq) data analysis, treated independently in the current process, hindering their mutual benefits. The latest methods jointly optimize these tasks through deep clustering. However, contrastive learning, with powerful representation capability, can bridge the gap that common deep clustering methods face, which requires pre-defined cluster centers. Therefore, a dual-level contrastive clustering method with nonuniform sampling (nsDCC) is proposed for scRNA-seq data analysis. Dual-level contrastive clustering, which combines instance-level contrast and cluster-level contrast, jointly optimizes dimensionality reduction and clustering. Multi-positive contrastive learning and unit matrix constraint are introduced in instance- and cluster-level contrast, respectively. Furthermore, the attention mechanism is introduced to capture inter-cellular information, which is beneficial for clustering. The nsDCC focuses on important samples at category boundaries and in minority categories by the proposed nearest boundary sparsest density weight assignment algorithm, making it capable of capturing comprehensive characteristics against imbalanced datasets. Experimental results show that nsDCC outperforms the six other state-of-the-art methods on both real and simulated scRNA-seq data, validating its performance on dimensionality reduction and clustering of scRNA-seq data, especially for imbalanced data. Simulation experiments demonstrate that nsDCC is insensitive to "dropout events" in scRNA-seq. Finally, cluster differential expressed gene analysis confirms the meaningfulness of results from nsDCC. In summary, nsDCC is a new way of analyzing and understanding scRNA-seq data.


Assuntos
Algoritmos , RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , RNA-Seq/métodos , Humanos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Análise da Expressão Gênica de Célula Única
9.
Trends Immunol ; 44(5): 329-332, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36997459

RESUMO

Profiling immune responses across several dimensions, including time, patients, molecular features, and tissue sites, can deepen our understanding of immunity as an integrated system. These studies require new analytical approaches to realize their full potential. We highlight recent applications of tensor methods and discuss several future opportunities.


Assuntos
Doenças Transmissíveis , Imunidade , Humanos
10.
Proc Natl Acad Sci U S A ; 120(48): e2311420120, 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-37988465

RESUMO

Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.

11.
Proc Natl Acad Sci U S A ; 120(40): e2303523120, 2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37748075

RESUMO

Sensorimotor transformation is the process of first sensing an object in the environment and then producing a movement in response to that stimulus. For visually guided saccades, neurons in the superior colliculus (SC) emit a burst of spikes to register the appearance of stimulus, and many of the same neurons discharge another burst to initiate the eye movement. We investigated whether the neural signatures of sensation and action in SC depend on context. Spiking activity along the dorsoventral axis was recorded with a laminar probe as Rhesus monkeys generated saccades to the same stimulus location in tasks that require either executive control to delay saccade onset until permission is granted or the production of an immediate response to a target whose onset is predictable. Using dimensionality reduction and discriminability methods, we show that the subspaces occupied during the visual and motor epochs were both distinct within each task and differentiable across tasks. Single-unit analyses, in contrast, show that the movement-related activity of SC neurons was not different between tasks. These results demonstrate that statistical features in neural activity of simultaneously recorded ensembles provide more insight than single neurons. They also indicate that cognitive processes associated with task requirements are multiplexed in SC population activity during both sensation and action and that downstream structures could use this activity to extract context. Additionally, the entire manifolds associated with sensory and motor responses, respectively, may be larger than the subspaces explored within a certain set of experiments.


Assuntos
Líquidos Corporais , Colículos Superiores , Animais , Movimentos Oculares , Neurônios , Macaca mulatta , Sensação
12.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38018908

RESUMO

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.


Assuntos
Multiômica , Neoplasias , Humanos
13.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37418278

RESUMO

Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Conformação Proteica , Proteínas/química
14.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36458451

RESUMO

In epistasis analysis, single-nucleotide polymorphism-single-nucleotide polymorphism interactions (SSIs) among genes may, alongside other environmental factors, influence the risk of multifactorial diseases. To identify SSI between cases and controls (i.e. binary traits), the score for model quality is affected by different objective functions (i.e. measurements) because of potential disease model preferences and disease complexities. Our previous study proposed a multiobjective approach-based multifactor dimensionality reduction (MOMDR), with the results indicating that two objective functions could enhance SSI identification with weak marginal effects. However, SSI identification using MOMDR remains a challenge because the optimal measure combination of objective functions has yet to be investigated. This study extended MOMDR to the many-objective version (i.e. many-objective MDR, MaODR) by integrating various disease probability measures based on a two-way contingency table to improve the identification of SSI between cases and controls. We introduced an objective function selection approach to determine the optimal measure combination in MaODR among 10 well-known measures. In total, 6 disease models with and 40 disease models without marginal effects were used to evaluate the general algorithms, namely those based on multifactor dimensionality reduction, MOMDR and MaODR. Our results revealed that the MaODR-based three objective function model, correct classification rate, likelihood ratio and normalized mutual information (MaODR-CLN) exhibited the higher 6.47% detection success rates (Accuracy) than MOMDR and higher 17.23% detection success rates than MDR through the application of an objective function selection approach. In a Wellcome Trust Case Control Consortium, MaODR-CLN successfully identified the significant SSIs (P < 0.001) associated with coronary artery disease. We performed a systematic analysis to identify the optimal measure combination in MaODR among 10 objective functions. Our combination detected SSIs-based binary traits with weak marginal effects and thus reduced spurious variables in the score model. MOAI is freely available at https://sites.google.com/view/maodr/home.


Assuntos
Epistasia Genética , Modelos Genéticos , Algoritmos , Fenótipo , Redução Dimensional com Múltiplos Fatores/métodos , Polimorfismo de Nucleotídeo Único
15.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37088976

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a revolutionary breakthrough that determines the precise gene expressions on individual cells and deciphers cell heterogeneity and subpopulations. However, scRNA-seq data are much noisier than traditional high-throughput RNA-seq data because of technical limitations, leading to many scRNA-seq data studies about dimensionality reduction and visualization remaining at the basic data-stacking stage. In this study, we propose an improved variational autoencoder model (termed DREAM) for dimensionality reduction and a visual analysis of scRNA-seq data. Here, DREAM combines the variational autoencoder and Gaussian mixture model for cell type identification, meanwhile explicitly solving 'dropout' events by introducing the zero-inflated layer to obtain the low-dimensional representation that describes the changes in the original scRNA-seq dataset. Benchmarking comparisons across nine scRNA-seq datasets show that DREAM outperforms four state-of-the-art methods on average. Moreover, we prove that DREAM can accurately capture the expression dynamics of human preimplantation embryonic development. DREAM is implemented in Python, freely available via the GitHub website, https://github.com/Crystal-JJ/DREAM.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Humanos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , RNA-Seq , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
16.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36946414

RESUMO

In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.


Assuntos
Proteínas , Software , Humanos , Sequência de Aminoácidos , Evolução Biológica
17.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113074

RESUMO

Optimizing and benchmarking data reduction methods for dynamic or spatial visualization and interpretation (DSVI) face challenges due to many factors, including data complexity, lack of ground truth, time-dependent metrics, dimensionality bias and different visual mappings of the same data. Current studies often focus on independent static visualization or interpretability metrics that require ground truth. To overcome this limitation, we propose the MIBCOVIS framework, a comprehensive and interpretable benchmarking and computational approach. MIBCOVIS enhances the visualization and interpretability of high-dimensional data without relying on ground truth by integrating five robust metrics, including a novel time-ordered Markov-based structural metric, into a semi-supervised hierarchical Bayesian model. The framework assesses method accuracy and considers interaction effects among metric features. We apply MIBCOVIS using linear and nonlinear dimensionality reduction methods to evaluate optimal DSVI for four distinct dynamic and spatial biological processes captured by three single-cell data modalities: CyTOF, scRNA-seq and CODEX. These data vary in complexity based on feature dimensionality, unknown cell types and dynamic or spatial differences. Unlike traditional single-summary score approaches, MIBCOVIS compares accuracy distributions across methods. Our findings underscore the joint evaluation of visualization and interpretability, rather than relying on separate metrics. We reveal that prioritizing average performance can obscure method feature performance. Additionally, we explore the impact of data complexity on visualization and interpretability. Specifically, we provide optimal parameters and features and recommend methods, like the optimized variational contractive autoencoder, for targeted DSVI for various data complexities. MIBCOVIS shows promise for evaluating dynamic single-cell atlases and spatiotemporal data reduction models.


Assuntos
Benchmarking , Análise de Célula Única , Teorema de Bayes , Análise de Célula Única/métodos
18.
Proc Natl Acad Sci U S A ; 119(26): e2113651119, 2022 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-35737842

RESUMO

The high-dimensional character of most biological systems presents genuine challenges for modeling and prediction. Here we propose a neural network-based approach for dimensionality reduction and analysis of biological gene expression data, using, as a case study, a well-known genetic network in the early Drosophila embryo, the gap gene patterning system. We build an autoencoder compressing the dynamics of spatial gap gene expression into a two-dimensional (2D) latent map. The resulting 2D dynamics suggests an almost linear model, with a small bare set of essential interactions. Maternally defined spatial modes control gap genes positioning, without the classically assumed intricate set of repressive gap gene interactions. This, surprisingly, predicts minimal changes of neighboring gap domains when knocking out gap genes, consistent with previous observations. Latent space geometries in maternal mutants are also consistent with the existence of such spatial modes. Finally, we show how positional information is well defined and interpretable as a polar angle in latent space. Our work illustrates how optimization of small neural networks on medium-sized biological datasets is sufficiently informative to capture essential underlying mechanisms of network function.


Assuntos
Proteínas de Drosophila , Redes Reguladoras de Genes , Redes Neurais de Computação , Animais , Drosophila/embriologia , Drosophila/genética , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Modelos Genéticos
19.
J Neurosci ; 43(29): 5350-5364, 2023 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-37217308

RESUMO

A sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: (1) the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation; and (2) this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous MEG and intracranial EEG. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than jabberwocky. Furthermore, multivariate decoding of normal versus jabberwocky confirmed three dynamic patterns: (1) a phasic pattern following each word, peaking in temporal and parietal areas; (2) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri; and (3) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition.SIGNIFICANCE STATEMENT Starting from general linguistic concepts, we make two sets of predictions in neural signals evoked by reading multiword sentences. First, the intrinsic dimensionality of the representation should grow with additional meaningful words. Second, the neural dynamics should exhibit signatures of encoding, maintaining, and resolving semantic composition. We successfully validated these hypotheses in deep neural language models, artificial neural networks trained on text and performing very well on many natural language processing tasks. Then, using a unique combination of MEG and intracranial electrodes, we recorded high-resolution brain data from human participants while they read a controlled set of sentences. Time-resolved dimensionality analysis showed increasing dimensionality with meaning, and multivariate decoding allowed us to isolate the three dynamical patterns we had hypothesized.


Assuntos
Encéfalo , Idioma , Masculino , Humanos , Feminino , Encéfalo/fisiologia , Semântica , Linguística , Mapeamento Encefálico/métodos , Leitura , Imageamento por Ressonância Magnética/métodos
20.
BMC Bioinformatics ; 25(1): 171, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38689234

RESUMO

BACKGROUND: Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS: This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS: Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.


Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Humanos , Algoritmos , Análise por Conglomerados , Redes Neurais de Computação , RNA-Seq/métodos , Análise da Expressão Gênica de Célula Única/métodos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa