Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
bioRxiv ; 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38746298

RESUMO

The two-dimensional embedding methods t-SNE and UMAP are ubiquitously used for visualizing single-cell data. Recent theoretical research in machine learning has shown that, despite their very different formulation and implementation, t-SNE and UMAP are closely connected, and a single parameter suffices to interpolate between them. This leads to a whole spectrum of visualization methods that focus on different aspects of the data. Along the spectrum, this focus changes from representing local structures to representing continuous ones. In single-cell context, this leads to a trade-off between highlighting rare cell types or continuous variation, such as developmental trajectories. Visualizing the entire spectrum as an animation can provide a more nuanced understanding of the high-dimensional dataset than individual visualizations with either t-SNE or UMAP.

2.
bioRxiv ; 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38585748

RESUMO

A recent paper in PLOS Computational Biology (Chari and Pachter, 2023) claimed that t-SNE and UMAP embeddings of single-cell datasets fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t-SNE and UMAP embeddings of single-cell data do not represent high-dimensional distances, they can nevertheless provide biologically relevant information.

3.
bioRxiv ; 2023 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-37577688

RESUMO

Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. These methods were developed for UMI-based sequencing protocols, where unique molecular identifiers (UMIs) help to remove PCR amplification noise by keeping track of the original molecules. In contrast, full-length protocols such as Smart-seq2 lack UMIs and retain amplification noise, making negative binomial models inapplicable. Here, we extend Pearson residuals to such read count data by modeling them as a compound process: we assume that the captured RNA molecules follow the negative binomial distribution, but are replicated according to an amplification distribution. Based on this model, we introduce compound Pearson residuals and show that they can be analytically obtained without explicit knowledge of the amplification distribution. Further, we demonstrate that compound Pearson residuals lead to a biologically meaningful gene selection and low-dimensional embeddings of complex Smart-seq2 datasets. Finally, we empirically study amplification distributions across several sequencing protocols, and suggest that they can be described by a broken power law. We show that the resulting compound distribution captures overdispersion and zero-inflation patterns characteristic of read count data. In summary, compound Pearson residuals provide an efficient and effective way to normalize read count data based on simple mechanistic assumptions.

4.
Lancet ; 401(10375): 431-432, 2023 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-36774147
5.
Nat Commun ; 13(1): 6389, 2022 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-36302912

RESUMO

Neocortical feedback is critical for attention, prediction, and learning. To mechanically understand its function requires deciphering its cell-type wiring. Recent studies revealed that feedback between primary motor to primary somatosensory areas in mice is disinhibitory, targeting vasoactive intestinal peptide-expressing interneurons, in addition to pyramidal cells. It is unknown whether this circuit motif represents a general cortico-cortical feedback organizing principle. Here we show that in contrast to this wiring rule, feedback between higher-order lateromedial visual area to primary visual cortex preferentially activates somatostatin-expressing interneurons. Functionally, both feedback circuits temporally sharpen feed-forward excitation eliciting a transient increase-followed by a prolonged decrease-in pyramidal cell activity under sustained feed-forward input. However, under feed-forward transient input, the primary motor to primary somatosensory cortex feedback facilitates bursting while lateromedial area to primary visual cortex feedback increases time precision. Our findings argue for multiple cortico-cortical feedback motifs implementing different dynamic non-linear operations.


Assuntos
Interneurônios , Células Piramidais , Camundongos , Animais , Retroalimentação , Interneurônios/fisiologia , Peptídeo Intestinal Vasoativo
6.
Signif (Oxf) ; 19(2): 10-13, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35601695

RESUMO

Throughout the Covid-19 pandemic, we have become used to seeing daily numbers of cases and deaths go up and down. But in some countries, the reported numbers show very little movement over days and weeks - they are "underdispersed", says Dmitry Kobak, and this may be a sign that all is not right with the data.

7.
J Neurophysiol ; 127(4): 995-1006, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35196180

RESUMO

We investigated motor skill learning using a path tracking task, where human subjects had to track various curved paths at a constant speed while maintaining the cursor within the path width. Subjects' accuracy increased with practice, even when tracking novel untrained paths. Using a "searchlight" paradigm, where only a short segment of the path ahead of the cursor was shown, we found that subjects with a higher tracking skill differed from the novice subjects in two respects. First, they had lower movement variability, in agreement with previous findings. Second, they took a longer section of the future path into account when performing the task, i.e., had a longer planning horizon. We estimate that between one-third and one-half of the performance increase in the expert group was due to the longer planning horizon. An optimal control model with a fixed horizon (receding horizon control) that increases with tracking skill quantitatively captured the subjects' movement behavior. These findings demonstrate that human subjects not only increase their motor acuity but also their planning horizon when acquiring a motor skill.NEW & NOTEWORTHY We show that when learning a motor skill humans are using information about the environment from an increasingly longer amount of the movement path ahead to improve performance. Crucial features of the behavioral performance can be captured by modeling the behavioral data with a receding horizon optimal control model.


Assuntos
Aprendizagem , Destreza Motora , Humanos , Movimento
8.
Genome Biol ; 22(1): 258, 2021 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-34488842

RESUMO

BACKGROUND: Standard preprocessing of single-cell RNA-seq UMI data includes normalization by sequencing depth to remove this technical variability, and nonlinear transformation to stabilize the variance across genes with different expression levels. Instead, two recent papers propose to use statistical count models for these tasks: Hafemeister and Satija (Genome Biol 20:296, 2019) recommend using Pearson residuals from negative binomial regression, while Townes et al. (Genome Biol 20:295, 2019) recommend fitting a generalized PCA model. Here, we investigate the connection between these approaches theoretically and empirically, and compare their effects on downstream processing. RESULTS: We show that the model of Hafemeister and Satija produces noisy parameter estimates because it is overspecified, which is why the original paper employs post hoc smoothing. When specified more parsimoniously, it has a simple analytic solution equivalent to the rank-one Poisson GLM-PCA of Townes et al. Further, our analysis indicates that per-gene overdispersion estimates in Hafemeister and Satija are biased, and that the data are in fact consistent with the overdispersion parameter being independent of gene expression. We then use negative control data without biological variability to estimate the technical overdispersion of UMI counts, and find that across several different experimental protocols, the data are close to Poisson and suggest very moderate overdispersion. Finally, we perform a benchmark to compare the performance of Pearson residuals, variance-stabilizing transformations, and GLM-PCA on scRNA-seq datasets with known ground truth. CONCLUSIONS: We demonstrate that analytic Pearson residuals strongly outperform other methods for identifying biologically variable genes, and capture more of the biologically meaningful variation when used for dimensionality reduction.


Assuntos
Algoritmos , Bases de Dados Genéticas , RNA-Seq , Análise de Sequência de RNA , Organogênese/genética , Análise de Componente Principal , Análise de Regressão , Retina/metabolismo
9.
Elife ; 102021 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-34190045

RESUMO

Comparing the impact of the COVID-19 pandemic between countries or across time is difficult because the reported numbers of cases and deaths can be strongly affected by testing capacity and reporting policy. Excess mortality, defined as the increase in all-cause mortality relative to the expected mortality, is widely considered as a more objective indicator of the COVID-19 death toll. However, there has been no global, frequently updated repository of the all-cause mortality data across countries. To fill this gap, we have collected weekly, monthly, or quarterly all-cause mortality data from 103 countries and territories, openly available as the regularly updated World Mortality Dataset. We used this dataset to compute the excess mortality in each country during the COVID-19 pandemic. We found that in several worst-affected countries (Peru, Ecuador, Bolivia, Mexico) the excess mortality was above 50% of the expected annual mortality (Peru, Ecuador, Bolivia, Mexico) or above 400 excess deaths per 100,000 population (Peru, Bulgaria, North Macedonia, Serbia). At the same time, in several other countries (e.g. Australia and New Zealand) mortality during the pandemic was below the usual level, presumably due to social distancing measures decreasing the non-COVID infectious mortality. Furthermore, we found that while many countries have been reporting the COVID-19 deaths very accurately, some countries have been substantially underreporting their COVID-19 deaths (e.g. Nicaragua, Russia, Uzbekistan), by up to two orders of magnitude (Tajikistan). Our results highlight the importance of open and rapid all-cause mortality reporting for pandemic monitoring.


Countries around the world reported 4.2 million deaths from SARS-CoV-2 (the virus that causes COVID-19) from the beginning of pandemic until the end of July 2021, but the actual number of deaths is likely higher. While some countries may have imperfect systems for counting deaths, others may have intentionally underreported them. To get a better estimate of deaths from an event such as a pandemic, scientists often compare the total number of deaths in a country during the event to the expected number of deaths based on data from previous years. This tells them how many excess deaths occurred during the event. To provide a more accurate count of deaths caused by COVID-19, Karlinsky and Kobak built a database called the World Mortality Dataset. It includes information on deaths from all causes from 103 countries. Karlinsky and Kobak used the database to compare the number of reported COVID-19 deaths reported to the excess deaths from all causes during the pandemic. Some of the hardest hit countries, including Peru, Ecuador, Bolivia, and Mexico, experienced over 50% more deaths than expected during the pandemic. Meanwhile, other countries like Australia and New Zealand, reported fewer deaths than normal. This is likely because social distancing measures reduced deaths from infections like influenza. Many countries reported their COVID-19 deaths accurately, but Karlinsky and Kobak argue that other countries, including Nicaragua, Russia, and Uzbekistan, underreported COVID-19 deaths. Using their database, Karlinsky and Kobak estimate that, in those countries, there have been at least 1.4 times more deaths due to COVID-19 than reported ­ adding over 1 million extra deaths in total. But they note that the actual number is likely much higher because data from more than 100 countries were not available to include in the database. The World Mortality Dataset provides a more accurate picture of the number of people who died because of the COVID-19 pandemic, and it is available online and updated daily. The database may help scientists develop better mitigation strategies for this pandemic or future ones.


Assuntos
COVID-19/mortalidade , COVID-19/epidemiologia , COVID-19/prevenção & controle , Monitoramento Epidemiológico , Saúde Global , Humanos , Pandemias , Distanciamento Físico , Saúde Pública , SARS-CoV-2/isolamento & purificação
10.
Signif (Oxf) ; 18(1): 16-19, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33821160

RESUMO

Data on excess deaths in Russia in 2020 paint a much bleaker picture of the Covid-19 death toll than the official daily updated number, argues Dmitry Kobak.

11.
medRxiv ; 2021 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-33532789

RESUMO

Comparing the impact of the COVID-19 pandemic between countries or across time is difficult because the reported numbers of cases and deaths can be strongly affected by testing capacity and reporting policy. Excess mortality, defined as the increase in all-cause mortality relative to the expected mortality, is widely considered as a more objective indicator of the COVID-19 death toll. However, there has been no global, frequently-updated repository of the all-cause mortality data across countries. To fill this gap, we have collected weekly, monthly, or quarterly all-cause mortality data from 94 countries and territories, openly available as the regularly-updated World Mortality Dataset. We used this dataset to compute the excess mortality in each country during the COVID-19 pandemic. We found that in several worst-affected countries (Peru, Ecuador, Bolivia, Mexico) the excess mortality was above 50% of the expected annual mortality. At the same time, in several other countries (Australia, New Zealand) mortality during the pandemic was below the usual level, presumably due to social distancing measures decreasing the non-COVID infectious mortality. Furthermore, we found that while many countries have been reporting the COVID-19 deaths very accurately, some countries have been substantially underreporting their COVID-19 deaths (e.g. Nicaragua, Russia, Uzbekistan), sometimes by two orders of magnitude (Tajikistan). Our results highlight the importance of open and rapid all-cause mortality reporting for pandemic monitoring.

13.
J Neurosci ; 41(5): 937-946, 2021 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-33431632

RESUMO

Single-cell transcriptomic approaches are revolutionizing neuroscience. Integrating this wealth of data with morphology and physiology, for the comprehensive study of neuronal biology, requires multiplexing gene expression data with complementary techniques. To meet this need, multiple groups in parallel have developed "Patch-seq," a modification of whole-cell patch-clamp protocols that enables mRNA sequencing of cell contents after electrophysiological recordings from individual neurons and morphologic reconstruction of the same cells. In this review, we first outline the critical technical developments that enabled robust Patch-seq experimental efforts and analytical solutions to interpret the rich multimodal data generated. We then review recent applications of Patch-seq that address novel and long-standing questions in neuroscience. These include the following: (1) targeted study of specific neuronal populations based on their anatomic location, functional properties, lineage, or a combination of these factors; (2) the compilation and integration of multimodal cell type atlases; and (3) the investigation of the molecular basis of morphologic and functional diversity. Finally, we highlight potential opportunities for further technical development and lines of research that may benefit from implementing the Patch-seq technique. As a multimodal approach at the intersection of molecular neurobiology and physiology, Patch-seq is uniquely positioned to directly link gene expression to brain function.


Assuntos
Neurônios/fisiologia , Técnicas de Patch-Clamp/métodos , Análise de Célula Única/métodos , Transcriptoma/fisiologia , Animais , Células Cultivadas , Fenômenos Eletrofisiológicos/fisiologia , Previsões , Humanos , Técnicas de Patch-Clamp/tendências , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/tendências , Análise de Célula Única/tendências
14.
Nature ; 598(7879): 144-150, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33184512

RESUMO

Cortical neurons exhibit extreme diversity in gene expression as well as in morphological and electrophysiological properties1,2. Most existing neural taxonomies are based on either transcriptomic3,4 or morpho-electric5,6 criteria, as it has been technically challenging to study both aspects of neuronal diversity in the same set of cells7. Here we used Patch-seq8 to combine patch-clamp recording, biocytin staining, and single-cell RNA sequencing of more than 1,300 neurons in adult mouse primary motor cortex, providing a morpho-electric annotation of almost all transcriptomically defined neural cell types. We found that, although broad families of transcriptomic types (those expressing Vip, Pvalb, Sst and so on) had distinct and essentially non-overlapping morpho-electric phenotypes, individual transcriptomic types within the same family were not well separated in the morpho-electric space. Instead, there was a continuum of variability in morphology and electrophysiology, with neighbouring transcriptomic cell types showing similar morpho-electric features, often without clear boundaries between them. Our results suggest that neuronal types in the neocortex do not always form discrete entities. Instead, neurons form a hierarchy that consists of distinct non-overlapping branches at the level of families, but can form continuous and correlated transcriptomic and morpho-electrical landscapes within families.


Assuntos
Perfilação da Expressão Gênica , Córtex Motor/citologia , Neurônios/classificação , Neurônios/metabolismo , Transcriptoma , Animais , Atlas como Assunto , Feminino , Neurônios GABAérgicos/citologia , Neurônios GABAérgicos/metabolismo , Glutamatos/metabolismo , Lisina/análogos & derivados , Lisina/análise , Masculino , Camundongos , Córtex Motor/anatomia & histologia , Neurônios/citologia , Especificidade de Órgãos , Técnicas de Patch-Clamp , Fenótipo , Análise de Sequência de RNA , Análise de Célula Única , Coloração e Rotulagem
15.
Mach Learn Knowl Discov Databases ; 11906: 124-139, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33103160

RESUMO

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom ν, with ν → ∞ corresponding to SNE and ν = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ν < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.

16.
Neuroinformatics ; 18(4): 591-609, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32367332

RESUMO

Quantitative analysis of neuronal morphologies usually begins with choosing a particular feature representation in order to make individual morphologies amenable to standard statistics tools and machine learning algorithms. Many different feature representations have been suggested in the literature, ranging from density maps to intersection profiles, but they have never been compared side by side. Here we performed a systematic comparison of various representations, measuring how well they were able to capture the difference between known morphological cell types. For our benchmarking effort, we used several curated data sets consisting of mouse retinal bipolar cells and cortical inhibitory neurons. We found that the best performing feature representations were two-dimensional density maps, two-dimensional persistence images and morphometric statistics, which continued to perform well even when neurons were only partially traced. Combining these feature representations together led to further performance increases suggesting that they captured non-redundant information. The same representations performed well in an unsupervised setting, implying that they can be suitable for dimensionality reduction or clustering.


Assuntos
Algoritmos , Benchmarking , Interneurônios/citologia , Aprendizado de Máquina , Neuroimagem/métodos , Animais , Análise por Conglomerados , Camundongos , Neuroimagem/normas
17.
Elife ; 92020 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-32134385

RESUMO

Clones of excitatory neurons derived from a common progenitor have been proposed to serve as elementary information processing modules in the neocortex. To characterize the cell types and circuit diagram of clonally related excitatory neurons, we performed multi-cell patch clamp recordings and Patch-seq on neurons derived from Nestin-positive progenitors labeled by tamoxifen induction at embryonic day 10.5. The resulting clones are derived from two radial glia on average, span cortical layers 2-6, and are composed of a random sampling of transcriptomic cell types. We find an interaction between shared lineage and connection type: related neurons are more likely to be connected vertically across cortical layers, but not laterally within the same layer. These findings challenge the view that related neurons show uniformly increased connectivity and suggest that integration of vertical intra-clonal input with lateral inter-clonal input may represent a developmentally programmed connectivity motif supporting the emergence of functional circuits.


Assuntos
Neocórtex/citologia , Neurônios/classificação , Neurônios/fisiologia , Sinapses/fisiologia , Animais , Células Cultivadas , Camundongos
18.
Nat Commun ; 10(1): 5416, 2019 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-31780648

RESUMO

Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.


Assuntos
Visualização de Dados , Perfilação da Expressão Gênica , Análise de Célula Única , Algoritmos , Animais , Biologia Computacional , Aprendizado de Máquina , Camundongos , Análise de Componente Principal , Transcriptoma
20.
Nat Commun ; 10(1): 4174, 2019 09 13.
Artigo em Inglês | MEDLINE | ID: mdl-31519874

RESUMO

Layer 4 (L4) of mammalian neocortex plays a crucial role in cortical information processing, yet a complete census of its cell types and connectivity remains elusive. Using whole-cell recordings with morphological recovery, we identified one major excitatory and seven inhibitory types of neurons in L4 of adult mouse visual cortex (V1). Nearly all excitatory neurons were pyramidal and all somatostatin-positive (SOM+) non-fast-spiking interneurons were Martinotti cells. In contrast, in somatosensory cortex (S1), excitatory neurons were mostly stellate and SOM+ interneurons were non-Martinotti. These morphologically distinct SOM+ interneurons corresponded to different transcriptomic cell types and were differentially integrated into the local circuit with only S1 neurons receiving local excitatory input. We propose that cell type specific circuit motifs, such as the Martinotti/pyramidal and non-Martinotti/stellate pairs, are used across the cortex as building blocks to assemble cortical circuits.


Assuntos
Neocórtex/citologia , Animais , Eletrofisiologia , Feminino , Interneurônios/citologia , Interneurônios/metabolismo , Masculino , Camundongos , Neocórtex/metabolismo , Neurônios/citologia , Neurônios/metabolismo , Córtex Somatossensorial/citologia , Córtex Somatossensorial/metabolismo , Somatostatina/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA