Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 118(22)2021 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-34001664

RESUMO

Comprehensive and accurate comparisons of transcriptomic distributions of cells from samples taken from two different biological states, such as healthy versus diseased individuals, are an emerging challenge in single-cell RNA sequencing (scRNA-seq) analysis. Current methods for detecting differentially abundant (DA) subpopulations between samples rely heavily on initial clustering of all cells in both samples. Often, this clustering step is inadequate since the DA subpopulations may not align with a clear cluster structure, and important differences between the two biological states can be missed. Here, we introduce DA-seq, a targeted approach for identifying DA subpopulations not restricted to clusters. DA-seq is a multiscale method that quantifies a local DA measure for each cell, which is computed from its k nearest neighboring cells across a range of k values. Based on this measure, DA-seq delineates contiguous significant DA subpopulations in the transcriptomic space. We apply DA-seq to several scRNA-seq datasets and highlight its improved ability to detect differences between distinct phenotypes in severe versus mildly ill COVID-19 patients, melanomas subjected to immune checkpoint therapy comparing responders to nonresponders, embryonic development at two time points, and young versus aging brain tissue. DA-seq enabled us to detect differences between these phenotypes. Importantly, we find that DA-seq not only recovers the DA cell types as discovered in the original studies but also reveals additional DA subpopulations that were not described before. Analysis of these subpopulations yields biological insights that would otherwise be undetected using conventional computational approaches.


Assuntos
Envelhecimento/genética , COVID-19/genética , Linhagem da Célula/genética , Melanoma/genética , RNA Citoplasmático Pequeno/genética , Neoplasias Cutâneas/genética , Envelhecimento/metabolismo , Linfócitos B/imunologia , Linfócitos B/virologia , Encéfalo/citologia , Encéfalo/metabolismo , COVID-19/imunologia , COVID-19/patologia , COVID-19/virologia , Linhagem da Célula/imunologia , Citocinas/genética , Citocinas/imunologia , Conjuntos de Dados como Assunto , Células Dendríticas/imunologia , Células Dendríticas/virologia , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Melanoma/imunologia , Melanoma/patologia , Monócitos/imunologia , Monócitos/virologia , Fenótipo , RNA Citoplasmático Pequeno/imunologia , SARS-CoV-2/patogenicidade , Índice de Gravidade de Doença , Análise de Célula Única/métodos , Neoplasias Cutâneas/imunologia , Neoplasias Cutâneas/patologia , Linfócitos T/imunologia , Linfócitos T/virologia , Transcriptoma
2.
Nucleic Acids Res ; 49(4): e21, 2021 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-33330933

RESUMO

Following antigenic challenge, activated B cells rapidly expand and undergo somatic hypermutation, yielding groups of clonally related B cells with diversified immunoglobulin receptors. Inference of clonal relationships based on the receptor sequence is an essential step in many adaptive immune receptor repertoire sequencing studies. These relationships are typically identified by a multi-step process that involves: (i) grouping sequences based on shared V and J gene assignments, and junction lengths and (ii) clustering these sequences using a junction-based distance. However, this approach is sensitive to the initial gene assignments, which are error-prone, and fails to identify clonal relatives whose junction length has changed through accumulation of indels. Through defining a translation-invariant feature space in which we cluster the sequences, we develop an alignment free clonal identification method that does not require gene assignments and is not restricted to a fixed junction length. This alignment free approach has higher sensitivity compared to a typical junction-based distance method without loss of specificity and PPV. While the alignment free procedure identifies clones that are broadly consistent with the junction-based distance method, it also identifies clones with characteristics (multiple V or J gene assignments or junction lengths) that are not detectable with the junction-based distance method.


Assuntos
Genes de Imunoglobulinas , Análise de Sequência de DNA/métodos , Células Clonais , Éxons VDJ
3.
Proc Natl Acad Sci U S A ; 117(49): 30918-30927, 2020 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-33229581

RESUMO

We propose a local conformal autoencoder (LOCA) for standardized data coordinates. LOCA is a deep learning-based method for obtaining standardized data coordinates from scientific measurements. Data observations are modeled as samples from an unknown, nonlinear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized, latent variables. We assume a repeated measurement sampling strategy, common in scientific measurements, and present a method for learning an embedding in [Formula: see text] that is isometric to the latent variables of the manifold. The coordinates recovered by our method are invariant to diffeomorphisms of the manifold, making it possible to match between different instrumental observations of the same phenomenon. Our embedding is obtained using LOCA, which is an algorithm that learns to rectify deformations by using a local z-scoring procedure, while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA in various model settings and observe that it exhibits promising interpolation and extrapolation capabilities, superior to the current state of the art. Finally, we demonstrate LOCA's efficacy in single-site Wi-Fi localization data and for the reconstruction of three-dimensional curved surfaces from two-dimensional projections.


Assuntos
Algoritmos , Análise de Dados , Padrões de Referência
4.
Chaos ; 31(4): 043118, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34251227

RESUMO

A low-dimensional dynamical system is observed in an experiment as a high-dimensional signal, for example, a video of a chaotic pendulums system. Assuming that we know the dynamical model up to some unknown parameters, can we estimate the underlying system's parameters by measuring its time-evolution only once? The key information for performing this estimation lies in the temporal inter-dependencies between the signal and the model. We propose a kernel-based score to compare these dependencies. Our score generalizes a maximum likelihood estimator for a linear model to a general nonlinear setting in an unknown feature space. We estimate the system's underlying parameters by maximizing the proposed score. We demonstrate the accuracy and efficiency of the method using two chaotic dynamical systems-the double pendulum and the Lorenz '63 model.

5.
Neuroinformatics ; 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38976152

RESUMO

The brain is an intricate system that controls a variety of functions. It consists of a vast number of cells that exhibit diverse characteristics. To understand brain function in health and disease, it is crucial to classify neurons accurately. Recent advancements in machine learning have provided a way to classify neurons based on their electrophysiological activity. This paper presents a deep-learning framework that classifies neurons solely on this basis. The framework uses data from the Allen Cell Types database, which contains a survey of biological features derived from single-cell recordings from mice and humans. The shared information from both sources is used to classify neurons into their broad types with the help of a joint model. An accurate domain-adaptive model, integrating electrophysiological data from both mice and humans, is implemented. Furthermore, data from mouse neurons, which also includes labels of transgenic mouse lines, is further classified into subtypes using an interpretable neural network model. The framework provides state-of-the-art results in terms of accuracy and precision while also providing explanations for the predictions.

6.
Neural Netw ; 152: 34-43, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35500458

RESUMO

Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features rather than on the complete feature set. To do this, we propose a fully differentiable approach for unsupervised feature selection, utilizing the Laplacian score criterion to avoid the selection of nuisance features. We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features. Building on the recently proposed concrete layer that allows controlling for the number of selected features via architectural design, simplifying the optimization process. Experimenting on several real-world datasets, we demonstrate that our proposed approach outperforms similar approaches designed to avoid only correlated or nuisance features, but not both. Several state-of-the-art clustering results are reported. Our code is publically available at https://github.com/jsvir/lscae.


Assuntos
Análise por Conglomerados
7.
Arch Pathol Lab Med ; 146(2): 182-193, 2022 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-34086849

RESUMO

CONTEXT.­: Large cell transformation (LCT) of indolent B-cell lymphomas, such as follicular lymphoma (FL) and chronic lymphocytic leukemia (CLL), signals a worse prognosis, at which point aggressive chemotherapy is initiated. Although LCT is relatively straightforward to diagnose in lymph nodes, a marrow biopsy is often obtained first given its ease of procedure, low cost, and low morbidity. However, consensus criteria for LCT in bone marrow have not been established. OBJECTIVE.­: To study the accuracy and reproducibility of a trained convolutional neural network in identifying LCT, in light of promising machine learning tools that may introduce greater objectivity to morphologic analysis. DESIGN.­: We retrospectively identified patients who had a diagnosis of FL or CLL who had undergone bone marrow biopsy for the clinical question of LCT. We scored morphologic criteria and correlated results with clinical disease progression. In addition, whole slide scans were annotated into patches to train convolutional neural networks to discriminate between small and large tumor cells and to predict the patient's probability of transformation. RESULTS.­: Using morphologic examination, the proportion of large lymphoma cells (≥10% in FL and ≥30% in CLL), chromatin pattern, distinct nucleoli, and proliferation index were significantly correlated with LCT in FL and CLL. Compared to pathologist-derived estimates, machine-generated quantification demonstrated better reproducibility and stronger correlation with final outcome data. CONCLUSIONS.­: These histologic findings may serve as indications of LCT in bone marrow biopsies. The pathologist-augmented with machine system appeared to be the most predictive, arguing for greater efforts to validate and implement these tools to further enhance physician practice.


Assuntos
Aprendizado Profundo , Leucemia Linfocítica Crônica de Células B , Linfoma Folicular , Biópsia , Medula Óssea/patologia , Humanos , Leucemia Linfocítica Crônica de Células B/diagnóstico , Leucemia Linfocítica Crônica de Células B/patologia , Linfoma Folicular/diagnóstico , Linfoma Folicular/patologia , Aprendizado de Máquina , Reprodutibilidade dos Testes , Estudos Retrospectivos
8.
JCI Insight ; 7(13)2022 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-35801589

RESUMO

People with HIV (PWH) on antiretroviral therapy (ART) experience elevated rates of neurological impairment, despite controlling for demographic factors and comorbidities, suggesting viral or neuroimmune etiologies for these deficits. Here, we apply multimodal and cross-compartmental single-cell analyses of paired cerebrospinal fluid (CSF) and peripheral blood in PWH and uninfected controls. We demonstrate that a subset of central memory CD4+ T cells in the CSF produced HIV-1 RNA, despite apparent systemic viral suppression, and that HIV-1-infected cells were more frequently found in the CSF than in the blood. Using cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), we show that the cell surface marker CD204 is a reliable marker for rare microglia-like cells in the CSF, which have been implicated in HIV neuropathogenesis, but which we did not find to contain HIV transcripts. Through a feature selection method for supervised deep learning of single-cell transcriptomes, we find that abnormal CD8+ T cell activation, rather than CD4+ T cell abnormalities, predominated in the CSF of PWH compared with controls. Overall, these findings suggest ongoing CNS viral persistence and compartmentalized CNS neuroimmune effects of HIV infection during ART and demonstrate the power of single-cell studies of CSF to better understand the CNS reservoir during HIV infection.


Assuntos
Infecções por HIV , HIV-1 , Infecções por HIV/tratamento farmacológico , Infecções por HIV/patologia , HIV-1/genética , Humanos , Estudos Longitudinais , Microglia/patologia , Transcrição Viral
9.
Data Min Knowl Discov ; 34(6): 1676-1712, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32837252

RESUMO

Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel's scale parameter, also referred to as the kernel's bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification and manifold learning. For manifold learning, we seek a scale which is best at capturing the manifold's intrinsic dimension. For classification, we propose three methods for estimating the scale, which optimize the classification results in different senses. The proposed frameworks are simulated on artificial and on real datasets. The results show a high correlation between optimal classification rates and the estimated scales. Finally, we demonstrate the approach on a seismic event classification task.

10.
Artigo em Inglês | MEDLINE | ID: mdl-34504892

RESUMO

Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA