Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i490-i500, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940151

RESUMEN

SUMMARY: Single-cell Hi-C (scHi-C) protocol helps identify cell-type-specific chromatin interactions and sheds light on cell differentiation and disease progression. Despite providing crucial insights, scHi-C data is often underutilized due to the high cost and the complexity of the experimental protocol. We present a deep learning framework, scGrapHiC, that predicts pseudo-bulk scHi-C contact maps using pseudo-bulk scRNA-seq data. Specifically, scGrapHiC performs graph deconvolution to extract genome-wide single-cell interactions from a bulk Hi-C contact map using scRNA-seq as a guiding signal. Our evaluations show that scGrapHiC, trained on seven cell-type co-assay datasets, outperforms typical sequence encoder approaches. For example, scGrapHiC achieves a substantial improvement of 23.2% in recovering cell-type-specific Topologically Associating Domains over the baselines. It also generalizes to unseen embryo and brain tissue samples. scGrapHiC is a novel method to generate cell-type-specific scHi-C contact maps using widely available genomic signals that enables the study of cell-type-specific chromatin interactions. AVAILABILITY AND IMPLEMENTATION: The GitHub link: https://github.com/rsinghlab/scGrapHiC contains the source code of scGrapHiC and associated scripts to preprocess publicly available datasets to produce the results and visualizations we have discuss in this manuscript.


Asunto(s)
Cromatina , Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Cromatina/metabolismo , Cromatina/química , Humanos
2.
Nucleic Acids Res ; 49(W1): W641-W653, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-34125906

RESUMEN

Uncovering how transcription factors regulate their targets at DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) and assign mechanisms in normal and diseased states. RNA-seq is a standard method measuring gene regulation using an established set of analysis stages. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time-series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods integrating ordered RNA-seq data with protein-DNA binding data to distinguish direct from indirect interactions are urgently needed. We present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time-series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to determine causal regulatory mechanism networks by leveraging time-series RNA-seq, motif analysis, protein-DNA binding data, and protein-protein interaction networks. TIMEOR's user-catered approach helps non-coders generate new hypotheses and validate known mechanisms. We used TIMEOR to identify a novel link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git and http://timeor.brown.edu.


Asunto(s)
Regulación de la Expresión Génica , Redes Reguladoras de Genes , RNA-Seq , Programas Informáticos , Ritmo Circadiano/genética , Genómica , Humanos , Insulina/fisiología , Internet , Mapeo de Interacción de Proteínas , Factores de Transcripción/metabolismo
3.
Bioinformatics ; 36(Suppl_2): i857-i865, 2020 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-33381828

RESUMEN

MOTIVATION: Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task's alphabet size. RESULTS: In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. AVAILABILITY AND IMPLEMENTATION: Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte , Algoritmos , Proteínas , Programas Informáticos
4.
Bioinformatics ; 34(17): i891-i900, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423076

RESUMEN

Motivation: Computational methods that predict differential gene expression from histone modification signals are highly desirable for understanding how histone modifications control the functional heterogeneity of cells through influencing differential gene regulation. Recent studies either failed to capture combinatorial effects on differential prediction or primarily only focused on cell type-specific analysis. In this paper we develop a novel attention-based deep learning architecture, DeepDiff, that provides a unified and end-to-end solution to model and to interpret how dependencies among histone modifications control the differential patterns of gene regulation. DeepDiff uses a hierarchy of multiple Long Short-Term Memory (LSTM) modules to encode the spatial structure of input signals and to model how various histone modifications cooperate automatically. We introduce and train two levels of attention jointly with the target prediction, enabling DeepDiff to attend differentially to relevant modifications and to locate important genome positions for each modification. Additionally, DeepDiff introduces a novel deep-learning based multi-task formulation to use the cell-type-specific gene expression predictions as auxiliary tasks, encouraging richer feature embeddings in our primary task of differential expression prediction. Results: Using data from Roadmap Epigenomics Project (REMC) for ten different pairs of cell types, we show that DeepDiff significantly outperforms the state-of-the-art baselines for differential gene expression prediction. The learned attention weights are validated by observations from previous studies about how epigenetic mechanisms connect to differential gene expression. Availability and implementation: Codes and results are available at deepchrome.org. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Expresión Génica , Histonas/metabolismo , Aprendizaje Automático , Código de Histonas , Humanos , Procesamiento Proteico-Postraduccional , Programas Informáticos
5.
PLoS Genet ; 11(2): e1005001, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25658338

RESUMEN

Genes or their encoded products are not expected to mingle with each other unless in some disease situations. In cancer, a frequent mechanism that can produce gene fusions is chromosomal rearrangement. However, recent discoveries of RNA trans-splicing and cis-splicing between adjacent genes (cis-SAGe) support for other mechanisms in generating fusion RNAs. In our transcriptome analyses of 28 prostate normal and cancer samples, 30% fusion RNAs on average are the transcripts that contain exons belonging to same-strand neighboring genes. These fusion RNAs may be the products of cis-SAGe, which was previously thought to be rare. To validate this finding and to better understand the phenomenon, we used LNCaP, a prostate cell line as a model, and identified 16 additional cis-SAGe events by silencing transcription factor CTCF and paired-end RNA sequencing. About half of the fusions are expressed at a significant level compared to their parental genes. Silencing one of the in-frame fusions resulted in reduced cell motility. Most out-of-frame fusions are likely to function as non-coding RNAs. The majority of the 16 fusions are also detected in other prostate cell lines, as well as in the 14 clinical prostate normal and cancer pairs. By studying the features associated with these fusions, we developed a set of rules: 1) the parental genes are same-strand-neighboring genes; 2) the distance between the genes is within 30kb; 3) the 5' genes are actively transcribing; and 4) the chimeras tend to have the second-to-last exon in the 5' genes joined to the second exon in the 3' genes. We then randomly selected 20 neighboring genes in the genome, and detected four fusion events using these rules in prostate cancer and non-cancerous cells. These results suggest that splicing between neighboring gene transcripts is a rather frequent phenomenon, and it is not a feature unique to cancer cells.


Asunto(s)
Perfilación de la Expresión Génica , Fusión Génica , Neoplasias de la Próstata/genética , Proteínas Represoras/genética , Secuencia de Bases , Factor de Unión a CCCTC , Fusión Celular , Línea Celular Tumoral , Exones , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Humanos , Masculino , Neoplasias de la Próstata/patología , Empalme del ARN/genética , Análisis de Secuencia de ARN
6.
Bioinformatics ; 32(17): i639-i648, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587684

RESUMEN

MOTIVATION: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effects in gene regulation. This knowledge can help in developing 'epigenetic drugs' for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis. This paper develops a unified discriminative framework using a deep convolutional neural network to classify gene expression using histone modification data as input. Our system, called DeepChrome, allows automatic extraction of complex interactions among important features. To simultaneously visualize the combinatorial interactions among histone modifications, we propose a novel optimization-based technique that generates feature pattern maps from the learnt deep model. This provides an intuitive description of underlying epigenetic mechanisms that regulate genes. RESULTS: We show that DeepChrome outperforms state-of-the-art models like Support Vector Machines and Random Forests for gene expression classification task on 56 different cell-types from REMC database. The output of our visualization technique not only validates the previous observations but also allows novel insights about combinatorial interactions among histone modification marks, some of which have recently been observed by experimental studies. AVAILABILITY AND IMPLEMENTATION: Codes and results are available at www.deepchrome.org CONTACT: yanjun@virginia.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Regulación de la Expresión Génica , Código de Histonas , Máquina de Vectores de Soporte , Análisis por Conglomerados , Biología Computacional , Epigénesis Genética , Redes Reguladoras de Genes , Humanos , Redes Neurales de la Computación
7.
Nucleic Acids Res ; 43(18): e118, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26032770

RESUMEN

The CRISPR system has become a powerful biological tool with a wide range of applications. However, improving targeting specificity and accurately predicting potential off-targets remains a significant goal. Here, we introduce a web-based CR: ISPR/Cas9 O: ff-target P: rediction and I: dentification T: ool (CROP-IT) that performs improved off-target binding and cleavage site predictions. Unlike existing prediction programs that solely use DNA sequence information; CROP-IT integrates whole genome level biological information from existing Cas9 binding and cleavage data sets. Utilizing whole-genome chromatin state information from 125 human cell types further enhances its computational prediction power. Comparative analyses on experimentally validated datasets show that CROP-IT outperforms existing computational algorithms in predicting both Cas9 binding as well as cleavage sites. With a user-friendly web-interface, CROP-IT outputs scored and ranked list of potential off-targets that enables improved guide RNA design and more accurate prediction of Cas9 binding or cleavage sites.


Asunto(s)
Proteínas Asociadas a CRISPR/metabolismo , Sistemas CRISPR-Cas , Cromatina/metabolismo , Desoxirribonucleasas/metabolismo , Programas Informáticos , Algoritmos , Sitios de Unión , División del ADN , Humanos , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN/métodos
8.
JMIR Med Educ ; 10: e51391, 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38349725

RESUMEN

BACKGROUND: Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains. OBJECTIVE: This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance. METHODS: We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks. RESULTS: Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5% and 13%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28% and 11%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93%, whereas ChatGPT-3.5 and MedAlpaca scored 73% and 47%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs. CONCLUSIONS: Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model's characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes.


Asunto(s)
Aprendizaje , Solución de Problemas , Humanos , Reproducibilidad de los Resultados , Escolaridad , Lenguaje
9.
JMIR Med Inform ; 12: e50209, 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38896468

RESUMEN

BACKGROUND: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. OBJECTIVE: This study aims to develop an information retrieval (IR)-based framework that accommodates data sparsity to facilitate broader diagnostic decision support. METHODS: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR's performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. RESULTS: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. CONCLUSIONS: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases.

10.
iScience ; 27(5): 109570, 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38646172

RESUMEN

The three-dimensional organization of genomes plays a crucial role in essential biological processes. The segregation of chromatin into A and B compartments highlights regions of activity and inactivity, providing a window into the genomic activities specific to each cell type. Yet, the steep costs associated with acquiring Hi-C data, necessary for studying this compartmentalization across various cell types, pose a significant barrier in studying cell type specific genome organization. To address this, we present a prediction tool called compartment prediction using recurrent neural networks (CoRNN), which predicts compartmentalization of 3D genome using histone modification enrichment. CoRNN demonstrates robust cross-cell-type prediction of A/B compartments with an average AuROC of 90.9%. Cell-type-specific predictions align well with known functional elements, with H3K27ac and H3K36me3 identified as highly predictive histone marks. We further investigate our mispredictions and found that they are located in regions with ambiguous compartmental status. Furthermore, our model's generalizability is validated by predicting compartments in independent tissue samples, which underscores its broad applicability.

12.
bioRxiv ; 2023 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-36747724

RESUMEN

With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses.

13.
Microscopy (Oxf) ; 2023 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-37864808

RESUMEN

We present a graph neural network (GNN)-based framework applied to large-scale microscopy image segmentation tasks. While deep learning models, like convolutional neural networks (CNNs), have become common for automating image segmentation tasks, they are limited by the image size that can fit in the memory of computational hardware. In a GNN framework, large-scale images are converted into graphs using superpixels (regions of pixels with similar color/intensity values), allowing us to input information from the entire image into the model. By converting images with hundreds of millions of pixels to graphs with thousands of nodes, we can segment large images using memory-limited computational resources. We compare the performance of GNN- and CNN-based segmentation in terms of accuracy, training time and required graphics processing unit memory. Based on our experiments with microscopy images of biological cells and cell colonies, GNN-based segmentation used one to three orders-of-magnitude fewer computational resources with only a change in accuracy of $-2\;%$ to $+0.3\;%$. Furthermore, errors due to superpixel generation can be reduced by either using better superpixel generation algorithms or increasing the number of superpixels, thereby allowing for improvement in the GNN framework's accuracy. This trade-off between accuracy and computational cost over CNN models makes the GNN framework attractive for many large-scale microscopy image segmentation tasks in biology.

14.
Genes (Basel) ; 15(1)2023 12 29.
Artículo en Inglés | MEDLINE | ID: mdl-38254945

RESUMEN

Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.


Asunto(s)
Aprendizaje Profundo , Benchmarking , Línea Celular , Cromatina/genética
15.
Med Phys ; 50(8): 4943-4959, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-36847185

RESUMEN

PURPOSE: State-of-the-art automated segmentation methods achieve exceptionally high performance on the Brain Tumor Segmentation (BraTS) challenge, a dataset of uniformly processed and standardized magnetic resonance generated images (MRIs) of gliomas. However, a reasonable concern is that these models may not fare well on clinical MRIs that do not belong to the specially curated BraTS dataset. Research using the previous generation of deep learning models indicates significant performance loss on cross-institutional predictions. Here, we evaluate the cross-institutional applicability and generalzsability of state-of-the-art deep learning models on new clinical data. METHODS: We train a state-of-the-art 3D U-Net model on the conventional BraTS dataset comprising low- and high-grade gliomas. We then evaluate the performance of this model for automatic tumor segmentation of brain tumors on in-house clinical data. This dataset contains MRIs of different tumor types, resolutions, and standardization than those found in the BraTS dataset. Ground truth segmentations to validate the automated segmentation for in-house clinical data were obtained from expert radiation oncologists. RESULTS: We report average Dice scores of 0.764, 0.648, and 0.61 for the whole tumor, tumor core, and enhancing tumor, respectively, in the clinical MRIs. These means are higher than numbers reported previously on same institution and cross-institution datasets of different origin using different methods. There is no statistically significant difference when comparing the dice scores to the inter-annotation variability between two expert clinical radiation oncologists. Although performance on the clinical data is lower than on the BraTS data, these numbers indicate that models trained on the BraTS dataset have impressive segmentation performance on previously unseen images obtained at a separate clinical institution. These images differ in the imaging resolutions, standardization pipelines, and tumor types from the BraTS data. CONCLUSIONS: State-of-the-art deep learning models demonstrate promising performance on cross-institutional predictions. They considerably improve on previous models and can transfer knowledge to new types of brain tumors without additional modeling.


Asunto(s)
Neoplasias Encefálicas , Glioma , Humanos , Neoplasias Encefálicas/diagnóstico por imagen , Glioma/diagnóstico por imagen , Instituciones de Salud
16.
Cell Rep ; 42(12): 113500, 2023 12 26.
Artículo en Inglés | MEDLINE | ID: mdl-38032797

RESUMEN

Aging is a major risk factor for many diseases. Accurate methods for predicting age in specific cell types are essential to understand the heterogeneity of aging and to assess rejuvenation strategies. However, classifying organismal age at single-cell resolution using transcriptomics is challenging due to sparsity and noise. Here, we developed CellBiAge, a robust and easy-to-implement machine learning pipeline, to classify the age of single cells in the mouse brain using single-cell transcriptomics. We show that binarization of gene expression values for the top highly variable genes significantly improved test performance across different models, techniques, sexes, and brain regions, with potential age-related genes identified for model prediction. Additionally, we demonstrate CellBiAge's ability to capture exercise-induced rejuvenation in neural stem cells. This study provides a broadly applicable approach for robust classification of organismal age of single cells in the mouse brain, which may aid in understanding the aging process and evaluating rejuvenation methods.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Animales , Ratones , Análisis de la Célula Individual/métodos , Aprendizaje Automático , Senescencia Celular , Envejecimiento
17.
Cancer Res ; 83(12): 1984-1999, 2023 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-37101376

RESUMEN

Chitinase 3-like 1 (Chi3l1) is a secreted protein that is highly expressed in glioblastoma. Here, we show that Chi3l1 alters the state of glioma stem cells (GSC) to support tumor growth. Exposure of patient-derived GSCs to Chi3l1 reduced the frequency of CD133+SOX2+ cells and increased the CD44+Chi3l1+ cells. Chi3l1 bound to CD44 and induced phosphorylation and nuclear translocation of ß-catenin, Akt, and STAT3. Single-cell RNA sequencing and RNA velocity following incubation of GSCs with Chi3l1 showed significant changes in GSC state dynamics driving GSCs towards a mesenchymal expression profile and reducing transition probabilities towards terminal cellular states. ATAC-seq revealed that Chi3l1 increases accessibility of promoters containing a Myc-associated zinc finger protein (MAZ) transcription factor footprint. Inhibition of MAZ downregulated a set of genes with high expression in cellular clusters that exhibit significant cell state transitions after treatment with Chi3l1, and MAZ deficiency rescued the Chi3L-induced increase of GSC self-renewal. Finally, targeting Chi3l1 in vivo with a blocking antibody inhibited tumor growth and increased the probability of survival. Overall, this work suggests that Chi3l1 interacts with CD44 on the surface of GSCs to induce Akt/ß-catenin signaling and MAZ transcriptional activity, which in turn upregulates CD44 expression in a pro-mesenchymal feed-forward loop. The role of Chi3l1 in regulating cellular plasticity confers a targetable vulnerability to glioblastoma. SIGNIFICANCE: Chi3l1 is a modulator of glioma stem cell states that can be targeted to promote differentiation and suppress growth of glioblastoma.


Asunto(s)
Neoplasias Encefálicas , Glioblastoma , Glioma , Humanos , Glioblastoma/patología , beta Catenina/metabolismo , Proteínas Proto-Oncogénicas c-akt/metabolismo , Células Madre Neoplásicas/patología , Glioma/metabolismo , Neoplasias Encefálicas/patología , Línea Celular Tumoral , Proliferación Celular
18.
J Am Med Inform Assoc ; 29(12): 2014-2022, 2022 11 14.
Artículo en Inglés | MEDLINE | ID: mdl-36149257

RESUMEN

OBJECTIVE: Alzheimer's disease (AD) is the most common neurodegenerative disorder with one of the most complex pathogeneses, making effective and clinically actionable decision support difficult. The objective of this study was to develop a novel multimodal deep learning framework to aid medical professionals in AD diagnosis. MATERIALS AND METHODS: We present a Multimodal Alzheimer's Disease Diagnosis framework (MADDi) to accurately detect the presence of AD and mild cognitive impairment (MCI) from imaging, genetic, and clinical data. MADDi is novel in that we use cross-modal attention, which captures interactions between modalities-a method not previously explored in this domain. We perform multi-class classification, a challenging task considering the strong similarities between MCI and AD. We compare with previous state-of-the-art models, evaluate the importance of attention, and examine the contribution of each modality to the model's performance. RESULTS: MADDi classifies MCI, AD, and controls with 96.88% accuracy on a held-out test set. When examining the contribution of different attention schemes, we found that the combination of cross-modal attention with self-attention performed the best, and no attention layers in the model performed the worst, with a 7.9% difference in F1-scores. DISCUSSION: Our experiments underlined the importance of structured clinical data to help machine learning models contextualize and interpret the remaining modalities. Extensive ablation studies showed that any multimodal mixture of input features without access to structured clinical information suffered marked performance losses. CONCLUSION: This study demonstrates the merit of combining multiple input modalities via cross-modal attention to deliver highly accurate AD diagnostic decision support.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Aprendizaje Profundo , Humanos , Enfermedad de Alzheimer/diagnóstico , Imagen por Resonancia Magnética/métodos , Disfunción Cognitiva/diagnóstico , Aprendizaje Automático
19.
J Comput Biol ; 29(1): 3-18, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35050714

RESUMEN

Recent advances in sequencing technologies have allowed us to capture various aspects of the genome at single-cell resolution. However, with the exception of a few of co-assaying technologies, it is not possible to simultaneously apply different sequencing assays on the same single cell. In this scenario, computational integration of multi-omic measurements is crucial to enable joint analyses. This integration task is particularly challenging due to the lack of sample-wise or feature-wise correspondences. We present single-cell alignment with optimal transport (SCOT), an unsupervised algorithm that uses the Gromov-Wasserstein optimal transport to align single-cell multi-omics data sets. SCOT performs on par with the current state-of-the-art unsupervised alignment methods, is faster, and requires tuning of fewer hyperparameters. More importantly, SCOT uses a self-tuning heuristic to guide hyperparameter selection based on the Gromov-Wasserstein distance. Thus, in the fully unsupervised setting, SCOT aligns single-cell data sets better than the existing methods without requiring any orthogonal correspondence information.


Asunto(s)
Algoritmos , Genómica/estadística & datos numéricos , Alineación de Secuencia/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Humanos , Modelos Estadísticos , Aprendizaje Automático no Supervisado
20.
J Comput Biol ; 29(1): 19-22, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34985990

RESUMEN

Although the availability of various sequencing technologies allows us to capture different genome properties at single-cell resolution, with the exception of a few co-assaying technologies, applying different sequencing assays on the same single cell is impossible. Single-cell alignment using optimal transport (SCOT) is an unsupervised algorithm that addresses this limitation by using optimal transport to align single-cell multiomics data. First, it preserves the local geometry by constructing a k-nearest neighbor (k-NN) graph for each data set (or domain) to capture the intra-domain distances. SCOT then finds a probabilistic coupling matrix that minimizes the discrepancy between the intra-domain distance matrices. Finally, it uses the coupling matrix to project one single-cell data set onto another through barycentric projection, thus aligning them. SCOT requires tuning only two hyperparameters and is robust to the choice of one. Furthermore, the Gromov-Wasserstein distance in the algorithm can guide SCOT's hyperparameter tuning in a fully unsupervised setting when no orthogonal alignment information is available. Thus, SCOT is a fast and accurate alignment method that provides a heuristic for hyperparameter selection in a real-world unsupervised single-cell data alignment scenario. We provide a tutorial for SCOT and make its source code publicly available on GitHub.


Asunto(s)
Algoritmos , Alineación de Secuencia/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Genómica/estadística & datos numéricos , Heurística , Humanos , Redes Neurales de la Computación , Análisis de Secuencia/estadística & datos numéricos , Programas Informáticos , Aprendizaje Automático no Supervisado
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA