Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25.016
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 185(10): 1793-1805.e17, 2022 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-35483372

RESUMEN

The lack of tools to observe drug-target interactions at cellular resolution in intact tissue has been a major barrier to understanding in vivo drug actions. Here, we develop clearing-assisted tissue click chemistry (CATCH) to optically image covalent drug targets in intact mammalian tissues. CATCH permits specific and robust in situ fluorescence imaging of target-bound drug molecules at subcellular resolution and enables the identification of target cell types. Using well-established inhibitors of endocannabinoid hydrolases and monoamine oxidases, direct or competitive CATCH not only reveals distinct anatomical distributions and predominant cell targets of different drug compounds in the mouse brain but also uncovers unexpected differences in drug engagement across and within brain regions, reflecting rare cell types, as well as dose-dependent target shifts across tissue, cellular, and subcellular compartments that are not accessible by conventional methods. CATCH represents a valuable platform for visualizing in vivo interactions of small molecules in tissue.


Asunto(s)
Química Clic , Imagen Óptica , Animales , Encéfalo , Sistemas de Liberación de Medicamentos , Mamíferos , Ratones , Imagen Óptica/métodos
2.
Cell ; 173(4): 864-878.e29, 2018 05 03.
Artículo en Inglés | MEDLINE | ID: mdl-29681454

RESUMEN

Diversity in the genetic lesions that cause cancer is extreme. In consequence, a pressing challenge is the development of drugs that target patient-specific disease mechanisms. To address this challenge, we employed a chemistry-first discovery paradigm for de novo identification of druggable targets linked to robust patient selection hypotheses. In particular, a 200,000 compound diversity-oriented chemical library was profiled across a heavily annotated test-bed of >100 cellular models representative of the diverse and characteristic somatic lesions for lung cancer. This approach led to the delineation of 171 chemical-genetic associations, shedding light on the targetability of mechanistic vulnerabilities corresponding to a range of oncogenotypes present in patient populations lacking effective therapy. Chemically addressable addictions to ciliogenesis in TTC21B mutants and GLUT8-dependent serine biosynthesis in KRAS/KEAP1 double mutants are prominent examples. These observations indicate a wealth of actionable opportunities within the complex molecular etiology of cancer.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/patología , Proliferación Celular/efectos de los fármacos , Neoplasias Pulmonares/patología , Bibliotecas de Moléculas Pequeñas/farmacología , Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Línea Celular Tumoral , Familia 4 del Citocromo P450/deficiencia , Familia 4 del Citocromo P450/genética , Descubrimiento de Drogas , Puntos de Control de la Fase G1 del Ciclo Celular/efectos de los fármacos , Glucocorticoides/farmacología , Proteínas Facilitadoras del Transporte de la Glucosa/antagonistas & inhibidores , Proteínas Facilitadoras del Transporte de la Glucosa/genética , Proteínas Facilitadoras del Transporte de la Glucosa/metabolismo , Humanos , Proteína 1 Asociada A ECH Tipo Kelch/genética , Proteína 1 Asociada A ECH Tipo Kelch/metabolismo , Neoplasias Pulmonares/metabolismo , Proteínas Asociadas a Microtúbulos/genética , Proteínas Asociadas a Microtúbulos/metabolismo , Mutación , Factor 2 Relacionado con NF-E2/antagonistas & inhibidores , Factor 2 Relacionado con NF-E2/genética , Factor 2 Relacionado con NF-E2/metabolismo , Proteínas Proto-Oncogénicas p21(ras)/genética , Proteínas Proto-Oncogénicas p21(ras)/metabolismo , Interferencia de ARN , ARN Interferente Pequeño/metabolismo , Receptor Notch2/genética , Receptor Notch2/metabolismo , Receptores de Glucocorticoides/antagonistas & inhibidores , Receptores de Glucocorticoides/genética , Receptores de Glucocorticoides/metabolismo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/metabolismo
3.
Cell ; 172(3): 549-563.e16, 2018 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-29275860

RESUMEN

The immune system can mount T cell responses against tumors; however, the antigen specificities of tumor-infiltrating lymphocytes (TILs) are not well understood. We used yeast-display libraries of peptide-human leukocyte antigen (pHLA) to screen for antigens of "orphan" T cell receptors (TCRs) expressed on TILs from human colorectal adenocarcinoma. Four TIL-derived TCRs exhibited strong selection for peptides presented in a highly diverse pHLA-A∗02:01 library. Three of the TIL TCRs were specific for non-mutated self-antigens, two of which were present in separate patient tumors, and shared specificity for a non-mutated self-antigen derived from U2AF2. These results show that the exposed recognition surface of MHC-bound peptides accessible to the TCR contains sufficient structural information to enable the reconstruction of sequences of peptide targets for pathogenic TCRs of unknown specificity. This finding underscores the surprising specificity of TCRs for their cognate antigens and enables the facile indentification of tumor antigens through unbiased screening.


Asunto(s)
Adenocarcinoma/inmunología , Antígenos de Neoplasias/inmunología , Neoplasias Colorrectales/inmunología , Linfocitos Infiltrantes de Tumor/inmunología , Receptores de Antígenos de Linfocitos T/inmunología , Anciano , Animales , Antígenos de Neoplasias/química , Línea Celular Tumoral , Células Cultivadas , Células HEK293 , Antígenos HLA-A/química , Antígenos HLA-A/inmunología , Humanos , Masculino , Persona de Mediana Edad , Biblioteca de Péptidos , Células Sf9 , Spodoptera
4.
Mol Cell ; 83(22): 4106-4122.e10, 2023 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-37977120

RESUMEN

γ-Secretases mediate the regulated intramembrane proteolysis (RIP) of more than 150 integral membrane proteins. We developed an unbiased γ-secretase substrate identification (G-SECSI) method to study to what extent these proteins are processed in parallel. We demonstrate here parallel processing of at least 85 membrane proteins in human microglia in steady-state cell culture conditions. Pharmacological inhibition of γ-secretase caused substantial changes of human microglial transcriptomes, including the expression of genes related to the disease-associated microglia (DAM) response described in Alzheimer disease (AD). While the overall effects of γ-secretase deficiency on transcriptomic cell states remained limited in control conditions, exposure of mouse microglia to AD-inducing amyloid plaques strongly blocked their capacity to mount this putatively protective DAM cell state. We conclude that γ-secretase serves as a critical signaling hub integrating the effects of multiple extracellular stimuli into the overall transcriptome of the cell.


Asunto(s)
Enfermedad de Alzheimer , Secretasas de la Proteína Precursora del Amiloide , Ratones , Animales , Humanos , Secretasas de la Proteína Precursora del Amiloide/genética , Secretasas de la Proteína Precursora del Amiloide/metabolismo , Proteoma/genética , Transducción de Señal , Proteínas de la Membrana/metabolismo , Enfermedad de Alzheimer/genética
5.
Trends Biochem Sci ; 49(3): 224-235, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38160064

RESUMEN

At its most fundamental level, life is a collection of synchronized cellular processes driven by interactions among biomolecules. Proximity labeling has emerged as a powerful technique to capture these interactions in native settings, revealing previously unexplored elements of biology. This review highlights recent developments in proximity labeling, focusing on methods that push the fundamental technologies beyond the classic bait-prey paradigm, such as RNA-protein interactions, ligand/small-molecule-protein interactions, cell surface protein interactions, and subcellular protein trafficking. The advancement of proximity labeling methods to address different biological problems will accelerate our understanding of the complex biological systems that make up life.


Asunto(s)
Proteínas de la Membrana , Proteómica , Proteómica/métodos , Proteínas de la Membrana/metabolismo
6.
Mol Cell ; 79(1): 191-198.e3, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32619469

RESUMEN

We recently used CRISPRi/a-based chemical-genetic screens and cell biological, biochemical, and structural assays to determine that rigosertib, an anti-cancer agent in phase III clinical trials, kills cancer cells by destabilizing microtubules. Reddy and co-workers (Baker et al., 2020, this issue of Molecular Cell) suggest that a contaminating degradation product in commercial formulations of rigosertib is responsible for the microtubule-destabilizing activity. Here, we demonstrate that cells treated with pharmaceutical-grade rigosertib (>99.9% purity) or commercially obtained rigosertib have qualitatively indistinguishable phenotypes across multiple assays. The two formulations have indistinguishable chemical-genetic interactions with genes that modulate microtubule stability, both destabilize microtubules in cells and in vitro, and expression of a rationally designed tubulin mutant with a mutation in the rigosertib binding site (L240F TUBB) allows cells to proliferate in the presence of either formulation. Importantly, the specificity of the L240F TUBB mutant for microtubule-destabilizing agents has been confirmed independently. Thus, rigosertib kills cancer cells by destabilizing microtubules, in agreement with our original findings.


Asunto(s)
Antineoplásicos/farmacología , Proliferación Celular , Glicina/análogos & derivados , Microtúbulos/efectos de los fármacos , Neoplasias/patología , Preparaciones Farmacéuticas/metabolismo , Sulfonas/farmacología , Tubulina (Proteína)/metabolismo , Células Cultivadas , Cristalografía por Rayos X , Contaminación de Medicamentos , Glicina/farmacología , Humanos , Mutación , Neoplasias/tratamiento farmacológico , Neoplasias/metabolismo , Preparaciones Farmacéuticas/química , Conformación Proteica , Tubulina (Proteína)/química , Tubulina (Proteína)/genética
7.
Annu Rev Pharmacol Toxicol ; 64: 527-550, 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-37738505

RESUMEN

Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small-molecule drugs. AI technologies, such as generative chemistry, machine learning, and multiproperty optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.


Asunto(s)
Inteligencia Artificial , Médicos , Animales , Humanos , Reproducibilidad de los Resultados , Descubrimiento de Drogas , Tecnología
8.
Am J Hum Genet ; 111(9): 1899-1913, 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-39173627

RESUMEN

Understanding the molecular mechanisms of complex traits is essential for developing targeted interventions. We analyzed liver expression quantitative-trait locus (eQTL) meta-analysis data on 1,183 participants to identify conditionally distinct signals. We found 9,013 eQTL signals for 6,564 genes; 23% of eGenes had two signals, and 6% had three or more signals. We then integrated the eQTL results with data from 29 cardiometabolic genome-wide association study (GWAS) traits and identified 1,582 GWAS-eQTL colocalizations for 747 eGenes. Non-primary eQTL signals accounted for 17% of all colocalizations. Isolating signals by conditional analysis prior to coloc resulted in 37% more colocalizations than using marginal eQTL and GWAS data, highlighting the importance of signal isolation. Isolating signals also led to stronger evidence of colocalization: among 343 eQTL-GWAS signal pairs in multi-signal regions, analyses that isolated the signals of interest resulted in higher posterior probability of colocalization for 41% of tests. Leveraging allelic heterogeneity, we predicted causal effects of gene expression on liver traits for four genes. To predict functional variants and regulatory elements, we colocalized eQTL with liver chromatin accessibility QTL (caQTL) and found 391 colocalizations, including 73 with non-primary eQTL signals and 60 eQTL signals that colocalized with both a caQTL and a GWAS signal. Finally, we used publicly available massively parallel reporter assays in HepG2 to highlight 14 eQTL signals that include at least one expression-modulating variant. This multi-faceted approach to unraveling the genetic underpinnings of liver-related traits could lead to therapeutic development.


Asunto(s)
Estudio de Asociación del Genoma Completo , Hígado , Sitios de Carácter Cuantitativo , Humanos , Alelos , Enfermedades Cardiovasculares/genética , Predisposición Genética a la Enfermedad , Hígado/metabolismo , Fenotipo , Polimorfismo de Nucleótido Simple
9.
Proc Natl Acad Sci U S A ; 121(24): e2321809121, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38781227

RESUMEN

The modern canon of open science consists of five "schools of thought" that justify unfettered access to the fruits of scientific research: i) public engagement, ii) democratic right of access, iii) efficiency of knowledge gain, iv) shared technology, and v) better assessment of impact. Here, we introduce a sixth school: due process. Due process under the law includes a right to "discovery" by a defendant of potentially exculpatory evidence held by the prosecution. When such evidence is scientific, due process becomes a Constitutional mandate for open science. To illustrate the significance of this new school, we present a case study from forensics, which centers on a federally funded investigation that reports summary statistics indicating that identification decisions made by forensic firearms examiners are highly accurate. Because of growing concern about validity of forensic methods, the larger scientific community called for public release of the complete analyzable dataset for independent audit and verification. Those in possession of the data opposed release for three years while summary statistics were used by prosecutors to gain admissibility of evidence in criminal trials. Those statistics paint an incomplete picture and hint at flaws in experimental design and analysis. Under the circumstances, withholding the underlying data in a criminal proceeding violates due process. Following the successful open-science model of drug validity testing through "clinical trials," which place strict requirements on experimental design and timing of data release, we argue for registered and open "forensic trials" to ensure transparency and accountability.


Asunto(s)
Ciencias Forenses , Humanos , Ciencias Forenses/métodos , Armas de Fuego/legislación & jurisprudencia
10.
Hum Mol Genet ; 33(6): 478-490, 2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-37971354

RESUMEN

BACKGROUND: Colorectal cancer (CRC) is impacted by various environmental and genetic variables. Dysregulation of vesicle-mediated transport-related genes (VMTRGs) has been observed in many malignancies, but their effect on prognosis in CRC remains unclear. METHODS: CRC samples were clustered into varying subtypes per differential expression of VMTRGs. R package was utilized to explore differences in survival, immune, and drug sensitivity among different disease subtypes. According to differentially expressed genes (DEGs) between subtypes, regression analysis was employed to build a riskscore model and identify independent prognostic factors. The model was validated through a Gene Expression Omnibus (GEO) dataset. Immune landscape, immunophenoscore (IPS), and Tumor Immune Dysfunction and Exclusion (TIDE) scores for different risk groups were calculated. RESULTS: Two subtypes of CRC were identified based on VMTRGs, which showed significant differences in survival rates, immune cell infiltration abundance, immune functional activation levels, and immune checkpoint expression levels. Cluster2 exhibited higher sensitivity to anti-tumor drugs such as Nilotinib, Cisplatin, and Oxaliplatin compared to Cluster1. DEGs were mainly enriched in biological processes such as epidermis development, epidermal cell differentiation, and receptor-ligand activity, and signaling pathways like pancreatic secretion. The constructed 13-gene riskscore model demonstrated good predictive ability for CRC patients' prognosis. Furthermore, differences in immune landscape, IPS, and TIDE scores were observed among different risk groups. CONCLUSION: This study successfully obtained two CRC subtypes with distinct survival statuses and immune levels based on differential expression of VMTRGs. A 13-gene risk model was constructed. The findings had important implications for prognosis and treatment of CRC.


Asunto(s)
Neoplasias Colorrectales , Humanos , Pronóstico , Transporte Biológico , Oxaliplatino , Neoplasias Colorrectales/genética
11.
Am J Hum Genet ; 110(8): 1330-1342, 2023 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-37494930

RESUMEN

Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.


Asunto(s)
Variación Genética , Lípidos , Simulación por Computador , Estudios de Asociación Genética , Fenotipo , Estudio de Asociación del Genoma Completo
12.
Am J Hum Genet ; 110(1): 92-104, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-36563679

RESUMEN

Variant interpretation remains a major challenge in medical genetics. We developed Meta-Domain HotSpot (MDHS) to identify mutational hotspots across homologous protein domains. We applied MDHS to a dataset of 45,221 de novo mutations (DNMs) from 31,058 individuals with neurodevelopmental disorders (NDDs) and identified three significantly enriched missense DNM hotspots in the ion transport protein domain family (PF00520). The 37 unique missense DNMs that drive enrichment affect 25 genes, 19 of which were previously associated with NDDs. 3D protein structure modeling supports the hypothesis of function-altering effects of these mutations. Hotspot genes have a unique expression pattern in tissue, and we used this pattern alongside in silico predictors and population constraint information to identify candidate NDD-associated genes. We also propose a lenient version of our method, which identifies 32 hotspot positions across 16 different protein domains. These positions are enriched for likely pathogenic variation in clinical databases and DNMs in other genetic disorders.


Asunto(s)
Trastornos del Neurodesarrollo , Humanos , Dominios Proteicos/genética , Mutación/genética , Trastornos del Neurodesarrollo/genética
13.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279648

RESUMEN

Virus-encoded circular RNA (circRNA) participates in the immune response to viral infection, affects the human immune system, and can be used as a target for precision therapy and tumor biomarker. The coronaviruses SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) that have emerged in recent years are highly contagious and have high mortality rates. In coronaviruses, little is known about the circRNA encoded by the SARS-CoV-1/2. Therefore, this study explores whether SARS-CoV-1/2 encodes circRNA and characteristics and functions of circRNA. Based on RNA-seq data of SARS-CoV-1 and SARS-CoV-2 infections, we used circRNA identification tools (circRNA_finder, find_circ and CIRI2) to identify circRNAs. The number of circRNAs encoded by SARS-CoV-1 and SARS-CoV-2 was identified as 151 and 470, respectively. It can be found that SARS-CoV-2 shows more prominent circRNA encoding ability than SARS-CoV-1. Expression analysis showed that only a few circRNAs encoded by SARS-CoV-1/2 showed high expression levels, and the positive strand produced more abundant circRNAs. Then, based on the identified SARS-CoV-1/2-encoded circRNAs, we performed circRNA identification and characterization using the previously developed CirRNAPL. Finally, target gene prediction and functional enrichment analysis were performed. It was found that viral circRNA is closely related to cancer and has a potential role in regulating host cell functions. This study studied the characteristics and functions of viral circRNA encoded by coronavirus SARS-CoV-1/2, providing a valuable resource for further research on the function and molecular mechanism of coronavirus circRNA.


Asunto(s)
COVID-19 , MicroARNs , Neoplasias , Humanos , ARN Circular/genética , SARS-CoV-2/genética , COVID-19/genética , ARN Viral/genética , Neoplasias/genética , MicroARNs/genética
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38343326

RESUMEN

Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.


Asunto(s)
Metagenoma , Microbiota , Microbiota/genética , Benchmarking
15.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39222062

RESUMEN

Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.


Asunto(s)
Metagenómica , Microbiota , Humanos , Microbiota/genética , Metagenómica/métodos , Biología Computacional/métodos , Metagenoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Análisis de Secuencia de ADN/métodos
16.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38324623

RESUMEN

Recent advances in spatially resolved transcriptomics (SRT) have brought ever-increasing opportunities to characterize expression landscape in the context of tissue spatiality. Nevertheless, there still exist multiple challenges to accurately detect spatial functional regions in tissue. Here, we present a novel contrastive learning framework, SPAtially Contrastive variational AutoEncoder (SpaCAE), which contrasts transcriptomic signals of each spot and its spatial neighbors to achieve fine-grained tissue structures detection. By employing a graph embedding variational autoencoder and incorporating a deep contrastive strategy, SpaCAE achieves a balance between spatial local information and global information of expression, enabling effective learning of representations with spatial constraints. Particularly, SpaCAE provides a graph deconvolutional decoder to address the smoothing effect of local spatial structure on expression's self-supervised learning, an aspect often overlooked by current graph neural networks. We demonstrated that SpaCAE could achieve effective performance on SRT data generated from multiple technologies for spatial domains identification and data denoising, making it a remarkable tool to obtain novel insights from SRT studies.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Redes Neurales de la Computación
17.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38385877

RESUMEN

Recent advancements in spatial transcriptomics technology have revolutionized our ability to comprehensively characterize gene expression patterns within the tissue microenvironment, enabling us to grasp their functional significance in a spatial context. One key field of research in spatial transcriptomics is the identification of spatial domains, which refers to distinct regions within the tissue where specific gene expression patterns are observed. Diverse methodologies have been proposed, each with its unique characteristics. As the availability of spatial transcriptomics data continues to expand, there is a growing need for methods that can integrate information from multiple slices to discover spatial domains. To extend the applicability of existing single-slice analysis methods to multi-slice clustering, we introduce BiGATAE (Bipartite Graph Attention Auto Encoder) that leverages gene expression information from adjacent tissue slices to enhance spatial transcriptomics data. BiGATAE comprises two steps: aligning slices to generate an adjacency matrix for different spots in consecutive slices and constructing a bipartite graph. Subsequently, it utilizes a graph attention network to integrate information across different slices. Then it can seamlessly integrate with pre-existing techniques. To evaluate the performance of BiGATAE, we conducted benchmarking analyses on three different datasets. The experimental results demonstrate that for existing single-slice clustering methods, the integration of BiGATAE significantly enhances their performance. Moreover, single-slice clustering methods integrated with BiGATAE outperform methods specifically designed for multi-slice integration. These results underscore the proficiency of BiGATAE in facilitating information transfer across multiple slices and its capacity to broaden the applicability and sustainability of pre-existing methods.


Asunto(s)
Benchmarking , Perfilación de la Expresión Génica , Análisis por Conglomerados
18.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38385876

RESUMEN

Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.


Asunto(s)
Benchmarking , Medicina , Redes Neurales de la Computación , Nucleótidos , Secuencias Reguladoras de Ácidos Nucleicos
19.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38678389

RESUMEN

MOTIVATION: Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. RESULTS: To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Humanos , Perfilación de la Expresión Génica/métodos , Algoritmos , Biología Computacional/métodos , Programas Informáticos
20.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39207729

RESUMEN

Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis por Conglomerados , Humanos , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Biología Computacional/métodos , Programas Informáticos , Perfilación de la Expresión Génica/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA