Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22.840
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 185(10): 1793-1805.e17, 2022 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-35483372

RESUMO

The lack of tools to observe drug-target interactions at cellular resolution in intact tissue has been a major barrier to understanding in vivo drug actions. Here, we develop clearing-assisted tissue click chemistry (CATCH) to optically image covalent drug targets in intact mammalian tissues. CATCH permits specific and robust in situ fluorescence imaging of target-bound drug molecules at subcellular resolution and enables the identification of target cell types. Using well-established inhibitors of endocannabinoid hydrolases and monoamine oxidases, direct or competitive CATCH not only reveals distinct anatomical distributions and predominant cell targets of different drug compounds in the mouse brain but also uncovers unexpected differences in drug engagement across and within brain regions, reflecting rare cell types, as well as dose-dependent target shifts across tissue, cellular, and subcellular compartments that are not accessible by conventional methods. CATCH represents a valuable platform for visualizing in vivo interactions of small molecules in tissue.


Assuntos
Química Click , Imagem Óptica , Animais , Encéfalo , Sistemas de Liberação de Medicamentos , Mamíferos , Camundongos , Imagem Óptica/métodos
2.
Cell ; 173(4): 864-878.e29, 2018 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-29681454

RESUMO

Diversity in the genetic lesions that cause cancer is extreme. In consequence, a pressing challenge is the development of drugs that target patient-specific disease mechanisms. To address this challenge, we employed a chemistry-first discovery paradigm for de novo identification of druggable targets linked to robust patient selection hypotheses. In particular, a 200,000 compound diversity-oriented chemical library was profiled across a heavily annotated test-bed of >100 cellular models representative of the diverse and characteristic somatic lesions for lung cancer. This approach led to the delineation of 171 chemical-genetic associations, shedding light on the targetability of mechanistic vulnerabilities corresponding to a range of oncogenotypes present in patient populations lacking effective therapy. Chemically addressable addictions to ciliogenesis in TTC21B mutants and GLUT8-dependent serine biosynthesis in KRAS/KEAP1 double mutants are prominent examples. These observations indicate a wealth of actionable opportunities within the complex molecular etiology of cancer.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/patologia , Proliferação de Células/efeitos dos fármacos , Neoplasias Pulmonares/patologia , Bibliotecas de Moléculas Pequenas/farmacologia , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Linhagem Celular Tumoral , Família 4 do Citocromo P450/deficiência , Família 4 do Citocromo P450/genética , Descoberta de Drogas , Pontos de Checagem da Fase G1 do Ciclo Celular/efeitos dos fármacos , Glucocorticoides/farmacologia , Proteínas Facilitadoras de Transporte de Glucose/antagonistas & inibidores , Proteínas Facilitadoras de Transporte de Glucose/genética , Proteínas Facilitadoras de Transporte de Glucose/metabolismo , Humanos , Proteína 1 Associada a ECH Semelhante a Kelch/genética , Proteína 1 Associada a ECH Semelhante a Kelch/metabolismo , Neoplasias Pulmonares/metabolismo , Proteínas Associadas aos Microtúbulos/genética , Proteínas Associadas aos Microtúbulos/metabolismo , Mutação , Fator 2 Relacionado a NF-E2/antagonistas & inibidores , Fator 2 Relacionado a NF-E2/genética , Fator 2 Relacionado a NF-E2/metabolismo , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas Proto-Oncogênicas p21(ras)/metabolismo , Interferência de RNA , RNA Interferente Pequeno/metabolismo , Receptor Notch2/genética , Receptor Notch2/metabolismo , Receptores de Glucocorticoides/antagonistas & inibidores , Receptores de Glucocorticoides/genética , Receptores de Glucocorticoides/metabolismo , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/metabolismo
3.
Cell ; 172(3): 549-563.e16, 2018 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-29275860

RESUMO

The immune system can mount T cell responses against tumors; however, the antigen specificities of tumor-infiltrating lymphocytes (TILs) are not well understood. We used yeast-display libraries of peptide-human leukocyte antigen (pHLA) to screen for antigens of "orphan" T cell receptors (TCRs) expressed on TILs from human colorectal adenocarcinoma. Four TIL-derived TCRs exhibited strong selection for peptides presented in a highly diverse pHLA-A∗02:01 library. Three of the TIL TCRs were specific for non-mutated self-antigens, two of which were present in separate patient tumors, and shared specificity for a non-mutated self-antigen derived from U2AF2. These results show that the exposed recognition surface of MHC-bound peptides accessible to the TCR contains sufficient structural information to enable the reconstruction of sequences of peptide targets for pathogenic TCRs of unknown specificity. This finding underscores the surprising specificity of TCRs for their cognate antigens and enables the facile indentification of tumor antigens through unbiased screening.


Assuntos
Adenocarcinoma/imunologia , Antígenos de Neoplasias/imunologia , Neoplasias Colorretais/imunologia , Linfócitos do Interstício Tumoral/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Idoso , Animais , Antígenos de Neoplasias/química , Linhagem Celular Tumoral , Células Cultivadas , Células HEK293 , Antígenos HLA-A/química , Antígenos HLA-A/imunologia , Humanos , Masculino , Pessoa de Meia-Idade , Biblioteca de Peptídeos , Células Sf9 , Spodoptera
4.
Mol Cell ; 83(22): 4106-4122.e10, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37977120

RESUMO

γ-Secretases mediate the regulated intramembrane proteolysis (RIP) of more than 150 integral membrane proteins. We developed an unbiased γ-secretase substrate identification (G-SECSI) method to study to what extent these proteins are processed in parallel. We demonstrate here parallel processing of at least 85 membrane proteins in human microglia in steady-state cell culture conditions. Pharmacological inhibition of γ-secretase caused substantial changes of human microglial transcriptomes, including the expression of genes related to the disease-associated microglia (DAM) response described in Alzheimer disease (AD). While the overall effects of γ-secretase deficiency on transcriptomic cell states remained limited in control conditions, exposure of mouse microglia to AD-inducing amyloid plaques strongly blocked their capacity to mount this putatively protective DAM cell state. We conclude that γ-secretase serves as a critical signaling hub integrating the effects of multiple extracellular stimuli into the overall transcriptome of the cell.


Assuntos
Doença de Alzheimer , Secretases da Proteína Precursora do Amiloide , Camundongos , Animais , Humanos , Secretases da Proteína Precursora do Amiloide/genética , Secretases da Proteína Precursora do Amiloide/metabolismo , Proteoma/genética , Transdução de Sinais , Proteínas de Membrana/metabolismo , Doença de Alzheimer/genética
5.
Trends Biochem Sci ; 49(3): 224-235, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38160064

RESUMO

At its most fundamental level, life is a collection of synchronized cellular processes driven by interactions among biomolecules. Proximity labeling has emerged as a powerful technique to capture these interactions in native settings, revealing previously unexplored elements of biology. This review highlights recent developments in proximity labeling, focusing on methods that push the fundamental technologies beyond the classic bait-prey paradigm, such as RNA-protein interactions, ligand/small-molecule-protein interactions, cell surface protein interactions, and subcellular protein trafficking. The advancement of proximity labeling methods to address different biological problems will accelerate our understanding of the complex biological systems that make up life.


Assuntos
Proteínas de Membrana , Proteômica , Proteômica/métodos , Proteínas de Membrana/metabolismo
6.
Mol Cell ; 79(1): 191-198.e3, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32619469

RESUMO

We recently used CRISPRi/a-based chemical-genetic screens and cell biological, biochemical, and structural assays to determine that rigosertib, an anti-cancer agent in phase III clinical trials, kills cancer cells by destabilizing microtubules. Reddy and co-workers (Baker et al., 2020, this issue of Molecular Cell) suggest that a contaminating degradation product in commercial formulations of rigosertib is responsible for the microtubule-destabilizing activity. Here, we demonstrate that cells treated with pharmaceutical-grade rigosertib (>99.9% purity) or commercially obtained rigosertib have qualitatively indistinguishable phenotypes across multiple assays. The two formulations have indistinguishable chemical-genetic interactions with genes that modulate microtubule stability, both destabilize microtubules in cells and in vitro, and expression of a rationally designed tubulin mutant with a mutation in the rigosertib binding site (L240F TUBB) allows cells to proliferate in the presence of either formulation. Importantly, the specificity of the L240F TUBB mutant for microtubule-destabilizing agents has been confirmed independently. Thus, rigosertib kills cancer cells by destabilizing microtubules, in agreement with our original findings.


Assuntos
Antineoplásicos/farmacologia , Proliferação de Células , Glicina/análogos & derivados , Microtúbulos/efeitos dos fármacos , Neoplasias/patologia , Preparações Farmacêuticas/metabolismo , Sulfonas/farmacologia , Tubulina (Proteína)/metabolismo , Células Cultivadas , Cristalografia por Raios X , Contaminação de Medicamentos , Glicina/farmacologia , Humanos , Mutação , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Preparações Farmacêuticas/química , Conformação Proteica , Tubulina (Proteína)/química , Tubulina (Proteína)/genética
7.
Annu Rev Pharmacol Toxicol ; 64: 527-550, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-37738505

RESUMO

Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small-molecule drugs. AI technologies, such as generative chemistry, machine learning, and multiproperty optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.


Assuntos
Inteligência Artificial , Médicos , Animais , Humanos , Reprodutibilidade dos Testes , Descoberta de Drogas , Tecnologia
8.
Am J Hum Genet ; 111(9): 1899-1913, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39173627

RESUMO

Understanding the molecular mechanisms of complex traits is essential for developing targeted interventions. We analyzed liver expression quantitative-trait locus (eQTL) meta-analysis data on 1,183 participants to identify conditionally distinct signals. We found 9,013 eQTL signals for 6,564 genes; 23% of eGenes had two signals, and 6% had three or more signals. We then integrated the eQTL results with data from 29 cardiometabolic genome-wide association study (GWAS) traits and identified 1,582 GWAS-eQTL colocalizations for 747 eGenes. Non-primary eQTL signals accounted for 17% of all colocalizations. Isolating signals by conditional analysis prior to coloc resulted in 37% more colocalizations than using marginal eQTL and GWAS data, highlighting the importance of signal isolation. Isolating signals also led to stronger evidence of colocalization: among 343 eQTL-GWAS signal pairs in multi-signal regions, analyses that isolated the signals of interest resulted in higher posterior probability of colocalization for 41% of tests. Leveraging allelic heterogeneity, we predicted causal effects of gene expression on liver traits for four genes. To predict functional variants and regulatory elements, we colocalized eQTL with liver chromatin accessibility QTL (caQTL) and found 391 colocalizations, including 73 with non-primary eQTL signals and 60 eQTL signals that colocalized with both a caQTL and a GWAS signal. Finally, we used publicly available massively parallel reporter assays in HepG2 to highlight 14 eQTL signals that include at least one expression-modulating variant. This multi-faceted approach to unraveling the genetic underpinnings of liver-related traits could lead to therapeutic development.


Assuntos
Estudo de Associação Genômica Ampla , Fígado , Locos de Características Quantitativas , Humanos , Alelos , Doenças Cardiovasculares/genética , Predisposição Genética para Doença , Fígado/metabolismo , Fenótipo , Polimorfismo de Nucleotídeo Único
9.
Proc Natl Acad Sci U S A ; 121(24): e2321809121, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38781227

RESUMO

The modern canon of open science consists of five "schools of thought" that justify unfettered access to the fruits of scientific research: i) public engagement, ii) democratic right of access, iii) efficiency of knowledge gain, iv) shared technology, and v) better assessment of impact. Here, we introduce a sixth school: due process. Due process under the law includes a right to "discovery" by a defendant of potentially exculpatory evidence held by the prosecution. When such evidence is scientific, due process becomes a Constitutional mandate for open science. To illustrate the significance of this new school, we present a case study from forensics, which centers on a federally funded investigation that reports summary statistics indicating that identification decisions made by forensic firearms examiners are highly accurate. Because of growing concern about validity of forensic methods, the larger scientific community called for public release of the complete analyzable dataset for independent audit and verification. Those in possession of the data opposed release for three years while summary statistics were used by prosecutors to gain admissibility of evidence in criminal trials. Those statistics paint an incomplete picture and hint at flaws in experimental design and analysis. Under the circumstances, withholding the underlying data in a criminal proceeding violates due process. Following the successful open-science model of drug validity testing through "clinical trials," which place strict requirements on experimental design and timing of data release, we argue for registered and open "forensic trials" to ensure transparency and accountability.


Assuntos
Ciências Forenses , Humanos , Ciências Forenses/métodos , Armas de Fogo/legislação & jurisprudência
10.
Hum Mol Genet ; 33(6): 478-490, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-37971354

RESUMO

BACKGROUND: Colorectal cancer (CRC) is impacted by various environmental and genetic variables. Dysregulation of vesicle-mediated transport-related genes (VMTRGs) has been observed in many malignancies, but their effect on prognosis in CRC remains unclear. METHODS: CRC samples were clustered into varying subtypes per differential expression of VMTRGs. R package was utilized to explore differences in survival, immune, and drug sensitivity among different disease subtypes. According to differentially expressed genes (DEGs) between subtypes, regression analysis was employed to build a riskscore model and identify independent prognostic factors. The model was validated through a Gene Expression Omnibus (GEO) dataset. Immune landscape, immunophenoscore (IPS), and Tumor Immune Dysfunction and Exclusion (TIDE) scores for different risk groups were calculated. RESULTS: Two subtypes of CRC were identified based on VMTRGs, which showed significant differences in survival rates, immune cell infiltration abundance, immune functional activation levels, and immune checkpoint expression levels. Cluster2 exhibited higher sensitivity to anti-tumor drugs such as Nilotinib, Cisplatin, and Oxaliplatin compared to Cluster1. DEGs were mainly enriched in biological processes such as epidermis development, epidermal cell differentiation, and receptor-ligand activity, and signaling pathways like pancreatic secretion. The constructed 13-gene riskscore model demonstrated good predictive ability for CRC patients' prognosis. Furthermore, differences in immune landscape, IPS, and TIDE scores were observed among different risk groups. CONCLUSION: This study successfully obtained two CRC subtypes with distinct survival statuses and immune levels based on differential expression of VMTRGs. A 13-gene risk model was constructed. The findings had important implications for prognosis and treatment of CRC.


Assuntos
Neoplasias Colorretais , Humanos , Prognóstico , Transporte Biológico , Oxaliplatina , Neoplasias Colorretais/genética
11.
Am J Hum Genet ; 110(8): 1330-1342, 2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37494930

RESUMO

Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.


Assuntos
Variação Genética , Lipídeos , Simulação por Computador , Estudos de Associação Genética , Fenótipo , Estudo de Associação Genômica Ampla
12.
Am J Hum Genet ; 110(1): 92-104, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36563679

RESUMO

Variant interpretation remains a major challenge in medical genetics. We developed Meta-Domain HotSpot (MDHS) to identify mutational hotspots across homologous protein domains. We applied MDHS to a dataset of 45,221 de novo mutations (DNMs) from 31,058 individuals with neurodevelopmental disorders (NDDs) and identified three significantly enriched missense DNM hotspots in the ion transport protein domain family (PF00520). The 37 unique missense DNMs that drive enrichment affect 25 genes, 19 of which were previously associated with NDDs. 3D protein structure modeling supports the hypothesis of function-altering effects of these mutations. Hotspot genes have a unique expression pattern in tissue, and we used this pattern alongside in silico predictors and population constraint information to identify candidate NDD-associated genes. We also propose a lenient version of our method, which identifies 32 hotspot positions across 16 different protein domains. These positions are enriched for likely pathogenic variation in clinical databases and DNMs in other genetic disorders.


Assuntos
Transtornos do Neurodesenvolvimento , Humanos , Domínios Proteicos/genética , Mutação/genética , Transtornos do Neurodesenvolvimento/genética
13.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39222062

RESUMO

Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.


Assuntos
Metagenômica , Microbiota , Humanos , Microbiota/genética , Metagenômica/métodos , Biologia Computacional/métodos , Metagenoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Algoritmos , Análise de Sequência de DNA/métodos
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38279648

RESUMO

Virus-encoded circular RNA (circRNA) participates in the immune response to viral infection, affects the human immune system, and can be used as a target for precision therapy and tumor biomarker. The coronaviruses SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) that have emerged in recent years are highly contagious and have high mortality rates. In coronaviruses, little is known about the circRNA encoded by the SARS-CoV-1/2. Therefore, this study explores whether SARS-CoV-1/2 encodes circRNA and characteristics and functions of circRNA. Based on RNA-seq data of SARS-CoV-1 and SARS-CoV-2 infections, we used circRNA identification tools (circRNA_finder, find_circ and CIRI2) to identify circRNAs. The number of circRNAs encoded by SARS-CoV-1 and SARS-CoV-2 was identified as 151 and 470, respectively. It can be found that SARS-CoV-2 shows more prominent circRNA encoding ability than SARS-CoV-1. Expression analysis showed that only a few circRNAs encoded by SARS-CoV-1/2 showed high expression levels, and the positive strand produced more abundant circRNAs. Then, based on the identified SARS-CoV-1/2-encoded circRNAs, we performed circRNA identification and characterization using the previously developed CirRNAPL. Finally, target gene prediction and functional enrichment analysis were performed. It was found that viral circRNA is closely related to cancer and has a potential role in regulating host cell functions. This study studied the characteristics and functions of viral circRNA encoded by coronavirus SARS-CoV-1/2, providing a valuable resource for further research on the function and molecular mechanism of coronavirus circRNA.


Assuntos
COVID-19 , MicroRNAs , Neoplasias , Humanos , RNA Circular/genética , SARS-CoV-2/genética , COVID-19/genética , RNA Viral/genética , Neoplasias/genética , MicroRNAs/genética
15.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38343326

RESUMO

Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.


Assuntos
Metagenoma , Microbiota , Microbiota/genética , Benchmarking
16.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39207729

RESUMO

Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.


Assuntos
Algoritmos , Análise de Célula Única , Análise por Conglomerados , Humanos , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Software , Perfilação da Expressão Gênica/métodos
17.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39210506

RESUMO

Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on graph neural networks (GNN) are effective in identifying cancer genes, they fall short in effectively integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multiview data, offering significant insights into the identification of cancer genes. Experimental results demonstrate that IMVRL-GCN outperforms state-of-the-art cancer gene identification methods and several baselines. Furthermore, IMVRL-GCN is employed to identify a total of 74 high-confidence novel cancer genes, and multiview data analysis highlights the pivotal roles of shared, mutation-specific, and structure-specific representations in discriminating distinctive cancer genes. Exploration of the mechanisms behind their discriminative capabilities suggests that shared representations are strongly associated with gene functions, while mutation-specific and structure-specific representations are linked to mutagenic propensity and functional synergy, respectively. Finally, our in-depth analyses of these candidates suggest potential insights for individualized treatments: afatinib could counteract many mutation-driven risks, and targeting interactions with cancer gene SRC is a reasonable strategy to mitigate interaction-induced risks for NR3C1, RXRA, HNF4A, and SP1.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Biologia Computacional/métodos , Redes Neurais de Computação , Mutação , Genes Neoplásicos , Fator 4 Nuclear de Hepatócito/genética , Aprendizado de Máquina
18.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38819253

RESUMO

Spatially resolved transcriptomics (SRT) has emerged as a powerful tool for investigating gene expression in spatial contexts, providing insights into the molecular mechanisms underlying organ development and disease pathology. However, the expression sparsity poses a computational challenge to integrate other modalities (e.g. histological images and spatial locations) that are simultaneously captured in SRT datasets for spatial clustering and variation analyses. In this study, to meet such a challenge, we propose multi-modal domain adaption for spatial transcriptomics (stMDA), a novel multi-modal unsupervised domain adaptation method, which integrates gene expression and other modalities to reveal the spatial functional landscape. Specifically, stMDA first learns the modality-specific representations from spatial multi-modal data using multiple neural network architectures and then aligns the spatial distributions across modal representations to integrate these multi-modal representations, thus facilitating the integration of global and spatially local information and improving the consistency of clustering assignments. Our results demonstrate that stMDA outperforms existing methods in identifying spatial domains across diverse platforms and species. Furthermore, stMDA excels in identifying spatially variable genes with high prognostic potential in cancer tissues. In conclusion, stMDA as a new tool of multi-modal data integration provides a powerful and flexible framework for analyzing SRT datasets, thereby advancing our understanding of intricate biological systems.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Redes Neurais de Computação , Neoplasias/genética , Algoritmos
19.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38975895

RESUMO

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Algoritmos , Humanos , Animais , Software , Aprendizado de Máquina
20.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678389

RESUMO

MOTIVATION: Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. RESULTS: To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Algoritmos , Biologia Computacional/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA