RESUMO
Crucial transitions in cancer-including tumor initiation, local expansion, metastasis, and therapeutic resistance-involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer.
Assuntos
Transformação Celular Neoplásica/metabolismo , Neoplasias/metabolismo , Microambiente Tumoral/fisiologia , Atlas como Assunto , Transformação Celular Neoplásica/patologia , Genômica/métodos , Humanos , Medicina de Precisão/métodos , Análise de Célula Única/métodosRESUMO
In January 2024, a targeted conference, 'CellVis2', was held at Scripps Research in La Jolla, USA, the second in a series designed to explore the promise, practices, roadblocks, and prospects of creating, visualizing, sharing, and communicating physical representations of entire biological cells at scales down to the atom.
RESUMO
Craniofacial phenotyping is critical for both syndrome delineation and diagnosis because craniofacial abnormalities occur in 30% of characterized genetic syndromes. Clinical reports, textbooks, and available software tools typically provide two-dimensional, static images and illustrations of the characteristic phenotypes of genetic syndromes. In this work, we provide an interactive web application that provides three-dimensional, dynamic visualizations for the characteristic craniofacial effects of 95 syndromes. Users can visualize syndrome facial appearance estimates quantified from data and easily compare craniofacial phenotypes of different syndromes. Our application also provides a map of morphological similarity between a target syndrome and other syndromes. Finally, users can upload 3D facial scans of individuals and compare them to our syndrome atlas estimates. In summary, we provide an interactive reference for the craniofacial phenotypes of syndromes that allows for precise, individual-specific comparisons of dysmorphology.
Assuntos
Face , Software , Humanos , Fácies , Fenótipo , SíndromeRESUMO
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
RESUMO
Single-cell RNA sequencing (scRNA-seq) data, susceptible to noise arising from biological variability and technical errors, can distort gene expression analysis and impact cell similarity assessments, particularly in heterogeneous populations. Current methods, including deep learning approaches, often struggle to accurately characterize cell relationships due to this inherent noise. To address these challenges, we introduce scAMF (Single-cell Analysis via Manifold Fitting), a framework designed to enhance clustering accuracy and data visualization in scRNA-seq studies. At the heart of scAMF lies the manifold fitting module, which effectively denoises scRNA-seq data by unfolding their distribution in the ambient space. This unfolding aligns the gene expression vector of each cell more closely with its underlying structure, bringing it spatially closer to other cells of the same cell type. To comprehensively assess the impact of scAMF, we compile a collection of 25 publicly available scRNA-seq datasets spanning various sequencing platforms, species, and organ types, forming an extensive RNA data bank. In our comparative studies, benchmarking scAMF against existing scRNA-seq analysis algorithms in this data bank, we consistently observe that scAMF outperforms in terms of clustering efficiency and data visualization clarity. Further experimental analysis reveals that this enhanced performance stems from scAMF's ability to improve the spatial distribution of the data and capture class-consistent neighborhoods. These findings underscore the promising application potential of manifold fitting as a tool in scRNA-seq analysis, signaling a significant enhancement in the precision and reliability of data interpretation in this critical field of study.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Humanos , Análise de Sequência de RNA/métodos , Animais , Algoritmos , RNA/genética , Perfilação da Expressão Gênica/métodos , RNA-Seq/métodosRESUMO
The unparalleled resolving power of electron microscopy is both a blessing and a curse. At 30,000× magnification, 1â µm corresponds to 3â cm in the image and the field of view is only a few micrometres or less, resulting in an inevitable reduction in the spatial data available in an image. Consequently, the gain in resolution is at the cost of loss of the contextual 'reference space', which is crucial for understanding the embedded structures of interest. This problem is particularly pronounced in immunoelectron microscopy, where the detection of a gold particle is crucial for the localisation of specific molecules. The common solution of presenting high-magnification and overview images side by side often insufficiently represents the cellular environment. To address these limitations, we propose here an interactive visualization strategy inspired by digital maps and GPS modules which enables seamless transitions between different magnifications by dynamically linking virtual low magnification overview images with primary high-resolution data. By enabling dynamic browsing, it offers the potential for a deeper understanding of cellular landscapes leading to more comprehensive analysis of the primary ultrastructural data.
Assuntos
Microscopia Eletrônica , Microscopia Eletrônica/métodos , Processamento de Imagem Assistida por Computador/métodos , HumanosRESUMO
The human gut microbiota produces diverse, extensive metabolites that have the potential to affect host physiology. Despite significant efforts to identify metabolic pathways for producing these microbial metabolites, a comprehensive metabolic pathway database for the human gut microbiota is still lacking. Here, we present Enteropathway, a metabolic pathway database that integrates 3269 compounds, 3677 reactions, and 876 modules that were obtained from 1012 manually curated scientific literature. Notably, 698 modules of these modules are new entries and cannot be found in any other databases. The database is accessible from a web application (https://enteropathway.org) that offers a metabolic diagram for graphical visualization of metabolic pathways, a customization interface, and an enrichment analysis feature for highlighting enriched modules on the metabolic diagram. Overall, Enteropathway is a comprehensive reference database that can complement widely used databases, and a tool for visual and statistical analysis in human gut microbiota studies and was designed to help researchers pinpoint new insights into the complex interplay between microbiota and host metabolism.
Assuntos
Bases de Dados Factuais , Microbioma Gastrointestinal , Redes e Vias Metabólicas , Humanos , Software , Biologia Computacional/métodosRESUMO
The analysis and comparison of gene neighborhoods is a powerful approach for exploring microbial genome structure, function, and evolution. Although numerous tools exist for genome visualization and comparison, genome exploration across large genomic databases or user-generated datasets remains a challenge. Here, we introduce AnnoView, a web server designed for interactive exploration of gene neighborhoods across the bacterial and archaeal tree of life. Our server offers users the ability to identify, compare, and visualize gene neighborhoods of interest from 30 238 bacterial genomes and 1672 archaeal genomes, through integration with the comprehensive Genome Taxonomy Database and AnnoTree databases. Identified gene neighborhoods can be visualized using pre-computed functional annotations from different sources such as KEGG, Pfam and TIGRFAM, or clustered based on similarity. Alternatively, users can upload and explore their own custom genomic datasets in GBK, GFF or CSV format, or use AnnoView as a genome browser for relatively small genomes (e.g. viruses and plasmids). Ultimately, we anticipate that AnnoView will catalyze biological discovery by enabling user-friendly search, comparison, and visualization of genomic data. AnnoView is available at http://annoview.uwaterloo.ca.
Assuntos
Software , Bases de Dados Genéticas , Genoma Bacteriano , Genoma Arqueal , Genômica/métodos , Archaea/genética , Genes Microbianos/genética , Biologia Computacional/métodos , Bactérias/genética , Bactérias/classificaçãoRESUMO
Recent technological advances in sequencing DNA and RNA modifications using high-throughput platforms have generated vast epigenomic and epitranscriptomic datasets whose power in transforming life science is yet fully unleashed. Currently available in silico methods have facilitated the identification, positioning and quantitative comparisons of individual modification sites. However, the essential challenge to link specific 'epi-marks' to gene expression in the particular context of cellular and biological processes is unmet. To fast-track exploration, we generated epidecodeR implemented in R, which allows biologists to quickly survey whether an epigenomic or epitranscriptomic status of their interest potentially influences gene expression responses. The evaluation is based on the cumulative distribution function and the statistical significance in differential expression of genes grouped by the number of 'epi-marks'. This tool proves useful in predicting the role of H3K9ac and H3K27ac in associated gene expression after knocking down deacetylases FAM60A and SDS3 and N6-methyl-adenosine-associated gene expression after knocking out the reader proteins. We further used epidecodeR to explore the effectiveness of demethylase FTO inhibitors and histone-associated modifications in drug abuse in animals. epidecodeR is available for downloading as an R package at https://bioconductor.riken.jp/packages/3.13/bioc/html/epidecodeR.html.
Assuntos
Epigenômica , Software , Animais , Epigenômica/métodos , Metilação de DNA , DNA/metabolismo , Epigênese GenéticaRESUMO
The major histocompatibility complex (MHC) encodes a range of immune response genes, including the human leukocyte antigens (HLAs) in humans. These molecules bind peptide antigens and present them on the cell surface for T cell recognition. The repertoires of peptides presented by HLA molecules are termed immunopeptidomes. The highly polymorphic nature of the genres that encode the HLA molecules confers allotype-specific differences in the sequences of bound ligands. Allotype-specific ligand preferences are often defined by peptide-binding motifs. Individuals express up to six classical class I HLA allotypes, which likely present peptides displaying different binding motifs. Such complex datasets make the deconvolution of immunopeptidomic data into allotype-specific contributions and further dissection of binding-specificities challenging. Herein, we developed MHCpLogics as an interactive machine learning-based tool for mining peptide-binding sequence motifs and visualization of immunopeptidome data across complex datasets. We showcase the functionalities of MHCpLogics by analyzing both in-house and published mono- and multi-allelic immunopeptidomics data. The visualization modalities of MHCpLogics allow users to inspect clustered sequences down to individual peptide components and to examine broader sequence patterns within multiple immunopeptidome datasets. MHCpLogics can deconvolute large immunopeptidome datasets enabling the interrogation of clusters for the segregation of allotype-specific peptide sequence motifs, identification of sub-peptidome motifs, and the exportation of clustered peptide sequence lists. The tool facilitates rapid inspection of immunopeptidomes as a resource for the immunology and vaccine communities. MHCpLogics is a standalone application available via an executable installation at: https://github.com/PurcellLab/MHCpLogics.
Assuntos
Visualização de Dados , Peptídeos , Humanos , Peptídeos/química , Antígenos HLA/genética , Antígenos de Histocompatibilidade , Aprendizado de Máquina , Análise por ConglomeradosRESUMO
Tumor immunotherapy is refashioning traditional treatments in the clinic for certain tumors, especially by relying on the activation of T cells. However, the safety and effectiveness of many antitumor immunotherapeutic agents are suboptimal due to difficulties encountered in assessing T cell responses and adjusting treatment regimens accordingly. Here, we review advances in the clinical visualization of T cell activity in vivo, and focus particularly on molecular imaging probes and biomarkers of T cell activation. Current challenges and prospects are also discussed that aim to achieve a better strategy for real-time monitoring of T cell activity, predicting prognoses and responses to tumor immunotherapy, and assessing disease management.
Assuntos
Antineoplásicos , Neoplasias , Humanos , Linfócitos T , Neoplasias/terapia , Imunoterapia/métodos , Imagem MolecularRESUMO
Traditionally, scientists have placed more emphasis on communicating inferential uncertainty (i.e., the precision of statistical estimates) compared to outcome variability (i.e., the predictability of individual outcomes). Here, we show that this can lead to sizable misperceptions about the implications of scientific results. Specifically, we present three preregistered, randomized experiments where participants saw the same scientific findings visualized as showing only inferential uncertainty, only outcome variability, or both and answered questions about the size and importance of findings they were shown. Our results, composed of responses from medical professionals, professional data scientists, and tenure-track faculty, show that the prevalent form of visualizing only inferential uncertainty can lead to significant overestimates of treatment effects, even among highly trained experts. In contrast, we find that depicting both inferential uncertainty and outcome variability leads to more accurate perceptions of results while appearing to leave other subjective impressions of the results unchanged, on average.
RESUMO
The main advantage proton beams offer over photon beams in radiation therapy of cancer patients is the dose maximum at their finite range, yielding a reduction in the dose deposited in healthy tissues surrounding the tumor. Since no direct method exists to measure the beam's range during dose delivery, safety margins around the tumor are applied, compromising the dose conformality and reducing the targeting accuracy. Here, we demonstrate that online MRI can visualize the proton beam and reveal its range during irradiation of liquid-filled phantoms. A clear dependence on beam energy and current was found. These results stimulate research into novel MRI-detectable beam signatures and already find application in the geometric quality assurance for magnetic resonance-integrated proton therapy systems currently under development.
Assuntos
Neoplasias , Terapia com Prótons , Humanos , Prótons , Terapia com Prótons/métodos , Neoplasias/diagnóstico por imagem , Neoplasias/radioterapia , Imageamento por Ressonância MagnéticaRESUMO
Mutations in the breast cancer susceptibility gene, BRCA2, greatly increase an individual's lifetime risk of developing breast and ovarian cancers. BRCA2 suppresses tumor formation by potentiating DNA repair via homologous recombination. Central to recombination is the assembly of a RAD51 nucleoprotein filament, which forms on single-stranded DNA (ssDNA) generated at or near the site of chromosomal damage. However, replication protein-A (RPA) rapidly binds to and continuously sequesters this ssDNA, imposing a kinetic barrier to RAD51 filament assembly that suppresses unregulated recombination. Recombination mediator proteins-of which BRCA2 is the defining member in humans-alleviate this kinetic barrier to catalyze RAD51 filament formation. We combined microfluidics, microscopy, and micromanipulation to directly measure both the binding of full-length BRCA2 to-and the assembly of RAD51 filaments on-a region of RPA-coated ssDNA within individual DNA molecules designed to mimic a resected DNA lesion common in replication-coupled recombinational repair. We demonstrate that a dimer of RAD51 is minimally required for spontaneous nucleation; however, growth self-terminates below the diffraction limit. BRCA2 accelerates nucleation of RAD51 to a rate that approaches the rapid association of RAD51 to naked ssDNA, thereby overcoming the kinetic block imposed by RPA. Furthermore, BRCA2 eliminates the need for the rate-limiting nucleation of RAD51 by chaperoning a short preassembled RAD51 filament onto the ssDNA complexed with RPA. Therefore, BRCA2 regulates recombination by initiating RAD51 filament formation.
Assuntos
DNA de Cadeia Simples , Proteína de Replicação A , Humanos , Proteína BRCA2/genética , Proteína BRCA2/metabolismo , DNA/metabolismo , DNA de Cadeia Simples/genética , Genes BRCA2 , Recombinação Homóloga , Ligação Proteica , Rad51 Recombinase/metabolismo , Proteína de Replicação A/genética , Proteína de Replicação A/metabolismoRESUMO
Scientific publications in the life sciences regularly include image data to display and communicate revelations about cellular structure and function. In 2016, a set of guiding principles known as the 'FAIR Data Principles' were put forward to ensure that research data are findable, accessible, interoperable and reproducible. However, challenges still persist regarding the quality, accessibility and interpretability of image data, and how to effectively communicate microscopy data in figures. This Perspective article details a community-driven initiative that aims to promote the accurate and understandable depiction of light microscopy data in publications. The initiative underscores the crucial role of global and diverse scientific communities in advancing the standards in the field of biological images. Additionally, the perspective delves into the historical context of scientific images, in the hope that this look into our past can help ongoing community efforts move forward.
Assuntos
Processamento de Imagem Assistida por Computador , MicroscopiaRESUMO
RT-PCR and northern blots have long been used to study RNA isoforms usage for single genes. Recent advancements in long-read sequencing have yielded unprecedented information about the usage and abundance of these RNA isoforms. However, visualization of long-read sequencing data remains challenging due to the high information density. To alleviate these issues, we have developed NanoBlot, an open-source R-package that generates northern blot and RT-PCR-like images from long-read sequencing data. NanoBlot requires aligned, positionally sorted and indexed BAM files. Plotting is based around ggplot2 and is easily customizable. Advantages of NanoBlot include a robust system for designing probes to visualize isoforms including excluding reads based on the presence or absence of a specified region, an elegant solution to representing isoforms with continuous variations in length, and the ability to overlay multiple genes in the same plot using different colors. We present examples of nanoblots compared to actual northern blot data. In addition to traditional gel-like images, the NanoBlot package can also output other visualizations such as violin plots and 3'-RACE-like plots focused on 3'-end isoforms visualization. The use of the NanoBlot package should provide a simple answer to some of the challenges of visualizing long-read RNA-sequencing data.
Assuntos
Isoformas de RNA , RNA , RNA/genética , Isoformas de RNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética , Processamento Alternativo , Perfilação da Expressão Gênica/métodos , TranscriptomaRESUMO
CLIP technologies are now widely used to study RNA-protein interactions and many data sets are now publicly available. An important first step in CLIP data exploration is the visual inspection and assessment of processed genomic data on selected genes or regions and performing comparisons: either across conditions within a particular project, or incorporating publicly available data. However, the output files produced by data processing pipelines or preprocessed files available to download from data repositories are often not suitable for direct comparison and usually need further processing. Furthermore, to derive biological insight it is usually necessary to visualize a CLIP signal alongside other data such as annotations, or orthogonal functional genomic data (e.g., RNA-seq). We have developed a simple, but powerful, command-line tool: clipplotr, which facilitates these visual comparative and integrative analyses with normalization and smoothing options for CLIP data and the ability to show these alongside reference annotation tracks and functional genomic data. These data can be supplied as input to clipplotr in a range of file formats, which will output a publication quality figure. It is written in R and can both run on a laptop computer independently or be integrated into computational workflows on a high-performance cluster. Releases, source code, and documentation are freely available at https://github.com/ulelab/clipplotr.
Assuntos
Genômica , Software , Genoma , RNA-SeqRESUMO
The emergence of massive datasets exploring the multiple levels of molecular biology has made their analysis and knowledge transfer more complex. Flexible tools to manage big biological datasets could be of great help for standardizing the usage of developed data visualizations and integration methods. Business intelligence (BI) tools have been used in many fields as exploratory tools. They have numerous connectors to link numerous data repositories with a unified graphic interface, offering an overview of data and facilitating interpretation for decision makers. BI tools could be a flexible and user-friendly way of handling molecular biological data with interactive visualizations. However, it is rather uncommon to see such tools used for the exploration of massive and complex datasets in biological fields. We believe that two main obstacles could be the reason. Firstly, we posit that the way to import data into BI tools are not compatible with biological databases. Secondly, BI tools may not be adapted to certain particularities of complex biological data, namely, the size, the variability of datasets and the availability of specialized visualizations. This paper highlights the use of five BI tools (Elastic Kibana, Siren Investigate, Microsoft Power BI, Salesforce Tableau and Apache Superset) onto which the massive data management repository engine called Elasticsearch is compatible. Four case studies will be discussed in which these BI tools were applied on biological datasets with different characteristics. We conclude that the performance of the tools depends on the complexity of the biological questions and the size of the datasets.
Assuntos
Conjuntos de Dados como Assunto , Software , Visualização de DadosRESUMO
Qualitative or quantitative prediction models of structure-activity relationships based on graph neural networks (GNNs) are prevalent in drug discovery applications and commonly have excellently predictive power. However, the network information flows of GNNs are highly complex and accompanied by poor interpretability. Unfortunately, there are relatively less studies on GNN attributions, and their developments in drug research are still at the early stages. In this work, we adopted several advanced attribution techniques for different GNN frameworks and applied them to explain multiple drug molecule property prediction tasks, enabling the identification and visualization of vital chemical information in the networks. Additionally, we evaluated them quantitatively with attribution metrics such as accuracy, sparsity, fidelity and infidelity, stability and sensitivity; discussed their applicability and limitations; and provided an open-source benchmark platform for researchers. The results showed that all attribution techniques were effective, while those directly related to the predicted labels, such as integrated gradient, preferred to have better attribution performance. These attribution techniques we have implemented could be directly used for the vast majority of chemical GNN interpretation tasks.
Assuntos
Benchmarking , Descoberta de Drogas , Humanos , Redes Neurais de Computação , Pesquisadores , Relação Estrutura-AtividadeRESUMO
Recent studies have demonstrated the significant role that circRNA plays in the progression of human diseases. Identifying circRNA-disease associations (CDA) in an efficient manner can offer crucial insights into disease diagnosis. While traditional biological experiments can be time-consuming and labor-intensive, computational methods have emerged as a viable alternative in recent years. However, these methods are often limited by data sparsity and their inability to explore high-order information. In this paper, we introduce a novel method named Knowledge Graph Encoder from Transformer for predicting CDA (KGETCDA). Specifically, KGETCDA first integrates more than 10 databases to construct a large heterogeneous non-coding RNA dataset, which contains multiple relationships between circRNA, miRNA, lncRNA and disease. Then, a biological knowledge graph is created based on this dataset and Transformer-based knowledge representation learning and attentive propagation layers are applied to obtain high-quality embeddings with accurately captured high-order interaction information. Finally, multilayer perceptron is utilized to predict the matching scores of CDA based on their embeddings. Our empirical results demonstrate that KGETCDA significantly outperforms other state-of-the-art models. To enhance user experience, we have developed an interactive web-based platform named HNRBase that allows users to visualize, download data and make predictions using KGETCDA with ease. The code and datasets are publicly available at https://github.com/jinyangwu/KGETCDA.