RESUMO
Dynamic cellular processes such as differentiation are driven by changes in the abundances of transcription factors (TFs). However, despite years of studies, our knowledge about the protein copy number of TFs in the nucleus is limited. Here, by determining the absolute abundances of 103 TFs and co-factors during the course of human erythropoiesis, we provide a dynamic and quantitative scale for TFs in the nucleus. Furthermore, we establish the first gene regulatory network of cell fate commitment that integrates temporal protein stoichiometry data with mRNA measurements. The model revealed quantitative imbalances in TFs' cross-antagonistic relationships that underlie lineage determination. Finally, we made the surprising discovery that, in the nucleus, co-repressors are dramatically more abundant than co-activators at the protein level, but not at the RNA level, with profound implications for understanding transcriptional regulation. These analyses provide a unique quantitative framework to understand transcriptional regulation of cell differentiation in a dynamic context.
Assuntos
Eritropoese/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição/genética , Bases de Dados Factuais , Regulação da Expressão Gênica/genética , Hematopoese/genética , Humanos , Proteômica/métodos , Fatores de Transcrição/análise , Fatores de Transcrição/metabolismoRESUMO
The remarkable advances of artificial intelligence (AI) technology are revolutionizing established approaches to the acquisition, interpretation, and analysis of biomedical imaging data. Development, validation, and continuous refinement of AI tools requires easy access to large high-quality annotated datasets, which are both representative and diverse. The National Cancer Institute (NCI) Imaging Data Commons (IDC) hosts large and diverse publicly available cancer image data collections. By harmonizing all data based on industry standards and colocalizing it with analysis and exploration resources, the IDC aims to facilitate the development, validation, and clinical translation of AI tools and address the well-documented challenges of establishing reproducible and transparent AI processing pipelines. Balanced use of established commercial products with open-source solutions, interconnected by standard interfaces, provides value and performance, while preserving sufficient agility to address the evolving needs of the research community. Emphasis on the development of tools, use cases to demonstrate the utility of uniform data representation, and cloud-based analysis aim to ease adoption and help define best practices. Integration with other data in the broader NCI Cancer Research Data Commons infrastructure opens opportunities for multiomics studies incorporating imaging data to further empower the research community to accelerate breakthroughs in cancer detection, diagnosis, and treatment. Published under a CC BY 4.0 license.
Assuntos
Inteligência Artificial , Neoplasias , Estados Unidos , Humanos , National Cancer Institute (U.S.) , Reprodutibilidade dos Testes , Diagnóstico por Imagem , Multiômica , Neoplasias/diagnóstico por imagemRESUMO
T-cell development from hematopoietic progenitors depends on multiple transcription factors, mobilized and modulated by intrathymic Notch signaling. Key aspects of T-cell specification network architecture have been illuminated through recent reports defining roles of transcription factors PU.1, GATA-3, and E2A, their interactions with Notch signaling, and roles of Runx1, TCF-1, and Hes1, providing bases for a comprehensively updated model of the T-cell specification gene regulatory network presented herein. However, the role of lineage commitment factor Bcl11b has been unclear. We use self-organizing maps on 63 RNA-seq datasets from normal and perturbed T-cell development to identify functional targets of Bcl11b during commitment and relate them to other regulomes. We show that both activation and repression target genes can be bound by Bcl11b in vivo, and that Bcl11b effects overlap with E2A-dependent effects. The newly clarified role of Bcl11b distinguishes discrete components of commitment, resolving how innate lymphoid, myeloid, and dendritic, and B-cell fate alternatives are excluded by different mechanisms.
Assuntos
Diferenciação Celular/genética , Redes Reguladoras de Genes , Proteínas Repressoras/fisiologia , Linfócitos T/citologia , Proteínas Supressoras de Tumor/fisiologia , Animais , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Receptores Notch , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Transdução de Sinais , Proteínas Supressoras de Tumor/genética , Proteínas Supressoras de Tumor/metabolismoRESUMO
The resilience of Mycobacterium tuberculosis (MTB) is largely due to its ability to effectively counteract and even take advantage of the hostile environments of a host. In order to accelerate the discovery and characterization of these adaptive mechanisms, we have mined a compendium of 2325 publicly available transcriptome profiles of MTB to decipher a predictive, systems-scale gene regulatory network model. The resulting modular organization of 98% of all MTB genes within this regulatory network was rigorously tested using two independently generated datasets: a genome-wide map of 7248 DNA-binding locations for 143 transcription factors (TFs) and global transcriptional consequences of overexpressing 206 TFs. This analysis has discovered specific TFs that mediate conditional co-regulation of genes within 240 modules across 14 distinct environmental contexts. In addition to recapitulating previously characterized regulons, we discovered 454 novel mechanisms for gene regulation during stress, cholesterol utilization and dormancy. Significantly, 183 of these mechanisms act uniquely under conditions experienced during the infection cycle to regulate diverse functions including 23 genes that are essential to host-pathogen interactions. These and other insights underscore the power of a rational, model-driven approach to unearth novel MTB biology that operates under some but not all phases of infection.
Assuntos
Regulação Bacteriana da Expressão Gênica , Redes Reguladoras de Genes , Mycobacterium tuberculosis/genética , Colesterol/metabolismo , Perfilação da Expressão Gênica , Genoma Bacteriano , Modelos Genéticos , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
The exchange of large and complex slide microscopy imaging data in biomedical research and pathology practice is impeded by a lack of data standardization and interoperability, which is detrimental to the reproducibility of scientific findings and clinical integration of technological innovations. We introduce Slim, an open-source, web-based slide microscopy viewer that implements the internationally accepted Digital Imaging and Communications in Medicine (DICOM) standard to achieve interoperability with a multitude of existing medical imaging systems. We showcase the capabilities of Slim as the slide microscopy viewer of the NCI Imaging Data Commons and demonstrate how the viewer enables interactive visualization of traditional brightfield microscopy and highly-multiplexed immunofluorescence microscopy images from The Cancer Genome Atlas and Human Tissue Atlas Network, respectively, using standard DICOMweb services. We further show how Slim enables the collection of standardized image annotations for the development or validation of machine learning models and the visual interpretation of model inference results in the form of segmentation masks, spatial heat maps, or image-derived measurements.
Assuntos
Ciência de Dados , Microscopia , Humanos , Microscopia/métodos , Reprodutibilidade dos TestesRESUMO
BACKGROUND AND OBJECTIVES: Reproducibility is a major challenge in developing machine learning (ML)-based solutions in computational pathology (CompPath). The NCI Imaging Data Commons (IDC) provides >120 cancer image collections according to the FAIR principles and is designed to be used with cloud ML services. Here, we explore its potential to facilitate reproducibility in CompPath research. METHODS: Using the IDC, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets. To assess reproducibility, the experiments were run multiple times with separate but identically configured instances of common ML services. RESULTS: The results of different runs of the same experiment were reproducible to a large extent. However, we observed occasional, small variations in AUC values, indicating a practical limit to reproducibility. CONCLUSIONS: We conclude that the IDC facilitates approaching the reproducibility limit of CompPath research (i) by enabling researchers to reuse exactly the same datasets and (ii) by integrating with cloud ML services so that experiments can be run in identically configured computing environments.
Assuntos
Neoplasias Pulmonares , Software , Humanos , Reprodutibilidade dos Testes , Computação em Nuvem , Diagnóstico por Imagem , Neoplasias Pulmonares/diagnóstico por imagemRESUMO
BACKGROUND: The analysis of large, complex networks is an important aspect of ongoing biological research. Yet there is a need for entirely new, scalable approaches for network visualization that can provide more insight into the structure and function of these complex networks. RESULTS: To address this need, we have developed a software tool named BioFabric, which uses a novel network visualization technique that depicts nodes as one-dimensional horizontal lines arranged in unique rows. This is in distinct contrast to the traditional approach that represents nodes as discrete symbols that behave essentially as zero-dimensional points. BioFabric then depicts each edge in the network using a vertical line assigned to its own unique column, which spans between the source and target rows, i.e. nodes. This method of displaying the network allows a full-scale view to be organized in a rational fashion; interesting network structures, such as sets of nodes with similar connectivity, can be quickly scanned and visually identified in the full network view, even in networks with well over 100,000 edges. This approach means that the network is being represented as a fundamentally linear, sequential entity, where the horizontal scroll bar provides the basic navigation tool for browsing the entire network. CONCLUSIONS: BioFabric provides a novel and powerful way of looking at any size of network, including very large networks, using horizontal lines to represent nodes and vertical lines to represent edges. It is freely available as an open-source Java application.
Assuntos
Gráficos por Computador , Redes Reguladoras de Genes , Redes e Vias Metabólicas , SoftwareRESUMO
Choice of a T lymphoid fate by hematopoietic progenitor cells depends on sustained Notch-Delta signaling combined with tightly regulated activities of multiple transcription factors. To dissect the regulatory network connections that mediate this process, we have used high-resolution analysis of regulatory gene expression trajectories from the beginning to the end of specification, tests of the short-term Notch dependence of these gene expression changes, and analyses of the effects of overexpression of two essential transcription factors, namely PU.1 and GATA-3. Quantitative expression measurements of >50 transcription factor and marker genes have been used to derive the principal components of regulatory change through which T cell precursors progress from primitive multipotency to T lineage commitment. Our analyses reveal separate contributions of Notch signaling, GATA-3 activity, and down-regulation of PU.1. Using BioTapestry (www.BioTapestry.org), the results have been assembled into a draft gene regulatory network for the specification of T cell precursors and the choice of T as opposed to myeloid/dendritic or mast-cell fates. This network also accommodates effects of E proteins and mutual repression circuits of Gfi1 against Egr-2 and of TCF-1 against PU.1 as proposed elsewhere, but requires additional functions that remain unidentified. Distinctive features of this network structure include the intense dose dependence of GATA-3 effects, the gene-specific modulation of PU.1 activity based on Notch activity, the lack of direct opposition between PU.1 and GATA-3, and the need for a distinct, late-acting repressive function or functions to extinguish stem and progenitor-derived regulatory gene expression.
Assuntos
Fator de Transcrição GATA3/genética , Redes Reguladoras de Genes , Linfopoese/genética , Proteínas Proto-Oncogênicas/genética , Linfócitos T/citologia , Transativadores/genética , Animais , Regulação da Expressão Gênica , Células-Tronco Hematopoéticas/citologia , Camundongos , Receptores Notch , Fatores de TranscriçãoRESUMO
The National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) aims to establish a national cloud-based data science infrastructure. Imaging Data Commons (IDC) is a new component of CRDC supported by the Cancer Moonshot. The goal of IDC is to enable a broad spectrum of cancer researchers, with and without imaging expertise, to easily access and explore the value of deidentified imaging data and to support integrated analyses with nonimaging data. We achieve this goal by colocating versatile imaging collections with cloud-based computing resources and data exploration, visualization, and analysis tools. The IDC pilot was released in October 2020 and is being continuously populated with radiology and histopathology collections. IDC provides access to curated imaging collections, accompanied by documentation, a user forum, and a growing number of analysis use cases that aim to demonstrate the value of a data commons framework applied to cancer imaging research. SIGNIFICANCE: This study introduces NCI Imaging Data Commons, a new repository of the NCI Cancer Research Data Commons, which will support cancer imaging research on the cloud.
Assuntos
Diagnóstico por Imagem/métodos , National Cancer Institute (U.S.) , Neoplasias/diagnóstico por imagem , Neoplasias/genética , Pesquisa Biomédica/tendências , Computação em Nuvem , Biologia Computacional/métodos , Gráficos por Computador , Segurança Computacional , Interpretação Estatística de Dados , Bases de Dados Factuais , Diagnóstico por Imagem/normas , Humanos , Processamento de Imagem Assistida por Computador , Projetos Piloto , Linguagens de Programação , Radiologia/métodos , Radiologia/normas , Reprodutibilidade dos Testes , Software , Estados Unidos , Interface Usuário-ComputadorRESUMO
The current gene regulatory network (GRN) for the sea urchin embryo pertains to pregastrular specification functions in the endomesodermal territories. Here we extend gene regulatory network analysis to the adjacent oral and aboral ectoderm territories over the same period. A large fraction of the regulatory genes predicted by the sea urchin genome project and shown in ancillary studies to be expressed in either oral or aboral ectoderm by 24 h are included, though universally expressed and pan-ectodermal regulatory genes are in general not. The loci of expression of these genes have been determined by whole mount in situ hybridization. We have carried out a global perturbation analysis in which expression of each gene was interrupted by introduction of morpholino antisense oligonucleotide, and the effects on all other genes were measured quantitatively, both by QPCR and by a new instrumental technology (NanoString Technologies nCounter Analysis System). At its current stage the network model, built in BioTapestry, includes 22 genes encoding transcription factors, 4 genes encoding known signaling ligands, and 3 genes that are yet unknown but are predicted to perform specific roles. Evidence emerged from the analysis pointing to distinctive subcircuit features observed earlier in other parts of the GRN, including a double negative transcriptional regulatory gate, and dynamic state lockdowns by feedback interactions. While much of the regulatory apparatus is downstream of Nodal signaling, as expected from previous observations, there are also cohorts of independently activated oral and aboral ectoderm regulatory genes, and we predict yet unidentified signaling interactions between oral and aboral territories.
Assuntos
Ectoderma/metabolismo , Modelos Biológicos , Ouriços-do-Mar/embriologia , Animais , Clonagem Molecular , Hibridização In Situ , Oligonucleotídeos Antissenso/genética , Reação em Cadeia da Polimerase , Ouriços-do-Mar/genéticaRESUMO
Genetic regulatory networks (GRNs) are complex, large-scale, and spatially and temporally distributed. These characteristics impose challenging demands on software tools for building GRN models, and so there is a need for custom tools. In this paper, we report on our ongoing development of BioTapestry, an open source, freely available computational tool designed specifically for building GRN models. We also outline our future development plans, and give some examples of current applications of BioTapestry.
Assuntos
Biologia Computacional , Documentação , Redes Reguladoras de Genes , Simulação por Computador , Modelos Biológicos , SoftwareRESUMO
Gene regulatory networks (GRNs) control embryonic development, and to understand this process in depth, researchers need to have a detailed understanding of both the network architecture and its dynamic evolution over time and space. Interactive visualization tools better enable researchers to conceptualize, understand, and share GRN models. BioTapestry is an established application designed to fill this role, and recent enhancements released in Versions 6 and 7 have targeted two major facets of the program. First, we introduced significant improvements for network drawing and automatic layout that have now made it much easier for the user to create larger, more organized network drawings. Second, we revised the program architecture so it could continue to support the current Java desktop Editor program, while introducing a new BioTapestry GRN Viewer that runs as a JavaScript web application in a browser. We have deployed a number of GRN models using this new web application. These improvements will ensure that BioTapestry remains viable as a research tool in the face of the continuing evolution of web technologies, and as our understanding of GRN models grows.
RESUMO
BioTapestry is an open source, freely available software tool that has been developed to handle the -challenges of modeling genetic regulatory networks (GRNs). Using BioTapestry, a researcher can -construct a network model and use it to visualize and understand the dynamic behavior of a complex, spatially and temporally distributed GRN. Here we provide a step-by-step example of a way to use BioTapestry to build a GRN model and discuss some common issues that can arise during this process.
Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Software , Modelos Genéticos , Biologia de SistemasRESUMO
Sonic hedgehog (Shh) acts as a morphogen to mediate the specification of distinct cell identities in the ventral neural tube through a Gli-mediated (Gli1-3) transcriptional network. Identifying Gli targets in a systematic fashion is central to the understanding of the action of Shh. We examined this issue in differentiating neural progenitors in mouse. An epitope-tagged Gli-activator protein was used to directly isolate cis-regulatory sequences by chromatin immunoprecipitation (ChIP). ChIP products were then used to screen custom genomic tiling arrays of putative Hedgehog (Hh) targets predicted from transcriptional profiling studies, surveying 50-150 kb of non-transcribed sequence for each candidate. In addition to identifying expected Gli-target sites, the data predicted a number of unreported direct targets of Shh action. Transgenic analysis of binding regions in Nkx2.2, Nkx2.1 (Titf1) and Rab34 established these as direct Hh targets. These data also facilitated the generation of an algorithm that improved in silico predictions of Hh target genes. Together, these approaches provide significant new insights into both tissue-specific and general transcriptional targets in a crucial Shh-mediated patterning process.
Assuntos
Padronização Corporal , Genoma , Proteínas Hedgehog/fisiologia , Fatores de Transcrição Kruppel-Like/genética , Fatores de Transcrição Kruppel-Like/fisiologia , Neurônios/metabolismo , Motivos de Aminoácidos , Animais , Imunoprecipitação da Cromatina , Epitopos/metabolismo , Perfilação da Expressão Gênica , Proteínas Hedgehog/metabolismo , Proteína Homeobox Nkx-2.2 , Camundongos , Camundongos Transgênicos , Células NIH 3T3 , Neurônios/citologia , Células-Tronco/citologia , Proteína GLI1 em Dedos de ZincoRESUMO
Developmental genetic regulatory networks (GRNs) have unique architectural characteristics. They are typically large-scale, multi-layered, and organized in a nested, modular hierarchy of regulatory network kernels, function-specific building blocks, and structural gene batteries. They are also inherently multicellular and involve changing topological relationships among a growing number of cells. Reconstruction of developmental GRNs requires unique computational tools that support the above representational requirements. In addition, we argue that DNA-centered network modeling, separate descriptions of network organization and network behavior, and support for network documentation and annotation are essential requirements for computational modeling of developmental GRNs. Based on these observations, we have developed a freely available, platform-independent, open source software package (BioTapestry) which supports both the process of model construction and also model visualization, analysis, documentation, and dissemination. We provide an overview of the main features of BioTapestry. The BioTapestry software and additional documents are available from http://www.biotapestry.org. We recommend BioTapestry as the substrate for further co-development for and by the developmental biology community.