RESUMO
During limb bud formation, axis polarities are established as evidenced by the spatially restricted expression of key regulator genes. In particular, the mutually antagonistic interaction between the GLI3 repressor and HAND2 results in distinct and non-overlapping anterior-distal Gli3 and posterior Hand2 expression domains. This is a hallmark of the establishment of antero-posterior limb axis polarity, together with spatially restricted expression of homeodomain and other transcriptional regulators. Here, we show that TBX3 is required for establishment of the posterior expression boundary of anterior genes in mouse limb buds. ChIP-seq and differential gene expression analysis of wild-type and mutant limb buds identifies TBX3-specific and shared TBX3-HAND2 target genes. High sensitivity fluorescent whole-mount in situ hybridisation shows that the posterior expression boundaries of anterior genes are positioned by TBX3-mediated repression, which excludes anterior genes such as Gli3, Alx4, Hand1 and Irx3/5 from the posterior limb bud mesenchyme. This exclusion delineates the posterior mesenchymal territory competent to establish the Shh-expressing limb bud organiser. In turn, HAND2 is required for Shh activation and cooperates with TBX3 to upregulate shared posterior identity target genes in early limb buds.
Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos , Regulação da Expressão Gênica no Desenvolvimento , Botões de Extremidades , Proteínas com Domínio T , Animais , Proteínas com Domínio T/metabolismo , Proteínas com Domínio T/genética , Botões de Extremidades/metabolismo , Botões de Extremidades/embriologia , Camundongos , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Proteína Gli3 com Dedos de Zinco/metabolismo , Proteína Gli3 com Dedos de Zinco/genética , Regulação para Cima/genética , Padronização Corporal/genética , Proteínas do Tecido Nervoso/metabolismo , Proteínas do Tecido Nervoso/genética , Proteínas de Homeodomínio/metabolismo , Proteínas de Homeodomínio/genética , Mesoderma/metabolismo , Mesoderma/embriologiaRESUMO
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Assuntos
Algoritmos , Aprendizado de MáquinaRESUMO
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.
Assuntos
COVID-19 , Humanos , Algoritmos , Projetos de Pesquisa , Viés , ProbabilidadeRESUMO
MOTIVATION: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). 'Hierarchy-unaware' classifiers, also known as 'flat' methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while 'hierarchy-aware' approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. RESULTS: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide 'TPR-safe' predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges. AVAILABILITY AND IMPLEMENTATION: Fully tested R code freely available at https://anaconda.org/bioconda/r-hemdag. Tutorial and documentation at https://hemdag.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Biologia Computacional , Ontologia Genética , Biologia Computacional/métodos , Proteínas/metabolismoRESUMO
The cellular response to DNA double-strand breaks (DSBs) is initiated by the Mre11-Rad50-Xrs2 (MRX) complex that has structural and catalytic functions. MRX association at DSBs is counteracted by Rif2, which is known to interact with Rap1 that binds telomeric DNA through two tandem Myb-like domains. Whether and how Rap1 acts at DSBs is unknown. Here we show that Rif2 inhibits MRX association to DSBs in a manner dependent on Rap1, which binds to DSBs and promotes Rif2 association to them. Rap1 in turn can negatively regulate MRX function at DNA ends also independently of Rif2. In fact, a characterization of Rap1 mutant variants shows that Rap1 binding to DNA through both Myb-like domains results in formation of Rap1-DNA complexes that control MRX functions at both DSBs and telomeres primarily through Rif2. By contrast, Rap1 binding to DNA through a single Myb-like domain results in formation of high stoichiometry complexes that act at DNA ends mostly in a Rif2-independent manner. Altogether these findings indicate that the DNA binding modes of Rap1 influence its functional properties, thus highlighting the structural plasticity of this protein.
Assuntos
DNA Fúngico/metabolismo , Complexos Multiproteicos/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Homeostase do Telômero , Proteínas de Ligação a Telômeros/metabolismo , Telômero/metabolismo , Fatores de Transcrição/metabolismo , Alelos , Quebras de DNA de Cadeia Dupla , Dano ao DNA , Modelos Biológicos , Mutação/genética , Ligação Proteica , Saccharomyces cerevisiae/citologia , Complexo Shelterina , Transcrição GênicaRESUMO
Wnt/Fzd signaling has been implicated in hematopoietic stem cell maintenance and in acute leukemia establishment. In our previous work, we described a recurrent rearrangement involving the WNT10B locus (WNT10BR ), characterized by the expression of WNT10BIVS1 transcript variant, in acute myeloid leukemia. To determine the occurrence of WNT10BR in T-cell acute lymphoblastic leukemia (T-ALL), we retrospectively analyzed an Italian cohort of patients (n = 20) and detected a high incidence (13/20) of WNT10BIVS1 expression. To address genes involved in WNT10B molecular response, we have designed a Wnt-targeted RNA sequencing panel. Identifying Wnt agonists and antagonists, it results that the expression of FZD6, LRP5, and PROM1 genes stands out in WNT10BIVS1 positive patients compared to negative ones. Using MOLT4 and MUTZ-2 as leukemic cell models, which are characterized by the expression of WNT10BIVS1 , we have observed that WNT10B drives major Wnt activation to the FZD6 receptor complex through receipt of ligand. Additionally, short hairpin RNAs (shRNAs)-mediated gene silencing and small molecule-mediated inhibition of WNTs secretion have been observed to interfere with the WNT10B/FZD6 interaction. We have therefore identified that WNT10BIVS1 knockdown, or pharmacological interference by the LGK974 porcupine (PORCN) inhibitor, reduces WNT10B/FZD6 protein complex formation and significantly impairs intracellular effectors and leukemic expansion. These results describe the molecular circuit induced by WNT10B and suggest WNT10B/FZD6 as a new target in the T-ALL treatment strategy.
Assuntos
Receptores Frizzled/metabolismo , Regulação Leucêmica da Expressão Gênica , Leucemia-Linfoma Linfoblástico de Células T Precursoras/metabolismo , Proteínas Proto-Oncogênicas/biossíntese , Proteínas Wnt/biossíntese , Via de Sinalização Wnt , Aciltransferases/antagonistas & inibidores , Aciltransferases/genética , Aciltransferases/metabolismo , Feminino , Receptores Frizzled/genética , Células HeLa , Humanos , Masculino , Proteínas de Membrana/antagonistas & inibidores , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Leucemia-Linfoma Linfoblástico de Células T Precursoras/tratamento farmacológico , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/patologia , Proteínas Proto-Oncogênicas/genética , Pirazinas/farmacologia , Piridinas/farmacologia , Proteínas Wnt/genéticaRESUMO
BACKGROUND: One of the main issues in the automated protein function prediction (AFP) problem is the integration of multiple networked data sources. The UNIPred algorithm was thereby proposed to efficiently integrate -in a function-specific fashion- the protein networks by taking into account the imbalance that characterizes protein annotations, and to subsequently predict novel hypotheses about unannotated proteins. UNIPred is publicly available as R code, which might result of limited usage for non-expert users. Moreover, its application requires efforts in the acquisition and preparation of the networks to be integrated. Finally, the UNIPred source code does not handle the visualization of the resulting consensus network, whereas suitable views of the network topology are necessary to explore and interpret existing protein relationships. RESULTS: We address the aforementioned issues by proposing UNIPred-Web, a user-friendly Web tool for the application of the UNIPred algorithm to a variety of biomolecular networks, already supplied by the system, and for the visualization and exploration of protein networks. We support different organisms and different types of networks -e.g., co-expression, shared domains and physical interaction networks. Users are supported in the different phases of the process, ranging from the selection of the networks and the protein function to be predicted, to the navigation of the integrated network. The system also supports the upload of user-defined protein networks. The vertex-centric and the highly interactive approach of UNIPred-Web allow a narrow exploration of specific proteins, and an interactive analysis of large sub-networks with only a few mouse clicks. CONCLUSIONS: UNIPred-Web offers a practical and intuitive (visual) guidance to biologists interested in gaining insights into protein biomolecular functions. UNIPred-Web provides facilities for the integration of networks, and supplies a framework for the imbalance-aware protein network integration of nine organisms, the prediction of thousands of GO protein functions, and a easy-to-use graphical interface for the visual analysis, navigation and interpretation of the integrated networks and of the functional predictions.
Assuntos
Biologia Computacional/métodos , Internet , Mapas de Interação de Proteínas , Proteínas/metabolismo , Software , Algoritmos , Interface Usuário-ComputadorRESUMO
BACKGROUND: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. RESULTS: We propose a novel semi-supervised parallel enhancement of COSNET, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. CONCLUSIONS: By parallelizing COSNET we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.
Assuntos
Algoritmos , Gráficos por Computador , Redes Reguladoras de Genes , Animais , Ontologia Genética , Humanos , Camundongos , Mapas de Interação de Proteínas/genética , Proteínas/genética , Saccharomyces cerevisiae/genética , Fatores de TempoRESUMO
BACKGROUND: The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. RESULTS: We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. CONCLUSIONS: Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
Assuntos
Algoritmos , Ontologias Biológicas , Área Sob a Curva , Estudos de Associação Genética , Humanos , Anotação de Sequência Molecular , Fenótipo , Curva ROCRESUMO
Liver metastases are associated with poor response to current pharmacological treatments, including immunotherapy. We describe a lentiviral vector (LV) platform to selectively engineer liver macrophages, including Kupffer cells and tumor-associated macrophages (TAMs), to deliver type I interferon (IFNα) to liver metastases. Gene-based IFNα delivery delays the growth of colorectal and pancreatic ductal adenocarcinoma liver metastases in mice. Response to IFNα is associated with TAM immune activation, enhanced MHC-II-restricted antigen presentation and reduced exhaustion of CD8+ T cells. Conversely, increased IL-10 signaling, expansion of Eomes CD4+ T cells, a cell type displaying features of type I regulatory T (Tr1) cells, and CTLA-4 expression are associated with resistance to therapy. Targeting regulatory T cell functions by combinatorial CTLA-4 immune checkpoint blockade and IFNα LV delivery expands tumor-reactive T cells, attaining complete response in most mice. These findings support a promising therapeutic strategy with feasible translation to patients with unmet medical need.
Assuntos
Linfócitos T CD8-Positivos , Neoplasias Hepáticas , Humanos , Camundongos , Animais , Antígeno CTLA-4/metabolismo , Microambiente Tumoral/genética , Macrófagos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/terapia , Neoplasias Hepáticas/patologiaRESUMO
High metabolic flexibility is pivotal for the persistence and therapy resistance of acute myeloid leukemia (AML). In 20-30% of AML patients, activating mutations of FLT3, specifically FLT3-ITD, are key therapeutic targets. Here, we investigated the influence of FLT3-ITD on AML metabolism. Nuclear Magnetic Resonance (NMR) profiling showed enhanced reshuffling of pyruvate towards the tricarboxylic acid (TCA) cycle, suggesting an increased activity of the pyruvate dehydrogenase complex (PDC). Consistently, FLT3-ITD-positive cells expressed high levels of PDP1, an activator of the PDC. Combining endogenous tagging of PDP1 with genome-wide CRISPR screens revealed that FLT3-ITD induces PDP1 expression through the RAS signaling axis. PDP1 knockdown resulted in reduced cellular respiration thereby impairing the proliferation of only FLT3-ITD cells. These cells continued to depend on PDP1, even in hypoxic conditions, and unlike FLT3-ITD-negative cells, they exhibited a rapid, PDP1-dependent revival of their respiratory capacity during reoxygenation. Moreover, we show that PDP1 modifies the response to FLT3 inhibition. Upon incubation with the FLT3 tyrosine kinase inhibitor quizartinib (AC220), PDP1 persisted or was upregulated, resulting in a further shift of glucose/pyruvate metabolism towards the TCA cycle. Overexpression of PDP1 enhanced, while PDP1 depletion diminished AC220 resistance in cell lines and peripheral blasts from an AC220-resistant AML patient in vivo. In conclusion, FLT3-ITD assures the expression of PDP1, a pivotal metabolic regulator that enhances oxidative glucose metabolism and drug resistance. Hence, PDP1 emerges as a potentially targetable vulnerability in the management of AML.
Assuntos
Leucemia Mieloide Aguda , Inibidores de Proteínas Quinases , Humanos , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Mutação , Resistencia a Medicamentos Antineoplásicos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Piruvatos/uso terapêutico , Tirosina Quinase 3 Semelhante a fms/genética , Tirosina Quinase 3 Semelhante a fms/uso terapêuticoRESUMO
The visual exploration and analysis of biomolecular networks is of paramount importance for identifying hidden and complex interaction patterns among proteins. Although many tools have been proposed for this task, they are mainly focused on the query and visualization of a single protein with its neighborhood. The global exploration of the entire network and the interpretation of its underlying structure still remains difficult, mainly due to the excessively large size of the biomolecular networks. In this paper we propose a novel multi-resolution representation and exploration approach that exploits hierarchical community detection algorithms for the identification of communities occurring in biomolecular networks. The proposed graphical rendering combines two types of nodes (protein and communities) and three types of edges (protein-protein, community-community, protein-community), and displays communities at different resolutions, allowing the user to interactively zoom in and out from different levels of the hierarchy. Links among communities are shown in terms of relationships and functional correlations among the biomolecules they contain. This form of navigation can be also combined by the user with a vertex centric visualization for identifying the communities holding a target biomolecule. Since communities gather limited-size groups of correlated proteins, the visualization and exploration of complex and large networks becomes feasible on off-the-shelf computer machines. The proposed graphical exploration strategies have been implemented and integrated in UNIPred-Web, a web application that we recently introduced for combining the UNIPred algorithm, able to address both integration and protein function prediction in an imbalance-aware fashion, with an easy to use vertex-centric exploration of the integrated network. The tool has been deeply amended from different standpoints, including the prediction core algorithm. Several tests on networks of different size and connectivity have been conducted to show off the vast potential of our methodology; moreover, enrichment analyses have been performed to assess the biological meaningfulness of detected communities. Finally, a CoV-human network has been embedded in the system, and a corresponding case study presented, including the visualization and the prediction of human host proteins that potentially interact with SARS-CoV2 proteins.