RESUMEN
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
Asunto(s)
Variación Estructural del Genoma/genética , Genómica/métodos , Neoplasias/genética , Inversión Cromosómica/genética , Cromotripsis , Variaciones en el Número de Copia de ADN/genética , Reordenamiento Génico/genética , Genoma Humano/genética , Humanos , Mutación/genética , Secuenciación Completa del Genoma/métodosRESUMEN
Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small-molecule drugs. AI technologies, such as generative chemistry, machine learning, and multiproperty optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.
Asunto(s)
Inteligencia Artificial , Médicos , Animales , Humanos , Reproducibilidad de los Resultados , Descubrimiento de Drogas , TecnologíaRESUMEN
Using a graph representation of RNA structures, we have studied the ensembles of secondary and tertiary graphs two sets of RNA with Monte Carlo simulations. The first consisted of 91 target ribozyme and riboswitch sequences of moderate lengths (< 150 nt) having a variety of secondary, H-type pseudoknots and kissing loop interactions. The second set consisted of 71 more diverse sequences across many RNA families. Using a simple empirical energy model for tertiary interactions and only sequence information for each target as input, the simulations examined how tertiary interactions impact the statistical mechanics of the fold ensembles. The results show that the graphs proliferate enormously when tertiary interactions are possible, producing an entropic driving force for the ensemble to access folds having tertiary structures even though they are overall energetically unfavorable in the energy model. For each of the targets in the two test sets, we assessed the quality of the model and the simulations by examining how well the simulated structures were able to predict the native fold and compared the results to fold predictions from ViennaRNA. Our model generated good or excellent predictions in a large majority of the targets. Overall, this method was able to produce predictions of comparable quality to Vienna, but it outperformed Vienna for structures with H-type pseudoknots. The results suggest that while tertiary interactions are predicated on real-space contacts, their impacts on the folded structure of RNA can be captured by graph space information for sequences of moderate lengths, using a simple tertiary energy model for the loops, the base pairs and base stacks.
RESUMEN
Drug repurposing has emerged as a effective and efficient strategy to identify new treatments for a variety of diseases. One of the most effective approaches for discovering potential new drug candidates involves the utilization of Knowledge Graphs (KGs). This review comprehensively explores some of the most prominent KGs, detailing their structure, data sources, and how they facilitate the repurposing of drugs. In addition to KGs, this paper delves into various artificial intelligence techniques that enhance the process of drug repurposing. These methods not only accelerate the identification of viable drug candidates but also improve the precision of predictions by leveraging complex datasets and advanced algorithms. Furthermore, the importance of explainability in drug repurposing is emphasized. Explainability methods are crucial as they provide insights into the reasoning behind AI-generated predictions, thereby increasing the trustworthiness and transparency of the repurposing process. We will discuss several techniques that can be employed to validate these predictions, ensuring that they are both reliable and understandable.
Asunto(s)
Reposicionamiento de Medicamentos , Reposicionamiento de Medicamentos/métodos , Humanos , Algoritmos , Inteligencia Artificial , Bases de Datos Factuales , Biología Computacional/métodosRESUMEN
Graph learning models have received increasing attention in the computational analysis of single-cell RNA sequencing (scRNA-seq) data. Compared with conventional deep neural networks, graph neural networks and language models have exhibited superior performance by extracting graph-structured data from raw gene count matrices. Established deep neural network-based clustering approaches generally focus on temporal expression patterns while ignoring inherent interactions at gene-level as well as cell-level, which could be regarded as spatial dynamics in single-cell data. Both gene-gene and cell-cell interactions are able to boost the performance of cell type detection, under the framework of multi-view modeling. In this study, spatiotemporal embedding and cell graphs are extracted to capture spatial dynamics at the molecular level. In order to enhance the accuracy of cell type detection, this study proposes the scHybridBERT architecture to conduct multi-view modeling of scRNA-seq data using extracted spatiotemporal patterns. In this scHybridBERT method, graph learning models are employed to deal with cell graphs and the Performer model employs spatiotemporal embeddings. Experimental outcomes about benchmark scRNA-seq datasets indicate that the proposed scHybridBERT method is able to enhance the accuracy of single-cell clustering tasks by integrating spatiotemporal embeddings and cell graphs.
Asunto(s)
Benchmarking , Regulación de la Expresión Génica , Comunicación Celular , Análisis por Conglomerados , AprendizajeRESUMEN
Recently, graph neural network (GNN)-based algorithms were proposed to solve a variety of combinatorial optimization problems [M. J. Schuetz, J. K. Brubaker, H. G. Katzgraber, Nat. Mach. Intell.4, 367-377 (2022)]. GNN was tested in particular on randomly generated instances of these problems. The publication [M. J. Schuetz, J. K. Brubaker, H. G. Katzgraber, Nat. Mach. Intell.4, 367-377 (2022)] stirred a debate whether the GNN-based method was adequately benchmarked against best prior methods. In particular, critical commentaries [M. C. Angelini, F. Ricci-Tersenghi, Nat. Mach. Intell.5, 29-31 (2023)] and [S. Boettcher, Nat. Mach. Intell.5, 24-25 (2023)] point out that a simple greedy algorithm performs better than the GNN. We do not intend to discuss the merits of arguments and counterarguments in these papers. Rather, in this note, we establish a fundamental limitation for running GNN on random instances considered in these references, for a broad range of choices of GNN architecture. Specifically, these barriers hold when the depth of GNN does not scale with graph size (we note that depth 2 was used in experiments in [M. J. Schuetz, J. K. Brubaker, H. G. Katzgraber, Nat. Mach. Intell.4, 367-377 (2022)]), and importantly, these barriers hold regardless of any other parameters of GNN architecture. These limitations arise from the presence of the overlap gap property (OGP) phase transition, which is a barrier for many algorithms, including importantly local algorithms, of which GNN is an example. At the same time, some algorithms known prior to the introduction of GNN provide best results for these problems up to the OGP phase transition. This leaves very little space for GNN to outperform the known algorithms, and based on this, we side with the conclusions made in [M. C. Angelini, F. Ricci-Tersenghi, Nat. Mach. Intell.5, 29-31 (2023)] and [S. Boettcher, Nat. Mach. Intell.5, 24-25 (2023)].
RESUMEN
The frameshifting RNA element (FSE) in coronaviruses (CoVs) regulates the programmed -1 ribosomal frameshift (-1 PRF) mechanism common to many viruses. The FSE is of particular interest as a promising drug candidate. Its associated pseudoknot or stem loop structure is thought to play a large role in frameshifting and thus viral protein production. To investigate the FSE structural evolution, we use our graph theory-based methods for representing RNA secondary structures in the RNA-As-Graphs (RAG) framework to calculate conformational landscapes of viral FSEs with increasing sequence lengths for representative 10 Alpha and 13 Beta-CoVs. By following length-dependent conformational changes, we show that FSE sequences encode many possible competing stems which in turn favor certain FSE topologies, including a variety of pseudoknots, stem loops, and junctions. We explain alternative competing stems and topological FSE changes by recurring patterns of mutations. At the same time, FSE topology robustness can be understood by shifted stems within different sequence contexts and base pair coevolution. We further propose that the topology changes reflected by length-dependent conformations contribute to tuning the frameshifting efficiency. Our work provides tools to analyze virus sequence/structure correlations, explains how sequence and FSE structure have evolved for CoVs, and provides insights into potential mutations for therapeutic applications against a broad spectrum of CoV FSEs by targeting key sequence/structural transitions.
Asunto(s)
Infecciones por Coronavirus , Coronavirus , Humanos , ARN Viral/metabolismo , Coronavirus/genética , Coronavirus/metabolismo , Secuencia de Bases , Conformación de Ácido Nucleico , Sistema de Lectura Ribosómico/genética , Infecciones por Coronavirus/genéticaRESUMEN
As population genetics data increases in size new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient, but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks (CNNs) applied to population genetic alignments. To better utilize these new data structures we propose and implement a graph convolutional network (GCN) to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard CNN approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a GCN approach and can be used to perform well on these common population genetics inference tasks with accuracies roughly matching or even exceeding that of a CNN-based method. As tree sequences become more widely used in population genetics research we foresee developments and optimizations of this work to provide a foundation for population genetics inference moving forward.
RESUMEN
In recent years, knowledge graphs (KGs) have gained a great deal of popularity as a tool for storing relationships between entities and for performing higher level reasoning. KGs in biomedicine and clinical practice aim to provide an elegant solution for diagnosing and treating complex diseases more efficiently and flexibly. Here, we provide a systematic review to characterize the state-of-the-art of KGs in the area of complex disease research. We cover the following topics: (1) knowledge sources, (2) entity extraction methods, (3) relation extraction methods and (4) the application of KGs in complex diseases. As a result, we offer a complete picture of the domain. Finally, we discuss the challenges in the field by identifying gaps and opportunities for further research and propose potential research directions of KGs for complex disease diagnosis and treatment.
Asunto(s)
Reconocimiento de Normas Patrones AutomatizadasRESUMEN
How cooperation emerges in human societies is both an evolutionary enigma and a practical problem with tangible implications for societal health. Population structure has long been recognized as a catalyst for cooperation because local interactions facilitate reciprocity. Analysis of population structure typically assumes bidirectional social interactions. But human social interactions are often unidirectional-where one individual has the opportunity to contribute altruistically to another, but not conversely-as the result of organizational hierarchies, social stratification, popularity effects, and endogenous mechanisms of network growth. Here we expand the theory of cooperation in structured populations to account for both uni- and bidirectional social interactions. Even though unidirectional interactions remove the opportunity for reciprocity, we find that cooperation can nonetheless be favored in directed social networks and that cooperation is provably maximized for networks with an intermediate proportion of unidirectional interactions, as observed in many empirical settings. We also identify two simple structural motifs that allow efficient modification of interaction directions to promote cooperation by orders of magnitude. We discuss how our results relate to the concepts of generalized and indirect reciprocity.
Asunto(s)
Conducta Cooperativa , Modelos Teóricos , Interacción Social , Red Social , HumanosRESUMEN
BACKGROUND: Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. RESULTS: Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. CONCLUSIONS: Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.
Asunto(s)
Proteínas , Proteínas/química , Proteínas/metabolismo , Descubrimiento de Drogas/métodos , Biología Computacional/métodosRESUMEN
BACKGROUND: Accurately identifying drug-target affinity (DTA) plays a pivotal role in drug screening, design, and repurposing in pharmaceutical industry. It not only reduces the time, labor, and economic costs associated with biological experiments but also expedites drug development process. However, achieving the desired level of computational accuracy for DTA identification methods remains a significant challenge. RESULTS: We proposed a novel multi-view-based graph deep model known as MvGraphDTA for DTA prediction. MvGraphDTA employed a graph convolutional network (GCN) to extract the structural features from original graphs of drugs and targets, respectively. It went a step further by constructing line graphs with edges as vertices based on original graphs of drugs and targets. GCN was also used to extract the relationship features within their line graphs. To enhance the complementarity between the extracted features from original graphs and line graphs, MvGraphDTA fused the extracted multi-view features of drugs and targets, respectively. Finally, these fused features were concatenated and passed through a fully connected (FC) network to predict DTA. CONCLUSIONS: During the experiments, we performed data augmentation on all the training sets used. Experimental results showed that MvGraphDTA outperformed the competitive state-of-the-art methods on benchmark datasets for DTA prediction. Additionally, we evaluated the universality and generalization performance of MvGraphDTA on additional datasets. Experimental outcomes revealed that MvGraphDTA exhibited good universality and generalization capability, making it a reliable tool for drug-target interaction prediction.
Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Biología Computacional/métodos , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismoRESUMEN
BACKGROUND: The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling. RESULTS: In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph. We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase. CONCLUSION: We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs. In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.
Asunto(s)
SARS-CoV-2 , SARS-CoV-2/genética , Algoritmos , Genoma Viral , Alineación de Secuencia/métodos , Genómica/métodos , COVID-19/virología , Programación Lineal , HumanosRESUMEN
The three-dimensional structure of the human genome has been proven to have a significant functional impact on gene expression. The high-order spatial chromatin is organised first by looping mediated by multiple protein factors, and then it is further formed into larger structures of topologically associated domains (TADs) or chromatin contact domains (CCDs), followed by A/B compartments and finally the chromosomal territories (CTs). The genetic variation observed in human population influences the multi-scale structures, posing a question regarding the functional impact of structural variants reflected by the variability of the genes expression patterns. The current methods of evaluating the functional effect include eQTLs analysis which uses statistical testing of influence of variants on spatially close genes. Rarely, non-coding DNA sequence changes are evaluated by their impact on the biomolecular interaction network (BIN) reflecting the cellular interactome that can be analysed by the classical graph-theoretic algorithms. Therefore, in the second part of the review, we introduce the concept of BIN, i.e. a meta-network model of the complete molecular interactome developed by integrating various biological networks. The BIN meta-network model includes DNA-protein binding by the plethora of protein factors as well as chromatin interactions, therefore allowing connection of genomics with the downstream biomolecular processes present in a cell. As an illustration, we scrutinise the chromatin interactions mediated by the CTCF protein detected in a ChIA-PET experiment in the human lymphoblastoid cell line GM12878. In the corresponding BIN meta-network the DNA spatial proximity is represented as a graph model, combined with the Proteins-Interaction Network (PIN) of human proteome using the Gene Association Network (GAN). Furthermore, we enriched the BIN with the signalling and metabolic pathways and Gene Ontology (GO) terms to assert its functional context. Finally, we mapped the Single Nucleotide Polymorphisms (SNPs) from the GWAS studies and identified the chromatin mutational hot-spots associated with a significant enrichment of SNPs related to autoimmune diseases. Afterwards, we mapped Structural Variants (SVs) from healthy individuals of 1000 Genomes Project and identified an interesting example of the missing protein complex associated with protein Q6GYQ0 due to a deletion on chromosome 14. Such an analysis using the meta-network BIN model is therefore helpful in evaluating the influence of genetic variation on spatial organisation of the genome and its functional effect in a cell.
Asunto(s)
Cromatina/metabolismo , Genoma Humano/genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mapas de Interacción de Proteínas/genética , HumanosRESUMEN
Most proteins exert their functions by interacting with other proteins, making the identification of protein-protein interactions (PPI) crucial for understanding biological activities, pathological mechanisms, and clinical therapies. Developing effective and reliable computational methods for predicting PPI can significantly reduce the time-consuming and labor-intensive associated traditional biological experiments. However, accurately identifying the specific categories of protein-protein interactions and improving the prediction accuracy of the computational methods remain dual challenges. To tackle these challenges, we proposed a novel graph neural network method called GNNGL-PPI for multi-category prediction of PPI based on global graphs and local subgraphs. GNNGL-PPI consisted of two main components: using Graph Isomorphism Network (GIN) to extract global graph features from PPI network graph, and employing GIN As Kernel (GIN-AK) to extract local subgraph features from the subgraphs of protein vertices. Additionally, considering the imbalanced distribution of samples in each category within the benchmark datasets, we introduced an Asymmetric Loss (ASL) function to further enhance the predictive performance of the method. Through evaluations on six benchmark test sets formed by three different dataset partitioning algorithms (Random, BFS, DFS), GNNGL-PPI outperformed the state-of-the-art multi-category prediction methods of PPI, as measured by the comprehensive performance evaluation metric F1-measure. Furthermore, interpretability analysis confirmed the effectiveness of GNNGL-PPI as a reliable multi-category prediction method for predicting protein-protein interactions.
Asunto(s)
Algoritmos , Biología Computacional , Redes Neurales de la Computación , Mapeo de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Biología Computacional/métodos , Mapas de Interacción de Proteínas , Humanos , Proteínas/metabolismoRESUMEN
How do we construct our causal directed acyclic graphs (DAGs)-for example, for life-course modeling and analysis? In this commentary, I review how the data-driven construction of causal DAGs (causal discovery) has evolved, what promises it holds, and what limitations or caveats must be considered. I find that expert- or theory-driven model-building might benefit from some more checking against the data and that causal discovery could bring new ideas to old theories.
Asunto(s)
Causalidad , Humanos , Modelos Estadísticos , Interpretación Estadística de Datos , Métodos EpidemiológicosRESUMEN
Deterministic variables are variables that are functionally determined by one or more parent variables. They commonly arise when a variable has been functionally created from one or more parent variables, as with derived variables, and in compositional data, where the 'whole' variable is determined from its 'parts'. This article introduces how deterministic variables may be depicted within directed acyclic graphs (DAGs) to help with identifying and interpreting causal effects involving derived variables and/or compositional data. We propose a two-step approach in which all variables are initially considered, and a choice is made whether to focus on the deterministic variable or its determining parents. Depicting deterministic variables within DAGs brings several benefits. It is easier to identify and avoid misinterpreting tautological associations, i.e., self-fulfilling associations between deterministic variables and their parents, or between sibling variables with shared parents. In compositional data, it is easier to understand the consequences of conditioning on the 'whole' variable, and correctly identify total and relative causal effects. For derived variables, it encourages greater consideration of the target estimand and greater scrutiny of the consistency and exchangeability assumptions. DAGs with deterministic variables are a useful aid for planning and interpreting analyses involving derived variables and/or compositional data.
RESUMEN
When analyzing a selected sample from a general population, selection bias can arise relative to the causal average treatment effect (ATE) for the general population, and also relative to the ATE for the selected sample itself. We provide simple graphical rules that indicate: (1) if a selected-sample analysis will be unbiased for each ATE; (2) whether adjusting for certain covariates could eliminate selection bias. The rules can easily be checked in a standard single-world intervention graph. When the treatment could affect selection, a third estimand of potential scientific interest is the "net treatment difference", namely the net change in outcomes that would occur for the selected sample if all members of the general population were treated versus not treated, including any effects of the treatment on which individuals are in the selected sample . We provide graphical rules for this estimand as well. We decompose bias in a selected-sample analysis relative to the general-population ATE into: (1) "internal bias" relative to the net treatment difference; (2) "net-external bias", a discrepancy between the net treatment difference and the general-population ATE. Each bias can be assessed unambiguously via a distinct graphical rule, providing new conceptual insight into the mechanisms by which certain causal structures produce selection bias.
RESUMEN
BACKGROUND: With growing interest in causal inference and machine learning among epidemiologists, there is increasing discussion of causal discovery algorithms for guiding covariate selection. We present a case study of novice application of causal discovery tools and attempt to validate the results against a well-established causal relationship. METHODS: As a case study, we attempted causal discovery of relationships relevant to the effect of adherence on mortality in the placebo arm of the Coronary Drug Project (CDP) dataset. We used four algorithms available as existing software implementations and varied several model inputs. RESULTS: We identified 15 adjustment sets from 17 model parameterizations. When applied to a baseline covariate adjustment analysis, these 15 adjustment sets returned effect estimates with similar magnitude and direction of bias as prior published results. When using methods to control for time-varying confounding, there was generally more residual bias than compared to expert-selected adjustment sets. CONCLUSION: Although causal discovery algorithms can perform on par with expert knowledge, we do not recommend novice use of causal discovery without the input of experts in causal discovery. Expert support is recommended to aid in choosing the algorithm, selecting input parameters, assessing underlying assumptions, and finalizing selection of the adjustment variables.
RESUMEN
Killer immunoglobulin-like receptor (KIR) and KIR-ligand (KIRL) interactions play an important role in natural killer cell-mediated effects after haematopoietic stem cell transplantation (HCT). Previous work has shown that accounting for known KIR-KIRL interactions may identify donors with optimal NK cell-mediated alloreactivity in the adult transplant setting. Paediatric acute leukaemia patients were retrospectively analysed, and KIR-KIRL combinations and maximal inhibitory KIR ligand (IM-KIR) scores were determined. Clinical outcomes were examined using a series of graphs depicting clinical events and endpoints. The graph methodology demonstrated that prognostic variables significant in the occurrence of specific clinical endpoints remained significant for relevant downstream events. KIR-KIRL combinations were significantly predictive for reduced grade 3-4 aGVHD likelihood, in patients transplanted with increased inhibitory KIR gene content and IM-KIR = 5 scores. Improvements were also observed in associated outcomes for both ALL and AML patients, including relapse-free survival, GRFS and overall survival. This study demonstrates that NK cell KIR HLA interactions may be relevant to the paediatric acute leukaemia transplant setting. Reduction in aGVHD suggests KIR effects may extend beyond NK cells. Moving forward clinical trials utilizing donors with a higher iKIR should be considered for URD HCT in paediatric recipients with acute leukaemia to optimize clinical outcomes.