RESUMEN
B cell epitope prediction methods are separated into linear sequence-based predictors and conformational epitope predictions that typically use the measured or predicted protein structure. Most linear predictions rely on the translation of the sequence to biologically based representations and the applications of machine learning on these representations. We here present CALIBER 'Conformational And LInear B cell Epitopes pRediction', and show that a bidirectional long short-term memory with random projection produces a more accurate prediction (test set AUC=0.789) than all current linear methods. The same predictor when combined with an Evolutionary Scale Modeling-2 projection also improves on the state of the art in conformational epitopes (AUC = 0.776). The inclusion of the graph of the 3D distances between residues did not increase the prediction accuracy. However, the long-range sequence information was essential for high accuracy. While the same model structure was applicable for linear and conformational epitopes, separate training was required for each. Combining the two slightly increased the linear accuracy (AUC 0.775 versus 0.768) and reduced the conformational accuracy (AUC = 0.769).
Asunto(s)
Epítopos de Linfocito B , Epítopos de Linfocito B/química , Conformación MolecularRESUMEN
The Hardy-Weinberg equilibrium (HWE) assumption is essential to many population genetics models. Multiple tests were developed to test its applicability in observed genotypes. Current methods are divided into exact tests applicable to small populations and a small number of alleles, and approximate goodness-of-fit tests. Existing tests cannot handle ambiguous typing in multi-allelic loci. We here present a novel exact test Unambiguous Multi Allelic Test (UMAT) not limited to the number of alleles and population size, based on a perturbative approach around the current observations. We show its accuracy in the detection of deviation from HWE. We then propose an additional model to handle ambiguous typing using either sampling into UMAT or a goodness-of-fit test test with a variance estimate taking ambiguity into account, named Asymptotic Statistical Test with Ambiguity (ASTA). We show the accuracy of ASTA and the possibility of detecting the source of deviation from HWE. We apply these tests to the HLA loci to reproduce multiple previously reported deviations from HWE, and a large number of new ones.
Asunto(s)
Genética de Población , Humanos , Polimorfismo Genético , Modelos Genéticos , Alelos , Frecuencia de los Genes , Genotipo , Sitios GenéticosRESUMEN
HLA haplotypes were found to be associated with increased risk for viral infections or disease severity in various diseases, including SARS. Several genetic variants are associated with COVID-19 severity. Studies have proposed associations, based on a very small sample and a large number of tested HLA alleles, but no clear association between HLA and COVID-19 incidence or severity has been reported. We conducted a large-scale HLA analysis of Israeli individuals who tested positive for SARS-CoV-2 infection by PCR. Overall, 72,912 individuals with known HLA haplotypes were included in the study, of whom 6413 (8.8%) were found to have SARS-CoV-2 by PCR. A total of 20,937 subjects were of Ashkenazi origin (at least 2/4 grandparents). One hundred eighty-one patients (2.8% of the infected) were hospitalized due to the disease. None of the 66 most common HLA loci (within the five HLA subgroups: A, B, C, DQB1, DRB1) was found to be associated with SARS-CoV-2 infection or hospitalization in the general Israeli population. Similarly, no association was detected in the Ashkenazi Jewish subset. Moreover, no association was found between heterozygosity in any of the HLA loci and either infection or hospitalization. We conclude that HLA haplotypes are not a major risk/protecting factor among the Israeli population for SARS-CoV-2 infection or severity. Our results suggest that if any HLA association exists with the disease it is very weak, and of limited effect on the pandemic.
Asunto(s)
COVID-19/genética , Genotipo , Antígenos HLA/genética , SARS-CoV-2/fisiología , Adulto , Alelos , COVID-19/epidemiología , COVID-19/inmunología , Estudios de Casos y Controles , Estudios de Cohortes , Etnicidad , Femenino , Estudios de Asociación Genética , Haplotipos , Prueba de Histocompatibilidad , Hospitalización/estadística & datos numéricos , Humanos , Israel/epidemiología , Masculino , Estudios Retrospectivos , Índice de Severidad de la Enfermedad , Clase SocialRESUMEN
The outcome of Hematopoietic Stem Cell (HSCT) and organ transplant is strongly affected by the matching of the HLA alleles of the donor and the recipient. However, donors and sometimes recipients are often typed at low resolution, with some alleles either missing or ambiguous. Thus, imputation methods are required to detect the most probably high-resolution HLA haplotypes consistent with a typing. Such imputation algorithms require predefined haplotype frequencies. As such, the phasing of the typing is required for both imputation and frequency generation.We have developed a new approach to HLA haplotype and genotype imputation, where first all candidate phases of a typing are explicated, and then the ambiguity within each phase is solved. This ambiguity is solved through a graph structure of all partial haplotypes and the haplotypes consistent with them.This phasing approach was used to produce an imputation algorithm (GRIMM-Graph Imputation and Matching). GRIMM was then combined with the possibility of combining information from multiple races to produce MR-GRIMM (Multi-Race GRIMM). When family information is available, the phasing of each family member can be restricted by the others. We propose GRAMM (GRaph-bAsed faMily iMputation) to phase alleles in family pedigree HLA typing data and in mother-cord blood unit pairs. Finally, we combined MR-GRIMM with an expectation-maximization (EM) algorithm to estimate haplotype frequencies sharing information between races to produce MR-GRIMME (MR-GRIMM EM).We have shown that these algorithms naturally combine information between races and family members. The accuracy of each of these algorithms is significantly better than its current parallel methods. MR-GRIMM leads to high accuracy in matching predictions. GRAMM better imputes family members than either MR-GRIMM or any existing algorithm and has practically no phasing errors. MR-GRIMME obtains a higher likelihood than existing algorithms.MR-GRIMM, MR-GRIMME, and GRAMM are available as servers or through stand-alone versions in GITHUB and PyPi, as detailed in the appropriate sections.
Asunto(s)
Algoritmos , Antígenos HLA , Haplotipos , Prueba de Histocompatibilidad , Donantes de Tejidos , Humanos , Antígenos HLA/genética , Prueba de Histocompatibilidad/métodos , Alelos , Programas Informáticos , Frecuencia de los Genes , Familia , Genotipo , Trasplante de Células Madre HematopoyéticasRESUMEN
Background: Pre-clinical development and in-human trials of 'off-the-shelf' immune effector cell therapy (IECT) are burgeoning. IECT offers many potential advantages over autologous products. The relevant HLA matching criteria vary from product to product and depend on the strategies employed to reduce the risk of GvHD or to improve allo-IEC persistence, as warranted by different clinical indications, disease kinetics, on-target/off-tumor effects, and therapeutic cell type (T cell subtype, NK, etc.). Objective: The optimal choice of candidate donors to maximize target patient population coverage and minimize cost and redundant effort in creating off-the-shelf IECT product banks is still an open problem. We propose here a solution to this problem, and test whether it would be more expensive to recruit additional donors or to prevent class I or class II HLA expression through gene editing. Study design: We developed an optimal coverage problem, combined with a graph-based algorithm to solve the donor selection problem under different, clinically plausible scenarios (having different HLA matching priorities). We then compared the efficiency of different optimization algorithms - a greedy solution, a linear programming (LP) solution, and integer linear programming (ILP) -- as well as random donor selection (average of 5 random trials) to show that an optimization can be performed at the entire population level. Results: The average additional population coverage per donor decrease with the number of donors, and varies with the scenario. The Greedy, LP and ILP algorithms consistently achieve the optimal coverage with far fewer donors than the random choice. In all cases, the number of randomly-selected donors required to achieve a desired coverage increases with increasing population. However, when optimal donors are selected, the number of donors required may counter-intuitively decrease with increasing population size. When comparing recruiting more donors vs gene editing, the latter was generally more expensive. When choosing donors and patients from different populations, the number of random donors required drastically increases, while the number of optimal donors does not change. Random donors fail to cover populations different from their original populations, while a small number of optimal donors from one population can cover a different population. Discussion: Graph-based coverage optimization algorithms can flexibly handle various HLA matching criteria and accommodate additional information such as KIR genotype, when such information becomes routinely available. These algorithms offer a more efficient way to develop off-the-shelf IECT product banks compared to random donor selection and offer some possibility of improved transparency and standardization in product design.
Asunto(s)
Trasplante de Células Madre Hematopoyéticas , Neoplasias , Humanos , Donantes de TejidosRESUMEN
Allogeneic Hematopoietic Cell Transplantation (HCT) is a curative therapy for hematologic disorders and often requires human leukocyte antigen (HLA)-matched donors. Donor registries have recruited donors utilizing evolving technologies of HLA genotyping methods. This necessitates in-silico ambiguity resolution and statistical imputation based on haplotype frequencies estimated from donor data stratified by self-identified race and ethnicity (SIRE). However, SIRE has limited genetic validity and presents a challenge for individuals with unknown or mixed SIRE. We present MR-GRIMM "Multi-Race Graph IMputation and Matching" that simultaneously imputes the race/ethnic category and HLA genotype using a SIRE based prior. Additionally, we propose a novel method to impute HLA typing inconsistent with current haplotype frequencies. The performance of MR-GRIMM was validated using a dataset of 170,000 donor-recipient pairs. MR-GRIMM has an average 20 % lower matching error (1-AUC) than single-race imputation. The recall metric (sensitivity) of the race/ethnic category imputation from HLA was measured by comparing the imputed donor race with the donor-provided SIRE. Accuracies of 0.74 and 0.55 were obtained for the prediction of 5 broad and 21 detailed US population groups respectively. The operational implementation of this algorithm in a registry search could help improve match predictions and access to HLA-matched donors.
Asunto(s)
Antígenos HLA , Trasplante de Células Madre Hematopoyéticas , Humanos , Genotipo , Antígenos HLA/genética , Haplotipos , Donantes de Tejidos , Trasplante de Células Madre Hematopoyéticas/métodos , Antígenos de Histocompatibilidad Clase II/genética , Prueba de Histocompatibilidad/métodos , Sistema de RegistrosRESUMEN
Recently, haplo-identical transplantation with multiple HLA mismatches has become a viable option for stem cell transplants. Haplotype sharing detection requires the imputation of donor and recipient. We show that even in high-resolution typing when all alleles are known, there is a 15% error rate in haplotype phasing, and even more in low-resolution typings. Similarly, in related donors, the parents' haplotypes should be imputed to determine what haplotype each child inherited. We propose graph-based family imputation (GRAMM) to phase alleles in family pedigree HLA typing data, and in mother-cord blood unit pairs. We show that GRAMM has practically no phasing errors when pedigree data are available. We apply GRAMM to simulations with different typing resolutions as well as paired cord-mother typings, and show very high phasing accuracy, and improved allele imputation accuracy. We use GRAMM to detect recombination events and show that the rate of falsely detected recombination events (false-positive rate) in simulations is very low. We then apply recombination detection to typed families to estimate the recombination rate in Israeli and Australian population datasets. The estimated recombination rate has an upper bound of 10%-20% per family (1%-4% per individual).
Asunto(s)
Donantes de Tejidos , Niño , Humanos , Alelos , Australia , HaplotiposRESUMEN
BACKGROUND AND OBJECTIVES: To explore the clinical characteristics and HLA associations of patients with anti-leucine-rich glioma-inactivated 1 encephalitis (LGI1E) from a large single center in Israel. Anti-LGI1E is the most commonly diagnosed antibody-associated encephalitic syndrome in adults. Recent studies of various populations reveal significant associations with specific HLA genes. We examined the clinical characteristics and HLA associations of a cohort of Israeli patients. METHODS: Seventeen consecutive patients with anti-LGI1E diagnosed at Tel Aviv Medical Center between the years 2011 and 2018 were included. HLA typing was performed using next-generation sequencing at the tissue typing laboratory of Sheba Medical Center and compared with data from the Ezer Mizion Bone Marrow Donor Registry, containing over 1,000,000 samples. RESULTS: Our cohort displayed a male predominance and median age at onset in the 7th decade, as previously reported. The most common presenting symptom was seizures. Notably, paroxysmal dizziness spells were significantly more common than previously reported (35%), whereas faciobrachial dystonic seizures were found only in 23%. HLA analysis revealed overrepresentation of DRB1*07:01 (OR: 3.18, CI: 20.9 p < 1.e-5) and DRB1*04:02 (OR: 3.8, CI: 20.1 p < 1.e-5), as well as of the DQ allele DQB1*02:02 (OR: 2.8, CI: 14.2 p < 0.0001) as previously reported. A novel overrepresentation observed among our patients was of the DQB1*03:02 allele (OR: 2.3, CI: 6.9 p < 0.008). In addition, we found DR-DQ associations, among patients with anti-LGI1E, that showed complete or near-complete linkage disequilibrium (LD). By applying LD analysis to an unprecedentedly large control cohort, we were able to show that although in the general population, DQB*03:02 is not fully associated with DRB1*04:02, in the patient population, both alleles are always coupled, suggesting the DRB1*04:02 association to be primary to disease predisposition. In silico predictions performed for the overrepresented DQ alleles reveal them to be strong binders of LGI1-derived peptides, similarly to overrepresented DR alleles. These predictions suggest a possible correlation between peptide binding sites of paired DR-DQ alleles. DISCUSSION: Our cohort presents distinct immune characteristics with substantially higher overrepresentation of DRB1*04:02 and slightly lower overrepresentation of DQB1*07:01 compared with previous reports implying differences between different populations. DQ-DR interactions found in our cohort may shed additional light on the complex role of immunogenetics in the pathogenesis of anti-LGI1E, implying a possible relevance of certain DQ alleles and DR-DQ interactions.
Asunto(s)
Encefalitis , Antígenos HLA-DQ , Adulto , Humanos , Masculino , Femenino , Antígenos HLA-DQ/genética , Cadenas beta de HLA-DQ/genética , Frecuencia de los Genes , Cadenas HLA-DRB1/genética , ConvulsionesRESUMEN
A large number of association studies have related donor characteristics to survival after bone marrow transplantation, for leukemia in general and specifically for acute myeloid leukemia (AML) patients. However, population-based differences often do not hold at the single transplant level. We test whether transplantation outcomes can be predicted at the single-patient level and whether such predictions can be used to better choose donors. The analysis was performed on a mixture of different diseases or with AML only, and with either patient and donor information or donor information only. We analyzed 3671 8-of-8 HLA-matched AML donor-recipient pairs and tested whether the outcome, including 1-year total and event-free survival, can be predicted from patient and donor-related factors. We used multiple machine learning and survival analysis methods. The best method is a fully connected neural network. Multiple outcomes can be predicted, with area under the specificity-sensitivity curve (AUC) values between 0.54 and 0.67 for the different outcomes. The patient age has a strong impact on prediction. However, for a given patient, when only donor or transplant information is used, limited prediction accuracy of 0.54 to 0.56 AUC for event-free survival and survival is obtained. Graft-versus-host disease and rejection after 1 year have slightly higher AUC values of around 0.59, whereas the relapse prediction accuracy was random. All donors' characteristics have a limited influence on the quality of hematopoietic stem cell transplantation for fully matched donors. Many factors with a population effect on survival have a very limited effect when combined with all other factors in a single-donor predictive model.
Asunto(s)
Trasplante de Células Madre Hematopoyéticas , Leucemia Mieloide Aguda , Humanos , Donante no Emparentado , Hermanos , Estudios Retrospectivos , Trasplante de Células Madre Hematopoyéticas/métodos , Leucemia Mieloide Aguda/terapiaRESUMEN
The "heterozygote advantage" hypothesis has been postulated regarding the role of human leukocyte antigen (HLA) in non-Hodgkin lymphoma (NHL), where homozygous loci are associated with an increased risk of disease. In this retrospective study, we analyzed the HLA homozygosity of 3789 patients with aplastic anemia (AA), acute lymphocytic leukemia (ALL), acute myeloblastic leukemia (AML), chronic lymphocytic leukemia (CLL), chronic myeloid leukemia (CML), myelodysplastic syndrome (MDS), multiple myeloma (MM), and non-Hodgkin lymphoma (NHL) at HLA-A, B, C, DRB1 and DQB1 loci compared to 169,964 normal controls. HLA homozygosity at one or more loci was only associated with an increased risk in NHL patients (OR = 1.28, 95% CI [1.09, 1.50], p = 0.002). This association was not seen in any of the other hematologic diseases. Homozygosity at HLA-A alone, HLA-B + C only, and HLA-DRB1 + DQB1 only was also significantly associated with NHL. Finally, we observed a 17% increased risk of NHL with each additional homozygous locus (OR per locus = 1.17, 95% CI [1.08, 1.25], p trend = 2.4 × 10-5). These results suggest that reduction of HLA diversity could predispose individuals to an increased risk of developing NHL.
Asunto(s)
Linfoma no Hodgkin , Antígenos HLA-A , Antígenos de Histocompatibilidad , Antígenos de Histocompatibilidad Clase I , Antígenos de Histocompatibilidad Clase II , Humanos , Linfoma no Hodgkin/genética , Estudios RetrospectivosRESUMEN
HLA haplotype frequencies are estimated from ambiguous unphased HLA genotyping data using Expectation-Maximization (EM) algorithms. Current population genetics methods require independent EM frequency estimates for each population, and assume that each population is in Hardy-Weinberg Equilibrium (HWE). The HWE assumption of EM has thus far resulted in the exclusion of individuals from mixed or unknown ethnic backgrounds from reference datasets. Multi-region populations are currently poorly served by stem cell donor registry HLA imputation and matching implementations due to the inability of such algorithms to incorporate admixture into their population genetics models. To address this unmet need, we have expanded the imputation component of our GRaph IMputation and Matching (GRIMM) framework, where imputation becomes the expectation step in an iterative EM algorithm. Our novel multi-region EM implementation considers region as a Bayesian prior, enabling integration of HLA information from multiple single-region population groups, and for the first time including individuals with ambiguous or mixed ethnic backgrounds. We show that our multi-region EM produces much higher likelihood values and better haplotype recovery as measured by Kullback-Leibler divergence than all evaluated EM implementations when tested on real datasets of US donor registry HLA typings as well as simulated multi-region datasets of ambiguous HLA typings.