Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 145
Filter
Add more filters

Publication year range
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38487845

ABSTRACT

B cell epitope prediction methods are separated into linear sequence-based predictors and conformational epitope predictions that typically use the measured or predicted protein structure. Most linear predictions rely on the translation of the sequence to biologically based representations and the applications of machine learning on these representations. We here present CALIBER 'Conformational And LInear B cell Epitopes pRediction', and show that a bidirectional long short-term memory with random projection produces a more accurate prediction (test set AUC=0.789) than all current linear methods. The same predictor when combined with an Evolutionary Scale Modeling-2 projection also improves on the state of the art in conformational epitopes (AUC = 0.776). The inclusion of the graph of the 3D distances between residues did not increase the prediction accuracy. However, the long-range sequence information was essential for high accuracy. While the same model structure was applicable for linear and conformational epitopes, separate training was required for each. Combining the two slightly increased the linear accuracy (AUC 0.775 versus 0.768) and reduced the conformational accuracy (AUC = 0.769).


Subject(s)
Epitopes, B-Lymphocyte , Epitopes, B-Lymphocyte/chemistry , Molecular Conformation
2.
Gut ; 72(5): 918-928, 2023 05.
Article in English | MEDLINE | ID: mdl-36627187

ABSTRACT

OBJECTIVE: Gestational diabetes mellitus (GDM) is a condition in which women without diabetes are diagnosed with glucose intolerance during pregnancy, typically in the second or third trimester. Early diagnosis, along with a better understanding of its pathophysiology during the first trimester of pregnancy, may be effective in reducing incidence and associated short-term and long-term morbidities. DESIGN: We comprehensively profiled the gut microbiome, metabolome, inflammatory cytokines, nutrition and clinical records of 394 women during the first trimester of pregnancy, before GDM diagnosis. We then built a model that can predict GDM onset weeks before it is typically diagnosed. Further, we demonstrated the role of the microbiome in disease using faecal microbiota transplant (FMT) of first trimester samples from pregnant women across three unique cohorts. RESULTS: We found elevated levels of proinflammatory cytokines in women who later developed GDM, decreased faecal short-chain fatty acids and altered microbiome. We next confirmed that differences in GDM-associated microbial composition during the first trimester drove inflammation and insulin resistance more than 10 weeks prior to GDM diagnosis using FMT experiments. Following these observations, we used a machine learning approach to predict GDM based on first trimester clinical, microbial and inflammatory markers with high accuracy. CONCLUSION: GDM onset can be identified in the first trimester of pregnancy, earlier than currently accepted. Furthermore, the gut microbiome appears to play a role in inflammation-induced GDM pathogenesis, with interleukin-6 as a potential contributor to pathogenesis. Potential GDM markers, including microbiota, can serve as targets for early diagnostics and therapeutic intervention leading to prevention.


Subject(s)
Diabetes, Gestational , Microbiota , Pregnancy , Female , Humans , Diabetes, Gestational/diagnosis , Pregnancy Trimester, Third , Inflammation , Cytokines
3.
J Theor Biol ; 534: 110972, 2022 02 07.
Article in English | MEDLINE | ID: mdl-34856201

ABSTRACT

An accurate estimate of the number of infected individuals in any disease is crucial. Current estimates are mainly based on the fraction of positive samples or the total number of positive samples. However, both methods are biased and sensitive to the sampling depth. We here propose an alternative method to use the attributes of each sample to estimate the change in the total number of positive patients in the total population. We present a Bayesian estimator assuming a combination of condition and time-dependent probability of being positive, and mixed implicit-explicit solution for the probability of a person with conditions i at time t of being positive. We use this estimate to predict the total probability of being positive at a given day t. We show that these estimate results are smooth and not sensitive to the properties of the samples. Moreover, these results are a better predictor of future mortality.


Subject(s)
Bayes Theorem , Bias , Forecasting , Humans , Probability , Selection Bias
4.
PLoS Comput Biol ; 17(7): e1009225, 2021 07.
Article in English | MEDLINE | ID: mdl-34310600

ABSTRACT

Recent advances in T cell repertoire (TCR) sequencing allow for the characterization of repertoire properties, as well as the frequency and sharing of specific TCR. However, there is no efficient measure for the local density of a given TCR. TCRs are often described either through their Complementary Determining region 3 (CDR3) sequences, or theirV/J usage, or their clone size. We here show that the local repertoire density can be estimated using a combined representation of these components through distance conserving autoencoders and Kernel Density Estimates (KDE). We present ELATE-an Encoder-based LocAl Tcr dEnsity and show that the resulting density of a sample can be used as a novel measure to study repertoire properties. The cross-density between two samples can be used as a similarity matrix to fully characterize samples from the same host. Finally, the same projection in combination with machine learning algorithms can be used to predict TCR-peptide binding through the local density of known TCRs binding a specific target.


Subject(s)
Receptors, Antigen, T-Cell/classification , Receptors, Antigen, T-Cell/genetics , Software , Algorithms , Amino Acid Sequence , Complementarity Determining Regions/classification , Complementarity Determining Regions/genetics , Computational Biology , Databases, Genetic , Gene Rearrangement, alpha-Chain T-Cell Antigen Receptor , Gene Rearrangement, beta-Chain T-Cell Antigen Receptor , Humans , Immunoglobulin Variable Region/genetics , Machine Learning , Receptors, Antigen, T-Cell, alpha-beta/classification , Receptors, Antigen, T-Cell, alpha-beta/genetics
5.
Proc Natl Acad Sci U S A ; 116(28): 14098-14104, 2019 07 09.
Article in English | MEDLINE | ID: mdl-31227609

ABSTRACT

The major histocompatibility complex (MHC) is a central component of the vertebrate immune system and hence evolves in the regime of a host-pathogen evolutionary race. The MHC is associated with quantitative traits which directly affect fitness and are subject to selection pressure. The evolution of haplotypes at the MHC HLA (HLA) locus is generally thought to be governed by selection for increased diversity that is manifested in overdominance and/or negative frequency-dependent selection (FDS). However, recently, a model combining purifying selection on haplotypes and balancing selection on alleles has been proposed. We compare the predictions of several population dynamics models of haplotype frequency evolution to the distributions derived from 6.59-million-donor HLA typings from the National Marrow Donor Program registry. We show that models that combine a multiplicative fitness function, extremely high haplotype discovery rates, and exponential fitness decay over time produce the best fit to the data for most of the analyzed populations. In contrast, overdominance is not supported, and population substructure does not explain the observed haplotype frequencies. Furthermore, there is no evidence of negative FDS. Thus, multiplicative fitness, rapid haplotype discovery, and rapid fitness decay appear to be the major factors shaping the HLA haplotype frequency distribution in the human population.


Subject(s)
Evolution, Molecular , Major Histocompatibility Complex/genetics , Physical Fitness , Selection, Genetic , Alleles , Female , Genetic Variation/genetics , Genetics, Population , HLA Antigens/genetics , HLA Antigens/immunology , Haplotypes/genetics , Haplotypes/immunology , Humans , Major Histocompatibility Complex/immunology , Male , Phenotype , Polymorphism, Genetic , Tissue Donors
6.
J Clin Immunol ; 41(6): 1154-1161, 2021 08.
Article in English | MEDLINE | ID: mdl-34050837

ABSTRACT

HLA haplotypes were found to be associated with increased risk for viral infections or disease severity in various diseases, including SARS. Several genetic variants are associated with COVID-19 severity. Studies have proposed associations, based on a very small sample and a large number of tested HLA alleles, but no clear association between HLA and COVID-19 incidence or severity has been reported. We conducted a large-scale HLA analysis of Israeli individuals who tested positive for SARS-CoV-2 infection by PCR. Overall, 72,912 individuals with known HLA haplotypes were included in the study, of whom 6413 (8.8%) were found to have SARS-CoV-2 by PCR. A total of 20,937 subjects were of Ashkenazi origin (at least 2/4 grandparents). One hundred eighty-one patients (2.8% of the infected) were hospitalized due to the disease. None of the 66 most common HLA loci (within the five HLA subgroups: A, B, C, DQB1, DRB1) was found to be associated with SARS-CoV-2 infection or hospitalization in the general Israeli population. Similarly, no association was detected in the Ashkenazi Jewish subset. Moreover, no association was found between heterozygosity in any of the HLA loci and either infection or hospitalization. We conclude that HLA haplotypes are not a major risk/protecting factor among the Israeli population for SARS-CoV-2 infection or severity. Our results suggest that if any HLA association exists with the disease it is very weak, and of limited effect on the pandemic.


Subject(s)
COVID-19/genetics , Genotype , HLA Antigens/genetics , SARS-CoV-2/physiology , Adult , Alleles , COVID-19/epidemiology , COVID-19/immunology , Case-Control Studies , Cohort Studies , Ethnicity , Female , Genetic Association Studies , Haplotypes , Histocompatibility Testing , Hospitalization/statistics & numerical data , Humans , Israel/epidemiology , Male , Retrospective Studies , Severity of Illness Index , Social Class
7.
Immunogenetics ; 73(2): 163-173, 2021 04.
Article in English | MEDLINE | ID: mdl-33475766

ABSTRACT

Restoration of T cell repertoire diversity after allogeneic bone marrow transplantation (allo-BMT) is crucial for immune recovery. T cell diversity is produced by rearrangements of germline gene segments (V (D) and J) of the T cell receptor (TCR) α and ß chains, and selection induced by binding of TCRs to MHC-peptide complexes. Multiple measures were proposed for this diversity. We here focus on the V-gene usage and the CDR3 sequences of the beta chain. We compared multiple T cell repertoires to follow T cell repertoire changes post-allo-BMT in HLA-matched related donor and recipient pairs. Our analyses of the differences between donor and recipient complementarity determining region 3 (CDR3) beta composition and V-gene profile show that the CDR3 sequence composition does not change during restoration, implying its dependence on the HLA typing. In contrast, V-gene usage followed a time-dependent pattern, initially following the donor profile and then shifting back to the recipients' profile. The final long-term repertoire was more similar to that of the recipient's original one than the donor's; some recipients converged within months, while others took multiple years. Based on the results of our analyses, we propose that donor-recipient V-gene distribution differences may serve as clinical biomarkers for monitoring immune recovery.


Subject(s)
Bone Marrow Transplantation , Complementarity Determining Regions/genetics , Genes, T-Cell Receptor beta/genetics , T-Lymphocytes/immunology , Adult , Female , Gene Rearrangement, beta-Chain T-Cell Antigen Receptor , Histocompatibility Testing , Humans , Male , Middle Aged , Receptors, Antigen, T-Cell, alpha-beta/genetics , Tissue Donors , Transplantation, Homologous
8.
Gut ; 69(3): 473-486, 2020 03.
Article in English | MEDLINE | ID: mdl-31167813

ABSTRACT

OBJECTIVE: Pregnancy may affect the disease course of IBD. Both pregnancy and IBD are associated with altered immunology and intestinal microbiology. However, to what extent immunological and microbial profiles are affected by pregnancy in patients with IBD remains unclear. DESIGN: Faecal and serum samples were collected from 46 IBD patients (31 Crohn's disease (CD) and 15 UC) and 179 healthy controls during first, second and third trimester of pregnancy, and prepregnancy and postpartum for patients with IBD. Peripheral blood cytokine profiles were determined by ELISA, and microbiome analysis was performed by sequencing the V4 region of the bacterial 16S rRNA gene. RESULTS: Proinflammatory serum cytokine levels in patients with IBD decrease significantly on conception. Reduced interleukin (IL)-10 and IL-5 levels but increased IL-8 and interferon (IFN)γ levels compared with healthy controls were seen throughout pregnancy, but cytokine patterns remained stable during gestation. Microbial diversity in pregnant patients with IBD was reduced compared with that in healthy women, and significant differences existed between patients with UC and CD in early pregnancy. However, these microbial differences were no longer present during middle and late pregnancy. Dynamic modelling showed considerable interaction between cytokine and microbial composition. CONCLUSION: Serum proinflammatory cytokine levels markedly improve on conception in pregnant patients with IBD, and intestinal microbiome diversity of patients with IBD normalises during middle and late pregnancy. We thus conclude that pregnancy is safe and even potentially beneficial for patients with IBD.


Subject(s)
Colitis, Ulcerative/blood , Colitis, Ulcerative/microbiology , Crohn Disease/blood , Crohn Disease/microbiology , Cytokines/blood , Gastrointestinal Microbiome , Pregnancy Complications/blood , Pregnancy Complications/microbiology , Adult , Case-Control Studies , Colitis, Ulcerative/immunology , Crohn Disease/immunology , Feces/microbiology , Female , Humans , Interferon-gamma/blood , Interleukin-10/blood , Interleukin-5/blood , Interleukin-8/blood , Pregnancy , Pregnancy Complications/immunology , Pregnancy Trimesters/blood , Pregnancy Trimesters/immunology
9.
BMC Med ; 18(1): 281, 2020 10 21.
Article in English | MEDLINE | ID: mdl-33081767

ABSTRACT

BACKGROUND: Adjuvant chemotherapy induces weight gain, glucose intolerance, and hypertension in about a third of women. The mechanisms underlying these events have not been defined. This study assessed the association between the microbiome and weight gain in patients treated with adjuvant chemotherapy for breast and gynecological cancers. METHODS: Patients were recruited before starting adjuvant therapy. Weight and height were measured before treatment and 4-6 weeks after treatment completion. Weight gain was defined as an increase of 3% or more in body weight. A stool sample was collected before treatment, and 16S rRNA gene sequencing was performed. Data regarding oncological therapy, menopausal status, and antibiotic use was prospectively collected. Patients were excluded if they were treated by antibiotics during the study. Fecal transplant experiments from patients were conducted using Swiss Webster germ-free mice. RESULTS: Thirty-three patients were recruited; of them, 9 gained 3.5-10.6% of baseline weight. The pretreatment microbiome of women who gained weight following treatment was significantly different in diversity and taxonomy from that of control women. Fecal microbiota transplantation from pretreatment samples of patients that gained weight induced metabolic changes in germ-free mice compared to mice transplanted with pretreatment fecal samples from the control women. CONCLUSION: The microbiome composition is predictive of weight gain following adjuvant chemotherapy and induces adverse metabolic changes in germ-free mice, suggesting it contributes to adverse metabolic changes seen in patients. Confirmation of these results in a larger patient cohort is warranted.


Subject(s)
Breast Neoplasms/complications , Chemotherapy, Adjuvant/adverse effects , Gastrointestinal Microbiome/genetics , Genital Neoplasms, Female/complications , Weight Gain/drug effects , Adolescent , Adult , Aged , Animals , Breast Neoplasms/drug therapy , Cohort Studies , Female , Genital Neoplasms, Female/drug therapy , Humans , Mice , Middle Aged , Young Adult
10.
Bioinformatics ; 35(11): 1907-1915, 2019 06 01.
Article in English | MEDLINE | ID: mdl-30346482

ABSTRACT

MOTIVATION: RNA viruses generate a cloud of genetic variants within each host. This cloud contains high-frequency genotypes, and many rare variants. The dynamics of these variants is crucial to understand viral evolution and their effect on their host. RESULTS: We use an experimental evolution system to show that the genetic cloud surrounding the Coxsackie virus master sequence slowly, but steadily, evolves over hundreds of generations. This movement is determined by strong context-dependent mutations, where the frequency and type of mutations are affected by neighboring positions, even in silent mutations. This context-dependent mutation pattern serves as a spearhead for the viral population's movement within the adaptive landscape and affects which new dominant variants will emerge. The non-local mutation patterns affect the mutated dinucleotide distribution, and eventually lead to a non-uniform dinucleotide distribution in the main viral sequence. We tested these results on other RNA viruses with similar conclusions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genes, Viral , Adaptation, Physiological , Genotype , Mutation , RNA Viruses , Time Factors
11.
Bioinformatics ; 35(18): 3520-3523, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30689784

ABSTRACT

MOTIVATION: For over 10 years allele-level HLA matching for bone marrow registries has been performed in a probabilistic context. HLA typing technologies provide ambiguous results in that they could not distinguish among all known HLA alleles equences; therefore registries have implemented matching algorithms that provide lists of donor and cord blood units ordered in terms of the likelihood of allele-level matching at specific HLA loci. With the growth of registry sizes, current match algorithm implementations are unable to provide match results in real time. RESULTS: We present here a novel computationally-efficient open source implementation of an HLA imputation and match algorithm using a graph database platform. Using graph traversal, the matching algorithm runtime is practically not affected by registry size. This implementation generates results that agree with consensus output on a publicly-available match algorithm cross-validation dataset. AVAILABILITY AND IMPLEMENTATION: The Python, Perl and Neo4j code is available at https://github.com/nmdp-bioinformatics/grimm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
HLA Antigens/genetics , Genotype , Histocompatibility Testing , Humans , Tissue Donors
12.
Entropy (Basel) ; 22(2)2020 Feb 19.
Article in English | MEDLINE | ID: mdl-33286010

ABSTRACT

A method for estimating the Shannon differential entropy of multidimensional random variables using independent samples is described. The method is based on decomposing the distribution into a product of marginal distributions and joint dependency, also known as the copula. The entropy of marginals is estimated using one-dimensional methods. The entropy of the copula, which always has a compact support, is estimated recursively by splitting the data along statistically dependent dimensions. The method can be applied both for distributions with compact and non-compact supports, which is imperative when the support is not known or of a mixed type (in different dimensions). At high dimensions (larger than 20), numerical examples demonstrate that our method is not only more accurate, but also significantly more efficient than existing approaches.

14.
Immunogenetics ; 71(10): 589-604, 2019 11.
Article in English | MEDLINE | ID: mdl-31741008

ABSTRACT

The human leukocyte antigen (HLA) is the most polymorphic region in humans. Anthropologists use HLA to trace populations' migration and evolution. However, recent admixture between populations can mask the ancestral haplotype frequency distribution. We present a statistical method based on high-resolution HLA haplotype frequencies to resolve population admixture using a non-negative matrix factorization formalism and validated using haplotype frequencies from 56 world populations. The result is a minimal set of source components (SCs) decoding roughly 90% of the total variance in the studied admixtures. These SCs agree with the geographical distribution, phylogenies, and recent admixture events of the studied groups. With the growing population of multi-ethnic individuals, or individuals that do not report race/ethnic information, the HLA matching process for stem-cell and solid organ transplants is becoming more challenging. The presented algorithm provides a framework that facilitates the breakdown of highly admixed populations into SCs, which can be used to better match the rapidly growing population of multi-ethnic individuals worldwide.


Subject(s)
Ethnicity/genetics , HLA Antigens/classification , HLA Antigens/genetics , Haplotypes , Histocompatibility Testing/methods , Models, Genetic , Gene Frequency , Genotype , Histocompatibility Testing/statistics & numerical data , Humans , Linkage Disequilibrium
15.
Immunogenetics ; 70(7): 419-428, 2018 07.
Article in English | MEDLINE | ID: mdl-29492592

ABSTRACT

Epitopes presented on MHC class I molecules pass multiple processing stages before their presentation on MHC molecules, the main ones being proteasomal cleavage and TAP binding. Transporter associated with antigen processing (TAP) binding is a necessary stage for most, but not all, MHC-I-binding peptides. The molecular determinants of TAP-binding peptides can be experimentally estimated from binding experiments and from the properties of peptides inducing a CD8 T cell response. We here propose novel optimization formalisms to combine binding and activation experimental results to produce a classifier for TAP binding using dual-output kernel and deep learning approaches. The application of these algorithms to the human and murine TAP binding leads to predictors that are much more precise than current state of the art methods. Moreover, the computed score is highly correlated with the observed binding energy. The new predictors show that TAP binding may be much more selective than previously assumed in humans and mice and sensitive to the properties of most positions of the peptides. Beyond the improved precision for TAP binding, we propose that the same approach holds in most molecular binding problems, where functional and binding measures are simultaneously available, and can be used to significantly improve the precision of binding prediction algorithms in general and immune system molecules specifically.


Subject(s)
ATP-Binding Cassette Transporters/physiology , Histocompatibility Antigens Class I/immunology , ATP-Binding Cassette Transporters/classification , Algorithms , Animals , Antigen Presentation/immunology , Computer Simulation , Deep Learning , Epitopes/classification , Forecasting , Histocompatibility Antigens Class I/physiology , Humans , Membrane Transport Proteins , Peptides/immunology , Proteasome Endopeptidase Complex/metabolism
16.
Immunogenetics ; 70(5): 279-292, 2018 05.
Article in English | MEDLINE | ID: mdl-29124304

ABSTRACT

Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.


Subject(s)
Algorithms , Bayes Theorem , Donor Selection , HLA Antigens/genetics , Haplotypes , Models, Statistical , Stem Cells/cytology , Genotype , Histocompatibility Testing , Humans , Polymorphism, Genetic , Registries , Tissue Donors
17.
J Autoimmun ; 90: 94-104, 2018 06.
Article in English | MEDLINE | ID: mdl-29503043

ABSTRACT

Systemic lupus erythematosus (SLE) is a complex autoimmune disease accompanied by production of autoantibodies directed to a variety of self-proteins and nucleic acids. The genetic basis of SLE is also complex with at least 40 susceptibility loci identified. This complexity suggests that there are a variety of SLE manifestations; nevertheless, SLE is treated as a single disease clinically. One unique SLE target is the Smith antigen (Sm), a nuclear ribonucleoprotein complex. Sm response occurs in 25% of patients with SLE. To simplify analysis of the disease and its associated autoantibody repertoire, we focused on this subset [referred to here as "Sm positive", Sm+]. We analyzed the memory B cell repertoire and identified a V region, Vκ4-1, which was significantly overrepresented in the Sm+ SLE subset. Antibodies that express Vκ4-1 are enriched in antinuclear (ANA) positive specificities and often associated with speckled ANA pattern that is a characteristic of Sm binding. In healthy individuals Vκ4-1 B cells are enriched in the unswitched memory population. Unswitched memory B cells resemble mouse marginal zone B cells and this population is decreased in all SLE patients. Moreover, we found a similar decrease in healthy African American donors. African Americans have a significantly higher prevalence of SLE compared to Caucasians. Thus, reduced unswitched memory B cell compartment may represent a new susceptibility marker for SLE.


Subject(s)
B-Lymphocytes/immunology , Black or African American , Immunoglobulin Class Switching/genetics , Immunoglobulin Variable Region/genetics , Lupus Erythematosus, Systemic/immunology , White People , Autoantibodies/blood , Autoantigens/metabolism , Disease Susceptibility , Epitopes/metabolism , Female , Genetic Markers , Humans , Immunologic Memory , Lupus Erythematosus, Systemic/epidemiology , United States/epidemiology , snRNP Core Proteins/metabolism
18.
PLoS Comput Biol ; 13(8): e1005693, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28846675

ABSTRACT

The major histocompatibility complex (MHC) contains the most polymorphic genetic system in humans, the human leukocyte antigen (HLA) genes of the adaptive immune system. High allelic diversity in HLA is argued to be maintained by balancing selection, such as negative frequency-dependent selection or heterozygote advantage. Selective pressure against immune escape by pathogens can maintain appreciable frequencies of many different HLA alleles. The selection pressures operating on combinations of HLA alleles across loci, or haplotypes, have not been extensively evaluated since the high HLA polymorphism necessitates very large sample sizes, which have not been available until recently. We aimed to evaluate the effect of selection operating at the HLA haplotype level by analyzing HLA A~C~B~DRB1~DQB1 haplotype frequencies derived from over six million individuals genotyped by the National Marrow Donor Program registry. In contrast with alleles, HLA haplotype diversity patterns suggest purifying selection, as certain HLA allele combinations co-occur in high linkage disequilibrium. Linkage disequilibrium is positive (Dij'>0) among frequent haplotypes and negative (Dij'<0) among rare haplotypes. Fitting the haplotype frequency distribution to several population dynamics models, we found that the best fit was obtained when significant positive frequency-dependent selection (FDS) was incorporated. Finally, the Ewens-Watterson test of homozygosity showed excess homozygosity for 5-locus haplotypes within 23 US populations studied, with an average Fnd of 28.43. Haplotype diversity is most consistent with purifying selection for HLA Class I haplotypes (HLA-A, -B, -C), and was not inferred for HLA Class II haplotypes (-DRB1 and-DQB1). We discuss our empirical results in the context of evolutionary theory, exploring potential mechanisms of selection that maintain high linkage disequilibrium in MHC haplotype blocks.


Subject(s)
Haplotypes/genetics , Histocompatibility Antigens Class I/genetics , Models, Genetic , Selection, Genetic/genetics , Alleles , Computational Biology , Genetic Variation/genetics , Humans , Linkage Disequilibrium
19.
Nucleic Acids Res ; 44(5): e46, 2016 Mar 18.
Article in English | MEDLINE | ID: mdl-26586802

ABSTRACT

Incremental selection within a population, defined as limited fitness changes following mutation, is an important aspect of many evolutionary processes. Strongly advantageous or deleterious mutations are detected using the synonymous to non-synonymous mutations ratio. However, there are currently no precise methods to estimate incremental selection. We here provide for the first time such a detailed method and show its precision in multiple cases of micro-evolution. The proposed method is a novel mixed lineage tree/sequence based method to detect within population selection as defined by the effect of mutations on the average number of offspring. Specifically, we propose to measure the log of the ratio between the number of leaves in lineage trees branches following synonymous and non-synonymous mutations. The method requires a high enough number of sequences, and a large enough number of independent mutations. It assumes that all mutations are independent events. It does not require of a baseline model and is practically not affected by sampling biases. We show the method's wide applicability by testing it on multiple cases of micro-evolution. We show that it can detect genes and inter-genic regions using the selection rate and detect selection pressures in viral proteins and in the immune response to pathogens.


Subject(s)
Algorithms , Biological Evolution , Models, Genetic , Pedigree , Selection, Genetic , Alphapapillomavirus/classification , Alphapapillomavirus/genetics , Animals , Base Sequence , Computer Simulation , Epitopes/chemistry , Epitopes/genetics , HIV/classification , HIV/genetics , Hepatitis B virus/classification , Hepatitis B virus/genetics , Humans , Immunoglobulins/classification , Immunoglobulins/genetics , Influenza A virus/classification , Influenza A virus/genetics , Mice , Mice, Transgenic , Molecular Sequence Data , Mutation , Phylogeny , RNA, Viral/chemistry , RNA, Viral/genetics , Receptors, Antigen, B-Cell/classification , Receptors, Antigen, B-Cell/genetics , Sequence Alignment
20.
Bioinformatics ; 32(21): 3314-3320, 2016 11 01.
Article in English | MEDLINE | ID: mdl-27378295

ABSTRACT

MOTIVATION: Spatial learning is one of the most widely studied cognitive domains in neuroscience. The Morris water maze and the Barnes maze are the most commonly used techniques to assess spatial learning and memory in rodents. Despite the fact that these tasks are well-validated paradigms for testing spatial learning abilities, manual categorization of performance into behavioral strategies is subject to individual interpretation, and thus to bias. We have previously described an unbiased machine-learning algorithm to classify spatial strategies in the Morris water maze. RESULTS: Here, we offer a support vector machine-based, automated, Barnes-maze unbiased strategy (BUNS) classification algorithm, as well as a cognitive score scale that can be used for memory acquisition, reversal training and probe trials. The BUNS algorithm can greatly benefit Barnes maze users as it provides a standardized method of strategy classification and cognitive scoring scale, which cannot be derived from typical Barnes maze data analysis. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://okunlab.wix.com/okunlab as a MATLAB application. CONTACT: eitan.okun@biu.ac.ilSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Maze Learning , Animals , Memory , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL