RESUMO
In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.
Assuntos
Algoritmos , Transcriptoma , Modelos LogísticosRESUMO
Type 1 diabetes (T1D) is a T cell-mediated autoimmune disease in which the insulin-producing ß cells within the pancreas are destroyed. Identification of target Ags and epitopes of the ß cell-reactive T cells is important both for understanding T1D pathogenesis and for the rational development of Ag-specific immunotherapies for the disease. Several studies suggest that proinsulin is an early and integral target autoantigen in T1D. However, proinsulin epitopes recognized by human CD4+ T cells have not been comprehensively characterized. Using a dye dilution-based T cell cloning method, we generated and characterized 24 unique proinsulin-specific CD4+ T cell clones from the peripheral blood of 17 individuals who carry the high-risk DR3-DQ2 and/or DR4-DQ8 HLA class II haplotypes. Some of the clones recognized previously reported DR4-restricted epitopes within the C-peptide (C25-35) or A-chain (A1-15) of proinsulin. However, we also characterized DR3-restricted epitopes within both the B-chain (B16-27 and B22-C3) and C-peptide (C25-35). Moreover, we identified DQ2-restricted epitopes within the B-chain and several DQ2- or DQ8-restricted epitopes within the C-terminal region of C-peptide that partially overlap with previously reported DQ-restricted epitopes. Two of the DQ2-restricted epitopes, B18-26 and C22-33, were shown to be naturally processed from whole human proinsulin. Finally, we observed a higher frequency of CDR3 sequences matching the TCR sequences of the proinsulin-specific T cell clones in pancreatic lymph node samples compared with spleen samples. In conclusion, we confirmed several previously reported epitopes but also identified novel (to our knowledge) epitopes within proinsulin, which are presented by HLA class II molecules associated with T1D risk.
Assuntos
Diabetes Mellitus Tipo 1/imunologia , Epitopos de Linfócito T/imunologia , Antígenos HLA-DQ/imunologia , Proinsulina/imunologia , Adolescente , Sequência de Aminoácidos , Autoantígenos/imunologia , Linfócitos T CD4-Positivos/imunologia , Criança , Pré-Escolar , Humanos , Lactente , Insulina/imunologia , Células Secretoras de Insulina/imunologia , Baço/imunologiaRESUMO
Introduction: The Human Connectome Project (HCP) has become a keystone dataset in human neuroscience, with a plethora of important applications in advancing brain imaging methods and an understanding of the human brain. We focused on tractometry of HCP diffusion-weighted MRI (dMRI) data. Methods: We used an open-source software library (pyAFQ; https://yeatmanlab.github.io/pyAFQ) to perform probabilistic tractography and delineate the major white matter pathways in the HCP subjects that have a complete dMRI acquisition (n = 1,041). We used diffusion kurtosis imaging (DKI) to model white matter microstructure in each voxel of the white matter, and extracted tract profiles of DKI-derived tissue properties along the length of the tracts. We explored the empirical properties of the data: first, we assessed the heritability of DKI tissue properties using the known genetic linkage of the large number of twin pairs sampled in HCP. Second, we tested the ability of tractometry to serve as the basis for predictive models of individual characteristics (e.g., age, crystallized/fluid intelligence, reading ability, etc.), compared to local connectome features. To facilitate the exploration of the dataset we created a new web-based visualization tool and use this tool to visualize the data in the HCP tractometry dataset. Finally, we used the HCP dataset as a test-bed for a new technological innovation: the TRX file-format for representation of dMRI-based streamlines. Results: We released the processing outputs and tract profiles as a publicly available data resource through the AWS Open Data program's Open Neurodata repository. We found heritability as high as 0.9 for DKI-based metrics in some brain pathways. We also found that tractometry extracts as much useful information about individual differences as the local connectome method. We released a new web-based visualization tool for tractometry-"Tractoscope" (https://nrdg.github.io/tractoscope). We found that the TRX files require considerably less disk space-a crucial attribute for large datasets like HCP. In addition, TRX incorporates a specification for grouping streamlines, further simplifying tractometry analysis.
RESUMO
The proportions and phenotypes of immune cell subsets in peripheral blood undergo continual and dramatic remodeling throughout the human life span, which complicates efforts to identify disease-associated immune signatures in type 1 diabetes (T1D). We conducted cross-sectional flow cytometric immune profiling on peripheral blood from 826 individuals (stage 3 T1D, their first-degree relatives, those with ≥2 islet autoantibodies, and autoantibody-negative unaffected controls). We constructed an immune age predictive model in unaffected participants and observed accelerated immune aging in T1D. We used generalized additive models for location, shape, and scale to obtain age-corrected data for flow cytometry and complete blood count readouts, which can be visualized in our interactive portal (ImmScape); 46 parameters were significantly associated with age only, 25 with T1D only, and 23 with both age and T1D. Phenotypes associated with accelerated immunological aging in T1D included increased CXCR3+ and programmed cell death 1-positive (PD-1+) frequencies in naive and memory T cell subsets, despite reduced PD-1 expression levels on memory T cells. Phenotypes associated with T1D after age correction were predictive of T1D status. Our findings demonstrate advanced immune aging in T1D and highlight disease-associated phenotypes for biomarker monitoring and therapeutic interventions.
Assuntos
Diabetes Mellitus Tipo 1 , Humanos , Lactente , Estudos Transversais , Receptor de Morte Celular Programada 1 , Autoanticorpos , EnvelhecimentoRESUMO
BACKGROUND: Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required. RESULTS: To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRß CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences. CONCLUSIONS: We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.
Assuntos
Aprendizado de Máquina , Receptores ImunológicosRESUMO
Human islet antigen reactive CD4+ memory T cells (IAR T cells) play a key role in the pathogenesis of autoimmune type 1 diabetes (T1D). Using single-cell RNA sequencing (scRNA-Seq) to identify T cell receptors (TCRs) in IAR T cells, we have identified a class of TCRs that share TCRα chains between individuals ("public" chains). We isolated IAR T cells from blood of healthy, new-onset T1D and established T1D donors using multiplexed CD154 enrichment and identified paired TCRαß sequences from 2767 individual cells. More than a quarter of cells shared TCR junctions between 2 or more cells ("expanded"), and 29/47 (~62%) of expanded TCRs tested showed specificity for islet antigen epitopes. Public TCRs sharing TCRα junctions were most prominent in new-onset T1D. Public TCR sequences were more germline like than expanded unique, or "private," TCRs, and had shorter junction sequences, suggestive of fewer random nucleotide insertions. Public TCRα junctions were often paired with mismatched TCRß junctions in TCRs; remarkably, a subset of these TCRs exhibited cross-reactivity toward distinct islet antigen peptides. Our findings demonstrate a prevalent population of IAR T cells with diverse specificities determined by TCRs with restricted TCRα junctions and germline-constrained antigen recognition properties. Since these "innate-like" TCRs differ from previously described immunodominant TCRß chains in autoimmunity, they have implications for fundamental studies of disease mechanisms. Self-reactive restricted TCRα chains and their associated epitopes should be considered in fundamental and translational investigations of TCRs in T1D.
Assuntos
Diabetes Mellitus Tipo 1/genética , Células Germinativas/metabolismo , Cadeias alfa de Imunoglobulina/metabolismo , Receptores de Antígenos de Linfócitos T/metabolismo , Adolescente , Adulto , Feminino , Humanos , Masculino , Adulto JovemRESUMO
Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.
RESUMO
The human T lymphocyte compartment is highly dynamic over the course of a lifetime. Of the many changes, perhaps most notable is the transition from a predominantly naïve T cell state at birth to the acquisition of antigen-experienced memory and effector subsets following environmental exposures. These phenotypic changes, including the induction of T cell exhaustion and senescence, have the potential to negatively impact efficacy of adoptive T cell therapies (ACT). When considering ACT with CD4+CD25+CD127-/lo regulatory T cells (Tregs) for the induction of immune tolerance, we previously reported ex vivo expanded umbilical cord blood (CB) Tregs remained more naïve, suppressed responder T cells equivalently, and exhibited a more diverse T cell receptor (TCR) repertoire compared to expanded adult peripheral blood (APB) Tregs. Herein, we hypothesized that upon further characterization, we would observe increased lineage heterogeneity and phenotypic diversity in APB Tregs that might negatively impact lineage stability, engraftment capacity, and the potential for Tregs to home to sites of tissue inflammation following ACT. We compared the phenotypic profiles of human Tregs isolated from CB versus the more traditional source, APB. We conducted analysis of fresh and ex vivo expanded Treg subsets at both the single cell (scRNA-seq and flow cytometry) and bulk (microarray and cytokine profiling) levels. Single cell transcriptional profiles of pre-expansion APB Tregs highlighted a cluster of cells that showed increased expression of genes associated with effector and pro-inflammatory phenotypes (CCL5, GZMK, CXCR3, LYAR, and NKG7) with low expression of Treg markers (FOXP3 and IKZF2). CB Tregs were more diverse in TCR repertoire and homogenous in phenotype, and contained fewer effector-like cells in contrast with APB Tregs. Interestingly, expression of canonical Treg markers, such as FOXP3, TIGIT, and IKZF2, were increased in CB CD4+CD127+ conventional T cells (Tconv) compared to APB Tconv, post-expansion, implying perinatal T cells may adopt a default regulatory program. Collectively, these data identify surface markers (namely CXCR3) that could be depleted to improve purity and stability of APB Tregs, and support the use of expanded CB Tregs as a potentially optimal ACT modality for the treatment of autoimmune and inflammatory diseases.
Assuntos
Sangue Fetal/imunologia , Imunoterapia Adotiva , Linfócitos T Reguladores/imunologia , Adulto , Linhagem da Célula , Sangue Fetal/citologia , Humanos , Ativação Linfocitária , Fenótipo , RNA-Seq , Receptores de Antígenos de Linfócitos T/imunologiaRESUMO
We investigated human T-cell repertoire formation using high throughput TCRß CDR3 sequencing in immunodeficient mice receiving human hematopoietic stem cells (HSCs) and human thymus grafts. Replicate humanized mice generated diverse and highly divergent repertoires. Repertoire narrowing and increased CDR3ß sharing was observed during thymocyte selection. While hydrophobicity analysis implicated self-peptides in positive selection of the overall repertoire, positive selection favored shorter shared sequences that had reduced hydrophobicity at positions 6 and 7 of CDR3ßs, suggesting weaker interactions with self-peptides than unshared sequences, possibly allowing escape from negative selection. Sharing was similar between autologous and allogeneic thymi and occurred between different cell subsets. Shared sequences were enriched for allo-crossreactive CDR3ßs and for Type 1 diabetes-associated autoreactive CDR3ßs. Single-cell TCR-sequencing showed increased sharing of CDR3αs compared to CDR3ßs between mice. Our data collectively implicate preferential positive selection for shared human CDR3ßs that are highly cross-reactive. While previous studies suggested a role for recombination bias in producing "public" sequences in mice, our study is the first to demonstrate a role for thymic selection. Our results implicate positive selection for promiscuous TCRß sequences that likely evade negative selection, due to their low affinity for self-ligands, in the abundance of "public" human TCRß sequences.