ABSTRACT
Controlled human infections provide opportunities to study the interaction between the immune system and malaria parasites, which is essential for vaccine development. Here, we compared immune signatures of malaria-naive Europeans and of Africans with lifelong malaria exposure using mass cytometry, RNA sequencing and data integration, before and 5 and 11 days after venous inoculation with Plasmodium falciparum sporozoites. We observed differences in immune cell populations, antigen-specific responses and gene expression profiles between Europeans and Africans and among Africans with differing degrees of immunity. Before inoculation, an activated/differentiated state of both innate and adaptive cells, including elevated CD161+CD4+ T cells and interferon-γ production, predicted Africans capable of controlling parasitemia. After inoculation, the rapidity of the transcriptional response and clusters of CD4+ T cells, plasmacytoid dendritic cells and innate T cells were among the features distinguishing Africans capable of controlling parasitemia from susceptible individuals. These findings can guide the development of a vaccine effective in malaria-endemic regions.
Subject(s)
Adaptive Immunity/immunology , Disease Susceptibility/immunology , Malaria, Falciparum/immunology , Plasmodium falciparum/immunology , Adaptive Immunity/genetics , Adolescent , Adult , Antibodies, Protozoan/blood , Antibodies, Protozoan/immunology , Antigens, Protozoan/immunology , Black People/genetics , Dendritic Cells/immunology , Disease Susceptibility/blood , Disease Susceptibility/parasitology , Female , Healthy Volunteers , Host-Parasite Interactions/genetics , Host-Parasite Interactions/immunology , Humans , Immunity, Innate/genetics , Immunity, Innate/immunology , Interferon-gamma/metabolism , Malaria, Falciparum/blood , Malaria, Falciparum/parasitology , Male , RNA-Seq , Systems Analysis , T-Lymphocytes/immunology , T-Lymphocytes/metabolism , White People/genetics , Young AdultABSTRACT
Tandem repeats (TR) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs, however, the underlying bioinformatics perspectives remain challenging. We present otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and PacBio (Sequel 2 and Revio), otter and TREAT achieved state-of-the-art genotyping and motif characterisation accuracy. Applied to clinically relevant TRs, TREAT/otter significantly identified individuals with pathogenic TR expansions. When applied to a case-control setting, we significantly replicated previously reported associations of TRs with Alzheimer's Disease, including those near or within APOC1 (p=2.63x10-9), SPI1 (p=6.5x10-3) and ABCA7 (p=0.04) genes. We used TREAT/otter to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing datasets. We showed that, in rare cases (0.06%), long-read sequencing suffers from coverage drops in TRs, including the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TR across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TR in human genomes, with broad applications in research and clinical fields.
ABSTRACT
Genome-wide association studies (GWAS) have been highly informative in discovering disease-associated loci but are not designed to capture all structural variations in the human genome. Using long-read sequencing data, we discovered widespread structural variation within SINE-VNTR-Alu (SVA) elements, a class of great ape-specific transposable elements with gene-regulatory roles, which represents a major source of structural variability in the human population. We highlight the presence of structurally variable SVAs (SV-SVAs) in neurological disease-associated loci, and we further associate SV-SVAs to disease-associated SNPs and differential gene expression using luciferase assays and expression quantitative trait loci data. Finally, we genetically deleted SV-SVAs in the BIN1 and CD2AP Alzheimer's disease-associated risk loci and in the BCKDK Parkinson's disease-associated risk locus and assessed multiple aspects of their gene-regulatory influence in a human neuronal context. Together, this study reveals a novel layer of genetic variation in transposable elements that may contribute to identification of the structural variants that are the actual drivers of disease associations of GWAS loci.
Subject(s)
DNA Transposable Elements , Genome-Wide Association Study , Alu Elements , DNA Transposable Elements/genetics , Genetic Predisposition to Disease , Genetic Variation , Genome, Human , Humans , Polymorphism, Single Nucleotide , Quantitative Trait LociABSTRACT
Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.
Subject(s)
Multiomics , Neoplasms , HumansABSTRACT
SUMMARY: T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR-epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues. AVAILABILITY AND IMPLEMENTATION: Data and code are available on https://github.com/InduKhatri/tcrformer.
Subject(s)
Epitopes, T-Lymphocyte , Receptors, Antigen, T-Cell , Humans , Animals , Mice , Epitopes, T-Lymphocyte/metabolism , Receptors, Antigen, T-Cell/genetics , T-Lymphocytes/metabolism , Amino Acid Sequence , Major Histocompatibility ComplexABSTRACT
MOTIVATION: Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. RESULTS: In this work, we present a model to transfer and align cell types in cross-species analysis (TACTiCS). First, TACTiCS uses a natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterward, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex of human, mouse, and marmoset. Our model can accurately match and align cell types on these datasets. Moreover, our model outperforms Seurat and the state-of-the-art method SAMap. Finally, we show that our gene matching method results in better cell type matches than BLAST in our model. AVAILABILITY AND IMPLEMENTATION: The implementation is available on GitHub (https://github.com/kbiharie/TACTiCS). The preprocessed datasets and trained models can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.7582460).
Subject(s)
Biological Evolution , Genetic Techniques , Humans , Animals , Mice , Amino Acid Sequence , Natural Language Processing , Machine LearningABSTRACT
OBJECTIVE: To assess the feasibility of scalable, objective, and minimally invasive liquid biopsy-derived biomarkers such as cell-free DNA copy number profiles, human epididymis protein 4 (HE4), and cancer antigen 125 (CA125) for pre-operative risk assessment of early-stage ovarian cancer in a clinically representative and diagnostically challenging population and to compare the performance of these biomarkers with the Risk of Malignancy Index (RMI). METHODS: In this case-control study, we included 100 patients with an ovarian mass clinically suspected to be early-stage ovarian cancer. Of these 100 patients, 50 were confirmed to have a malignant mass (cases) and 50 had a benign mass (controls). Using WisecondorX, an algorithm used extensively in non-invasive prenatal testing, we calculated the benign-calibrated copy number profile abnormality score. This score represents how different a sample is from benign controls based on copy number profiles. We combined this score with HE4 serum concentration to separate cases and controls. RESULTS: Combining the benign-calibrated copy number profile abnormality score with HE4, we obtained a model with a significantly higher sensitivity (42% vs 0%; p<0.002) at 99% specificity as compared with the RMI that is currently employed in clinical practice. Investigating performance in subgroups, we observed especially large differences in the advanced stage and non-high-grade serous ovarian cancer groups. CONCLUSION: This study demonstrates that cell-free DNA can be successfully employed to perform pre-operative risk of malignancy assessment for ovarian masses; however, results warrant validation in a more extensive clinical study.
Subject(s)
Biomarkers, Tumor , Ovarian Neoplasms , WAP Four-Disulfide Core Domain Protein 2 , Humans , Female , Ovarian Neoplasms/blood , Ovarian Neoplasms/diagnosis , Ovarian Neoplasms/surgery , Ovarian Neoplasms/pathology , Case-Control Studies , Middle Aged , WAP Four-Disulfide Core Domain Protein 2/analysis , WAP Four-Disulfide Core Domain Protein 2/metabolism , Liquid Biopsy/methods , Biomarkers, Tumor/blood , Cell-Free Nucleic Acids/blood , Adult , Aged , CA-125 Antigen/bloodABSTRACT
Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents.
Subject(s)
Drug Screening Assays, Antitumor/methods , Gene Expression Profiling/methods , Animals , Antineoplastic Agents/therapeutic use , Biomarkers, Pharmacological/metabolism , Cell Line, Tumor/drug effects , Deep Learning , Disease Models, Animal , Forecasting/methods , Heterografts , Humans , Models, TheoreticalABSTRACT
BACKGROUND: Alzheimer's disease (AD) prevalence increases with age, yet a small fraction of the population reaches ages > 100 years without cognitive decline. We studied the genetic factors associated with such resilience against AD. METHODS: Genome-wide association studies identified 86 single nucleotide polymorphisms (SNPs) associated with AD risk. We estimated SNP frequency in 2281 AD cases, 3165 age-matched controls, and 346 cognitively healthy centenarians. We calculated a polygenic risk score (PRS) for each individual and investigated the functional properties of SNPs enriched/depleted in centenarians. RESULTS: Cognitively healthy centenarians were enriched with the protective alleles of the SNPs associated with AD risk. The protective effect concentrated on the alleles in/near ANKH, GRN, TMEM106B, SORT1, PLCG2, RIN3, and APOE genes. This translated to >5-fold lower PRS in centenarians compared to AD cases (P = 7.69 × 10-71), and 2-fold lower compared to age-matched controls (P = 5.83 × 10-17). DISCUSSION: Maintaining cognitive health until extreme ages requires complex genetic protection against AD, which concentrates on the genes associated with the endolysosomal and immune systems. HIGHLIGHTS: Cognitively healthy cent enarians are enriched with the protective alleles of genetic variants associated with Alzheimer's disease (AD). The protective effect is concentrated on variants involved in the immune and endolysosomal systems. Combining variants into a polygenic risk score (PRS) translated to > 5-fold lower PRS in centenarians compared to AD cases, and ≈ 2-fold lower compared to middle-aged healthy controls.
Subject(s)
Alzheimer Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Alzheimer Disease/genetics , Alzheimer Disease/prevention & control , Female , Male , Aged, 80 and over , Genetic Predisposition to Disease , Multifactorial Inheritance/genetics , Alleles , Case-Control StudiesABSTRACT
INTRODUCTION: Unraveling how Alzheimer's disease (AD) genetic risk is related to neuropathological heterogeneity, and whether this occurs through specific biological pathways, is a key step toward precision medicine. METHODS: We computed pathway-specific genetic risk scores (GRSs) in non-demented individuals and investigated how AD risk variants predict cerebrospinal fluid (CSF) and imaging biomarkers reflecting AD pathology, cardiovascular, white matter integrity, and brain connectivity. RESULTS: CSF amyloidbeta and phosphorylated tau were related to most GRSs. Inflammatory pathways were associated with cerebrovascular disease, whereas quantitative measures of white matter lesion and microstructure integrity were predicted by clearance and migration pathways. Functional connectivity alterations were related to genetic variants involved in signal transduction and synaptic communication. DISCUSSION: This study reveals distinct genetic risk profiles in association with specific pathophysiological aspects in predementia stages of AD, unraveling the biological substrates of the heterogeneity of AD-associated endophenotypes and promoting a step forward in disease understanding and development of personalized therapies. HIGHLIGHTS: Polygenic risk for Alzheimer's disease encompasses six biological pathways that can be quantified with pathway-specific genetic risk scores, and differentially relate to cerebrospinal fluid and imaging biomarkers. Inflammatory pathways are mostly related to cerebrovascular burden. White matter health is associated with pathways of clearance and membrane integrity, whereas functional connectivity measures are related to signal transduction and synaptic communication pathways.
Subject(s)
Alzheimer Disease , Amyloid beta-Peptides , Biomarkers , Endophenotypes , Humans , Alzheimer Disease/genetics , Alzheimer Disease/cerebrospinal fluid , Alzheimer Disease/pathology , Biomarkers/cerebrospinal fluid , Amyloid beta-Peptides/cerebrospinal fluid , Female , Male , tau Proteins/cerebrospinal fluid , Aged , Brain/pathology , Brain/diagnostic imaging , Genetic Predisposition to Disease , Middle Aged , Magnetic Resonance Imaging , White Matter/pathology , White Matter/diagnostic imagingABSTRACT
The response to lifestyle intervention studies is often heterogeneous, especially in older adults. Subtle responses that may represent a health gain for individuals are not always detected by classical health variables, stressing the need for novel biomarkers that detect intermediate changes in metabolic, inflammatory, and immunity-related health. Here, our aim was to develop and validate a molecular multivariate biomarker maximally sensitive to the individual effect of a lifestyle intervention; the Personalized Lifestyle Intervention Status (PLIS). We used 1 H-NMR fasting blood metabolite measurements from before and after the 13-week combined physical and nutritional Growing Old TOgether (GOTO) lifestyle intervention study in combination with a fivefold cross-validation and a bootstrapping method to train a separate PLIS score for men and women. The PLIS scores consisted of 14 and four metabolites for females and males, respectively. Performance of the PLIS score in tracking health gain was illustrated by association of the sex-specific PLIS scores with several classical metabolic health markers, such as BMI, trunk fat%, fasting HDL cholesterol, and fasting insulin, the primary outcome of the GOTO study. We also showed that the baseline PLIS score indicated which participants respond positively to the intervention. Finally, we explored PLIS in an independent physical activity lifestyle intervention study, showing similar, albeit remarkably weaker, associations of PLIS with classical metabolic health markers. To conclude, we found that the sex-specific PLIS score was able to track the individual short-term metabolic health gain of the GOTO lifestyle intervention study. The methodology used to train the PLIS score potentially provides a useful instrument to track personal responses and predict the participant's health benefit in lifestyle interventions similar to the GOTO study.
Subject(s)
Life Style , Obesity , Aged , Biomarkers , Cholesterol, HDL , Female , Humans , Insulin , MaleABSTRACT
p53 and p19(ARF) are tumor suppressors frequently mutated in human tumors. In a high-throughput screen in mice for mutations collaborating with either p53 or p19(ARF) deficiency, we identified 10,806 retroviral insertion sites, implicating over 300 loci in tumorigenesis. This dataset reveals 20 genes that are specifically mutated in either p19(ARF)-deficient, p53-deficient or wild-type mice (including Flt3, mmu-mir-106a-363, Smg6, and Ccnd3), as well as networks of significant collaborative and mutually exclusive interactions between cancer genes. Furthermore, we found candidate tumor suppressor genes, as well as distinct clusters of insertions within genes like Flt3 and Notch1 that induce mutants with different spectra of genetic interactions. Cross species comparative analysis with aCGH data of human cancer cell lines revealed known and candidate oncogenes (Mmp13, Slamf6, and Rreb1) and tumor suppressors (Wwox and Arfrp2). This dataset should prove to be a rich resource for the study of genetic interactions that underlie tumorigenesis.
Subject(s)
Cyclin-Dependent Kinase Inhibitor p16/metabolism , Gene Regulatory Networks , Genes, Tumor Suppressor , Neoplasms/genetics , Tumor Suppressor Protein p53/metabolism , Animals , Cell Line, Tumor , Cloning, Molecular , Cyclin-Dependent Kinase Inhibitor p16/genetics , Genes, p53 , Genomics/methods , Humans , Mice , Mice, Knockout , Mutagenesis, Insertional , Neoplasms/metabolism , Sequence Analysis, DNAABSTRACT
Genetic association studies are frequently used to study the genetic basis of numerous human phenotypes. However, the rapid interrogation of how well a certain genomic region associates across traits as well as the interpretation of genetic associations is often complex and requires the integration of multiple sources of annotation, which involves advanced bioinformatic skills. We developed snpXplorer, an easy-to-use web-server application for exploring Single Nucleotide Polymorphisms (SNP) association statistics and to functionally annotate sets of SNPs. snpXplorer can superimpose association statistics from multiple studies, and displays regional information including SNP associations, structural variations, recombination rates, eQTL, linkage disequilibrium patterns, genes and gene-expressions per tissue. By overlaying multiple GWAS studies, snpXplorer can be used to compare levels of association across different traits, which may help the interpretation of variant consequences. Given a list of SNPs, snpXplorer can also be used to perform variant-to-gene mapping and gene-set enrichment analysis to identify molecular pathways that are overrepresented in the list of input SNPs. snpXplorer is freely available at https://snpxplorer.net. Source code, documentation, example files and tutorial videos are available within the Help section of snpXplorer and at https://github.com/TesiNicco/snpXplorer.
Subject(s)
Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Software , Alzheimer Disease/genetics , Gene Expression , Genetic Association Studies , Genomics , Humans , Linkage Disequilibrium , Quantitative Trait LociABSTRACT
The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.
Subject(s)
Chickens/genetics , DNA, Intergenic/genetics , Genomics/methods , Alleles , Animals , Conserved Sequence/genetics , DNA, Intergenic/metabolism , Evolution, Molecular , Gene Frequency/genetics , Genetic Variation/genetics , Genome/genetics , Introns/genetics , Metagenomics/methods , Polymorphism, Single Nucleotide/genetics , Sequence Analysis/methodsABSTRACT
INTRODUCTION: With increasing age, neuropathological substrates associated with Alzheimer's disease (AD) accumulate in brains of cognitively healthy individuals-are they resilient, or resistant to AD-associated neuropathologies? METHODS: In 85 centenarian brains, we correlated NIA (amyloid) stages, Braak (neurofibrillary tangle) stages, and CERAD (neuritic plaque) scores with cognitive performance close to death as determined by Mini-Mental State Examination (MMSE) scores. We assessed centenarian brains against 2131 brains from AD patients, non-AD demented, and non-demented individuals in an age continuum ranging from 16 to 100+ years. RESULTS: With age, brains from non-demented individuals reached the NIA and Braak stages observed in AD patients, while CERAD scores remained lower. In centenarians, NIA stages varied (22.4% were the highest stage 3), Braak stages rarely exceeded stage IV (5.9% were V), and CERAD scores rarely exceeded 2 (4.7% were 3); within these distributions, we observed no correlation with the MMSE (NIA: P = 0.60; Braak: P = 0.08; CERAD: P = 0.16). DISCUSSION: Cognitive health can be maintained despite the accumulation of high levels of AD-related neuropathological substrates. HIGHLIGHTS: Cognitively healthy elderly have AD neuropathology levels similar to AD patients. AD neuropathology loads do not correlate with cognitive performance in centenarians. Some centenarians are resilient to the highest levels of AD neuropathology.
Subject(s)
Alzheimer Disease , Neurofibrillary Tangles , Aged, 80 and over , Humans , Aged , Adolescent , Young Adult , Adult , Middle Aged , Neurofibrillary Tangles/pathology , Plaque, Amyloid/pathology , Centenarians , Alzheimer Disease/pathology , Brain/pathologyABSTRACT
INTRODUCTION: Neuropathological substrates associated with neurodegeneration occur in brains of the oldest old. How does this affect cognitive performance? METHODS: The 100-plus Study is an ongoing longitudinal cohort study of centenarians who self-report to be cognitively healthy; post mortem brain donation is optional. In 85 centenarian brains, we explored the correlations between the levels of 11 neuropathological substrates with ante mortem performance on 12 neuropsychological tests. RESULTS: Levels of neuropathological substrates varied: we observed levels up to Thal-amyloid beta phase 5, Braak-neurofibrillary tangle (NFT) stage V, Consortium to Establish a Registry for Alzheimer's Disease (CERAD)-neuritic plaque score 3, Thal-cerebral amyloid angiopathy stage 3, Tar-DNA binding protein 43 (TDP-43) stage 3, hippocampal sclerosis stage 1, Braak-Lewy bodies stage 6, atherosclerosis stage 3, cerebral infarcts stage 1, and cerebral atrophy stage 2. Granulovacuolar degeneration occurred in all centenarians. Some high performers had the highest neuropathology scores. DISCUSSION: Only Braak-NFT stage and limbic-predominant age-related TDP-43 encephalopathy (LATE) pathology associated significantly with performance across multiple cognitive domains. Of all cognitive tests, the clock-drawing test was particularly sensitive to levels of multiple neuropathologies.
Subject(s)
Alzheimer Disease , Amyloid beta-Peptides , Aged, 80 and over , Humans , Amyloid beta-Peptides/metabolism , Centenarians , Longitudinal Studies , Alzheimer Disease/pathology , Brain/pathology , Neurofibrillary Tangles/pathology , Neuropathology , CognitionABSTRACT
The IMGT database profiles the TR germline alleles for all four TR loci (TRA, TRB, TRG and TRD), however, it does not comprise of the information regarding population specificity and allelic frequencies of these germline alleles. The specificity of allelic variants to different human populations can, however, be a rich source of information when studying the genetic basis of population-specific immune responses in disease and in vaccination. Therefore, we meticulously identified true germline alleles enriched with complete TR allele sequences and their frequencies across 26 different human populations, profiled by "1000 Genomes data". We identified 205 TRAV, 249 TRBV, 16 TRGV and 5 TRDV germline alleles supported by at least four haplotypes. The diversity of germline allelic variants in the TR loci is the highest in Africans, while the majority of the Non-African alleles are specific to the Asian populations, suggesting a diverse profile of TR germline alleles in different human populations. Interestingly, the alleles in the IMGT database are frequent and common across all five super-populations. We believe that this new set of germline TR sequences represents a valuable new resource which we have made available through the new population-matched TR (pmTR) database, accessible via https://pmtrig.lumc.nl/ .
Subject(s)
Germ Cells , Receptors, Antigen, T-Cell , Alleles , Humans , Receptors, Antigen, T-Cell/geneticsABSTRACT
Population-scale expression profiling studies can provide valuable insights into biological and disease-underlying mechanisms. The availability of phenotypic traits is essential for studying clinical effects. Therefore, missing, incomplete, or inaccurate phenotypic information can make analyses challenging and prevent RNA-seq or other omics data to be reused. A possible solution are predictors that infer clinical or behavioral phenotypic traits from molecular data. While such predictors have been developed based on different omics data types and are being applied in various studies, metabolomics-based surrogates are less commonly used than predictors based on DNA methylation profiles.In this study, we inferred 17 traits, including diabetes status and exposure to lipid medication, using previously trained metabolomic predictors. We evaluated whether these metabolomic surrogates can be used as an alternative to reported information for studying the respective phenotypes using expression profiling data of four population cohorts. For the majority of the 17 traits, the metabolomic surrogates performed similarly to the reported phenotypes in terms of effect sizes, number of significant associations, replication rates, and significantly enriched pathways.The application of metabolomics-derived surrogate outcomes opens new possibilities for reuse of multi-omics data sets. In studies where availability of clinical metadata is limited, missing or incomplete information can be complemented by these surrogates, thereby increasing the size of available data sets. Additionally, the availability of such surrogates could be used to correct for potential biological confounding. In the future, it would be interesting to further investigate the use of molecular predictors across different omics types and cohorts.
Subject(s)
Metabolomics , PhenotypeABSTRACT
MOTIVATION: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. RESULTS: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. AVAILABILITY AND IMPLEMENTATION: Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Proteins , Software , Amino Acid Sequence , Neural Networks, Computer , Proteins/geneticsABSTRACT
The integration of metabolomics data with sequencing data is a key step towards improving the diagnostic process for finding the disease-causing genetic variant(s) in patients suspected of having an inborn error of metabolism (IEM). The measured metabolite levels could provide additional phenotypical evidence to elucidate the degree of pathogenicity for variants found in genes associated with metabolic processes. We present a computational approach, called Reafect, that calculates for each reaction in a metabolic pathway a score indicating whether that reaction is deficient or not. When calculating this score, Reafect takes multiple factors into account: the magnitude and sign of alterations in the metabolite levels, the reaction distances between metabolites and reactions in the pathway, and the biochemical directionality of the reactions. We applied Reafect to untargeted metabolomics data of 72 patient samples with a known IEM and found that in 81% of the cases the correct deficient enzyme was ranked within the top 5% of all considered enzyme deficiencies. Next, we integrated Reafect with Combined Annotation Dependent Depletion (CADD) scores (a measure for gene variant deleteriousness) and ranked the metabolic genes of 27 IEM patients. We observed that this integrated approach significantly improved the prioritization of the genes containing the disease-causing variant when compared with the two approaches individually. For 15/27 IEM patients the correct affected gene was ranked within the top 0.25% of the set of potentially affected genes. Together, our findings suggest that metabolomics data improves the identification of affected genes in patients suffering from IEM.