ABSTRACT
Co-occurrence of diseases decreases patient quality of life, complicates treatment choices, and increases mortality. Analyses of electronic health records present a complex scenario of comorbidity relationships that vary by age, sex, and cohort under study. The study of similarities between diseases using 'omics data, such as genes altered in diseases, gene expression, proteome, and microbiome, are fundamental to uncovering the origin of, and potential treatment for, comorbidities. Recent studies have produced a first generation of genetic interpretations for as much as 46% of the comorbidities described in large cohorts. Integrating different sources of molecular information and using artificial intelligence (AI) methods are promising approaches for the study of comorbidities. They may help to improve the treatment of comorbidities, including the potential repositioning of drugs.
Subject(s)
Artificial Intelligence , Quality of Life , Humans , ComorbidityABSTRACT
Here we describe the LifeTime Initiative, which aims to track, understand and target human cells during the onset and progression of complex diseases, and to analyse their response to therapy at single-cell resolution. This mission will be implemented through the development, integration and application of single-cell multi-omics and imaging, artificial intelligence and patient-derived experimental disease models during the progression from health to disease. The analysis of large molecular and clinical datasets will identify molecular mechanisms, create predictive computational models of disease progression, and reveal new drug targets and therapies. The timely detection and interception of disease embedded in an ethical and patient-centred vision will be achieved through interactions across academia, hospitals, patient associations, health data management systems and industry. The application of this strategy to key medical challenges in cancer, neurological and neuropsychiatric disorders, and infectious, chronic inflammatory and cardiovascular diseases at the single-cell level will usher in cell-based interceptive medicine in Europe over the next decade.
Subject(s)
Cell- and Tissue-Based Therapy , Delivery of Health Care/methods , Delivery of Health Care/trends , Medicine/methods , Medicine/trends , Pathology , Single-Cell Analysis , Artificial Intelligence , Delivery of Health Care/ethics , Delivery of Health Care/standards , Early Diagnosis , Education, Medical , Europe , Female , Health , Humans , Legislation, Medical , Male , Medicine/standardsABSTRACT
According to the Principle of Minimal Frustration, folded proteins can only have a minimal number of strong energetic conflicts in their native states. However, not all interactions are energetically optimized for folding but some remain in energetic conflict, i.e. they are highly frustrated. This remaining local energetic frustration has been shown to be statistically correlated with distinct functional aspects such as protein-protein interaction sites, allosterism and catalysis. Fuelled by the recent breakthroughs in efficient protein structure prediction that have made available good quality models for most proteins, we have developed a strategy to calculate local energetic frustration within large protein families and quantify its conservation over evolutionary time. Based on this evolutionary information we can identify how stability and functional constraints have appeared at the common ancestor of the family and have been maintained over the course of evolution. Here, we present FrustraEvo, a web server tool to calculate and quantify the conservation of local energetic frustration in protein families.
Subject(s)
Internet , Protein Folding , Proteins , Software , Proteins/chemistry , Thermodynamics , Protein Conformation , Evolution, Molecular , Models, MolecularABSTRACT
MOTIVATION: The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures. However, most of these tools are web-based and not well-suited for large-scale genomic data analysis. RESULTS: To address this issue, we introduce 3Dmapper, a stand-alone command-line tool developed in Python and R. It systematically maps annotated protein positions and variants to protein structures, providing a solution that is both efficient and reliable. AVAILABILITY AND IMPLEMENTATION: https://github.com/vicruiser/3Dmapper.
Subject(s)
Biological Specimen Banks , Software , Proteins/chemistry , GenomicsABSTRACT
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Subject(s)
Biomedical Research , Genome, Human , Human Genome Project , Europe , HumansABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
BACKGROUND: Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous condition. We hypothesized that the unbiased integration of different COPD lung omics using a novel multi-layer approach may unravel mechanisms associated with clinical characteristics. METHODS: We profiled mRNA, miRNA and methylome in lung tissue samples from 135 former smokers with COPD. For each omic (layer) we built a patient network based on molecular similarity. The three networks were used to build a multi-layer network, and optimization of multiplex-modularity was employed to identify patient communities across the three distinct layers. Uncovered communities were related to clinical features. RESULTS: We identified five patient communities in the multi-layer network which were molecularly distinct and related to clinical characteristics, such as FEV1 and blood eosinophils. Two communities (C#3 and C#4) had both similarly low FEV1 values and emphysema, but were molecularly different: C#3, but not C#4, presented B and T cell signatures and a downregulation of secretory (SCGB1A1/SCGB3A1) and ciliated cells. A machine learning model was set up to discriminate C#3 and C#4 in our cohort, and to validate them in an independent cohort. Finally, using spatial transcriptomics we characterized the small airway differences between C#3 and C#4, identifying an upregulation of T/B cell homing chemokines, and bacterial response genes in C#3. CONCLUSIONS: A novel multi-layer network analysis is able to identify clinically relevant COPD patient communities. Patients with similarly low FEV1 and emphysema can have molecularly distinct small airways and immune response patterns, indicating that different endotypes can lead to similar clinical presentation.
ABSTRACT
BACKGROUND: The co-administration of drugs known to interact greatly impacts morbidity, mortality, and health economics. This study aims to examine the drug-drug interaction (DDI) phenomenon with a large-scale longitudinal analysis of age and gender differences found in drug administration data from three distinct healthcare systems. METHODS: This study analyzes drug administrations from population-wide electronic health records in Blumenau (Brazil; 133 K individuals), Catalonia (Spain; 5.5 M individuals), and Indianapolis (USA; 264 K individuals). The stratified prevalences of DDI for multiple severity levels per patient gender and age at the time of administration are computed, and null models are used to estimate the expected impact of polypharmacy on DDI prevalence. Finally, to study actionable strategies to reduce DDI prevalence, alternative polypharmacy regimens using drugs with fewer known interactions are simulated. RESULTS: A large prevalence of co-administration of drugs known to interact is found in all populations, affecting 12.51%, 12.12%, and 10.06% of individuals in Blumenau, Indianapolis, and Catalonia, respectively. Despite very different healthcare systems and drug availability, the increasing prevalence of DDI as patients age is very similar across all three populations and is not explained solely by higher co-administration rates in the elderly. In general, the prevalence of DDI is significantly higher in women - with the exception of men over 50 years old in Indianapolis. Finally, we show that using proton pump inhibitor alternatives to omeprazole (the drug involved in more co-administrations in Catalonia and Blumenau), the proportion of patients that are administered known DDI can be reduced by up to 21% in both Blumenau and Catalonia and 2% in Indianapolis. CONCLUSIONS: DDI administration has a high incidence in society, regardless of geographic, population, and healthcare management differences. Although DDI prevalence increases with age, our analysis points to a complex phenomenon that is much more prevalent than expected, suggesting comorbidities as key drivers of the increase. Furthermore, the gender differences observed in most age groups across populations are concerning in regard to gender equity in healthcare. Finally, our study exemplifies how electronic health records' analysis can lead to actionable interventions that significantly reduce the administration of known DDI and its associated human and economic costs.
Subject(s)
Polypharmacy , Male , Humans , Female , Aged , Middle Aged , Pharmaceutical Preparations , Prevalence , Drug Interactions , ComorbidityABSTRACT
BACKGROUND: Neoantigens are patient- and tumor-specific peptides that arise from somatic mutations. They stand as promising targets for personalized therapeutic cancer vaccines. The identification process for neoantigens has evolved with the use of next-generation sequencing technologies and bioinformatic tools in tumor genomics. However, in-silico strategies for selecting immunogenic neoantigens still have very low accuracy rates, since they mainly focus on predicting peptide binding to Major Histocompatibility Complex (MHC) molecules, which is key but not the sole determinant for immunogenicity. Moreover, the therapeutic potential of neoantigen-based vaccines may be enhanced using an optimal delivery platform that elicits robust de novo immune responses. METHODS: We developed a novel neoantigen selection pipeline based on existing software combined with a novel prediction method, the Neoantigen Optimization Algorithm (NOAH), which takes into account structural features of the peptide/MHC-I interaction, as well as the interaction between the peptide/MHC-I complex and the TCR, in its prediction strategy. Moreover, to maximize neoantigens' therapeutic potential, neoantigen-based vaccines should be manufactured in an optimal delivery platform that elicits robust de novo immune responses and bypasses central and peripheral tolerance. RESULTS: We generated a highly immunogenic vaccine platform based on engineered HIV-1 Gag-based Virus-Like Particles (VLPs) expressing a high copy number of each in silico selected neoantigen. We tested different neoantigen-loaded VLPs (neoVLPs) in a B16-F10 melanoma mouse model to evaluate their capability to generate new immunogenic specificities. NeoVLPs were used in in vivo immunogenicity and tumor challenge experiments. CONCLUSIONS: Our results indicate the relevance of incorporating other immunogenic determinants beyond the binding of neoantigens to MHC-I. Thus, neoVLPs loaded with neoantigens enhancing the interaction with the TCR can promote the generation of de novo antitumor-specific immune responses, resulting in a delay in tumor growth. Vaccination with the neoVLP platform is a robust alternative to current therapeutic vaccine approaches and a promising candidate for future personalized immunotherapy.
Subject(s)
Cancer Vaccines , Neoplasms , Vaccines , Humans , Animals , Mice , Neoplasms/genetics , Antigens, Neoplasm/metabolism , Peptides , Receptors, Antigen, T-Cell/metabolism , Immunotherapy/methodsABSTRACT
In mammalian cells, chromosomal replication starts at thousands of origins at which replisomes are assembled. Replicative stress triggers additional initiation events from 'dormant' origins whose genomic distribution and regulation are not well understood. In this study, we have analyzed origin activity in mouse embryonic stem cells in the absence or presence of mild replicative stress induced by aphidicolin, a DNA polymerase inhibitor, or by deregulation of origin licensing factor CDC6. In both cases, we observe that the majority of stress-responsive origins are also active in a small fraction of the cell population in a normal S phase, and stress increases their frequency of activation. In a search for the molecular determinants of origin efficiency, we compared the genetic and epigenetic features of origins displaying different levels of activation, and integrated their genomic positions in three-dimensional chromatin interaction networks derived from high-depth Hi-C and promoter-capture Hi-C data. We report that origin efficiency is directly proportional to the proximity to transcriptional start sites and to the number of contacts established between origin-containing chromatin fragments, supporting the organization of origins in higher-level DNA replication factories.
Subject(s)
Chromatin , Replication Origin , Animals , Mice , Replication Origin/genetics , Chromatin/genetics , Mouse Embryonic Stem Cells/metabolism , DNA Replication/genetics , Cell Cycle Proteins/metabolism , Mammals/geneticsABSTRACT
The MYC axis is disrupted in cancer, predominantly through activation of the MYC family oncogenes but also through inactivation of the MYC partner MAX or of the MAX partner MGA. MGA and MAX are also members of the polycomb repressive complex, ncPRC1.6. Here, we use genetically modified MAX-deficient small-cell lung cancer (SCLC) cells and carry out genome-wide and proteomics analyses to study the tumor suppressor function of MAX. We find that MAX mutant SCLCs have ASCL1 or NEUROD1 or combined ASCL1/NEUROD1 characteristics and lack MYC transcriptional activity. MAX restitution triggers prodifferentiation expression profiles that shift when MAX and oncogenic MYC are coexpressed. Although ncPRC1.6 can be formed, the lack of MAX restricts global MGA occupancy, selectively driving its recruitment toward E2F6-binding motifs. Conversely, MAX restitution enhances MGA occupancy to repress genes involved in different functions, including stem cell and DNA repair/replication. Collectively, these findings reveal that MAX mutant SCLCs have either ASCL1 or NEUROD1 or combined characteristics and are MYC independent and exhibit deficient ncPRC1.6-mediated gene repression.
Subject(s)
Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/metabolism , Basic Helix-Loop-Helix Transcription Factors/metabolism , Gene Expression Regulation, Neoplastic , Lung Neoplasms/pathology , Polycomb-Group Proteins/metabolism , Proto-Oncogene Proteins c-myc/metabolism , Small Cell Lung Carcinoma/pathology , Apoptosis , Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/genetics , Basic Helix-Loop-Helix Transcription Factors/genetics , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Cell Cycle Proteins/genetics , Cell Cycle Proteins/metabolism , Cell Proliferation , Humans , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Polycomb-Group Proteins/genetics , Promoter Regions, Genetic , Proto-Oncogene Proteins c-myc/genetics , Small Cell Lung Carcinoma/genetics , Small Cell Lung Carcinoma/metabolism , Tumor Cells, CulturedABSTRACT
MOTIVATION: The analysis of cancer genomes provides fundamental information about its etiology, the processes driving cell transformation or potential treatments. While researchers and clinicians are often only interested in the identification of oncogenic mutations, actionable variants or mutational signatures, the first crucial step in the analysis of any tumor genome is the identification of somatic variants in cancer cells (i.e. those that have been acquired during their evolution). For that purpose, a wide range of computational tools have been developed in recent years to detect somatic mutations in sequencing data from tumor samples. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. RESULTS: Here, we quantify the impact of variant calling decisions by comparing the results obtained in three important analyses of cancer genomics data (identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants) when changing the somatic variant caller (MuSE, MuTect2, SomaticSniper and VarScan2) or the strategy to combine them (Consensus of two, Consensus of three and Union) across all 33 cancer types from The Cancer Genome Atlas. Our results show that variant calling decisions have a significant impact on these analyses, creating important differences that could even impact treatment decisions for some patients. Moreover, the Consensus of three calling strategy to combine the output of multiple variant calling tools, a very widely used strategy by the research community, can lead to the loss of some cancer driver genes and actionable mutations. Overall, our results highlight the limitations of widespread practices within the cancer genomics community and point to important differences in critical analyses of tumor sequencing data depending on variant calling, affecting even the identification of clinically actionable variants. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/carlosgarciaprieto/VariantCallingClinicalBenchmark. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
High-Throughput Nucleotide Sequencing , Neoplasms , Humans , High-Throughput Nucleotide Sequencing/methods , Mutation , Genomics , Neoplasms/genetics , Oncogenes , Carcinogenesis/genetics , SoftwareABSTRACT
The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications.
Subject(s)
Protein Folding , Proteome , Proteomics/methods , Humans , Models, Molecular , Proteome/analysis , Proteome/chemistry , Proteome/metabolismABSTRACT
Cohesin exists in two variants containing STAG1 or STAG2. STAG2 is one of the most mutated genes in cancer and a major bladder tumor suppressor. Little is known about how its inactivation contributes to tumorigenesis. Here, we analyze the genomic distribution of STAG1 and STAG2 and perform STAG2 loss-of-function experiments using RT112 bladder cancer cells; we then analyze the genomic effects by integrating gene expression and chromatin interaction data. Functional compartmentalization exists between the cohesin complexes: cohesin-STAG2 displays a distinctive genomic distribution and mediates short and mid-ranged interactions that engage genes at higher frequency than those established by cohesin-STAG1. STAG2 knockdown results in down-regulation of the luminal urothelial signature and up-regulation of the basal transcriptional program, mirroring differences between STAG2-high and STAG2-low human bladder tumors. This is accompanied by rewiring of DNA contacts within topological domains, while compartments and domain boundaries remain refractive. Contacts lost upon depletion of STAG2 are assortative, preferentially occur within silent chromatin domains, and are associated with de-repression of lineage-specifying genes. Our findings indicate that STAG2 participates in the DNA looping that keeps the basal transcriptional program silent and thus sustains the luminal program. This mechanism may contribute to the tumor suppressor function of STAG2 in the urothelium.
Subject(s)
Cell Cycle Proteins/genetics , Chromatin/chemistry , Loss of Function Mutation , Nuclear Proteins/genetics , Transcription, Genetic , Urinary Bladder Neoplasms/genetics , Base Sequence , Cell Cycle Proteins/antagonists & inhibitors , Cell Cycle Proteins/metabolism , Cell Line, Tumor , Chromatin/metabolism , Chromosomal Proteins, Non-Histone/genetics , Chromosomal Proteins, Non-Histone/metabolism , DNA, Neoplasm/genetics , DNA, Neoplasm/metabolism , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Gene Ontology , HEK293 Cells , Histones/genetics , Histones/metabolism , Humans , Molecular Sequence Annotation , Nuclear Proteins/metabolism , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism , Signal Transduction , Urinary Bladder Neoplasms/metabolism , Urinary Bladder Neoplasms/pathologyABSTRACT
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological agent of COVID-19, is considered a zoonotic pathogen mainly transmitted human to human. Few reports indicate that pets may be exposed to the virus. The present report describes a cat suffering from severe respiratory distress and thrombocytopenia living with a family with several members affected by COVID-19. Clinical signs of the cat prompted humanitarian euthanasia and a detailed postmortem investigation to assess whether a COVID-19-like disease was causing the condition. Necropsy results showed the animal suffered from feline hypertrophic cardiomyopathy and severe pulmonary edema and thrombosis. SARS-CoV-2 RNA was only detected in nasal swab, nasal turbinates, and mesenteric lymph node, but no evidence of histopathological lesions compatible with a viral infection were detected. The cat seroconverted against SARS-CoV-2, further evidencing a productive infection in this animal. We conclude that the animal had a subclinical SARS-CoV-2 infection concomitant to an unrelated cardiomyopathy that led to euthanasia.
Subject(s)
Betacoronavirus/isolation & purification , Cardiomyopathy, Hypertrophic/veterinary , Coronavirus Infections/veterinary , Pandemics/veterinary , Pneumonia, Viral/veterinary , Animals , COVID-19 , Cardiomyopathy, Hypertrophic/pathology , Cardiomyopathy, Hypertrophic/virology , Cats , Coronavirus Infections/complications , Coronavirus Infections/pathology , Fatal Outcome , Humans , Incidental Findings , Pneumonia, Viral/complications , Pneumonia, Viral/pathology , SARS-CoV-2ABSTRACT
Introns can be extraordinarily large and they account for the majority of the DNA sequence in human genes. However, little is known about their population patterns of structural variation and their functional implication. By combining the most extensive maps of CNVs in human populations, we have found that intronic losses are the most frequent copy number variants (CNVs) in protein-coding genes in human, with 12,986 intronic deletions, affecting 4,147 genes (including 1,154 essential genes and 1,638 disease-related genes). This intronic length variation results in dozens of genes showing extreme population variability in size, with 40 genes with 10 or more different sizes and up to 150 allelic sizes. Intronic losses are frequent in evolutionarily ancient genes that are highly conserved at the protein sequence level. This result contrasts with losses overlapping exons, which are observed less often than expected by chance and almost exclusively affect primate-specific genes. An integrated analysis of CNVs and RNA-seq data showed that intronic loss can be associated with significant differences in gene expression levels in the population (CNV-eQTLs). These intronic CNV-eQTLs regions are enriched for intronic enhancers and can be associated with expression differences of other genes showing long distance intron-promoter 3D interactions. Our data suggests that intronic structural variation of protein-coding genes makes an important contribution to the variability of gene expression and splicing in human populations.
Subject(s)
DNA Copy Number Variations/genetics , Evolution, Molecular , Genetics, Population , Quantitative Trait Loci/genetics , Alleles , Exons/genetics , Gene Dosage/genetics , Gene Expression Regulation , Genome, Human/genetics , Humans , Introns/genetics , RNA Splicing/geneticsABSTRACT
Challenges of health systems in Latin America and the Caribbean include accessibility, inequity, segmentation, and poverty. These challenges are similar in different countries of the region and transcend national borders. The increasing digital transformation of health care holds promise of more precise interventions, improved health outcomes, increased efficiency, and ultimately reduced health-care costs. In Latin America and the Caribbean, the adoption of digital health tools is in early stages and the quality of cancer registries, electronic health records, and structured databases are problematic. Cancer research and innovation in the region are limited due to inadequate academic resources and translational research is almost fully dependent on public funding. Regulatory complexity and extended timelines jeopardise the potential improvement in participation in international studies. Emerging technologies, artificial intelligence, big data, and cancer research represent an opportunity to address the health-care challenges in Latin America and the Caribbean collectively, by optimising national capacities, sharing and comparing best practices, and transferring scientific and technical capabilities.
Subject(s)
Biomedical Research/trends , Neoplasms/prevention & control , Precision Medicine/trends , Artificial Intelligence , Big Data , Biomedical Research/statistics & numerical data , Caribbean Region/epidemiology , Digital Technology , Electronic Health Records , Humans , Latin America/epidemiology , Neoplasms/epidemiology , Precision Medicine/statistics & numerical dataABSTRACT
Alternative splicing is commonly believed to be a major source of cellular protein diversity. However, although many thousands of alternatively spliced transcripts are routinely detected in RNA-seq studies, reliable large-scale mass spectrometry-based proteomics analyses identify only a small fraction of annotated alternative isoforms. The clearest finding from proteomics experiments is that most human genes have a single main protein isoform, while those alternative isoforms that are identified tend to be the most biologically plausible: those with the most cross-species conservation and those that do not compromise functional domains. Indeed, most alternative exons do not seem to be under selective pressure, suggesting that a large majority of predicted alternative transcripts may not even be translated into proteins.
Subject(s)
Alternative Splicing/genetics , Proteome/genetics , Exons , Protein Isoforms/genetics , ProteomicsABSTRACT
We present the results of the assessment of the intramolecular residue-residue contact and distance predictions from groups participating in the 14th round of the CASP experiment. The performance of contact prediction methods was evaluated with the measures used in previous CASPs, while distance predictions were assessed based on a new protocol, which considers individual distance pairs as well as the whole predicted distance matrix, using a graph-based framework. The results of the evaluation indicate that predictions by the tFold framework, TripletRes and DeepPotential were the most accurate in both categories. With regards to progress in method performance, the results of the assessment in contact prediction did not reveal any discernible difference when compared to CASP13. Arguably, this could be due to CASP14 FM targets being more challenging than ever before.