RESUMEN
SUMMARY: Most tools for normalizing NanoString gene expression data, apart from the default NanoString nCounter software, are R packages that focus on technical normalization and lack configurable parameters. However, content normalization is the most sensitive, experiment-specific, and relevant step to preprocess NanoString data. Currently this step requires the use of multiple tools and a deep understanding of data management by the researcher. We present GUANIN, a comprehensive normalization tool that integrates both new and well-established methods, offering a wide variety of options to introduce, filter, choose, and evaluate reference genes for content normalization. GUANIN allows the introduction of genes from an endogenous subset as reference genes, addressing housekeeping-related selection problems. It performs a specific and straightforward normalization approach for each experiment, using a wide variety of parameters with suggested default values. GUANIN provides a large number of informative output files that enable the iterative refinement of the normalization process. In terms of normalization, GUANIN matches or outperforms other available methods. Importantly, it allows researchers to interact comprehensively with the data preprocessing step without programming knowledge, thanks to its easy-to-use Graphical User Interface (GUI). AVAILABILITY AND IMPLEMENTATION: GUANIN can be installed with pip install GUANIN and it is available at https://pypi.org/project/guanin/. Source code, documentation, and case studies are available at https://github.com/julimontoto/guanin under the GPLv3 license.
Asunto(s)
Programas Informáticos , Perfilación de la Expresión Génica/métodos , Humanos , Interfaz Usuario-ComputadorRESUMEN
Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75-10.05, p = 5.41x10-7). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights.
Asunto(s)
COVID-19 , Exoma , Humanos , Exoma/genética , Estudio de Asociación del Genoma Completo , COVID-19/genética , Predisposición Genética a la Enfermedad , Receptor Toll-Like 7/genética , SARS-CoV-2/genéticaRESUMEN
Superspreading and variants of concern (VOC) of the human pathogen severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are the main catalyzers of the coronavirus disease 2019 (COVID-19) pandemic. However, measuring their individual impact is challenging. By examining the largest database of SARS-CoV-2 genomes The Global Initiative on Sharing Avian Influenza Data [GISAID; n >1.2 million high-quality (HQ) sequences], we present evidence suggesting that superspreading has had a key role in the epidemiological predominance of VOC. There are clear signatures in the database compatible with large superspreading events (SSEs) coinciding chronologically with the worst epidemiological scenarios triggered by VOC. The data suggest that, without the randomness effect of the genetic drift facilitated by superspreading, new VOC of SARS-CoV-2 would have had more limited chance of success.
Asunto(s)
COVID-19 , Pandemias , SARS-CoV-2/clasificación , Animales , HumanosRESUMEN
BACKGROUND: Food protein-induced enterocolitis syndrome (FPIES) is a food allergy primarily affecting infants, often leading to vomiting and shock. Due to its poorly understood pathophysiology and lack of specific biomarkers, diagnosis is frequently delayed. Understanding FPIES genetics can shed light on disease susceptibility and pathophysiology-key to developing diagnostic, prognostic, preventive and therapeutic strategies. Using a well-characterised cohort of patients we explored the potential genome-wide susceptibility factors underlying FPIES. METHODS: Blood samples from 41 patients with oral food challenge-proven FPIES were collected for a comprehensive whole exome sequencing association study. RESULTS: Notable genetic variants, including rs872786 (RBM8A), rs2241880 (ATG16L1) and rs2289477 (ATG16L1), were identified as significant findings in FPIES. A weighted SKAT model identified six other associated genes including DGKZ and SIRPA. DGKZ induces TGF-ß signalling, crucial for epithelial barrier integrity and IgA production; RBM8A is associated with thrombocytopenia absent radius syndrome, frequently associated with cow's milk allergy; SIRPA is associated with increased neutrophils/monocytes in inflamed tissues as often observed in FPIES; ATG16L1 is associated with inflammatory bowel disease. Coexpression correlation analysis revealed a functional correlation between RBM8A and filaggrin gene (FLG) in stomach and intestine tissue, with filaggrin being a known key pathogenic and risk factor for IgE-mediated food allergy. A transcriptome-wide association study suggested genetic variability in patients impacted gene expression of RBM8A (stomach and pancreas) and ATG16L1 (transverse colon). CONCLUSIONS: This study represents the first case-control exome association study of FPIES patients and marks a crucial step towards unravelling genetic susceptibility factors underpinning the syndrome. Our findings highlight potential factors and pathways contributing to FPIES, including epithelial barrier dysfunction and immune dysregulation. While these results are novel, they are preliminary and need further validation in a second cohort of patients.
RESUMEN
The human pathogen severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the major pandemic of the twenty-first century. We analyzed more than 4700 SARS-CoV-2 genomes and associated metadata retrieved from public repositories. SARS-CoV-2 sequences have a high sequence identity (>99.9%), which drops to >96% when compared to bat coronavirus genome. We built a mutation-annotated reference SARS-CoV-2 phylogeny with two main macro-haplogroups, A and B, both of Asian origin, and more than 160 sub-branches representing virus strains of variable geographical origins worldwide, revealing a rather uniform mutation occurrence along branches that could have implications for diagnostics and the design of future vaccines. Identification of the root of SARS-CoV-2 genomes is not without problems, owing to conflicting interpretations derived from either using the bat coronavirus genomes as an outgroup or relying on the sampling chronology of the SARS-CoV-2 genomes and TMRCA estimates; however, the overall scenario favors haplogroup A as the ancestral node. Phylogenetic analysis indicates a TMRCA for SARS-CoV-2 genomes dating to November 12, 2019, thus matching epidemiological records. Sub-haplogroup A2 most likely originated in Europe from an Asian ancestor and gave rise to subclade A2a, which represents the major non-Asian outbreak, especially in Africa and Europe. Multiple founder effect episodes, most likely associated with super-spreader hosts, might explain COVID-19 pandemic to a large extent.
Asunto(s)
Betacoronavirus/genética , Infecciones por Coronavirus/epidemiología , Genoma Viral/genética , Neumonía Viral/epidemiología , Animales , Asia/epidemiología , Secuencia de Bases/genética , COVID-19 , Quirópteros/virología , Mapeo Cromosómico , Europa (Continente)/epidemiología , Evolución Molecular , Variación Genética/genética , Humanos , Pandemias , Filogenia , Filogeografía , SARS-CoV-2 , Homología de Secuencia de Ácido NucleicoRESUMEN
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the pathogen responsible for the coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV-2 genomes have been sequenced massively and worldwide and are now available in different public genome repositories. There is much interest in generating bioinformatic tools capable to analyze and interpret SARS-CoV-2 variation. We have designed CovidPhy (http://covidphy.eu), a web interface that can process SARS-CoV-2 genome sequences in plain fasta text format or provided through identity codes from the Global Initiative on Sharing Avian Influenza Data (GISAID) or GenBank. CovidPhy aggregates information available on the large GISAID database (>1.49 M genomes). Sequences are first aligned against the reference sequence and the interface provides different sources of information, including automatic classification of genomes into a pre-computed phylogeny and phylogeographic information, haplogroup/lineage frequencies, and sequencing variation, indicating also if the genome contains known variants of concern (VOC). Additionally, CovidPhy allows searching for variants and haplotypes introduced by the user and includes a list of genomes that are good candidates for being responsible for large outbreaks worldwide, most likely mediated by important superspreading events, indicating their possible geographic epicenters and their relative impact as recorded in the GISAID database.
Asunto(s)
COVID-19 , Genoma Viral , Filogenia , SARS-CoV-2 , COVID-19/virología , Bases de Datos Genéticas , Humanos , Internet , Pandemias , Filogeografía , SARS-CoV-2/genética , Programas InformáticosRESUMEN
Coronavirus Disease-19 (COVID-19) symptoms range from mild to severe illness; the cause for this differential response to infection remains unknown. Unravelling the immune mechanisms acting at different levels of the colonization process might be key to understand these differences. We carried out a multi-tissue (nasal, buccal and blood; n = 156) gene expression analysis of immune-related genes from patients affected by different COVID-19 severities, and healthy controls through the nCounter technology. Mild and asymptomatic cases showed a powerful innate antiviral response in nasal epithelium, characterized by activation of interferon (IFN) pathway and downstream cascades, successfully controlling the infection at local level. In contrast, weak macrophage/monocyte driven innate antiviral response and lack of IFN signalling activity were present in severe cases. Consequently, oral mucosa from severe patients showed signals of viral activity, cell arresting and viral dissemination to the lower respiratory tract, which ultimately could explain the exacerbated innate immune response and impaired adaptative immune responses observed at systemic level. Results from saliva transcriptome suggest that the buccal cavity might play a key role in SARS-CoV-2 infection and dissemination in patients with worse prognosis. Co-expression network analysis adds further support to these findings, by detecting modules specifically correlated with severity involved in the abovementioned biological routes; this analysis also provides new candidate genes that might be tested as biomarkers in future studies. We also found tissue specific severity-related signatures mainly represented by genes involved in the innate immune system and cytokine/chemokine signalling. Local immune response could be key to determine the course of the systemic response and thus COVID-19 severity. Our findings provide a framework to investigate severity host gene biomarkers and pathways that might be relevant to diagnosis, prognosis, and therapy.
Asunto(s)
COVID-19 , Antivirales , Biomarcadores , COVID-19/genética , Perfilación de la Expresión Génica/métodos , Humanos , Inmunidad Innata/genética , Mucosa Nasal , SARS-CoV-2RESUMEN
Establishing the timeframe when a particular virus was circulating in a population could be useful in several areas of biomedical research, including microbiology and legal medicine. Using simulations, we demonstrate that the circulation timeframe of an unknown SARS-CoV-2 genome in a population (hereafter, estimated time of a queried genome [QG]; tE-QG) can be easily predicted using a phylogenetic model based on a robust reference genome database of the virus, and information on their sampling dates. We evaluate several phylogeny-based approaches, including modeling evolutionary (substitution) rates of the SARS-CoV-2 genome (~10-3 substitutions/nucleotide/year) and the mutational (substitutions) differences separating the QGs from the reference genomes (RGs) in the database. Owing to the mutational characteristics of the virus, the present Viral Molecular Clock Dating (VMCD) method covers timeframes going backwards from about a month in the past. The method has very low errors associated to the tE-QG estimates and narrow intervals of tE-QG, both ranging from a few days to a few weeks regardless of the mathematical model used. The SARS-CoV-2 model represents a proof of concept that can be extrapolated to any other microorganism, provided that a robust genome sequence database is available. Besides obvious applications in epidemiology and microbiology investigations, there are several contexts in forensic casework where estimating tE-QG could be useful, including estimation of the postmortem intervals (PMI) and the dating of samples stored in hospital settings.
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Filogenia , Genoma Viral , MutaciónRESUMEN
The fight against the spread of antibiotic resistance is one of the most important challenges facing health systems worldwide. Given the limitations of current diagnostic methods, the development of fast and accurate tests for the diagnosis of viral and bacterial infections would improve patient management and treatment, as well as contribute to reducing antibiotic misuse in clinical settings. In this scenario, analysis of host transcriptomics constitutes a promising target to develop new diagnostic tests based on the host-specific response to infections. We carried out a multi-cohort meta-analysis of blood transcriptomic data available in public databases, including 11 different studies and 1209 samples from virus- (n = 695) and bacteria- (n = 514) infected patients. We applied a Parallel Regularized Regression Model Search (PReMS) on a set of previously reported genes that distinguished viral from bacterial infection to find a minimum gene expression bio-signature. This strategy allowed us to detect three genes, namely BAFT, ISG15 and DNMT1, that clearly differentiate groups of infection with high accuracy (training set: area under the curve (AUC) 0.86 (sensitivity: 0.81; specificity: 0.87); testing set: AUC 0.87 (sensitivity: 0.82; specificity: 0.86)). BAFT and ISG15 are involved in processes related to immune response, while DNMT1 is related to the preservation of methylation patterns, and its expression is modulated by pathogen infections. We successfully tested this three-transcript signature in the 11 independent studies, demonstrating its high performance under different scenarios. The main advantage of this three-gene signature is the low number of genes needed to differentiate both groups of patient categories.
Asunto(s)
Infecciones Bacterianas/genética , Interacciones Huésped-Patógeno/genética , Transcriptoma , Virosis/genética , Área Bajo la Curva , Infecciones Bacterianas/microbiología , Biomarcadores , Estudios de Cohortes , Biología Computacional/métodos , Perfilación de la Expresión Génica , Humanos , Metaanálisis como Asunto , Curva ROC , Virosis/virologíaRESUMEN
Respiratory syncytial virus (RSV) is one of the major causes of acute lower respiratory tract infection worldwide. The absence of a commercial vaccine and the limited success of current therapeutic strategies against RSV make further research necessary. We used a multi-cohort analysis approach to investigate host transcriptomic biomarkers and shed further light on the molecular mechanism underlying RSV-host interactions. We meta-analyzed seven transcriptome microarray studies from the public Gene Expression Omnibus (GEO) repository containing a total of 922 samples, including RSV, healthy controls, coronaviruses, enteroviruses, influenzas, rhinoviruses, and coinfections, from both adult and pediatric patients. We identified > 1500 genes differentially expressed when comparing the transcriptomes of RSV-infected patients against healthy controls. Functional enrichment analysis showed several pathways significantly altered, including immunologic response mediated by RSV infection, pattern recognition receptors, cell cycle, and olfactory signaling. In addition, we identified a minimal 17-transcript host signature specific for RSV infection by comparing transcriptomic profiles against other respiratory viruses. These multi-genic signatures might help to investigate future drug targets against RSV infection.
Asunto(s)
Biomarcadores/sangre , Interacciones Huésped-Patógeno/genética , Infecciones por Virus Sincitial Respiratorio/virología , Virus Sincitial Respiratorio Humano/aislamiento & purificación , Infecciones del Sistema Respiratorio/sangre , Transcriptoma , Estudios de Casos y Controles , Estudios de Cohortes , Perfilación de la Expresión Génica , Humanos , Infecciones del Sistema Respiratorio/genética , Infecciones del Sistema Respiratorio/virología , Transducción de SeñalRESUMEN
There is a growing interest in unraveling gene expression mechanisms leading to viral host invasion and infection progression. Current findings reveal that long non-coding RNAs (lncRNAs) are implicated in the regulation of the immune system by influencing gene expression through a wide range of mechanisms. By mining whole-transcriptome shotgun sequencing (RNA-seq) data using machine learning approaches, we detected two lncRNAs (ENSG00000254680 and ENSG00000273149) that are downregulated in a wide range of viral infections and different cell types, including blood monocluclear cells, umbilical vein endothelial cells, and dermal fibroblasts. The efficiency of these two lncRNAs was positively validated in different viral phenotypic scenarios. These two lncRNAs showed a strong downregulation in virus-infected patients when compared to healthy control transcriptomes, indicating that these biomarkers are promising targets for infection diagnosis. To the best of our knowledge, this is the very first study using host lncRNAs biomarkers for the diagnosis of human viral infections.
Asunto(s)
Células Endoteliales/metabolismo , Fibroblastos/metabolismo , Monocitos/metabolismo , ARN Largo no Codificante/sangre , Virosis/metabolismo , Adulto , Pueblo Asiatico , Biomarcadores/sangre , Biomarcadores/metabolismo , Preescolar , Minería de Datos , Regulación hacia Abajo , Células Endoteliales/microbiología , Infecciones por Escherichia coli/genética , Infecciones por Escherichia coli/metabolismo , Fibroblastos/microbiología , Células Endoteliales de la Vena Umbilical Humana , Humanos , Gripe Humana/genética , Gripe Humana/metabolismo , Aprendizaje Automático , México , Monocitos/microbiología , Monocitos/virología , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , RNA-Seq , Infecciones por Rotavirus/genética , Infecciones por Rotavirus/metabolismo , Infección por el Virus de la Varicela-Zóster/genética , Infección por el Virus de la Varicela-Zóster/metabolismo , Virosis/genética , Población BlancaRESUMEN
GPCR-ModSim (http://open.gpcr-modsim.org) is a centralized and easy to use service dedicated to the structural modeling of G-protein Coupled Receptors (GPCRs). 3D molecular models can be generated from amino acid sequence by homology-modeling techniques, considering different receptor conformations. GPCR-ModSim includes a membrane insertion and molecular dynamics (MD) equilibration protocol, which can be used to refine the generated model or any GPCR structure uploaded to the server, including if desired non-protein elements such as orthosteric or allosteric ligands, structural waters or ions. We herein revise the main characteristics of GPCR-ModSim and present new functionalities. The templates used for homology modeling have been updated considering the latest structural data, with separate profile structural alignments built for inactive, partially-active and active groups of templates. We have also added the possibility to perform multiple-template homology modeling in a unique and flexible way. Finally, our new MD protocol considers a series of distance restraints derived from a recently identified conserved network of helical contacts, allowing for a smoother refinement of the generated models which is particularly advised when there is low homology to the available templates. GPCR- ModSim has been tested on the GPCR Dock 2013 competition with satisfactory results.
Asunto(s)
Internet , Modelos Moleculares , Receptores Acoplados a Proteínas G/química , Programas Informáticos , Algoritmos , Regulación Alostérica , Secuencia de Aminoácidos , Humanos , Ligandos , Simulación de Dinámica Molecular , Receptor de Angiotensina Tipo 2/químicaRESUMEN
Transposable elements (TEs) account for nearly half (44 %) of the human genome. However, their overall activity has been steadily declining over the past 35-50 million years, so that <0.05 % of TEs are presumably still "alive" (potentially transposable) in human populations. All the active elements are retrotransposons, either autonomous (LINE-1 and possibly the endogenous retrovirus ERVK), or non-autonomous (Alu and SVA, whose transposition is dependent on the LINE-1 enzymatic machinery). Here we show that a lineage of the endogenous retrovirus ERVE was recently engaged in ectopic recombination events and may have at least one potentially fully functional representative, initially reported as a novel retrovirus isolated from blood cells of a Chinese patient with chronic myeloid leukemia, which bears signals of positive selection on its envelope region. Altogether, there is strong evidence that ERVE should be included in the short list of potentially active TEs, and we give clues on how to identify human specific insertions of this element that are likely to be segregating in some of our populations.
Asunto(s)
Retrovirus Endógenos/genética , Genoma Humano/genética , Filogenia , Retroelementos/genética , Animales , Secuencia de Bases , Retrovirus Endógenos/clasificación , Evolución Molecular , Productos del Gen env/química , Productos del Gen env/genética , Humanos , Leucemia Mielógena Crónica BCR-ABL Positiva/genética , Modelos Moleculares , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Selección Genética , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido NucleicoRESUMEN
The recent advances in membrane protein crystallography have provided extremely valuable structural information of the superfamily of GPCRs (G-protein-coupled receptors). This has been particularly true for a few receptors whose structure was solved several times under different biochemical conditions. It follows that the mechanisms of receptor conformational equilibrium and related dynamic events can be explored by computational simulations. In the present article, we summarize our recent understanding of several dynamic features of GPCRs, accomplished through the use of MD (molecular dynamics) simulations. Our pipeline for the MD simulations of GPCRs, implemented in the web service http://gpcr.usc.es, is updated in the present paper and illustrated by recent applications. Special emphasis is put on the A2A adenosine receptor, one of the selected cases where crystal structures in several conformations and conditions exist, and on the dimerization process of the CXCR4 (CXC chemokine receptor 4).
Asunto(s)
Automatización , Simulación por Computador , Receptores Acoplados a Proteínas G/metabolismo , Dimerización , Simulación de Dinámica Molecular , Receptores Acoplados a Proteínas G/químicaRESUMEN
Extensive literature has explored the beneficial effects of music in age-related cognitive disorders (ACD), but limited knowledge exists regarding its impact on gene expression. We analyzed transcriptomes of ACD patients and healthy controls, pre-post a music session (n = 60), and main genes/pathways were compared to those dysregulated in mild cognitive impairment (MCI) and Alzheimer's disease (AD) as revealed by a multi-cohort study (n = 1269 MCI/AD and controls). Music was associated with 2.3 times more whole-genome gene expression, particularly on neurodegeneration-related genes, in ACD than in controls. Co-expressed gene-modules and pathways analysis demonstrated that music impacted autophagy, vesicle and endosome organization, biological processes commonly dysregulated in MCI/AD. Notably, the data indicated a strong negative correlation between musically-modified genes/pathways in ACD and those dysregulated in MCI/AD. These findings highlight the compensatory effect of music on genes/biological processes affected in MCI/AD, providing insights into the molecular mechanisms underlying the benefits of music on these disorders.
Asunto(s)
Enfermedad de Alzheimer , Trastornos del Conocimiento , Disfunción Cognitiva , Música , Humanos , Música/psicología , Estudios de Cohortes , Disfunción Cognitiva/genética , Disfunción Cognitiva/psicología , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/psicología , Expresión GénicaRESUMEN
Background: Respiratory syncytial virus (RSV) infection has been associated with the subsequent development of recurrent wheezing and asthma, although the mechanisms involved are still unknown. We investigate the role of epigenetics in the respiratory morbidity after infection by comparing methylation patterns from children who develop recurrent wheezing (RW-RSV), subsequent asthma (AS-RVS), and those experiencing complete recovery (CR-RSV). Methods: Prospective, observational study of infants aged < 2 years with RSV respiratory infection admitted to hospital and followed-up after discharge for at least three years. According to their clinical course, patients were categorized into subgroups: RW-RSV (n = 36), AS-RSV (n = 9), and CR-RSV (n = 32). The DNA genome-wide methylation pattern was analyzed in whole blood samples, collected during the acute phase of the infection, using the Illumina Infinium Methylation EPIC BeadChip (850K CpG sites). Differences in methylation were determined through a linear regression model adjusted for age, gender and cell composition. Results: Patients who developed respiratory sequelae showed a statistically significant higher proportion of NK and CD8T cells (inferred through a deconvolution approach) than those with complete recovery. We identified 5,097 significant differentially methylated positions (DMPs) when comparing RW-RSV and AS-RVS together against CR-RSV. Methylation profiles affect several genes involved in airway inflammation processes. The most significant DMPs were found to be hypomethylated in cases and therefore generally leading to overexpression of affected genes. The lead CpG position (cg24509398) falls at the gene body of EYA3 (P-value = 2.77×10-10), a tyrosine phosphatase connected with pulmonary vascular remodeling, a key process in the asthma pathology. Logistic regression analysis resulted in a diagnostic epigenetic signature of 3-DMPs (involving genes ZNF2698, LOC102723354 and RPL15/NKIRAS1) that allows to efficiently differentiate sequelae cases from CR-RSV patients (AUC = 1.00). Enrichment pathway analysis reveals the role of the cell cycle checkpoint (FDR P-value = 4.71×10-2), DNA damage (FDP-value = 2.53×10-2), and DNA integrity checkpoint (FDR P-value = 2.56×10-2) in differentiating sequelae from CR-RSV patients. Conclusions: Epigenetic mechanisms might play a fundamental role in the long-term sequelae after RSV infection, contributing to explain the different phenotypes observed.
Asunto(s)
Asma , Infecciones por Virus Sincitial Respiratorio , Asma/complicaciones , ADN , Progresión de la Enfermedad , Epigenoma , Humanos , Morbilidad , Estudios Prospectivos , Ruidos RespiratoriosRESUMEN
Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks. We show that purely mathematical procedures for site selection should be supervised by known phylogeny (i) to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties, and (ii) to avoid phylogenetic redundancy. We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites (as a proxy for phylogenetic-based criteria). This procedure demonstrates that, for short barcodes (e.g., 11 sites), there are thousands of informative site combinations that improve previous proposals. We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny, such that most representative genomes in these ancestral nodes are no longer in circulation. Consequently, coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.
Asunto(s)
COVID-19/virología , SARS-CoV-2/clasificación , SARS-CoV-2/genética , Algoritmos , Código de Barras del ADN Taxonómico , Variación Genética , Genoma Viral , Humanos , Mutación , Filogenia , Filogeografía , SARS-CoV-2/aislamiento & purificaciónRESUMEN
TIPICO is an expert meeting and workshop that aims to provide the most recent evidence in the field of infectious diseases and vaccination. The 10th Interactive Infectious Disease TIPICO workshop took place in Santiago de Compostela, Spain, on November 21-22, 2019. Cutting-edge advances in vaccination against respiratory syncytial virus, Streptococcus pneumoniae, rotavirus, human papillomavirus, Neisseria meningitidis, influenza virus, and Salmonella Typhi were discussed. Furthermore, heterologous vaccine effects were updated, including the use of Bacillus Calmette-Guérin (BCG) vaccine as potential treatment for type 1 diabetes. Finally, the workshop also included presentations and discussion on emergent virus and zoonoses, vaccine resilience, building and sustaining confidence in vaccination, approaches to vaccine decision-making, pros and cons of compulsory vaccination, the latest advances in decoding infectious diseases by RNA gene signatures, and the application of big data approaches.
Asunto(s)
Enfermedades Transmisibles , Virus Sincitial Respiratorio Humano , Animales , Vacuna BCG , Humanos , España , VacunaciónRESUMEN
Spain has been one of the main global pandemic epicenters for coronavirus disease 2019 (COVID-19). Here, we analyzed >41 000 genomes (including >26 000 high-quality (HQ) genomes) downloaded from the GISAID repository, including 1 245 (922 HQ) sampled in Spain. The aim of this study was to investigate genome variation of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and reconstruct phylogeographic and transmission patterns in Spain. Phylogeographic analysis suggested at least 34 independent introductions of SARS-CoV-2 to Spain at the beginning of the outbreak. Six lineages spread very successfully in the country, probably favored by super-spreaders, namely, A2a4 (7.8%), A2a5 (38.4%), A2a10 (2.8%), B3a (30.1%), and B9 (8.7%), which accounted for 87.9% of all genomes in the Spanish database. One distinct feature of the Spanish SARS-CoV-2 genomes was the higher frequency of B lineages (39.3%, mainly B3a+B9) than found in any other European country. While B3a, B9, (and an important sub-lineage of A2a5, namely, A2a5c) most likely originated in Spain, the other three haplogroups were imported from other European locations. The B3a strain may have originated in the Basque Country from a B3 ancestor of uncertain geographic origin, whereas B9 likely emerged in Madrid. The time of the most recent common ancestor (TMRCA) of SARS-CoV-2 suggested that the first coronavirus entered the country around 11 February 2020, as estimated from the TMRCA of B3a, the first lineage detected in the country. Moreover, earlier claims that the D614G mutation is associated to higher transmissibility is not consistent with the very high prevalence of COVID-19 in Spain when compared to other countries with lower disease incidence but much higher frequency of this mutation (56.4% in Spain vs. 82.4% in rest of Europe). Instead, the data support a major role of genetic drift in modeling the micro-geographic stratification of virus strains across the country as well as the role of SARS-CoV-2 super-spreaders.
Asunto(s)
Betacoronavirus/genética , Infecciones por Coronavirus/transmisión , Variación Genética , Genoma Viral/genética , Neumonía Viral/transmisión , Animales , Betacoronavirus/clasificación , Betacoronavirus/fisiología , COVID-19 , Infecciones por Coronavirus/epidemiología , Infecciones por Coronavirus/virología , Evolución Molecular , Efecto Fundador , Geografía , Haplotipos , Humanos , Mutación , Pandemias , Filogenia , Filogeografía , Neumonía Viral/epidemiología , Neumonía Viral/virología , SARS-CoV-2 , España/epidemiologíaRESUMEN
Lynch syndrome (LS) is the most common hereditary colorectal cancer (CRC) syndrome, caused by heterozygous mutations in the mismatch repair (MMR) genes. Biallelic mutations in these genes lead however, to constitutive mismatch repair deficiency (CMMRD). In this study, we follow the diagnostic journey of a 12-year old patient with CRC, with a clinical phenotype overlapping CMMRD. We perform molecular and functional assays to discard a CMMRD diagnosis then identify by exome sequencing and validation in a cohort of 134 LS patients, a candidate variant in the MLH1 UTR region in homozygosis. We propose that this variant, together with other candidates, could be responsible for age-of-onset modulation. Our data support the idea that low-risk modifier alleles may influence early development of cancer in LS leading to a LS-to-CMMRD phenotypic continuum. Therefore, it is essential that larger efforts are directed to the identification and study of these genetic modifiers, in order to provide optimal cancer prevention strategies to these patients.