RESUMEN
The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short-read, de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target capture short-read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short-read data. MHConstructor facilitates wide-spread access to high-quality, alignment-free MHC sequence analysis.
Asunto(s)
Haplotipos , Complejo Mayor de Histocompatibilidad , Humanos , Complejo Mayor de Histocompatibilidad/genética , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , AlgoritmosRESUMEN
The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short read de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target-capture short read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short read data. MHConstructor facilitates wide-spread access to high quality, alignment-free MHC sequence analysis.
RESUMEN
Although B cells are implicated in multiple sclerosis (MS) pathophysiology, a predictive or diagnostic autoantibody remains elusive. In this study, the Department of Defense Serum Repository (DoDSR), a cohort of over 10 million individuals, was used to generate whole-proteome autoantibody profiles of hundreds of patients with MS (PwMS) years before and subsequently after MS onset. This analysis defines a unique cluster in approximately 10% of PwMS who share an autoantibody signature against a common motif that has similarity with many human pathogens. These patients exhibit antibody reactivity years before developing MS symptoms and have higher levels of serum neurofilament light (sNfL) compared to other PwMS. Furthermore, this profile is preserved over time, providing molecular evidence for an immunologically active preclinical period years before clinical onset. This autoantibody reactivity was validated in samples from a separate incident MS cohort in both cerebrospinal fluid and serum, where it is highly specific for patients eventually diagnosed with MS. This signature is a starting point for further immunological characterization of this MS patient subset and may be clinically useful as an antigen-specific biomarker for high-risk patients with clinically or radiologically isolated neuroinflammatory syndromes.
Asunto(s)
Autoanticuerpos , Esclerosis Múltiple , Proteínas de Neurofilamentos , Humanos , Esclerosis Múltiple/inmunología , Esclerosis Múltiple/sangre , Autoanticuerpos/sangre , Autoanticuerpos/inmunología , Proteínas de Neurofilamentos/sangre , Proteínas de Neurofilamentos/inmunología , Biomarcadores/sangre , Estudios de Cohortes , Femenino , Masculino , Adulto , Persona de Mediana EdadRESUMEN
The complement component 4 gene loci, composed of the C4A and C4B genes and located on chromosome 6, encodes for complement component 4 (C4) proteins, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. C4A and C4B gene loci exhibit copy number variation, with each composite gene varying between 0 and 5 copies per haplotype. C4A and C4B genes also vary in size depending on the presence of the human endogenous retrovirus (HERV) in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which affects expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4A and C4B copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4A and C4B variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4A and C4B sequences from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines the overall gene copy numbers, as well as C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S). Additionally, C4Ivestigator reports the full overall C4A and C4B aligned sequence, enabling nucleotide level analysis. To demonstrate the utility of this workflow we have analyzed C4A and C4B variation in the 1000 Genomes Project Data set, showing that these genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.
Asunto(s)
Complemento C4b , Variaciones en el Número de Copia de ADN , Humanos , Complemento C4b/genética , Alelos , Complemento C4/genética , Genómica , Análisis de Secuencia , EpítoposRESUMEN
The complement component 4 gene locus, composed of the C4A and C4B genes and located on chromosome 6, encodes for C4 protein, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. The C4 gene locus exhibits copy number variation, with each composite gene varying between 0-5 copies per haplotype, C4 genes also vary in size depending on the presence of the HERV retrovirus in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which modulates expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4 copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4 variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4 sequence from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines gene copy number for overall C4, C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S), additionally, C4Ivestigator reports the full overall C4 aligned sequence, enabling nucleotide level analysis of C4. To demonstrate the utility of this workflow we have analyzed C4 variation in the 1000 Genomes Project Dataset, showing that the C4 genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.
RESUMEN
Although B cells are implicated in multiple sclerosis (MS) pathophysiology, a predictive or diagnostic autoantibody remains elusive. Here, the Department of Defense Serum Repository (DoDSR), a cohort of over 10 million individuals, was used to generate whole-proteome autoantibody profiles of hundreds of patients with MS (PwMS) years before and subsequently after MS onset. This analysis defines a unique cluster of PwMS that share an autoantibody signature against a common motif that has similarity with many human pathogens. These patients exhibit antibody reactivity years before developing MS symptoms and have higher levels of serum neurofilament light (sNfL) compared to other PwMS. Furthermore, this profile is preserved over time, providing molecular evidence for an immunologically active prodromal period years before clinical onset. This autoantibody reactivity was validated in samples from a separate incident MS cohort in both cerebrospinal fluid (CSF) and serum, where it is highly specific for patients eventually diagnosed with MS. This signature is a starting point for further immunological characterization of this MS patient subset and may be clinically useful as an antigen-specific biomarker for high-risk patients with clinically- or radiologically-isolated neuroinflammatory syndromes.
RESUMEN
Since the initial reported discovery of SARS-CoV-2 in late 2019, genomic surveillance has been an important tool to understand its transmission and evolution. Here, we sought to describe the underlying regional phylodynamics before and during a rapid spreading event that was documented by surveillance protocols of the United States Air Force Academy (USAFA) in late October-November of 2020. We used replicate long-read sequencing on Colorado SARS-CoV-2 genomes collected July through November 2020 at the University of Colorado Anschutz Medical campus in Aurora and the United States Air Force Academy in Colorado Springs. Replicate sequencing allowed rigorous validation of variation and placement in a phylogenetic relatedness network. We focus on describing the phylodynamics of a lineage that likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. Divergence estimates from sampling dates indicate that the SARS-CoV-2 lineage associated with this rapid expansion event originated in late October 2020. These results are in agreement with transmission pathways inferred by the United States Air Force Academy, and provide a window into the evolutionary process and transmission dynamics of a potentially dangerous but ultimately contained variant.
Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiología , Colorado/epidemiología , Genoma Viral , Humanos , Filogenia , SARS-CoV-2/genéticaRESUMEN
The global community has adopted ambitious goals to eliminate schistosomiasis as a public health problem, and new tools are needed to achieve them. Mass drug administration programs, for example, have reduced the burden of schistosomiasis, but the identification of hotspots of persistent and reemergent transmission threaten progress toward elimination and underscore the need to couple treatment with interventions that reduce transmission. Recent advances in DNA sequencing technologies make whole-genome sequencing a valuable and increasingly feasible option for population-based studies of complex parasites such as schistosomes. Here, we focus on leveraging genomic data to tailor interventions to distinct social and ecological circumstances. We consider two priority questions that can be addressed by integrating epidemiological, ecological, and genomic information: (1) how often do non-human host species contribute to human schistosome infection? and (2) what is the importance of locally acquired versus imported infections in driving transmission at different stages of elimination? These questions address processes that can undermine control programs, especially those that rely heavily on treatment with praziquantel. Until recently, these questions were difficult to answer with sufficient precision to inform public health decision-making. We review the literature related to these questions and discuss how whole-genome approaches can identify the geographic and taxonomic sources of infection, and how such information can inform context-specific efforts that advance schistosomiasis control efforts and minimize the risk of reemergence.
Asunto(s)
Parásitos , Esquistosomiasis , Animales , Genómica , Administración Masiva de Medicamentos , Schistosoma , Esquistosomiasis/epidemiología , Esquistosomiasis/prevención & controlRESUMEN
Schistosomiasis is a neglected tropical disease caused by multiple parasitic Schistosoma species, and which impacts over 200 million people globally, mainly in low- and middle-income countries. Genomic surveillance to detect evidence for natural selection in schistosome populations represents an emerging and promising approach to identify and interpret schistosome responses to ongoing control efforts or other environmental factors. Here we review how genomic variation is used to detect selection, how these approaches have been applied to schistosomes, and how future studies to detect selection may be improved. We discuss the theory of genomic analyses to detect selection, identify experimental designs for such analyses, and review studies that have applied these approaches to schistosomes. We then consider the biological characteristics of schistosomes that are expected to respond to selection, particularly those that may be impacted by control programs. Examples include drug resistance, host specificity, and life history traits, and we review our current understanding of specific genes that underlie them in schistosomes. We also discuss how inherent features of schistosome reproduction and demography pose substantial challenges for effective identification of these traits and their genomic bases. We conclude by discussing how genomic surveillance for selection should be designed to improve understanding of schistosome biology, and how the parasite changes in response to selection.
RESUMEN
Spindly is a dynein adaptor involved in chromosomal segregation during cell division. While Spindly's N-terminal domain binds to the microtubule motor dynein and its activator dynactin, the C-terminal domain (Spindly-C) binds its cargo, the ROD/ZW10/ZWILCH (RZZ) complex in the outermost layer of the kinetochore. In humans, Spindly-C binds to ROD, while in C. elegans Spindly-C binds to both Zwilch (ZWL-1) and ROD-1. Here, we employed various biophysical techniques to characterize the structure, dynamics and interaction sites of C. elegans Spindly-C. We found that despite the overall disorder, there are two regions with variable α-helical propensity. One of these regions is located in the C-terminal half and is compact; the second is sparsely populated in the N-terminal half. The interactions with both ROD-1 and ZWL-1 are mostly mediated by the same two sequentially remote disordered segments of Spindly-C, which are C-terminally adjacent to the helical regions. The findings suggest that the Spindly-C binding sites on ROD-1 in the ROD-1/ZWL-1 complex context are either shielded or conformationally weakened by the presence of ZWL-1 such that only ZWL-1 directly interacts with Spindly-C in C. elegans.
Asunto(s)
Proteínas de Caenorhabditis elegans/química , Dineínas/química , Cinetocoros/química , Dominios y Motivos de Interacción de Proteínas , Proteínas Represoras/química , Animales , Caenorhabditis elegans , Proteínas de Caenorhabditis elegans/metabolismo , Humanos , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/metabolismo , Espectroscopía de Resonancia Magnética , Unión Proteica , Conformación Proteica , Huso Acromático/metabolismo , Relación Estructura-ActividadRESUMEN
Due to the scope and impact of the COVID-19 pandemic there exists a strong desire to understand where the SARS-CoV-2 virus came from and how it jumped species boundaries to humans. Molecular evolutionary analyses can trace viral origins by establishing relatedness and divergence times of viruses and identifying past selective pressures. However, we must uphold rigorous standards of inference and interpretation on this topic because of the ramifications of being wrong. Here, we dispute the conclusions of Xia (2020. Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host antiviral defense. Mol Biol Evol. doi:10.1093/molbev/masa095) that dogs are a likely intermediate host of a SARS-CoV-2 ancestor. We highlight major flaws in Xia's inference process and his analysis of CpG deficiencies, and conclude that there is no direct evidence for the role of dogs as intermediate hosts. Bats and pangolins currently have the greatest support as ancestral hosts of SARS-CoV-2, with the strong caveat that sampling of wildlife species for coronaviruses has been limited.
Asunto(s)
Alphacoronavirus/genética , Betacoronavirus/genética , Infecciones por Coronavirus/epidemiología , Genoma Viral , Pandemias , Neumonía Viral/epidemiología , Virus Reordenados/genética , Alphacoronavirus/clasificación , Alphacoronavirus/patogenicidad , Animales , Betacoronavirus/clasificación , Betacoronavirus/patogenicidad , Evolución Biológica , COVID-19 , Quirópteros/virología , Infecciones por Coronavirus/inmunología , Infecciones por Coronavirus/transmisión , Infecciones por Coronavirus/virología , Islas de CpG , Perros , Euterios/virología , Humanos , Evasión Inmune/genética , Neumonía Viral/inmunología , Neumonía Viral/transmisión , Neumonía Viral/virología , Unión Proteica , ARN Viral/genética , ARN Viral/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/inmunología , Proteínas de Unión al ARN/metabolismo , Virus Reordenados/clasificación , Virus Reordenados/patogenicidad , SARS-CoV-2 , Replicación ViralRESUMEN
The Krüppel-like transcription factors KLF1 and KLF2 are essential for embryonic erythropoiesis. They can partially compensate for each other during mouse development, and coordinately regulate numerous erythroid genes, including the ß-like globins. Simultaneous ablation of KLF1 and KLF2 results in earlier embryonic lethality and severe anemia. In this study, we determine that this anemia is caused by a paucity of blood cells, and exacerbated by diminished ß-like globin gene expression. The anemia phenotype is dose-dependent, and, interestingly, can be ameliorated by a single copy of the KLF2, but not the KLF1 gene. The roles of KLF1 and KLF2 in maintaining normal peripheral blood cell numbers and globin mRNA amounts are erythroid cell-specific. Mechanistic studies led to the discovery that KLF2 has an essential function in erythroid precursor maintenance. KLF1 can partially compensate for KLF2 in this role, but is uniquely crucial for erythroid precursor proliferation through its regulation of G1- to S-phase cell cycle transition. A more drastic impairment of primitive erythroid colony formation from embryonic progenitor cells occurs with simultaneous loss of KLF1 and KLF2 than with loss of a single factor. KLF1 and KLF2 coordinately regulate several proliferation-associated genes, including Foxm1. Differential expression of FoxM1, in particular, correlates with the observed KLF1 and KLF2 gene dosage effects on anemia. Furthermore, KLF1 binds to the FoxM1 gene promoter in blood cells. Thus KLF1 and KLF2 coordinately regulate embryonic erythroid precursor maturation through the regulation of multiple homeostasis-associated genes, and KLF2 has a novel and essential role in this process.