RESUMO
16S ribosomal RNA-based analysis is the established standard for elucidating the composition of microbial communities. While short-read 16S rRNA analyses are largely confined to genus-level resolution at best, given that only a portion of the gene is sequenced, full-length 16S rRNA gene amplicon sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate often observed in long-read data. Here we present Emu, an approach that uses an expectation-maximization algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from simulated datasets and mock communities show that Emu is capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of Emu by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow with those returned by full-length 16S rRNA gene sequences processed with Emu.
Assuntos
Dromaiidae , Microbiota , Sequenciamento por Nanoporos , Animais , Bactérias/genética , Dromaiidae/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Microbiota/genética , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/métodosRESUMO
MOTIVATION: Microbial sequencing data from clinical samples is often contaminated with human sequences, which have to be removed prior to sharing. Existing methods for human read removal, however, are applicable only after the target dataset has been retrieved in its entirety, putting the recipient at least temporarily in control of a potentially identifiable genetic dataset with potential implications under regulatory frameworks such as the GDPR. In some instances, the ability to carry out stream-based host depletion as part of the data transfer process may be preferable. RESULTS: We present SWGTS, a client-server application for the transfer and stream-based host depletion of sequencing reads. SWGTS enforces a robust upper bound on the maximum amount of human genetic data from any one client held in memory at any point in time by storing all incoming sequencing data in a limited-size, client-specific intermediate processing buffer, and by throttling the rate of incoming data if it exceeds the speed of host depletion carried out on the SWGTS server in the background. SWGTS exposes a HTTP-REST interface, is implemented using docker-compose, Redis and traefik, and requires less than 8 Gb of RAM for deployment. We demonstrate high filtering accuracy of SWGTS; incoming data transfer rates of up to 1.65 megabases per second in a conservative configuration; and mitigation of re-identification risks by the ability to limit the number of SNPs present on a popular population-scale genotyping array covered by reads in the SWGTS buffer to a low user-defined number, such as 10 or 100. AVAILABILITY AND IMPLEMENTATION: SWGTS is available on GitHub: https://github.com/AlBi-HHU/swgts (https://doi.org/10.5281/zenodo.10891052). The repository also contains a jupyter notebook that can be used to reproduce all the benchmarks used in this article. All datasets used for benchmarking are publicly available.
Assuntos
Análise de Sequência de DNA , Software , Humanos , Análise de Sequência de DNA/métodos , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
COVID-19 pandemic caused by SARS-CoV-2 infection is a public health emergency. COVID-19 typically exhibits respiratory illness. Unexpectedly, emerging clinical reports indicate that neurological symptoms continue to rise, suggesting detrimental effects of SARS-CoV-2 on the central nervous system (CNS). Here, we show that a Düsseldorf isolate of SARS-CoV-2 enters 3D human brain organoids within 2 days of exposure. We identified that SARS-CoV-2 preferably targets neurons of brain organoids. Imaging neurons of organoids reveal that SARS-CoV-2 exposure is associated with altered distribution of Tau from axons to soma, hyperphosphorylation, and apparent neuronal death. Our studies, therefore, provide initial insights into the potential neurotoxic effect of SARS-CoV-2 and emphasize that brain organoids could model CNS pathologies of COVID-19.
Assuntos
Betacoronavirus/fisiologia , Encéfalo/virologia , Neurônios/virologia , Animais , Morte Celular , Chlorocebus aethiops , Humanos , Doenças do Sistema Nervoso/virologia , Organoides , SARS-CoV-2 , Células Vero , Proteínas tau/metabolismoRESUMO
BACKGROUND: Monoclonal antibodies (mAbs) that target severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are predominantly less effective against Omicron variants. Immunocompromised patients often experience prolonged viral shedding, resulting in an increased risk of viral escape. METHODS: In an observational, prospective cohort, 57 patients infected with Omicron variants who received sotrovimab alone or in combination with remdesivir were followed. The study end points were a decrease in SARS-CoV-2 RNA <106 copies/mL in nasopharyngeal swabs at day 21 and the emergence of escape mutations at days 7, 14, and 21 after sotrovimab administration. All SARS-CoV-2 samples were analyzed using whole-genome sequencing. Individual variants within the quasispecies were subsequently quantified and further characterized using a pseudovirus neutralization assay. RESULTS: The majority of patients (43 of 57, 75.4%) were immunodeficient, predominantly due to immunosuppression after organ transplantation or hematologic malignancies. Infections by Omicron/BA.1 comprised 82.5%, while 17.5% were infected by Omicron/BA.2. Twenty-one days after sotrovimab administration, 12 of 43 (27.9%) immunodeficient patients had prolonged viral shedding compared with 1 of 14 (7.1%) immunocompetent patients (P = .011). Viral spike protein mutations, some specific for Omicron (e.g., P337S and/or E340D/V), emerged in 14 of 43 (32.6%) immunodeficient patients, substantially reducing sensitivity to sotrovimab in a pseudovirus neutralization assay. Combination therapy with remdesivir significantly reduced emergence of escape variants. CONCLUSIONS: Immunocompromised patients face a considerable risk of prolonged viral shedding and emergence of escape mutations after early therapy with sotrovimab. These findings underscore the importance of careful monitoring and the need for dedicated clinical trials in this patient population.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Anticorpos Neutralizantes , Anticorpos Antivirais , Hospedeiro Imunocomprometido , Estudos Prospectivos , RNA Viral , SARS-CoV-2/genéticaRESUMO
The SARS-CoV-2 pandemic has highlighted the importance of viable infection surveillance and the relevant infrastructure. From a German perspective, an integral part of this infrastructure, genomic pathogen sequencing, was at best fragmentary and stretched to its limits due to the lack or inefficient use of equipment, human resources, data management and coordination. The experience in other countries has shown that the rate of sequenced positive samples and linkage of genomic and epidemiological data (person, place, time) represent important factors for a successful application of genomic pathogen surveillance. Planning, establishing and consistently supporting adequate structures for genomic pathogen surveillance will be crucial to identify and combat future pandemics as well as other challenges in infectious diseases such as multi-drug resistant bacteria and healthcare-associated infections. Therefore, the authors propose a multifaceted and coordinated process for the definition of procedural, legal and technical standards for comprehensive genomic pathogen surveillance in Germany, covering the areas of genomic sequencing, data collection and data linkage, as well as target pathogens. A comparative analysis of the structures established in Germany and in other countries is applied. This proposal aims to better tackle epi- and pandemics to come and take action from the "lessons learned" from the SARS-CoV-2 pandemic.
Assuntos
COVID-19 , Infecção Hospitalar , Humanos , Pandemias/prevenção & controle , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2/genética , GenômicaRESUMO
The SARS-CoV2 pandemic has shown a deficit of essential epidemiological infrastructure, especially with regard to genomic pathogen surveillance in Germany. In order to prepare for future pandemics, the authors consider it urgently necessary to remedy this existing deficit by establishing an efficient infrastructure for genomic pathogen surveillance. Such a network can build on structures, processes, and interactions that have already been initiated regionally and further optimize them. It will be able to respond to current and future challenges with a high degree of adaptability.The aim of this paper is to address the urgency and to outline proposed measures for establishing an efficient, adaptable, and responsive genomic pathogen surveillance network, taking into account external framework conditions and internal standards. The proposed measures are based on global and country-specific best practices and strategy papers. Specific next steps to achieve an integrated genomic pathogen surveillance include linking epidemiological data with pathogen genomic data; sharing and coordinating existing resources; making surveillance data available to relevant decision-makers, the public health service, and the scientific community; and engaging all stakeholders. The establishment of a genomic pathogen surveillance network is essential for the continuous, stable, active surveillance of the infection situation in Germany, both during pandemic phases and beyond.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , COVID-19/prevenção & controle , Pandemias/prevenção & controle , Alemanha/epidemiologia , GenômicaRESUMO
BACKGROUND: Tracing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission chains is still a major challenge for public health authorities, when incidental contacts are not recalled or are not perceived as potential risk contacts. Viral sequencing can address key questions about SARS-CoV-2 evolution and may support reconstruction of viral transmission networks by integration of molecular epidemiology into classical contact tracing. METHODS: In collaboration with local public health authorities, we set up an integrated system of genomic surveillance in an urban setting, combining a) viral surveillance sequencing, b) genetically based identification of infection clusters in the population, c) integration of public health authority contact tracing data, and d) a user-friendly dashboard application as a central data analysis platform. RESULTS: Application of the integrated system from August to December 2020 enabled a characterization of viral population structure, analysis of 4 outbreaks at a maximum care hospital, and genetically based identification of 5 putative population infection clusters, all of which were confirmed by contact tracing. The system contributed to the development of improved hospital infection control and prevention measures and enabled the identification of previously unrecognized transmission chains, involving a martial arts gym and establishing a link between the hospital to the local population. CONCLUSIONS: Integrated systems of genomic surveillance could contribute to the monitoring and, potentially, improved management of SARS-CoV-2 transmission in the population.
Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , Busca de Comunicante , Surtos de Doenças/prevenção & controle , Genômica , Humanos , SARS-CoV-2/genéticaRESUMO
Multiple sclerosis (MS) is a chronic inflammatory, likely autoimmune disease of the central nervous system with a combination of genetic and environmental risk factors, among which Epstein-Barr virus (EBV) infection is a strong suspect. We have previously identified increased autoantibody levels toward the chloride-channel protein Anoctamin 2 (ANO2) in MS. Here, IgG antibody reactivity toward ANO2 and EBV nuclear antigen 1 (EBNA1) was measured using bead-based multiplex serology in plasma samples from 8,746 MS cases and 7,228 controls. We detected increased anti-ANO2 antibody levels in MS (P = 3.5 × 10-36) with 14.6% of cases and 7.8% of controls being ANO2 seropositive (odds ratio [OR] = 1.6; 95% confidence intervals [95%CI]: 1.5 to 1.8). The MS risk increase in ANO2-seropositive individuals was dramatic when also exposed to 3 known risk factors for MS: HLA-DRB1*15:01 carriage, absence of HLA-A*02:01, and high anti-EBNA1 antibody levels (OR = 24.9; 95%CI: 17.9 to 34.8). Reciprocal blocking experiments with ANO2 and EBNA1 peptides demonstrated antibody cross-reactivity, mapping to ANO2 [aa 140 to 149] and EBNA1 [aa 431 to 440]. HLA gene region was associated with anti-ANO2 antibody levels and HLA-DRB1*04:01 haplotype was negatively associated with ANO2 seropositivity (OR = 0.6; 95%CI: 0.5 to 0.7). Anti-ANO2 antibody levels were not increased in patients from 3 other inflammatory disease cohorts. The HLA influence and the fact that specific IgG production usually needs T cell help provides indirect evidence for a T cell ANO2 autoreactivity in MS. We propose a hypothesis where immune reactivity toward EBNA1 through molecular mimicry with ANO2 contributes to the etiopathogenesis of MS.
Assuntos
Anoctaminas , Antígenos Nucleares do Vírus Epstein-Barr , Herpesvirus Humano 4 , Modelos Imunológicos , Mimetismo Molecular , Esclerose Múltipla , Anoctaminas/genética , Anoctaminas/imunologia , Autoanticorpos/imunologia , Reações Cruzadas/genética , Antígenos Nucleares do Vírus Epstein-Barr/genética , Antígenos Nucleares do Vírus Epstein-Barr/imunologia , Feminino , Antígeno HLA-A2/imunologia , Cadeias HLA-DRB1/genética , Cadeias HLA-DRB1/imunologia , Haplótipos , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/imunologia , Humanos , Imunoglobulina G/imunologia , Masculino , Esclerose Múltipla/genética , Esclerose Múltipla/imunologia , Esclerose Múltipla/patologia , Fatores de RiscoRESUMO
BackgroundTracking person-to-person SARS-CoV-2 transmission in the population is important to understand the epidemiology of community transmission and may contribute to the containment of SARS-CoV-2. Neither contact tracing nor genomic surveillance alone, however, are typically sufficient to achieve this objective.AimWe demonstrate the successful application of the integrated genomic surveillance (IGS) system of the German city of Düsseldorf for tracing SARS-CoV-2 transmission chains in the population as well as detecting and investigating travel-associated SARS-CoV-2 infection clusters.MethodsGenomic surveillance, phylogenetic analysis, and structured case interviews were integrated to elucidate two genetically defined clusters of SARS-CoV-2 isolates detected by IGS in Düsseldorf in July 2021.ResultsCluster 1 (n = 67 Düsseldorf cases) and Cluster 2 (n = 36) were detected in a surveillance dataset of 518 high-quality SARS-CoV-2 genomes from Düsseldorf (53% of total cases, sampled mid-June to July 2021). Cluster 1 could be traced back to a complex pattern of transmission in nightlife venues following a putative importation by a SARS-CoV-2-infected return traveller (IP) in late June; 28 SARS-CoV-2 cases could be epidemiologically directly linked to IP. Supported by viral genome data from Spain, Cluster 2 was shown to represent multiple independent introduction events of a viral strain circulating in Catalonia and other European countries, followed by diffuse community transmission in Düsseldorf.ConclusionIGS enabled high-resolution tracing of SARS-CoV-2 transmission in an internationally connected city during community transmission and provided infection chain-level evidence of the downstream propagation of travel-imported SARS-CoV-2 cases.
Assuntos
COVID-19 , Doenças Transmissíveis Importadas , Humanos , SARS-CoV-2/genética , Viagem , Doenças Transmissíveis Importadas/epidemiologia , COVID-19/epidemiologia , Filogenia , Busca de Comunicante , Alemanha/epidemiologia , GenômicaRESUMO
Genetic variation within the major histocompatibility complex (MHC) contributes substantial risk for systemic lupus erythematosus, but high gene density, extreme polymorphism and extensive linkage disequilibrium (LD) have made fine mapping challenging. To address the problem, we compared two association techniques in two ancestrally diverse populations, African Americans (AAs) and Europeans (EURs). We observed a greater number of Human Leucocyte Antigen (HLA) alleles in AA consistent with the elevated level of recombination in this population. In EUR we observed 50 different A-C-B-DRB1-DQA-DQB multilocus haplotype sequences per hundred individuals; in the AA sample, these multilocus haplotypes were twice as common compared to Europeans. We also observed a strong narrow class II signal in AA as opposed to the long-range LD observed in EUR that includes class I alleles. We performed a Bayesian model choice of the classical HLA alleles and a frequentist analysis that combined both single nucleotide polymorphisms (SNPs) and classical HLA alleles. Both analyses converged on a similar subset of risk HLA alleles: in EUR HLA- B*08:01 + B*18:01 + (DRB1*15:01 frequentist only) + DQA*01:02 + DQB*02:01 + DRB3*02 and in AA HLA-C*17:01 + B*08:01 + DRB1*15:03 + (DQA*01:02 frequentist only) + DQA*02:01 + DQA*05:01+ DQA*05:05 + DQB*03:19 + DQB*02:02. We observed two additional independent SNP associations in both populations: EUR rs146903072 and rs501480; AA rs389883 and rs114118665. The DR2 serotype was best explained by DRB1*15:03 + DQA*01:02 in AA and by DRB1*15:01 + DQA*01:02 in EUR. The DR3 serotype was best explained by DQA*05:01 in AA and by DQB*02:01 in EUR. Despite some differences in underlying HLA allele risk models in EUR and AA, SNP signals across the extended MHC showed remarkable similarity and significant concordance in direction of effect for risk-associated variants.
Assuntos
Predisposição Genética para Doença , Lúpus Eritematoso Sistêmico/genética , Complexo Principal de Histocompatibilidade/genética , Polimorfismo de Nucleotídeo Único , Negro ou Afro-Americano/genética , Feminino , Estudos de Associação Genética , Haplótipos , Humanos , Masculino , Modelos Genéticos , População Branca/genéticaRESUMO
SUMMARY: HLA*LA implements a new graph alignment model for human leukocyte antigen (HLA) type inference, based on the projection of linear alignments onto a variation graph. It enables accurate HLA type inference from whole-genome (99% accuracy) and whole-exome (93% accuracy) Illumina data; from long-read Oxford Nanopore and Pacific Biosciences data (98% accuracy for whole-genome and targeted data) and from genome assemblies. Computational requirements for a typical sample vary between 0.7 and 14 CPU hours per sample. AVAILABILITY AND IMPLEMENTATION: HLA*LA is implemented in C++ and Perl and freely available as a bioconda package or from https://github.com/DiltheyLab/HLA-LA (GPL v3). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Genoma , Teste de Histocompatibilidade , Humanos , Análise de Sequência de DNARESUMO
Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering â¼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.
Assuntos
Cromossomos Humanos Par 21 , DNA Ribossômico/química , Genes de RNAr , Variação Genética , Animais , Linhagem Celular , Clonagem Molecular , DNA Ribossômico/isolamento & purificação , DNA Espaçador Ribossômico/química , Humanos , Camundongos , Conformação de Ácido Nucleico , Região Organizadora do Nucléolo/química , RNA Ribossômico/química , RNA Ribossômico/metabolismo , Análise de Sequência de DNARESUMO
We whole-genome sequenced 55 SARS-CoV-2 isolates from Germany to investigate SARS-CoV-2 outbreaks in 2020 in the Heinsberg district and Düsseldorf. While the genetic structure of the Heinsberg outbreak indicates a clonal origin, reflecting superspreading dynamics from mid-February during the carnival season, distinct viral strains were circulating in Düsseldorf in March, reflecting the city's international links. Limited detection of Heinsberg strains in the Düsseldorf area despite geographical proximity may reflect efficient containment and contact-tracing efforts.
Assuntos
Betacoronavirus/genética , Técnicas de Laboratório Clínico/métodos , Infecções por Coronavirus/diagnóstico , Genoma Viral/genética , Pandemias , Pneumonia Viral/diagnóstico , Sequenciamento Completo do Genoma/métodos , Betacoronavirus/isolamento & purificação , Betacoronavirus/patogenicidade , COVID-19 , Teste para COVID-19 , Infecções por Coronavirus/epidemiologia , Surtos de Doenças , Alemanha/epidemiologia , Humanos , Pneumonia Viral/epidemiologia , DNA Polimerase Dirigida por RNA , Reação em Cadeia da Polimerase Via Transcriptase Reversa , SARS-CoV-2RESUMO
Motivation: Whole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive. In addition, current practical methods lack any guarantee on the characteristics of output alignments, thus making them hard to tune for different application requirements. Results: We introduce an approximate algorithm for computing local alignment boundaries between long DNA sequences. Given a minimum alignment length and an identity threshold, our algorithm computes the desired alignment boundaries and identity estimates using kmer-based statistics, and maintains sufficient probabilistic guarantees on the output sensitivity. Further, to prioritize higher scoring alignment intervals, we develop a plane-sweep based filtering technique which is theoretically optimal and practically efficient. Implementation of these ideas resulted in a fast and accurate assembly-to-genome and genome-to-genome mapper. As a result, we were able to map an error-corrected whole-genome NA12878 human assembly to the hg38 human reference genome in about 1 min total execution time and <4 GB memory using eight CPU threads, achieving significant improvement in memory-usage over competing methods. Recall accuracy of computed alignment boundaries was consistently found to be >97% on multiple datasets. Finally, we performed a sensitive self-alignment of the human genome to compute all duplications of length ≥1 Kbp and ≥90% identity. The reported output achieves good recall and covers twice the number of bases than the current UCSC browser's segmental duplication annotation. Availability and implementation: https://github.com/marbl/MashMap.
Assuntos
Algoritmos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequência de Bases , Mapeamento Cromossômico , Genoma Humano , Genômica/métodos , Humanos , Duplicações Segmentares Genômicas , Alinhamento de Sequência , Software , Fatores de TempoRESUMO
Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease.
Assuntos
Asma/genética , Variações do Número de Cópias de DNA/genética , Dermatite Atópica/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Receptores KIR/classificação , Receptores KIR/genética , Estudos de Casos e Controles , Estudos de Coortes , Europa (Continente) , Família , Feminino , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Análise de Sequência de DNARESUMO
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant challenge to practical application.
Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genética Populacional , Genoma Humano/genética , Proteína da Hemocromatose/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Valores de ReferênciaRESUMO
Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.
Assuntos
Predisposição Genética para Doença/genética , Imunidade Celular/imunologia , Esclerose Múltipla/genética , Esclerose Múltipla/imunologia , Alelos , Diferenciação Celular/imunologia , Europa (Continente)/etnologia , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Antígenos HLA-A/genética , Antígenos HLA-DR/genética , Cadeias HLA-DRB1 , Humanos , Imunidade Celular/genética , Complexo Principal de Histocompatibilidade/genética , Polimorfismo de Nucleotídeo Único/genética , Tamanho da Amostra , Linfócitos T Auxiliares-Indutores/citologia , Linfócitos T Auxiliares-Indutores/imunologiaRESUMO
An important paradigm in evolutionary genetics is that of a delicate balance between genetic variants that favorably boost host control of infection but which may unfavorably increase susceptibility to autoimmune disease. Here, we investigated whether patients with psoriasis, a common immune-mediated disease of the skin, are enriched for genetic variants that limit the ability of HIV-1 virus to replicate after infection. We analyzed the HLA class I and class II alleles of 1,727 Caucasian psoriasis cases and 3,581 controls and found that psoriasis patients are significantly more likely than controls to have gene variants that are protective against HIV-1 disease. This includes several HLA class I alleles associated with HIV-1 control; amino acid residues at HLA-B positions 67, 70, and 97 that mediate HIV-1 peptide binding; and the deletion polymorphism rs67384697 associated with high surface expression of HLA-C. We also found that the compound genotype KIR3DS1 plus HLA-B Bw4-80I, which respectively encode a natural killer cell activating receptor and its putative ligand, significantly increased psoriasis susceptibility. This compound genotype has also been associated with delay of progression to AIDS. Together, our results suggest that genetic variants that contribute to anti-viral immunity may predispose to the development of psoriasis.
Assuntos
Genes MHC da Classe II , Genes MHC Classe I , Psoríase/genética , Psoríase/imunologia , Genes MHC Classe I/imunologia , Genes MHC da Classe II/imunologia , Estudos de Associação Genética , Predisposição Genética para Doença , Infecções por HIV/genética , Infecções por HIV/imunologia , HIV-1/genética , HIV-1/patogenicidade , Antígenos HLA-B/genética , Antígenos HLA-C/genética , Humanos , Células Matadoras Naturais/imunologia , Células Matadoras Naturais/metabolismo , Células Matadoras Naturais/virologia , Polimorfismo Genético , Ligação Proteica , Receptores KIR3DS1/genéticaRESUMO
Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007) and Ron et al. (1998). HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%). On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework.
Assuntos
Biologia Computacional/métodos , Genética Populacional/métodos , Antígenos HLA/genética , Modelos Genéticos , Modelos Imunológicos , Haplótipos , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Grupos Raciais , Reprodutibilidade dos Testes , SoftwareRESUMO
16S rRNA targeted amplicon sequencing is an established standard for elucidating microbial community composition. While high-throughput short-read sequencing can elicit only a portion of the 16S rRNA gene due to their limited read length, third generation sequencing can read the 16S rRNA gene in its entirety and thus provide more precise taxonomic classification. Here, we present a protocol for generating full-length 16S rRNA sequences with Oxford Nanopore Technologies (ONT) and a microbial community profile with Emu. We select Emu for analyzing ONT sequences as it leverages information from the entire community to overcome errors due to incomplete reference databases and hardware limitations to ultimately obtain species-level resolution. This pipeline provides a low-cost solution for characterizing microbiome composition by exploiting real-time, long-read ONT sequencing and tailored software for accurate characterization of microbial communities. © 2024 Wiley Periodicals LLC. Basic Protocol: Microbial community profiling with Emu Support Protocol 1: Full-length 16S rRNA microbial sequences with Oxford Nanopore Technologies sequencing platform Support Protocol 2: Building a custom reference database for Emu.