RESUMO
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Assuntos
Genoma Humano , Genômica , Humanos , Diploide , Genoma Humano/genética , Haplótipos/genética , Análise de Sequência de DNA , Genômica/normas , Padrões de Referência , Estudos de Coortes , Alelos , Variação GenéticaRESUMO
Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships.
RESUMO
MOTIVATION: Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. However, current construction methods often introduce biases, excluding complex sequences or relying on references. The PanGenome Graph Builder (PGGB) addresses these issues. To date, though, there is no state-of-the-art pipeline allowing for easy deployment, efficient and dynamic use of available resources, and scalable usage at the same time. RESULTS: To overcome these limitations, we present nf-core/pangenome, a reference-unbiased approach implemented in Nextflow following nf-core's best practices. Leveraging biocontainers ensures portability and seamless deployment in HPC environments. Unlike PGGB, nf-core/pangenome distributes alignments across cluster nodes, enabling scalability. Demonstrating its efficiency, we constructed pangenome graphs for 1000 human chromosome 19 haplotypes and 2146 E. coli sequences, achieving a two to threefold speedup compared to PGGB without increasing greenhouse gas emissions. AVAILABILITY: Nf-core/pangenome is released under the MIT open-source license, available on GitHub and Zenodo, with documentation accessible at https://nf-co.re/pangenome/1.1.2/docs/usage. SUPPLEMENTARY: Supplementary data are available at Bioinformatics online.
RESUMO
MOTIVATION: The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human-readable graph layout: a graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge. RESULTS: In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by SGD. We show that our implementation efficiently computes the low-dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features. AVAILABILITY AND IMPLEMENTATION: We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
Assuntos
Algoritmos , Software , Humanos , Genômica/métodos , Gráficos por Computador , GenomaRESUMO
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets.
Assuntos
COVID-19 , Biologia Computacional , Receptores de Antígenos de Linfócitos T , SARS-CoV-2 , Fluxo de Trabalho , Humanos , COVID-19/imunologia , COVID-19/virologia , COVID-19/genética , SARS-CoV-2/imunologia , SARS-CoV-2/genética , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/imunologia , Biologia Computacional/métodos , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/imunologia , Software , Análise de Célula Única/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunidade Adaptativa/genética , Linfócitos B/imunologia , Linfócitos T/imunologiaRESUMO
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.
Assuntos
Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNARESUMO
MOTIVATION: Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. RESULTS: We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. AVAILABILITY AND IMPLEMENTATION: ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Software , Genômica , Algoritmos , DocumentaçãoRESUMO
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease mainly affecting upper and lower motoneurons. Several functionally heterogeneous genes have been associated with the familial form of this disorder (fALS), depicting an extremely complex pathogenic landscape. This heterogeneity has limited the identification of an effective therapy, and this bleak prognosis will only improve with a greater understanding of convergent disease mechanisms. Recent evidence from human post-mortem material and diverse model systems has highlighted the synapse as a crucial structure actively involved in disease progression, suggesting that synaptic aberrations might represent a shared pathological feature across the ALS spectrum. To test this hypothesis, we performed the first comprehensive analysis of the synaptic proteome from post-mortem spinal cord and human iPSC-derived motoneurons carrying mutations in the major ALS genes. This integrated approach highlighted perturbations in the molecular machinery controlling vesicle release as a shared pathomechanism in ALS. Mechanistically, phosphoproteomic analysis linked the presynaptic vesicular phenotype to an accumulation of cytotoxic protein aggregates and to the pro-apoptotic activation of the transcription factor c-Jun, providing detailed insights into the shared pathobiochemistry in ALS. Notably, sub-chronic treatment of our iPSC-derived motoneurons with the fatty acid docosahexaenoic acid exerted a neuroprotective effect by efficiently rescuing the alterations revealed by our multidisciplinary approach. Together, this study provides strong evidence for the central and convergent role played by the synaptic microenvironment within the ALS spinal cord and highlights a potential therapeutic target that counteracts degeneration in a heterogeneous cohort of human motoneuron cultures.
Assuntos
Esclerose Lateral Amiotrófica , Doenças Neurodegenerativas , Humanos , Esclerose Lateral Amiotrófica/patologia , Doenças Neurodegenerativas/patologia , Proteômica , Superóxido Dismutase-1/genética , Neurônios Motores/metabolismoRESUMO
MOTIVATION: Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this data model can be difficult, and the scale of pangenomic datasets can be challenging to work at. These challenges have impeded progress in this field. RESULTS: Here, we present a stack of two C++ libraries, libbdsg and libhandlegraph, which use a simple, field-proven interface, designed to expose elementary features of these graphs while preventing common graph manipulation mistakes. The libraries also provide a Python binding. Using a diverse collection of pangenome graphs, we demonstrate that these tools allow for efficient construction and manipulation of large genome graphs with dense variation. For instance, the speed and memory usage are up to an order of magnitude better than the prior graph implementation in the VG toolkit, which has now transitioned to using libbdsg's implementations. AVAILABILITY AND IMPLEMENTATION: libhandlegraph and libbdsg are available under an MIT License from https://github.com/vgteam/libhandlegraph and https://github.com/vgteam/libbdsg.
Assuntos
Bibliotecas , Software , Genoma , GenômicaRESUMO
BACKGROUND: Immunotherapy with immune checkpoint inhibitors (ICI) has revolutionized cancer therapy. However, therapeutic targeting of inhibitory T cell receptors such as PD-1 not only initiates a broad immune response against tumors, but also causes severe adverse effects. An ideal future stratified immunotherapy would interfere with cancer-specific cell surface receptors only. METHODS: To identify such candidates, we profiled the surface receptors of the NCI-60 tumor cell panel via flow cytometry. The resulting surface receptor expression data were integrated into proteomic and transcriptomic NCI-60 datasets applying a sophisticated multiomics multiple co-inertia analysis (MCIA). This allowed us to identify surface profiles for skin, brain, colon, kidney, and bone marrow derived cell lines and cancer entity-specific cell surface receptor biomarkers for colon and renal cancer. RESULTS: For colon cancer, identified biomarkers are CD15, CD104, CD324, CD326, CD49f, and for renal cancer, CD24, CD26, CD106 (VCAM1), EGFR, SSEA-3 (B3GALT5), SSEA-4 (TMCC1), TIM1 (HAVCR1), and TRA-1-60R (PODXL). Further data mining revealed that CD106 (VCAM1) in particular is a promising novel immunotherapeutic target for the treatment of renal cancer. CONCLUSION: Altogether, our innovative multiomics analysis of the NCI-60 panel represents a highly valuable resource for uncovering surface receptors that could be further exploited for diagnostic and therapeutic purposes in the context of cancer immunotherapy.
RESUMO
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).
RESUMO
Motivation: The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge. Results: In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features. Availability: We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
RESUMO
Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
RESUMO
The trait-based approach in plant ecology aims at understanding and classifying the diversity of ecological strategies by comparing plant morphology and physiology across organisms. The major drawback of the approach is that the time and financial cost of measuring the traits on many individuals and environments can be prohibitive. We show that combining near-infrared spectroscopy (NIRS) with deep learning resolves this limitation by quickly, non-destructively, and accurately measuring a suite of traits, including plant morphology, chemistry, and metabolism. Such an approach also allows to position plants within the well-known CSR triangle that depicts the diversity of plant ecological strategies. The processing of NIRS through deep learning identifies the effect of growth conditions on trait values, an issue that plagues traditional statistical approaches. Together, the coupling of NIRS and deep learning is a promising high-throughput approach to capture a range of ecological information on plant diversity and functioning and can accelerate the creation of extensive trait databases.
RESUMO
Double negative (DN) (CD19+CD20lowCD27-IgD-) B cells are expanded in patients with autoimmune and infectious diseases; however their role in the humoral immune response remains unclear. Using systematic flow cytometric analyses of peripheral blood B cell subsets, we observed an inflated DN B cell population in patients with variety of active inflammatory conditions: myasthenia gravis, Guillain-Barré syndrome, neuromyelitis optica spectrum disorder, meningitis/encephalitis, and rheumatic disorders. Furthermore, we were able to induce DN B cells in healthy subjects following vaccination against influenza and tick borne encephalitis virus. Transcriptome analysis revealed a gene expression profile in DN B cells that clustered with naïve B cells, memory B cells, and plasmablasts. Immunoglobulin VH transcriptome sequencing and analysis of recombinant antibodies revealed clonal expansion of DN B cells that were targeted against the vaccine antigen. Our study suggests that DN B cells are expanded in multiple inflammatory neurologic diseases and represent an inducible B cell population that responds to antigenic stimulation, possibly through an extra-follicular maturation pathway.
Assuntos
Linfócitos B/imunologia , Proliferação de Células , Doenças Transmissíveis/imunologia , Imunogenicidade da Vacina , Inflamação/imunologia , Ativação Linfocitária , Vacinas Virais/imunologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Anticorpos Antivirais/sangue , Antígenos CD19/metabolismo , Antígenos CD20/metabolismo , Linfócitos B/metabolismo , Estudos de Casos e Controles , Doenças Transmissíveis/sangue , Doenças Transmissíveis/genética , Doenças Transmissíveis/virologia , Vírus da Encefalite Transmitidos por Carrapatos/imunologia , Feminino , Humanos , Imunidade Humoral , Inflamação/sangue , Inflamação/genética , Vacinas contra Influenza/administração & dosagem , Vacinas contra Influenza/imunologia , Masculino , Pessoa de Meia-Idade , Fenótipo , Transcriptoma , Membro 7 da Superfamília de Receptores de Fatores de Necrose Tumoral/metabolismo , Vacinação , Vacinas Virais/administração & dosagem , Adulto JovemRESUMO
Dendritic cells (DCs) are key players of the immune system and thus a target for immune evasion by pathogens. We recently showed that the virulence factors phenol-soluble-modulins (PSMs) produced by community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) strains induce tolerogenic DCs upon Toll-like receptor activation via the p38-CREB-IL-10 pathway in vitro. Here, we addressed the hypothesis that S. aureus PSMs disturb the adaptive immune response via modulation of DC subsets in vivo. Using a systemic mouse infection model we found that S. aureus reduced the numbers of splenic DC subsets, mainly CD4+ and CD8+ DCs independently of PSM secretion. S. aureus infection induced upregulation of the C-C motif chemokine receptor 7 (CCR7) on the surface of all DC subsets, on CD4+ DCs in a PSM-dependent manner, together with increased expression of MHCII, CD86, CD80, CD40, and the co-inhibitory molecule PD-L2, with only minor effects of PSMs. Moreover, PSMs increased IL-10 production in the spleen and impaired TNF production by CD4+ DCs. Besides, S. aureus PSMs reduced the number of CD4+ T cells in the spleen, whereas CD4+CD25+Foxp3+ regulatory T cells (Tregs) were increased. In contrast, Th1 and Th17 priming and IFN-γ production by CD8+ T cells were impaired by S. aureus PSMs. Thus, PSMs from highly virulent S. aureus strains modulate the adaptive immune response in the direction of tolerance by affecting DC functions.
Assuntos
Imunidade Adaptativa/imunologia , Toxinas Bacterianas/imunologia , Células Dendríticas/imunologia , Staphylococcus aureus Resistente à Meticilina/imunologia , Infecções Estafilocócicas/imunologia , Animais , Linfócitos T CD8-Positivos/imunologia , Citocinas/metabolismo , Feminino , Tolerância Imunológica/imunologia , Staphylococcus aureus Resistente à Meticilina/genética , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Infecções Estafilocócicas/microbiologia , Linfócitos T Reguladores/imunologia , Células Th1/imunologia , Células Th17/imunologiaRESUMO
Psoriasis is a frequent systemic inflammatory autoimmune disease characterized primarily by skin lesions with massive infiltration of leukocytes, but frequently also presents with cardiovascular comorbidities. Especially polymorphonuclear neutrophils (PMNs) abundantly infiltrate psoriatic skin but the cues that prompt PMNs to home to the skin are not well-defined. To identify PMN surface receptors that may explain PMN skin homing in psoriasis patients, we screened 332 surface antigens on primary human blood PMNs from healthy donors and psoriasis patients. We identified platelet surface antigens as a defining feature of psoriasis PMNs, due to a significantly increased aggregation of neutrophils and platelets in the blood of psoriasis patients. Similarly, in the imiquimod-induced experimental in vivo mouse model of psoriasis, disease induction promoted PMN-platelet aggregate formation. In psoriasis patients, disease incidence directly correlated with blood platelet counts and platelets were detected in direct contact with PMNs in psoriatic but not healthy skin. Importantly, depletion of circulating platelets in mice in vivo ameliorated disease severity significantly, indicating that both PMNs and platelets may be relevant for psoriasis pathology and disease severity.
Assuntos
Plaquetas/imunologia , Neutrófilos/imunologia , Agregação Plaquetária/imunologia , Psoríase/imunologia , Pele/patologia , Adulto , Animais , Humanos , Imiquimode/toxicidade , Camundongos , Camundongos Endogâmicos C57BL , Ativação Plaquetária/imunologia , Contagem de Plaquetas , Psoríase/patologiaRESUMO
The population dynamics of the Pleistocene woolly mammoth (Mammuthus primigenius) has been the subject of intensive palaeogenetic research. Although a large number of mitochondrial genomes across Eurasia have been reconstructed, the available data remains geographically sparse and mostly focused on eastern Eurasia. Thus, population dynamics in other regions have not been extensively investigated. Here, we use a multi-method approach utilising proteomic, stable isotope and genetic techniques to identify and generate twenty woolly mammoth mitochondrial genomes, and associated dietary stable isotopic data, from highly fragmentary Late Pleistocene material from central Europe. We begin to address region-specific questions regarding central European woolly mammoth populations, highlighting parallels with a previous replacement event in eastern Eurasia ten thousand years earlier. A high number of shared derived mutations between woolly mammoth mitochondrial clades are identified, questioning previous phylogenetic analysis and thus emphasizing the need for nuclear DNA studies to explicate the increasingly complex genetic history of the woolly mammoth.