RESUMO
Gene therapies using recombinant adeno-associated virus (AAV) vectors have demonstrated considerable clinical success in the treatment of genetic disorders. Improved vectors with favorable tropism profiles, decreased immunogenicity, and enhanced manufacturability are poised to further improve the state of gene therapies. Such vectors can be identified through directed evolution, a process of subjecting a diverse capsid library to a selection pressure to identify individual variants with a desired trait. Currently, libraries that involve changes distributed throughout the AAV capsid coding region, such as DNA family shuffled libraries, are largely characterized using low-throughput Sanger sequencing of individual clones. However, improvements in long-read sequencing technologies have increased their applicability to capsid libraries and evaluation of the selection process. Here, we explore the application of Oxford Nanopore Technologies refined by a concatemeric consensus method for initial library characterization and monitoring selection of a shuffled AAV capsid library. Furthermore, we present AAVolve, a bioinformatic pipeline for processing long-read data from AAV-directed evolution experiments. Our approach allows high-throughput characterization of AAV capsids in a streamlined manner, facilitating deeper insights into library composition through multiple rounds of selection, and generalization through training of machine learning models.
RESUMO
The flagellar motors of Campylobacter jejuni (C. jejuni) and related Campylobacterota (previously epsilonproteobacteria) feature 100-nm-wide periplasmic "basal disks" that have been implicated in scaffolding a wider ring of additional motor proteins to increase torque, but the size of these disks is excessive for a role solely in scaffolding motor proteins. Here, we show that the basal disk is a flange that braces the flagellar motor during disentanglement of its flagellar filament from interactions with the cell body and other filaments. We show that motor output is unaffected when we shrink or displace the basal disk, and suppressor mutations of debilitated motors occur in flagellar-filament or cell-surface glycosylation pathways, thus sidestepping the need for a flange to overcome the interactions between two flagellar filaments and between flagellar filaments and the cell body. Our results identify unanticipated co-dependencies in the evolution of flagellar motor structure and cell-surface properties in the Campylobacterota.
RESUMO
Seasonal influenza viruses continuously evolve via antigenic drift. This leads to recurring epidemics, globally significant mortality rates, and the need for annually updated vaccines. Co-occurring mutations in hemagglutinin (HA) and neuraminidase (NA) are suggested to have synergistic interactions where mutations can increase the chances of immune escape and viral fitness. Association rule mining was used to identify temporal relationships of co-occurring HA-NA mutations of influenza virus A/H3N2 and its role in antigenic evolution. A total of 64 clusters were found. These included well-known mutations responsible for antigenic drift, as well as previously undiscovered groups. A majority (41/64) were associated with known antigenic sites, and 38/64 involved mutations across both HA and NA. The emergence and disappearance of N-glycosylation sites in the pattern of N-X-[S/T] were also identified, which are crucial post-translational processes to maintain protein stability and functional balance (e.g., emergence of NA:339ASP and disappearance of HA:187ASP). Our study offers an alternative approach to the existing mutual-information and phylogenetic methods used to identify co-occurring mutations, enabling faster processing of large amounts of data. Our approach can facilitate the prediction of critical mutations given their occurrence in a previous season, facilitating vaccine development for the next flu season and leading to better preparation for future pandemics.
Assuntos
Evolução Molecular , Glicoproteínas de Hemaglutininação de Vírus da Influenza , Vírus da Influenza A Subtipo H3N2 , Influenza Humana , Mutação , Neuraminidase , Neuraminidase/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Humanos , Influenza Humana/virologia , Influenza Humana/epidemiologia , Vírus da Influenza A Subtipo H3N2/genética , Vírus da Influenza A Subtipo H3N2/imunologia , Glicosilação , Proteínas Virais/genética , Antígenos Virais/genética , Antígenos Virais/imunologia , Deriva e Deslocamento Antigênicos/genética , FilogeniaRESUMO
Colicins are antimicrobial proteins produced by certain strains of Escherichia coli that function as offensive weapons against closely-related competitor strains. Their bactericidal properties and narrow bacterial targeting range has made them of therapeutic interest. Furthermore, the applications of engineered non-bactericidal colicins are of interest as a cell surface-directed protein anchor for decorating E. coli with biomolecules. We previously demonstrated that an engineered non-bacteriocidal colicin E9 could be used to label bacterial cells with multiple biomolecules including glycans. Herein we extend our approach to colicin Ia, constructing mannose-presenting colicin la neoglycoproteins, through N-terminal organocatalyst-mediated protein aldol ligation (OPAL), or maleimide ligation targeting an internal cysteine. This work further highlights the potential utility of engineered colicins for non-genetic glyco-engineering of the E. coli cell surface.
RESUMO
DNA sequences of nearly any desired composition, length, and function can be synthesized to alter the biology of an organism for purposes ranging from the bioproduction of therapeutic compounds to invasive pest control. Yet despite offering many great benefits, engineered DNA poses a risk due to their possible misuse or abuse by malicious actors, or their unintentional introduction into the environment. Monitoring the presence of engineered DNA in biological or environmental systems is therefore crucial for routine and timely detection of emerging biological threats, and for improving public acceptance of genetic technologies. To address this, we developed Synsor, a tool for identifying engineered DNA sequences in high-throughput sequencing data. Synsor leverages the k-mer signature differences between naturally occurring and engineered DNA sequences and uses an artificial neural network to classify whether a DNA sequence is natural or engineered. By querying suspected sequences against the model, Synsor can identify sequences that are likely to have been engineered. Using natural plasmid and engineered vector sequences, we showed that Synsor identifies engineered DNA with >99% accuracy. We demonstrate how Synsor can be used to detect potential genetically engineered organisms and locate where engineered DNA is being introduced into the environment by analysing genomic and metagenomic data from yeast and wastewater samples, respectively. Synsor is therefore a powerful tool that will streamline the process of identifying engineered DNA in poorly characterized biological or environmental systems, thereby allowing for enhanced monitoring of emerging biological threats.
RESUMO
Biofilm formation is integral to the pathogenesis of numerous adherent bacteria and contributes to antimicrobial resistance (AMR). The rising threat of AMR means the need to develop novel nonbactericidal antiadhesion approaches against such bacteria is more urgent than ever. Both adherent-invasive Escherichia coli (AIEC, implicated in inflammatory bowel disease) and uropathogenic E. coli (UPEC, responsible for â¼80% of urinary tract infections) adhere to terminal mannose sugars on epithelial glycoproteins through the FimH adhesin on their type 1 pilus. Although mannose-based inhibitors have previously been explored to inhibit binding of adherent bacteria to epithelial cells, this approach has been limited by monovalent carbohydrate-protein interactions. Herein, we pioneer a novel approach to this problem through the preparation of colicin E9 bioconjugates that bind to the abundant BtuB receptor in the outer membrane of bacteria, which enables multivalent presentation of functional motifs on the cell surface. We show these bioconjugates label the surface of live E. coli and furthermore demonstrate that mannose-presenting "glyco-colicins" induce E. coli aggregation, thereby using the bacteria, itself, as a multivalent platform for mannose display, which triggers binding to adjacent FimH-presenting bacteria.
RESUMO
The three-dimensional swimming tracks of motile microorganisms can be used to identify their species, which holds promise for the rapid identification of bacterial pathogens. The tracks also provide detailed information on the cells' responses to external stimuli such as chemical gradients and physical objects. Digital holographic microscopy (DHM) is a well-established, but computationally intensive method for obtaining three-dimensional cell tracks from video microscopy data. We demonstrate that a common neural network (NN) accelerates the analysis of holographic data by an order of magnitude, enabling its use on single-board computers and in real time. We establish a heuristic relationship between the distance of a cell from the focal plane and the size of the bounding box assigned to it by the NN, allowing us to rapidly localise cells in three dimensions as they swim. This technique opens the possibility of providing real-time feedback in experiments, for example by monitoring and adapting the supply of nutrients to a microbial bioreactor in response to changes in the swimming phenotype of microbes, or for rapid identification of bacterial pathogens in drinking water or clinical samples.
Assuntos
Aprendizado Profundo , Holografia , Microscopia , Holografia/métodos , Microscopia/métodos , Imageamento Tridimensional/métodos , Bactérias , Imageamento Quantitativo de FaseRESUMO
With the advancement of genomic engineering and genetic modification techniques, the uptake of computational tools to design guide RNA increased drastically. Searching for genomic targets to design guides with maximum on-target activity (efficiency) and minimum off-target activity (specificity) is now an essential part of genome editing experiments. Today, a variety of tools exist that allow the search of genomic targets and let users customize their search parameters to better suit their experiments. Here we present an overview of different ways to visualize these searched CRISPR target sites along with specific downstream information like primer design, restriction enzyme activity and mutational outcome prediction after a double-stranded break. We discuss the importance of a good visualization summary to interpret information along with different ways to represent similar information effectively.
Assuntos
Sistemas CRISPR-Cas , Visualização de Dados , RNA Guia de Sistemas CRISPR-Cas , Engenharia , GenômicaRESUMO
Realization of the immense therapeutic potential of epigenetic editing requires development of clinically predictive model systems that faithfully recapitulate relevant aspects of the target disease pathophysiology. In female patients with ornithine transcarbamylase (OTC) deficiency, an X-linked condition, skewed inactivation of the X chromosome carrying the wild-type OTC allele is associated with increased disease severity. The majority of affected female patients can be managed medically, but a proportion require liver transplantation. With rapid development of epigenetic editing technology, reactivation of silenced wild-type OTC alleles is becoming an increasingly plausible therapeutic approach. Toward this end, privileged access to explanted diseased livers from two affected female infants provided the opportunity to explore whether engraftment and expansion of dissociated patient-derived hepatocytes in the FRG mouse might produce a relevant model for evaluation of epigenetic interventions. Hepatocytes from both infants were successfully used to generate chimeric mouse-human livers, in which clusters of primary human hepatocytes were either OTC positive or negative by immunohistochemistry (IHC), consistent with clonal expansion from individual hepatocytes in which the mutant or wild-type OTC allele was inactivated, respectively. Enumeration of the proportion of OTC-positive or -negative human hepatocyte clusters was consistent with dramatic skewing in one infant and minimal to modest skewing in the other. Importantly, IHC and fluorescence-activated cell sorting analysis of intact and dissociated liver samples from both infants showed qualitatively similar patterns, confirming that the chimeric mouse-human liver model recapitulated the native state in each infant. Also of importance was the induction of a treatable metabolic phenotype, orotic aciduria, in mice, which correlated with the presence of clonally expanded OTC-negative primary human hepatocytes. We are currently using this unique model to explore CRISPR-dCas9-based epigenetic targeting strategies in combination with efficient adeno-associated virus (AAV) gene delivery to reactivate the silenced functional OTC gene on the inactive X chromosome.
Assuntos
Doença da Deficiência de Ornitina Carbomoiltransferase , Ornitina Carbamoiltransferase , Lactente , Humanos , Camundongos , Feminino , Animais , Ornitina Carbamoiltransferase/genética , Inativação do Cromossomo X/genética , Hepatócitos , Fígado , Doença da Deficiência de Ornitina Carbomoiltransferase/genética , Doença da Deficiência de Ornitina Carbomoiltransferase/terapiaRESUMO
Due to the high mutation rate of the virus, the COVID-19 pandemic evolved rapidly. Certain variants of the virus, such as Delta and Omicron emerged with altered viral properties leading to severe transmission and death rates. These variants burdened the medical systems worldwide with a major impact to travel, productivity, and the world economy. Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data. This paper presents a framework that utilizes unsupervised machine learning methods to discriminate and visualize the associations between major COVID-19 variants based on their genome sequences. These methods comprise a combination of selected dimensionality reduction and clustering techniques. The framework processes the RNA sequences by performing a k-mer analysis on the data and further visualises and compares the results using selected dimensionality reduction methods that include principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Our framework also employs agglomerative hierarchical clustering to visualize the mutational differences among major variants of concern and country-wise mutational differences for selected variants (Delta and Omicron) using dendrograms. We also provide country-wise mutational differences for selected variants via dendrograms. We find that the proposed framework can effectively distinguish between the major variants and has the potential to identify emerging variants in the future.
Assuntos
COVID-19 , Aprendizado de Máquina não Supervisionado , Humanos , Algoritmos , Pandemias , COVID-19/epidemiologia , COVID-19/genética , SARS-CoV-2/genéticaRESUMO
The liver is a prime target for in vivo gene therapies using recombinant adeno-associated viral vectors. Multiple clinical trials have been undertaken for this target in the past 15 years; however, we are still to see market approval of the first liver-targeted adeno-associated virus (AAV)-based gene therapy. Inefficient expression of the therapeutic transgene, vector-induced liver toxicity and capsid, and/or transgene-mediated immune responses reported at high vector doses are the main challenges to date. One of the contributing factors to the insufficient clinical outcomes, despite highly encouraging preclinical data, is the lack of robust, biologically and clinically predictive preclinical models. To this end, this study reports findings of a functional evaluation of 6 AAV vectors in 12 preclinical models of the human liver, with the aim to uncover which combination of models is the most relevant for the identification of AAV capsid variant for safe and efficient transgene delivery to primary human hepatocytes. The results, generated by studies in models ranging from immortalized cells, iPSC-derived and primary hepatocytes, and primary human hepatic organoids to in vivo models, increased our understanding of the strengths and weaknesses of each system. This should allow the development of novel gene therapies targeting the human liver.
Assuntos
Dependovirus , Fígado , Humanos , Dependovirus/genética , Fígado/metabolismo , Terapia Genética/métodos , Hepatócitos/metabolismo , Proteínas do Capsídeo/metabolismo , Tropismo , Vetores Genéticos/genéticaRESUMO
New SARS-CoV-2 variants emerge as part of the virus' adaptation to the human host. The Health Organizations are monitoring newly emerging variants with suspected impact on disease or vaccination efficacy as Variants Being Monitored (VBM), like Delta and Omicron. Genetic changes (SNVs) compared to the Wuhan variant characterize VBMs with current emphasis on the spike protein and lineage markers. However, monitoring VBMs in such a way might miss SNVs with functional effect on disease. Here we introduce a lineage-agnostic genome-wide approach to identify SNVs associated with disease. We curated a case-control dataset of 10,520 samples and identified 117 SNVs significantly associated with adverse patient outcome. While 40% (47) SNV are already monitored and 36% (43) are in the spike protein, we also identified 70 new SNVs that are associated with disease outcome. 31 of these are disease-worsening and predominantly located in the 3'-5' exonuclease (NSP14) with structural modelling revealing a concise cluster in the Zn binding domain that has known host-immune modulating function. Furthermore, we generate clade-independent VBM groupings by identifying interacting SNVs (epistasis). We find 37 sets of higher-order epistatic interactions joining 5 genomic regions (nsp3, nsp14, Spike S1, ORF3a, N). Structural modelling of these regions provides insights into potential mechanistic pathways of increased virulence as well as orthogonal methods of validation. Clade-independent monitoring of functionally interacting (epistasis, co-evolution) SNVs detected emerging VBM a week before they were flagged by Health Organizations and in conjunction with structural modelling provides faster, mechanistic insight into emerging strains to guide public health interventions.
RESUMO
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Assuntos
Proteômica , Espectrometria de Massas em Tandem , Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Peptídeos/genética , Peptídeos/metabolismo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA/metabolismo , Análise de Sequência de RNA , Espectrometria de Massas em Tandem/métodos , TranscriptomaRESUMO
Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex characteristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the integrated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snakemake [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a reference containing integrations, from which sequencing reads are simulated using ART [5]. The IDs of the reads crossing integration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6].
RESUMO
Recent clinical successes have intensified interest in using adeno-associated virus (AAV) vectors for therapeutic gene delivery. The liver is a key clinical target, given its critical physiological functions and involvement in a wide range of genetic diseases. Here, we report the bioengineering of a set of next-generation AAV vectors, named AAV-SYDs (where "SYD" stands for Sydney, Australia), with increased human hepato-tropism in a liver xenograft mouse model repopulated with primary human hepatocytes. We followed a two-step process that staggered directed evolution and domain-swapping approaches. Using DNA-family shuffling, we first mapped key AAV capsid regions responsible for efficient human hepatocyte transduction in vivo. Focusing on these regions, we next applied domain-swapping strategies to identify and study key capsid residues that enhance primary human hepatocyte uptake and transgene expression. Our findings underscore the potential of AAV-SYDs as liver gene therapy vectors and provide insights into the mechanism responsible for their enhanced transduction profile.
RESUMO
Detecting viral and vector integration events is a key step when investigating interactions between viral and host genomes. This is relevant in several fields, including virology, cancer research and gene therapy. For example, investigating integrations of wild-type viruses such as human papillomavirus and hepatitis B virus has proven to be crucial for understanding the role of these integrations in cancer. Furthermore, identifying the extent of vector integration is vital for determining the potential for genotoxicity in gene therapies. To address these questions, we developed isling, the first tool specifically designed for identifying viral integrations in both wild-type and vector from next-generation sequencing data. Isling addresses complexities in integration behaviour including integration of fragmented genomes and integration junctions with ambiguous locations in a host or vector genome, and can also flag possible vector recombinations. We show that isling is up to 1.6-fold faster and up to 170% more accurate than other viral integration tools, and performs well on both simulated and real datasets. Isling is therefore an efficient and application-agnostic tool that will enable a broad range of investigations into viral and vector integration. These include comparisons between integrations of wild-type viruses and gene therapy vectors, as well as assessing the genotoxicity of vectors and understanding the role of viruses in cancer.
Assuntos
Terapia Genética , Vetores Genéticos , Software , Integração Viral , Alphapapillomavirus/fisiologia , Vetores Genéticos/fisiologia , Vírus da Hepatite B/fisiologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/virologiaRESUMO
Precise genomic modification using prime editing (PE) holds enormous potential for research and clinical applications. In this study, we generated all-in-one prime editing (PEA1) constructs that carry all the components required for PE, along with a selection marker. We tested these constructs (with selection) in HEK293T, K562, HeLa and mouse embryonic stem (ES) cells. We discovered that PE efficiency in HEK293T cells was much higher than previously observed, reaching up to 95% (mean 67%). The efficiency in K562 and HeLa cells, however, remained low. To improve PE efficiency in K562 and HeLa, we generated a nuclease prime editor and tested this system in these cell lines as well as mouse ES cells. PE-nuclease greatly increased prime editing initiation, however, installation of the intended edits was often accompanied by extra insertions derived from the repair template. Finally, we show that zygotic injection of the nuclease prime editor can generate correct modifications in mouse fetuses with up to 100% efficiency.
Assuntos
Proteína 9 Associada à CRISPR , Edição de Genes , Animais , Proteína 9 Associada à CRISPR/genética , Células Cultivadas , Células-Tronco Embrionárias/metabolismo , Células HEK293 , Células HeLa , Humanos , Células K562 , Camundongos , Plasmídeos/genética , ZigotoRESUMO
External DNA sequences can be inserted into an organism's genome either through natural processes such as gene transfer, or through targeted genome engineering strategies. Being able to robustly identify such foreign DNA is a crucial capability for health and biosecurity applications, such as anti-microbial resistance (AMR) detection or monitoring gene drives. This capability does not exist for poorly characterised host genomes or with limited information about the integrated sequence. To address this, we developed the INserted Sequence Information DEtectoR (INSIDER). INSIDER analyses whole genome sequencing data and identifies segments of potentially foreign origin by their significant shift in k-mer signatures. We demonstrate the power of INSIDER to separate integrated DNA sequences from normal genomic sequences on a synthetic dataset simulating the insertion of a CRISPR-Cas gene drive into wild-type yeast. As a proof-of-concept, we use INSIDER to detect the exact AMR plasmid in whole genome sequencing data from a Citrobacter freundii patient isolate. INSIDER streamlines the process of identifying integrated DNA in poorly characterised wild species or when the insert is of unknown origin, thus enhancing the monitoring of emerging biosecurity threats.