Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 37(22): 4230-4232, 2021 11 18.
Article in English | MEDLINE | ID: mdl-33978747

ABSTRACT

MOTIVATION: Recombinant DNA technology is widely used for different applications in biology, medicine and bio-technology. Viral transduction and plasmid transfection are among the most frequently used techniques to generate recombinant cell lines. Many of these methods result in the random integration of the plasmid into the host genome. Rapid identification of the integration sites is highly desirable in order to characterize these engineered cell lines. RESULTS: We developed detectIS: a pipeline specifically designed to identify genomic integration sites of exogenous DNA, either a plasmid containing one or more transgenes or a virus. The pipeline is based on a Nextflow workflow combined with a Singularity image containing all the necessary software, ensuring high reproducibility and scalability of the analysis. We tested it on simulated datasets and RNA-seq data from a human sample infected with Hepatitis B virus. Comparisons with other state of the art tools show that our method can identify the integration site in different recombinant cell lines, with accurate results, lower computational demand and shorter execution times. AVAILABILITY AND IMPLEMENTATION: The Nextflow workflow, the Singularity image and a test dataset are available at https://github.com/AstraZeneca/detectIS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
RNA , Software , Humans , Reproducibility of Results , Genomics , DNA
2.
MAbs ; 11(8): 1452-1463, 2019.
Article in English | MEDLINE | ID: mdl-31570042

ABSTRACT

Protein primary structure is a potential critical quality attribute for biotherapeutics. Identifying and characterizing any sequence variants present is essential for product development. A sequence variant ~11 kDa larger than the expected IgG mass was observed by size-exclusion chromatography and two-dimensional liquid chromatography coupled with online mass spectrometry. Further characterization indicated that the 11 kDa was added to the heavy chain (HC) Fc domain. Despite the relatively large mass addition, only one unknown peptide was detected by peptide mapping. To decipher the sequence, the transcriptome of the manufacturing cell line was characterized by Illumina RNA-seq. Transcriptome reconstruction detected an aberrant fusion transcript, where the light chain (LC) constant domain sequence was fused to the 3' end of the HC transcript. Translation of this fusion transcript generated an extended peptide sequence at the HC C-terminus corresponding to the observed 11 kDa mass addition. Nanopore-based genome sequencing showed multiple copies of the plasmid had integrated in tandem with one copy missing the 5' end of the plasmid, deleting the LC variable domain. The fusion transcript was due to read-through of the HC terminator sequence into the adjacent partial LC gene and an unexpected splicing event between a cryptic splice-donor site at the 3' end of the HC and the splice acceptor site at the 5' end of the LC constant domain. Our study demonstrates that combining protein physicochemical characterization with genomic and transcriptomic analysis of the manufacturing cell line greatly improves the identification of sequence variants and understanding of the underlying molecular mechanisms.


Subject(s)
Antibodies, Monoclonal , Immunoglobulin G , Immunoglobulin Heavy Chains , Animals , Antibodies, Monoclonal/chemistry , Antibodies, Monoclonal/genetics , Antibodies, Monoclonal/immunology , CHO Cells , Chromatography, Liquid , Cricetulus , High-Throughput Nucleotide Sequencing , Immunoglobulin G/chemistry , Immunoglobulin G/genetics , Immunoglobulin G/immunology , Immunoglobulin Heavy Chains/chemistry , Immunoglobulin Heavy Chains/genetics , Immunoglobulin Heavy Chains/immunology , Mice , Protein Domains , Tandem Mass Spectrometry
3.
Nat Commun ; 9(1): 4128, 2018 10 08.
Article in English | MEDLINE | ID: mdl-30297836

ABSTRACT

Selecting the most appropriate protein sequences is critical for precision drug design. Here we describe Haplosaurus, a bioinformatic tool for computation of protein haplotypes. Haplosaurus computes protein haplotypes from pre-existing chromosomally-phased genomic variation data. Integration into the Ensembl resource provides rapid and detailed protein haplotypes retrieval. Using Haplosaurus, we build a database of unique protein haplotypes from the 1000 Genomes dataset reflecting real-world protein sequence variability and their prevalence. For one in seven genes, their most common protein haplotype differs from the reference sequence and a similar number differs on their most common haplotype between human populations. Three case studies show how knowledge of the range of commonly encountered protein forms predicted in populations leads to insights into therapeutic efficacy. Haplosaurus and its associated database is expected to find broad applications in many disciplines using protein sequences and particularly impactful for therapeutics design.


Subject(s)
Computational Biology/methods , Drug Design , Haplotypes , Precision Medicine/methods , Proteins/genetics , Computer-Aided Design , Genome, Human/genetics , Genomics/methods , Humans , Proteome/genetics , Reproducibility of Results , Software
4.
Protein Eng Des Sel ; 30(4): 303-311, 2017 04 01.
Article in English | MEDLINE | ID: mdl-28130326

ABSTRACT

High levels of protein expression are key to the successful development and manufacture of a therapeutic antibody. Here, we describe two related antibodies, Ab001 and Ab008, where Ab001 shows a markedly lower level of expression relative to Ab008 when stably expressed in Chinese hamster ovary cells. We use single-gene expression vectors and structural analysis to show that the reduced titer is associated with the VL CDR2 of Ab001. We adopted two approaches to improve the expression of Ab001. First, we used mutagenesis to change single amino-acid residues in the Ab001 VL back to the equivalent Ab008 residues but this resulted in limited improvements in expression. In contrast when we used an in silico structure-based design approach to generate a set of five individual single-point variants in a discrete region of the VL, all exhibited significantly improved expression relative to Ab001. The most successful of these, D53N, exhibited a 25-fold increase in stable transfectants relative to Ab001. The functional potency of these VL-modified antibodies was unaffected. We expect that this in silico engineering strategy can be used to improve the expression of other antibodies and proteins.


Subject(s)
Amino Acid Substitution , Interleukin-13/antagonists & inhibitors , Single-Chain Antibodies , Humans , Mutagenesis , Mutation, Missense , Single-Chain Antibodies/biosynthesis , Single-Chain Antibodies/chemistry , Single-Chain Antibodies/genetics
5.
Biotechnol Prog ; 30(1): 188-97, 2014.
Article in English | MEDLINE | ID: mdl-24311306

ABSTRACT

Despite the development of high-titer bioprocesses capable of producing >10 g L(-1) of recombinant monoclonal antibody (MAb), some so called "difficult-to-express" (DTE) MAbs only reach much lower process titers. For widely utilized "platform" processes the only discrete variable is the protein coding sequence of the recombinant product. However, there has been little systematic study to identify the sequence parameters that affect expression. This information is vital, as it would allow us to rationally design genetic sequence and engineering strategies for optimal bioprocessing. We have therefore developed a new computational tool that enables prediction of MAb titer in Chinese hamster ovary (CHO) cells based on the recombinant coding sequence of the expressed MAb. Model construction utilized a panel of MAbs, which following a 10-day fed-batch transient production process varied in titer 5.6-fold, allowing analysis of the sequence features that impact expression over a range of high and low MAb productivity. The model identified 18 light chain (LC)-specific sequence features within complementarity determining region 3 (CDR3) capable of predicting MAb titer with a root mean square error of 0.585 relative expression units. Furthermore, we identify that CDR3 variation influences the rate of LC-HC dimerization during MAb synthesis, which could be exploited to improve the production of DTE MAb variants via increasing the transfected LC:HC gene ratio. Taken together these data suggest that engineering intervention strategies to improve the expression of DTE recombinant products can be rationally implemented based on an identification of the sequence motifs that render a recombinant product DTE.


Subject(s)
Antibodies, Monoclonal/chemistry , Biotechnology/methods , Complementarity Determining Regions/genetics , Computational Biology/methods , Recombinant Proteins/chemistry , Amino Acid Sequence , Animals , Antibodies, Monoclonal/genetics , Antibodies, Monoclonal/metabolism , CHO Cells , Cricetinae , Cricetulus , Hydrophobic and Hydrophilic Interactions , Immunoglobulin Light Chains/chemistry , Immunoglobulin Light Chains/genetics , Immunoglobulin Light Chains/metabolism , RNA, Messenger/genetics , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Sequence Analysis, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...