RESUMO
BACKGROUND: In protein sequences-as there are 61 sense codons but only 20 standard amino acids-most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting protein. Codon optimization of synthetic DNA sequences is important for heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 7,000 non-redundant, high-expression, robust genes which are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential context of codon usage in genes to be learned. Our tool can predict synonymous codons for synthetic genes toward optimal expression in Escherichia coli. RESULTS: We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome. Based on computational metrics that predict protein expression, ICOR theoretically optimizes protein expression more than frequency-based approaches. ICOR is evaluated on 1,481 Escherichia coli genes as well as a benchmark set of 40 select DNA sequences whose heterologous expression has been previously characterized. ICOR's performance is measured across five metrics: the Codon Adaptation Index, GC-content, negative repeat elements, negative cis-regulatory elements, and codon frequency distribution. CONCLUSIONS: The results, based on in silico metrics, indicate that ICOR codon optimization is theoretically more effective in enhancing recombinant expression of proteins over other established codon optimization techniques. Our tool is provided as an open-source software package that includes the benchmark set of sequences used in this study.
Assuntos
Aminoácidos , Genômica , Códon/genética , Aminoácidos/genética , Escherichia coli/genéticaRESUMO
In 2019, the first cases of SARS-CoV-2 were detected in Wuhan, China, and by early 2020 the first cases were identified in the United States. SARS-CoV-2 infections increased in the US causing many states to implement stay-at-home orders and additional safety precautions to mitigate potential outbreaks. As policies changed throughout the pandemic and restrictions lifted, there was an increase in demand for COVID-19 testing which was costly, difficult to obtain, or had long turn-around times. Some academic institutions, including Boston University (BU), created an on-campus COVID-19 screening protocol as part of a plan for the safe return of students, faculty, and staff to campus with the option for in-person classes. At BU, we put together an automated high-throughput clinical testing laboratory with the capacity to run 45,000 individual tests weekly by Fall of 2020, with a purpose-built clinical testing laboratory, a multiplexed reverse transcription PCR (RT-qPCR) test, robotic instrumentation, and trained staff. There were many challenges including supply chain issues for personal protective equipment and testing materials in addition to equipment that were in high demand. The BU Clinical Testing Laboratory (CTL) was operational at the start of Fall 2020 and performed over 1 million SARS-CoV-2 PCR tests during the 2020-2021 academic year.