|

1.

Biophysics-based protein language models for protein engineering.

Gelman, Sam; Johnson, Bryce; Freschlin, Chase; D'Costa, Sameer; Gitter, Anthony; Romero, Philip A.

bioRxiv ; 2024 Mar 17.

Article En | MEDLINE | ID: mdl-38559182

Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.

2.

Evaluating Scalable Supervised Learning for Synthesize-on-Demand Chemical Libraries.

Alnammi, Moayad; Liu, Shengchao; Ericksen, Spencer S; Ananiev, Gene E; Voter, Andrew F; Guo, Song; Keck, James L; Hoffmann, F Michael; Wildman, Scott A; Gitter, Anthony.

J Chem Inf Model ; 63(17): 5513-5528, 2023 09 11.

Article En | MEDLINE | ID: mdl-37625010

Traditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries. However, there has been limited experimental validation of these methods in practical applications on large commercially available or synthesize-on-demand chemical libraries. Through a prospective evaluation with the bacterial protein-protein interaction PriA-SSB, we demonstrate that ligand-based virtual screening can identify many active compounds in large commercial libraries. We use cross-validation to compare different types of supervised learning models and select a random forest (RF) classifier as the best model for this target. When predicting the activity of more than 8 million compounds from Aldrich Market Select, the RF substantially outperforms a naïve baseline based on chemical structure similarity. 48% of the RF's 701 selected compounds are active. The RF model easily scales to score one billion compounds from the synthesize-on-demand Enamine REAL database. We tested 68 chemically diverse top predictions from Enamine REAL and observed 31 hits (46%), including one with an IC50 value of 1.3 µM.

High-Throughput Screening Assays , Small Molecule Libraries , Databases, Factual , Drug Discovery , Supervised Machine Learning

3.

HIV-1 virological synapse formation enhances infection spread by dysregulating Aurora Kinase B.

Bruce, James W; Park, Eunju; Magnano, Chris; Horswill, Mark; Richards, Alicia; Potts, Gregory; Hebert, Alexander; Islam, Nafisah; Coon, Joshua J; Gitter, Anthony; Sherer, Nathan; Ahlquist, Paul.

PLoS Pathog ; 19(7): e1011492, 2023 07.

Article En | MEDLINE | ID: mdl-37459363

HIV-1 spreads efficiently through direct cell-to-cell transmission at virological synapses (VSs) formed by interactions between HIV-1 envelope proteins (Env) on the surface of infected cells and CD4 receptors on uninfected target cells. Env-CD4 interactions bring the infected and uninfected cellular membranes into close proximity and induce transport of viral and cellular factors to the VS for efficient virion assembly and HIV-1 transmission. Using novel, cell-specific stable isotope labeling and quantitative mass spectrometric proteomics, we identified extensive changes in the levels and phosphorylation states of proteins in HIV-1 infected producer cells upon mixing with CD4+ target cells under conditions inducing VS formation. These coculture-induced alterations involved multiple cellular pathways including transcription, TCR signaling and, unexpectedly, cell cycle regulation, and were dominated by Env-dependent responses. We confirmed the proteomic results using inhibitors targeting regulatory kinases and phosphatases in selected pathways identified by our proteomic analysis. Strikingly, inhibiting the key mitotic regulator Aurora kinase B (AURKB) in HIV-1 infected cells significantly increased HIV activity in cell-to-cell fusion and transmission but had little effect on cell-free infection. Consistent with this, we found that AURKB regulates the fusogenic activity of HIV-1 Env. In the Jurkat T cell line and primary T cells, HIV-1 Env:CD4 interaction also dramatically induced cell cycle-independent AURKB relocalization to the centromere, and this signaling required the long (150 aa) cytoplasmic C-terminal domain (CTD) of Env. These results imply that cytoplasmic/plasma membrane AURKB restricts HIV-1 envelope fusion, and that this restriction is overcome by Env CTD-induced AURKB relocalization. Taken together, our data reveal a new signaling pathway regulating HIV-1 cell-to-cell transmission and potential new avenues for therapeutic intervention through targeting the Env CTD and AURKB activity.

HIV Infections , HIV-1 , Humans , HIV-1/physiology , Aurora Kinase B/metabolism , Proteomics , CD4-Positive T-Lymphocytes/metabolism , CD4 Antigens/metabolism , HIV Infections/metabolism

4.

The Coming of Age of Nucleic Acid Vaccines during COVID-19.

Rando, Halie M; Lordan, Ronan; Kolla, Likhitha; Sell, Elizabeth; Lee, Alexandra J; Wellhausen, Nils; Naik, Amruta; Kamil, Jeremy P; Gitter, Anthony; Greene, Casey S.

mSystems ; 8(2): e0092822, 2023 04 27.

Article En | MEDLINE | ID: mdl-36861992

In the 21st century, several emergent viruses have posed a global threat. Each pathogen has emphasized the value of rapid and scalable vaccine development programs. The ongoing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has made the importance of such efforts especially clear. New biotechnological advances in vaccinology allow for recent advances that provide only the nucleic acid building blocks of an antigen, eliminating many safety concerns. During the COVID-19 pandemic, these DNA and RNA vaccines have facilitated the development and deployment of vaccines at an unprecedented pace. This success was attributable at least in part to broader shifts in scientific research relative to prior epidemics: the genome of SARS-CoV-2 was available as early as January 2020, facilitating global efforts in the development of DNA and RNA vaccines within 2 weeks of the international community becoming aware of the new viral threat. Additionally, these technologies that were previously only theoretical are not only safe but also highly efficacious. Although historically a slow process, the rapid development of vaccines during the COVID-19 crisis reveals a major shift in vaccine technologies. Here, we provide historical context for the emergence of these paradigm-shifting vaccines. We describe several DNA and RNA vaccines in terms of their efficacy, safety, and approval status. We also discuss patterns in worldwide distribution. The advances made since early 2020 provide an exceptional illustration of how rapidly vaccine development technology has advanced in the last 2 decades in particular and suggest a new era in vaccines against emerging pathogens. IMPORTANCE The SARS-CoV-2 pandemic has caused untold damage globally, presenting unusual demands on but also unique opportunities for vaccine development. The development, production, and distribution of vaccines are imperative to saving lives, preventing severe illness, and reducing the economic and social burdens caused by the COVID-19 pandemic. Although vaccine technologies that provide the DNA or RNA sequence of an antigen had never previously been approved for use in humans, they have played a major role in the management of SARS-CoV-2. In this review, we discuss the history of these vaccines and how they have been applied to SARS-CoV-2. Additionally, given that the evolution of new SARS-CoV-2 variants continues to present a significant challenge in 2022, these vaccines remain an important and evolving tool in the biomedical response to the pandemic.

COVID-19 , Viral Vaccines , Humans , COVID-19/epidemiology , SARS-CoV-2/genetics , COVID-19 Vaccines , Nucleic Acid-Based Vaccines , Pandemics/prevention & control , mRNA Vaccines

5.

Application of Traditional Vaccine Development Strategies to SARS-CoV-2.

Rando, Halie M; Lordan, Ronan; Lee, Alexandra J; Naik, Amruta; Wellhausen, Nils; Sell, Elizabeth; Kolla, Likhitha; Gitter, Anthony; Greene, Casey S.

mSystems ; 8(2): e0092722, 2023 04 27.

Article En | MEDLINE | ID: mdl-36861991

Over the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A variety of approaches have been used to develop COVID-19 vaccines that are now authorized for use in countries around the world. In this review, we highlight strategies that focus on the viral capsid and outwards, rather than on the nucleic acids inside. These approaches fall into two broad categories: whole-virus vaccines and subunit vaccines. Whole-virus vaccines use the virus itself, in either an inactivated or an attenuated state. Subunit vaccines contain instead an isolated, immunogenic component of the virus. Here, we highlight vaccine candidates that apply these approaches against SARS-CoV-2 in different ways. In a companion article (H. M. Rando, R. Lordan, L. Kolla, E. Sell, et al., mSystems 8:e00928-22, 2023, https://doi.org/10.1128/mSystems.00928-22), we review the more recent and novel development of nucleic acid-based vaccine technologies. We further consider the role that these COVID-19 vaccine development programs have played in prophylaxis at the global scale. Well-established vaccine technologies have proved especially important to making vaccines accessible in low- and middle-income countries. Vaccine development programs that use established platforms have been undertaken in a much wider range of countries than those using nucleic acid-based technologies, which have been led by wealthy Western countries. Therefore, these vaccine platforms, though less novel from a biotechnological standpoint, have proven to be extremely important to the management of SARS-CoV-2. IMPORTANCE The development, production, and distribution of vaccines is imperative to saving lives, preventing illness, and reducing the economic and social burdens caused by the COVID-19 pandemic. Vaccines that use cutting-edge biotechnology have played an important role in mitigating the effects of SARS-CoV-2. However, more traditional methods of vaccine development that were refined throughout the 20th century have been especially critical to increasing vaccine access worldwide. Effective deployment is necessary to reducing the susceptibility of the world's population, which is especially important in light of emerging variants. In this review, we discuss the safety, immunogenicity, and distribution of vaccines developed using established technologies. In a separate review, we describe the vaccines developed using nucleic acid-based vaccine platforms. From the current literature, it is clear that the well-established vaccine technologies are also highly effective against SARS-CoV-2 and are being used to address the challenges of COVID-19 globally, including in low- and middle-income countries. This worldwide approach is critical for reducing the devastating impact of SARS-CoV-2.

COVID-19 , Viral Vaccines , Humans , SARS-CoV-2 , COVID-19/prevention & control , COVID-19 Vaccines , Pandemics/prevention & control , Vaccine Development , Vaccines, Subunit , Nucleic Acid-Based Vaccines

6.

Bayes optimal informer sets for early-stage drug discovery.

Yu, Peng; Ericksen, Spencer; Gitter, Anthony; Newton, Michael A.

Biometrics ; 79(2): 642-654, 2023 06.

Article En | MEDLINE | ID: mdl-35165892

An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer-based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anticancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance.

Drug Discovery , Proteins , Bayes Theorem , Retrospective Studies , Drug Discovery/methods

7.

Graph algorithms for predicting subcellular localization at the pathway level

Magnano, Chris S; Gitter, Anthony.

Pac Symp Biocomput ; 28: 145-156, 2023.

Article En | MEDLINE | ID: mdl-36540972

Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data. We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task. We compare a variety of models including graph neural networks, probabilistic graphical models, and discriminative classifiers for predicting localization annotations from curated pathway databases. We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection. Pathway localization prediction is a promising approach for integrating publicly available localization data into the analysis of large-scale biological data.

Algorithms , Computational Biology , Humans , Databases, Protein

8.

Application of Traditional Vaccine Development Strategies to SARS-CoV-2.

Rando, Halie M; Lordan, Ronan; Lee, Alexandra J; Naik, Amruta; Wellhausen, Nils; Sell, Elizabeth; Kolla, Likhitha; Gitter, Anthony; Greene, Casey S.

ArXiv ; 2023 Jan 23.

Article En | MEDLINE | ID: mdl-36034485

Over the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against the SARS-CoV-2 virus. A variety of approaches have been used to develop COVID-19 vaccines that are now authorized for use in countries around the world. In this review, we highlight strategies that focus on the viral capsid and outwards, rather than on the nucleic acids inside. These approaches fall into two broad categories: whole-virus vaccines and subunit vaccines. Whole-virus vaccines use the virus itself, either in an inactivated or attenuated state. Subunit vaccines contain instead an isolated, immunogenic component of the virus. Here, we highlight vaccine candidates that apply these approaches against SARS-CoV-2 in different ways. In a companion manuscript, we review the more recent and novel development of nucleic-acid based vaccine technologies. We further consider the role that these COVID-19 vaccine development programs have played in prophylaxis at the global scale. Well-established vaccine technologies have proved especially important to making vaccines accessible in low- and middle-income countries. Vaccine development programs that use established platforms have been undertaken in a much wider range of countries than those using nucleic-acid-based technologies, which have been led by wealthy Western countries. Therefore, these vaccine platforms, though less novel from a biotechnological standpoint, have proven to be extremely important to the management of SARS-CoV-2.

9.

Alternative splicing liberates a cryptic cytoplasmic isoform of mitochondrial MECR that antagonizes influenza virus.

Baker, Steven F; Meistermann, Helene; Tzouros, Manuel; Baker, Aaron; Golling, Sabrina; Polster, Juliane Siebourg; Ledwith, Mitchell P; Gitter, Anthony; Augustin, Angelique; Javanbakht, Hassan; Mehle, Andrew.

PLoS Biol ; 20(12): e3001934, 2022 12.

Article En | MEDLINE | ID: mdl-36542656

Viruses must balance their reliance on host cell machinery for replication while avoiding host defense. Influenza A viruses are zoonotic agents that frequently switch hosts, causing localized outbreaks with the potential for larger pandemics. The host range of influenza virus is limited by the need for successful interactions between the virus and cellular partners. Here we used immunocompetitive capture-mass spectrometry to identify cellular proteins that interact with human- and avian-style viral polymerases. We focused on the proviral activity of heterogenous nuclear ribonuclear protein U-like 1 (hnRNP UL1) and the antiviral activity of mitochondrial enoyl CoA-reductase (MECR). MECR is localized to mitochondria where it functions in mitochondrial fatty acid synthesis (mtFAS). While a small fraction of the polymerase subunit PB2 localizes to the mitochondria, PB2 did not interact with full-length MECR. By contrast, a minor splice variant produces cytoplasmic MECR (cMECR). Ectopic expression of cMECR shows that it binds the viral polymerase and suppresses viral replication by blocking assembly of viral ribonucleoprotein complexes (RNPs). MECR ablation through genome editing or drug treatment is detrimental for cell health, creating a generic block to virus replication. Using the yeast homolog Etr1 to supply the metabolic functions of MECR in MECR-null cells, we showed that specific antiviral activity is independent of mtFAS and is reconstituted by expressing cMECR. Thus, we propose a strategy where alternative splicing produces a cryptic antiviral protein that is embedded within a key metabolic enzyme.

Fatty Acid Desaturases , Influenza A virus , Humans , Fatty Acid Desaturases/metabolism , Alternative Splicing/genetics , Mitochondria/metabolism , Influenza A virus/genetics , Protein Isoforms/metabolism , Virus Replication

10.

An approachable, flexible and practical machine learning workshop for biologists.

Magnano, Chris S; Mu, Fangzhou; Russ, Rosemary S; Cvetkovic, Milica; Treu, Debora; Gitter, Anthony.

Bioinformatics ; 38(Suppl 1): i10-i18, 2022 06 24.

Article En | MEDLINE | ID: mdl-35758797

SUMMARY: The increasing prevalence and importance of machine learning in biological research have created a need for machine learning training resources tailored towards biological researchers. However, existing resources are often inaccessible, infeasible or inappropriate for biologists because they require significant computational and mathematical knowledge, demand an unrealistic time-investment or teach skills primarily for computational researchers. We created the Machine Learning for Biologists (ML4Bio) workshop, a short, intensive workshop that empowers biological researchers to comprehend machine learning applications and pursue machine learning collaborations in their own research. The ML4Bio workshop focuses on classification and was designed around three principles: (i) emphasizing preparedness over fluency or expertise, (ii) necessitating minimal coding and mathematical background and (iii) requiring low time investment. It incorporates active learning methods and custom open-source software that allows participants to explore machine learning workflows. After multiple sessions to improve workshop design, we performed a study on three workshop sessions. Despite some confusion around identifying subtle methodological flaws in machine learning workflows, participants generally reported that the workshop met their goals, provided them with valuable skills and knowledge and greatly increased their beliefs that they could engage in research that uses machine learning. ML4Bio is an educational tool for biological researchers, and its creation and evaluation provide valuable insight into tailoring educational resources for active researchers in different domains. AVAILABILITY AND IMPLEMENTATION: Workshop materials are available at https://github.com/carpentries-incubator/ml4bio-workshop and the ml4bio software is available at https://github.com/gitter-lab/ml4bio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Machine Learning , Software , Humans , Workflow

11.

Molecular and Serologic Diagnostic Technologies for SARS-CoV-2.

Rando, Halie M; Brueffer, Christian; Lordan, Ronan; Dattoli, Anna Ada; Manheim, David; Meyer, Jesse G; Mundo, Ariel I; Perrin, Dimitri; Mai, David; Wellhausen, Nils; Gitter, Anthony; Greene, Casey S.

ArXiv ; 2022 Apr 26.

Article En | MEDLINE | ID: mdl-35547240

The COVID-19 pandemic has presented many challenges that have spurred biotechnological research to address specific problems. Diagnostics is one area where biotechnology has been critical. Diagnostic tests play a vital role in managing a viral threat by facilitating the detection of infected and/or recovered individuals. From the perspective of what information is provided, these tests fall into two major categories, molecular and serological. Molecular diagnostic techniques assay whether a virus is present in a biological sample, thus making it possible to identify individuals who are currently infected. Additionally, when the immune system is exposed to a virus, it responds by producing antibodies specific to the virus. Serological tests make it possible to identify individuals who have mounted an immune response to a virus of interest and therefore facilitate the identification of individuals who have previously encountered the virus. These two categories of tests provide different perspectives valuable to understanding the spread of SARS-CoV-2. Within these categories, different biotechnological approaches offer specific advantages and disadvantages. Here we review the categories of tests developed for the detection of the SARS-CoV-2 virus or antibodies against SARS-CoV-2 and discuss the role of diagnostics in the COVID-19 pandemic.

12.

Ten quick tips for deep learning in biology.

Lee, Benjamin D; Gitter, Anthony; Greene, Casey S; Raschka, Sebastian; Maguire, Finlay; Titus, Alexander J; Kessler, Michael D; Lee, Alexandra J; Chevrette, Marc G; Stewart, Paul Allen; Britto-Borges, Thiago; Cofer, Evan M; Yu, Kun-Hsing; Carmona, Juan Jose; Fertig, Elana J; Kalinin, Alexandr A; Signal, Brandon; Lengerich, Benjamin J; Triche, Timothy J; Boca, Simina M.

PLoS Comput Biol ; 18(3): e1009803, 2022 03.

Article En | MEDLINE | ID: mdl-35324884

Deep Learning , Computational Biology

13.

Network inference with Granger causality ensembles on single-cell transcriptomics.

Deshpande, Atul; Chu, Li-Fang; Stewart, Ron; Gitter, Anthony.

Cell Rep ; 38(6): 110333, 2022 02 08.

Article En | MEDLINE | ID: mdl-35139376

Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.

Cell Differentiation/physiology , Gene Expression Profiling , Gene Regulatory Networks/physiology , Transcriptome/physiology , Algorithms , Animals , Computational Biology/methods , Gene Expression Profiling/methods , Mice , Software

14.

Correction for Rando et al., "Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure".

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; D'Agostino McGowan, Lucy; Ben Guebila, Marouen; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Mai, David; Sun, Yuchen; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

mSystems ; 7(1): e0144721, 2022 Feb 22.

Article En | MEDLINE | ID: mdl-35076276

15.

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Gelman, Sam; Fahlberg, Sarah A; Heinzelman, Pete; Romero, Philip A; Gitter, Anthony.

Proc Natl Acad Sci U S A ; 118(48)2021 11 30.

Article En | MEDLINE | ID: mdl-34815338

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Amino Acid Sequence/genetics , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence/physiology , Biochemical Phenomena , Deep Learning , Machine Learning , Mutation , Neural Networks, Computer , Proteins/metabolism , Structure-Activity Relationship

16.

Identification and Development of Therapeutics for COVID-19.

Rando, Halie M; Wellhausen, Nils; Ghosh, Soumita; Lee, Alexandra J; Dattoli, Anna Ada; Hu, Fengling; Byrd, James Brian; Rafizadeh, Diane N; Lordan, Ronan; Qi, Yanjun; Sun, Yuchen; Brueffer, Christian; Field, Jeffrey M; Ben Guebila, Marouen; Jadavji, Nafisa M; Skelly, Ashwin N; Ramsundar, Bharath; Wang, Jinhui; Goel, Rishi Raj; Park, YoSon; Boca, Simina M; Gitter, Anthony; Greene, Casey S.

mSystems ; 6(6): e0023321, 2021 Dec 21.

Article En | MEDLINE | ID: mdl-34726496

After emerging in China in late 2019, the novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread worldwide, and as of mid-2021, it remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a species closely related to SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis and identified many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification (ID) of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease. IMPORTANCE The COVID-19 pandemic is a rapidly evolving crisis. With the worldwide scientific community shifting focus onto the SARS-CoV-2 virus and COVID-19, a large number of possible pharmaceutical approaches for treatment and prevention have been proposed. What was known about each of these potential interventions evolved rapidly throughout 2020 and 2021. This fast-paced area of research provides important insight into how the ongoing pandemic can be managed and also demonstrates the power of interdisciplinary collaboration to rapidly understand a virus and match its characteristics with existing or novel pharmaceuticals. As illustrated by the continued threat of viral epidemics during the current millennium, a rapid and strategic response to emerging viral threats can save lives. In this review, we explore how different modes of identifying candidate therapeutics have borne out during COVID-19.

17.

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure.

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; D'Agostino McGowan, Lucy; Ben Guebila, Marouen; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Mai, David; Sun, Yuchen; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

mSystems ; 6(5): e0009521, 2021 10 26.

Article En | MEDLINE | ID: mdl-34698547

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease. IMPORTANCE COVID-19 involves a number of organ systems and can present with a wide range of symptoms. From how the virus infects cells to how it spreads between people, the available research suggests that these patterns are very similar to those seen in the closely related viruses SARS-CoV-1 and possibly Middle East respiratory syndrome-related CoV (MERS-CoV). Understanding the pathogenesis of the SARS-CoV-2 virus also contextualizes how the different biological systems affected by COVID-19 connect. Exploring the structure, phylogeny, and pathogenesis of the virus therefore helps to guide interpretation of the broader impacts of the virus on the human body and on human populations. For this reason, an in-depth exploration of viral mechanisms is critical to a robust understanding of SARS-CoV-2 and, potentially, future emergent human CoVs (HCoVs).

18.

An Open-Publishing Response to the COVID-19 Infodemic.

Rando, Halie M; Boca, Simina M; McGowan, Lucy D'Agostino; Himmelstein, Daniel S; Robson, Michael P; Rubinetti, Vincent; Velazquez, Ryan; Greene, Casey S; Gitter, Anthony.

ArXiv ; 2021 Sep 17.

Article En | MEDLINE | ID: mdl-34545336

The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.

19.

Identification and Development of Therapeutics for COVID-19.

Rando, Halie M; Wellhausen, Nils; Ghosh, Soumita; Lee, Alexandra J; Dattoli, Anna Ada; Hu, Fengling; Byrd, James Brian; Rafizadeh, Diane N; Lordan, Ronan; Qi, Yanjun; Sun, Yuchen; Brueffer, Christian; Field, Jeffrey M; Guebila, Marouen Ben; Jadavji, Nafisa M; Skelly, Ashwin N; Ramsundar, Bharath; Wang, Jinhui; Goel, Rishi Raj; Park, YoSon; Boca, Simina M; Gitter, Anthony; Greene, Casey S.

ArXiv ; 2021 Mar 03.

Article En | MEDLINE | ID: mdl-33688554

After emerging in China in late 2019, the novel coronavirus SARS-CoV-2 spread worldwide and as of mid-2021 remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a closely related species of SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis to identify many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases, but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease.

20.

Automating parameter selection to avoid implausible biological pathway models.

Magnano, Chris S; Gitter, Anthony.

NPJ Syst Biol Appl ; 7(1): 12, 2021 02 23.

Article En | MEDLINE | ID: mdl-33623016

A common way to integrate and analyze large amounts of biological "omic" data is through pathway reconstruction: using condition-specific omic data to create a subnetwork of a generic background network that represents some process or cellular state. A challenge in pathway reconstruction is that adjusting pathway reconstruction algorithms' parameters produces pathways with drastically different topological properties and biological interpretations. Due to the exploratory nature of pathway reconstruction, there is no ground truth for direct evaluation, so parameter tuning methods typically used in statistics and machine learning are inapplicable. We developed the pathway parameter advising algorithm to tune pathway reconstruction algorithms to minimize biologically implausible predictions. We leverage background knowledge in pathway databases to select pathways whose high-level structure resembles that of manually curated biological pathways. At the core of this method is a graphlet decomposition metric, which measures topological similarity to curated biological pathways. In order to evaluate pathway parameter advising, we compare its performance in avoiding implausible networks and reconstructing pathways from the NetPath database with other parameter selection methods across four pathway reconstruction algorithms. We also demonstrate how pathway parameter advising can guide reconstruction of an influenza host factor network. Pathway parameter advising is method agnostic; it is applicable to any pathway reconstruction algorithm with tunable parameters.

Biosynthetic Pathways/physiology , Computational Biology/methods , Systems Biology/methods , Algorithms , Animals , Biosynthetic Pathways/genetics , Data Analysis , Databases, Factual , Gene Regulatory Networks/genetics , Humans , Models, Biological , Models, Statistical