RESUMO
The synthesis of thousands of candidate compounds in drug discovery and development offers opportunities for computer-aided synthesis planning to simplify the synthesis of molecule libraries by leveraging common starting materials and reaction conditions. We develop an optimization-based method to analyze large organic chemical reaction networks and design overlapping synthesis plans for entire molecule libraries so as to minimize the overall number of unique chemical compounds needed as either starting materials or reaction conditions. We consider multiple objectives, including the number of starting materials, the number of catalysts/solvents/reagents, and the likelihood of success of the overall syntheses plan, to select an optimal reaction network to access the target molecules. The library synthesis planning task was formulated as a network flow optimization problem, and we design an efficient decomposition scheme that reduces solution time by a factor of 5 and scales to instance with 48 target molecules and nearly 8000 intermediate reactions within hours. In four case studies of pharmaceutical compounds, the approach reduces the number of starting materials and catalysts/solvents/reagents needed by 32.2 and 66.0% on average and up to 63.2 and 80.0% in the best cases. The code implementation can be found at https://github.com/Coughy1991/Molecule_library_synthesis.
Assuntos
Computadores , Descoberta de Drogas , Estudos de ViabilidadeRESUMO
The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a comprehensive data-driven approach to understand the clinical characteristics of COVID-19, predict its mortality, forecast its evolution, and ultimately alleviate its impact. By leveraging cohort-level clinical data, patient-level hospital data, and census-level epidemiological data, we develop an integrated four-step approach, combining descriptive, predictive and prescriptive analytics. First, we aggregate hundreds of clinical studies into the most comprehensive database on COVID-19 to paint a new macroscopic picture of the disease. Second, we build personalized calculators to predict the risk of infection and mortality as a function of demographics, symptoms, comorbidities, and lab values. Third, we develop a novel epidemiological model to project the pandemic's spread and inform social distancing policies. Fourth, we propose an optimization model to re-allocate ventilators and alleviate shortages. Our results have been used at the clinical level by several hospitals to triage patients, guide care management, plan ICU capacity, and re-distribute ventilators. At the policy level, they are currently supporting safe back-to-work policies at a major institution and vaccine trial location planning at Janssen Pharmaceuticals, and have been integrated into the US Center for Disease Control's pandemic forecast.