RESUMO
We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2, paving the path to realizing this on large biological data.
RESUMO
Diabetes mellitus is a disease characterized by a range of metabolic complications involving an individual's blood glucose levels, and its main regulator, insulin. These complications can vary largely from person to person depending on their current biophysical state. Biomedical research day-by-day makes strides to impact the lives of patients of a variety of diseases, including diabetes. One large stride that is being made is the generation of techniques to assist physicians to ``personalize medicine''. From available physiological data, biological understanding of the system, and dimensional analysis, a differential equation-based mathematical model was built in a sequential matter, to be able to elucidate clearly how each parameter correlates to the patient's current physiological state. We developed a simple mathematical model that accurately simulates the dynamics between glucose, insulin, and pancreatic $\beta$-cells throughout disease progression with constraints to maintain biological relevance. The current framework is clearly capable of tracking the patient's current progress through the disease, dependent on factors such as latent insulin resistance or an attrite $\beta$-cell population. Further interests would be to develop tools that allow the direct and feasible testing of how effective a given plan of treatment would be at returning the patient to a desirable biophysical state.