ABSTRACT
Since late 2019, COVID-19 has significantly impacted the world. Understanding the evolution of SARS-CoV-2 is crucial for protecting against future infectious pathogens. In this study, we conducted a comprehensive chronological analysis of SARS-CoV-2 evolution by examining mutation prevalence from the source countries of VOCs: United Kingdom, India, Brazil, South Africa, plus two countries: United States, Russia, utilizing genomic sequences from GISAID. Our methodological approach involved large-scale genomic sequence alignment using MAFFT, Python-based data processing on a high-performance computing platform, and advanced statistical methods the Maximal Information Coefficient (MIC), and also Long Short-Term Memory (LSTM) models for correlation analysis. Our findings elucidate the dynamics of SARS-CoV-2 evolution, highlighting the virus's changing behaviour over various pandemic stages. Key results include the discovery of three temporal mutation patterns-lineage distinct, long-span, and competitive mutations-with varying levels of impact on the virus. Notably, we observed a convergence of advantageous mutations in the spike protein, especially in the later stages of the pandemic, indicating a substantial evolutionary pressure on the virus. One of the most significant revelations is the predominant role of natural immunity over vaccination-induced immunity in driving these evolutionary changes. This emphasizes the critical need for regular vaccine updates to maintain efficacy against evolving strains. In conclusion, our study not only sheds light on the evolutionary trajectory of SARS-CoV-2 but also underscores the urgency for robust, continuous global data collection and sharing. It highlights the necessity for rapid adaptations in medical countermeasures, including vaccine development, to stay ahead of pathogen evolution. This research provides valuable insights for future pandemic preparedness and response strategies.