RESUMO
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCE When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
Assuntos
COVID-19 , Orthomyxoviridae , Vírus , Humanos , SARS-CoV-2/genética , Mutação , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller, and use of replicate sequencing have the most significant impact on single nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false negative rates. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intrahost viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intrahost variation, viral diversity, and viral evolution. IMPORTANCE: When viruses replicate inside a host, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus, nor strongly beneficial, can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in inclusion of false positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant calling tools. We used simulated and synthetic data to test their performance against a true set of variants, and then used these studies to inform variant identification in data from clinical SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
RESUMO
The amplification of a fragment from hsp65 gene by polymerase chain reaction (PCR) followed by restriction fragment length polymorphism (RFLP) analysis with BstEll and Haelll restriction enzymes has demonstrated to be very useful for identification of Non-Tuberculous Mycobacteria (NTM). The biochemical tests as well as the PCR-RFLP were carried out in 13 reference strains and 46 strains received in the laboratory. The results by biochemical tests were available in 4-6 weeks whereas the PCR-RFLP only required 48 hours. In both methods, Mycobacterium intracellulare, M. kansasii and M. fortuitum were the most frequently detected species. The PCR-RFLP method is fast, cheap and simple. Its application in Reference Laboratories could be very useful for diagnosis of NTM.
Assuntos
Infecções por Mycobacterium não Tuberculosas/diagnóstico , Micobactérias não Tuberculosas/classificação , Humanos , Infecções por Mycobacterium não Tuberculosas/microbiologia , Micobactérias não Tuberculosas/genética , Micobactérias não Tuberculosas/isolamento & purificação , Reação em Cadeia da Polimerase/métodos , Polimorfismo de Fragmento de RestriçãoRESUMO
La amplificación por reacción de la polimersa en cadena (RPC) de un fragmento del gen hsp65, seguido del análisis del polimorfismo de la longitud de los fragmentos de restricción (PLFR) por las enzimas BstEll y Haelll, ha demostrado ser muy útil en la identificación de micobacterias no tuberculosas (MNT). En el presente trabajo se les realizó una batería de pruebas bioquímicas así como la RPC-PLFR a un total de 13 cepas de referencia y 46 cepas recibidas en el laboratorio. Los resultados de las pruebas bioquímicas estuvieron disponibles entre 4 a 6 semanas, a diferencia de la RPC-PLFR que requirieron de sólo 48 horas. En ambos métodos, las especies detectadas con mayor frecuencia fueron Mycobacetrium intracellulare, M. kansasii y M. fortuitum. La RPC-PLFR es un método rápido, sencillo y eficaz. Su aplicación en los Laboratorios de Referencia pudiera ser de gran utilidad para el diagnóstico de MNT.
The amplification of a fragment from hsp65 gene by polymerase chain reaction (PCR) followed by restriction fragment length polymorphism (RFLP) analysis with BstEll and Haelll restriction enzymes has demonstrated to be very useful for identification of Non-Tuberculous Mycobacteria (NTM). The biochemical tests as well as the PCR-RFLP were carried out in 13 reference strains and 46 strains received in the laboratory. The results by biochemical tests were available in 4-6 weeks whereas the PCR-RFLP only required 48 hours. In both methods, Mycobacterium intracellulare, M. kansasii and M. fortuitum were the most frequently detected species. The PCR-RFLP method is fast, cheap and simple. Its application in Reference Laboratories could be very useful for diagnosis of NTM.