Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
2.
Bioinformatics ; 40(3)2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38426352

ABSTRACT

MOTIVATION: Intra-host variants refer to genetic variations or mutations that occur within an individual host organism. These variants are typically studied in the context of viruses, bacteria, or other pathogens to understand the evolution of pathogens. Moreover, intra-host variants are also explored in the field of tumor biology and mitochondrial biology to characterize somatic mutations and inherited heteroplasmic mutations. Intra-host variants can involve long insertions, deletions, and combinations of different mutation types, which poses challenges in their identification. The performance of current methods in detecting of complex intra-host variants is unknown. RESULTS: First, we simulated a dataset comprising 10 samples with 1869 intra-host variants involving various mutation patterns and benchmarked current variant detection software. The results indicated that though current software can detect most variants with F1-scores between 0.76 and 0.97, their performance in detecting long indels and low frequency variants was limited. Thus, we developed a new software, PySNV, for the detection of complex intra-host variations. On the simulated dataset, PySNV successfully detected 1863 variant cases (F1-score: 0.99) and exhibited the highest Pearson correlation coefficient (PCC: 0.99) to the ground truth in predicting variant frequencies. The results demonstrated that PySNV delivered promising performance even for long indels and low frequency variants, while maintaining computational speed comparable to other methods. Finally, we tested its performance on SARS-CoV-2 replicate sequencing data and found that it reported 21% more variants compared to LoFreq, the best-performing benchmarked software, while showing higher consistency (62% over 54%) within replicates. The discrepancies mostly exist in low-depth regions and low frequency variants. AVAILABILITY AND IMPLEMENTATION: https://github.com/bnuLyndon/PySNV/.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Mutation , INDEL Mutation , Genetic Variation
3.
Cell Host Microbe ; 32(1): 25-34.e5, 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38029742

ABSTRACT

Emerging SARS-CoV-2 sub-lineages like XBB.1.5, XBB.1.16, EG.5, HK.3 (FLip), and XBB.2.3 and the variant BA.2.86 have recently been identified. Understanding the efficacy of current vaccines on these emerging variants is critical. We evaluate the serum neutralization activities of participants who received COVID-19 inactivated vaccine (CoronaVac), those who received the recently approved tetravalent protein vaccine (SCTV01E), or those who had contracted a breakthrough infection with BA.5/BF.7/XBB virus. Neutralization profiles against a broad panel of 30 sub-lineages reveal that BQ.1.1, CH.1.1, and all the XBB sub-lineages exhibit heightened resistance to neutralization compared to previous variants. However, despite their extra mutations, BA.2.86 and the emerging XBB sub-lineages do not demonstrate significantly increased resistance to neutralization over XBB.1.5. Encouragingly, the SCTV01E booster consistently induces higher neutralizing titers against all these variants than breakthrough infection does. Cellular immunity assays also show that the SCTV01E booster elicits a higher frequency of virus-specific memory B cells. Our findings support the development of multivalent vaccines to combat future variants.


Subject(s)
Breakthrough Infections , COVID-19 Vaccines , COVID-19 , Immunization, Secondary , Humans , COVID-19/prevention & control , SARS-CoV-2/genetics , Antibodies, Neutralizing , Antibodies, Viral
4.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37779249

ABSTRACT

To contain infectious diseases, it is crucial to determine the origin and transmission routes of the pathogen, as well as how the virus evolves. With the development of genome sequencing technology, genome epidemiology has emerged as a powerful approach for investigating the source and transmission of pathogens. In this study, we first presented the rationale for genomic tracing of SARS-CoV-2 and the challenges we currently face. Identifying the most genetically similar reference sequence to the query sequence is a critical step in genome tracing, typically achieved using either a phylogenetic tree or a sequence similarity search. However, these methods become inefficient or computationally prohibitive when dealing with tens of millions of sequences in the reference database, as we encountered during the COVID-19 pandemic. To address this challenge, we developed a novel genomic tracing algorithm capable of processing 6 million SARS-CoV-2 sequences in less than a minute. Instead of constructing a giant phylogenetic tree, we devised a weighted scoring system based on mutation characteristics to quantify sequences similarity. The developed method demonstrated superior performance compared to previous methods. Additionally, an online platform was developed to facilitate genomic tracing and visualization of the spatiotemporal distribution of sequences. The method will be a valuable addition to standard epidemiological investigations, enabling more efficient genomic tracing. Furthermore, the computational framework can be easily adapted to other pathogens, paving the way for routine genomic tracing of infectious diseases.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/genetics , Phylogeny , Pandemics , Genome, Viral , Genomics/methods
5.
Nat Ecol Evol ; 7(9): 1457-1466, 2023 09.
Article in English | MEDLINE | ID: mdl-37443189

ABSTRACT

Mutations in the SARS-CoV-2 genome could confer resistance to pre-existing antibodies and/or increased transmissibility. The recently emerged Omicron subvariants exhibit a strong tendency for immune evasion, suggesting adaptive evolution. However, because previous studies have been limited to specific lineages or subsets of mutations, the overall evolutionary trajectory of SARS-CoV-2 and the underlying driving forces are still not fully understood. Here we analysed all open-access SARS-CoV-2 genomes (up to November 2022) and correlated the mutation incidence and fitness changes with the impacts of mutations on immune evasion and ACE2 binding affinity. Our results show that the Omicron lineage had an accelerated mutation rate in the RBD region, while the mutation incidence in other genomic regions did not change dramatically over time. Mutations in the RBD region exhibited a lineage-specific pattern and tended to become more aggregated over time, and the mutation incidence was positively correlated with the strength of antibody pressure. Additionally, mutation incidence was positively correlated with changes in ACE2 binding affinity, but with a lower correlation coefficient than with immune evasion. In contrast, the effect of mutations on fitness was more closely correlated with changes in ACE2 binding affinity than with immune evasion. Our findings suggest that immune evasion and ACE2 binding affinity play significant and diverse roles in the evolution of SARS-CoV-2.


Subject(s)
COVID-19 , Immune Evasion , Humans , Angiotensin-Converting Enzyme 2 , Mutation , SARS-CoV-2/genetics
6.
Microbiol Spectr ; 11(1): e0342622, 2023 02 14.
Article in English | MEDLINE | ID: mdl-36622170

ABSTRACT

SARS-CoV-2 has infected more than 600 million people. However, the origin of the virus is still unclear; knowing where the virus came from could help us prevent future zoonotic epidemics. Sequencing data, particularly metagenomic data, can profile the genomes of all species in the sample, including those not recognized at the time, thus allowing for the identification of the progenitor of SARS-CoV-2 in samples collected before the pandemic. We analyzed the data from 5,196 SARS-CoV-2-positive sequencing runs in the NCBI's SRA database with collection dates prior to 2020 or unknown. We found that the mutation patterns obtained from these suspicious SARS-CoV-2 reads did not match the genome characteristics of an unknown progenitor of the virus, suggesting that they may derive from circulating SARS-CoV-2 variants or other coronaviruses. Despite a negative result for tracking the progenitor of SARS-CoV-2, the methods developed in the study could assist in pinpointing the origin of various pathogens in the future. IMPORTANCE Sequences that are homologous to the SARS-CoV-2 genome were found in numerous sequencing runs that were not associated with the SARS-CoV-2 studies in the public database. It is unclear whether they are derived from the possible progenitor of SARS-CoV-2 or contamination of more recent SARS-CoV-2 variants circulated in the population due to the lack of information on the collection, library preparation, and sequencing processes. We have developed a computational framework to infer the evolutionary relationship between sequences based on the comparison of mutations, which enabled us to rule out the possibility that these suspicious sequences originate from unknown progenitors of SARS-CoV-2.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Metagenomics , Mutation , Genome, Viral
7.
Biosaf Health ; 5(1): 62-67, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36320662

ABSTRACT

We analyzed variations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome during a flight-related cluster outbreak of coronavirus disease 2019 (COVID-19) in Shenzhen, China, to explore the characteristics of SARS-CoV-2 transmission and intra-host single nucleotide variations (iSNVs) in a confined space. Thirty-three patients with COVID-19 were sampled, and 14 were resampled 3-31 days later. All 47 nasopharyngeal swabs were deep-sequenced. iSNVs and similarities in the consensus genome sequence were analyzed. Three SARS-CoV-2 variants of concern, Delta (n = 31), Beta (n = 1), and C.1.2 (n = 1), were detected among the 33 patients. The viral genome sequences from 30 Delta-positive patients had similar SNVs; 14 of these patients provided two successive samples. Overall, the 47 sequenced genomes contained 164 iSNVs. Of the 14 paired (successive) samples, the second samples (T2) contained more iSNVs (median: 3; 95% confidence interval [95% CI]: 2.77-10.22) than did the first samples (T1; median: 2; 95% CI: 1.63-3.74; Wilcoxon test, P = 0.021). 38 iSNVs were detected in T1 samples, and only seven were also detectable in T2 samples. Notably, T2 samples from two of the 14 paired samples had additional mutations than the T1 samples. The iSNVs of the SARS-CoV-2 genome exhibited rapid dynamic changes during a flight-related cluster outbreak event. Intra-host diversity increased gradually with time, and new site mutations occurred in vivo without a population transmission bottleneck. Therefore, we could not determine the generational relationship from the mutation site changes alone.

8.
BMC Biol ; 20(1): 225, 2022 10 08.
Article in English | MEDLINE | ID: mdl-36209213

ABSTRACT

BACKGROUND: Shotgun metagenomic sequencing has greatly expanded the understanding of microbial communities in various biological niches. However, it is still challenging to efficiently convert sub-nanogram DNA to high-quality metagenomic libraries and obtain high-fidelity data, hindering the exploration of niches with low microbial biomass. RESULTS: To cope with this challenge comprehensively, we evaluated the performance of various library preparation methods on 0.5 pg-5 ng synthetic microbial community DNA, characterized contaminants, and further applied different in silico decontamination methods. First, we discovered that whole genome amplification prior to library construction led to worse outcomes than preparing libraries directly. Among different non-WGA-based library preparation methods, we found the endonuclease-based method being generally good for different amounts of template and the tagmentation-based method showing specific advantages with 0.5 pg template, based on evaluation metrics including fidelity, proportion of designated reads, and reproducibility. The load of contaminating DNA introduced by library preparation varied from 0.01 to 15.59 pg for different kits and accounted for 0.05 to 45.97% of total reads. A considerable fraction of the contaminating reads were mapped to human commensal and pathogenic microbes, thus potentially leading to erroneous conclusions in human microbiome studies. Furthermore, the best performing in silico decontamination method in our evaluation, Decontam-either, was capable of recovering the real microbial community from libraries where contaminants accounted for less than 10% of total reads, but not from libraries with heavy and highly varied contaminants. CONCLUSIONS: This study demonstrates that high-quality metagenomic data can be obtained from samples with sub-nanogram microbial DNA by combining appropriate library preparation and in silico decontamination methods and provides a general reference for method selection for samples with varying microbial biomass.


Subject(s)
Decontamination , Metagenomics , DNA/genetics , Endonucleases/genetics , Gene Library , High-Throughput Nucleotide Sequencing/methods , Humans , Metagenomics/methods , Reproducibility of Results , Sequence Analysis, DNA/methods
9.
Virol Sin ; 37(6): 804-812, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36167254

ABSTRACT

The continuously arising of SARS-CoV-2 variants has been posting a great threat to public health safety globally, from B.1.17 (Alpha), B.1.351 (Beta), P.1 (Gamma), B.1.617.2 (Delta) to B.1.1.529 (Omicron). The emerging or re-emerging of the SARS-CoV-2 variants of concern is calling for the constant monitoring of their epidemics, pathogenicity and immune escape. In this study, we aimed to characterize replication and pathogenicity of the Alpha and Delta variant strains isolated from patients infected in Laos. The amino acid mutations within the spike fragment of the isolates were determined via sequencing. The more efficient replication of the Alpha and Delta isolates was documented than the prototyped SARS-CoV-2 in Calu-3 and Caco-2 â€‹cells, while such features were not observed in Huh-7, Vero E6 and HPA-3 â€‹cells. We utilized both animal models of human ACE2 (hACE2) transgenic mice and hamsters to evaluate the pathogenesis of the isolates. The Alpha and Delta can replicate well in multiple organs and cause moderate to severe lung pathology in these animals. In conclusion, the spike protein of the isolated Alpha and Delta variant strains was characterized, and the replication and pathogenicity of the strains in the cells and animal models were also evaluated.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Cricetinae , Humans , Mice , Angiotensin-Converting Enzyme 2 , Caco-2 Cells , COVID-19/virology , Mice, Transgenic , SARS-CoV-2/pathogenicity , Spike Glycoprotein, Coronavirus , Virulence
10.
Emerg Microbes Infect ; 11(1): 552-555, 2022 Dec.
Article in English | MEDLINE | ID: mdl-35081877

ABSTRACT

We identified an individual who was coinfected with two SARS-CoV-2 variants of concern, the Beta and Delta variants. The ratio of the relative abundance between the two variants was maintained at 1:9 (Beta:Delta) in 14 days. Furthermore, possible evidence of recombinations in the Orf1ab and Spike genes was found.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Recombination, Genetic , Spike Glycoprotein, Coronavirus/genetics
11.
Genomics Proteomics Bioinformatics ; 20(1): 60-69, 2022 02.
Article in English | MEDLINE | ID: mdl-35033679

ABSTRACT

A new variant of concern for SARS-CoV-2, Omicron (B.1.1.529), was designated by the World Health Organization on November 26, 2021. This study analyzed the viral genome sequencing data of 108 samples collected from patients infected with Omicron. First, we found that the enrichment efficiency of viral nucleic acids was reduced due to mutations in the region where the primers anneal to. Second, the Omicron variant possesses an excessive number of mutations compared to other variants circulating at the same time (median: 62 vs. 45), especially in the Spike gene. Mutations in the Spike gene confer alterations in 32 amino acid residues, more than those observed in other SARS-CoV-2 variants. Moreover, a large number of nonsynonymous mutations occur in the codons for the amino acid residues located on the surface of the Spike protein, which could potentially affect the replication, infectivity, and antigenicity of SARS-CoV-2. Third, there are 53 mutations between the Omicron variant and its closest sequences available in public databases. Many of these mutations were rarely observed in public databases and had a low mutation rate. In addition, the linkage disequilibrium between these mutations was low, with a limited number of mutations concurrently observed in the same genome, suggesting that the Omicron variant would be in a different evolutionary branch from the currently prevalent variants. To improve our ability to detect and track the source of new variants rapidly, it is imperative to further strengthen genomic surveillance and data sharing globally in a timely manner.


Subject(s)
COVID-19 , Nucleic Acids , Amino Acids , Genomics , Humans , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics
13.
Genomics Proteomics Bioinformatics ; 19(5): 727-740, 2021 10.
Article in English | MEDLINE | ID: mdl-34695600

ABSTRACT

COVID-19 has swept globally and Pakistan is no exception. To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1, 2020. We identified a total of 347 mutated positions, 31 of which were over-represented in Pakistan. Meanwhile, we found over 1000 intra-host single-nucleotide variants (iSNVs). Several of them occurred concurrently, indicating possible interactions among them or coevolution. Some of the high-frequency iSNVs in Pakistan were not observed in the global population, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation (G8371T in ORF1ab) of this cluster. Furthermore, 28 putative international introductions were identified, several of which are consistent with the epidemiological investigations. In all, this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan, which could aid ongoing and future viral surveillance and COVID-19 control.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genome, Viral , Genomics , Humans , Pakistan/epidemiology , Phylogeny , SARS-CoV-2/genetics
14.
Clin Infect Dis ; 71(15): 713-720, 2020 07 28.
Article in English | MEDLINE | ID: mdl-32129843

ABSTRACT

BACKGROUND: A novel coronavirus (CoV), severe acute respiratory syndrome (SARS)-CoV-2, has infected >75 000 individuals and spread to >20 countries. It is still unclear how fast the virus evolved and how it interacts with other microorganisms in the lung. METHODS: We have conducted metatranscriptome sequencing for bronchoalveolar lavage fluid samples from 8 patients with SARS-CoV-2, and also analyzed data from 25 patients with community-acquired pneumonia (CAP), and 20 healthy controls for comparison. RESULTS: The median number of intrahost variants was 1-4 in SARS-CoV-2-infected patients, ranged from 0 to 51 in different samples. The distribution of variants on genes was similar to those observed in the population data. However, very few intrahost variants were observed in the population as polymorphisms, implying either a bottleneck or purifying selection involved in the transmission of the virus, or a consequence of the limited diversity represented in the current polymorphism data. Although current evidence did not support the transmission of intrahost variants in a possible person-to-person spread, the risk should not be overlooked. Microbiotas in SARS-CoV-2-infected patients were similar to those in CAP, either dominated by the pathogens or with elevated levels of oral and upper respiratory commensal bacteria. CONCLUSION: SARS-CoV-2 evolves in vivo after infection, which may affect its virulence, infectivity, and transmissibility. Although how the intrahost variant spreads in the population is still elusive, it is necessary to strengthen the surveillance of the viral evolution in the population and associated clinical changes.


Subject(s)
Coronavirus Infections/epidemiology , Coronavirus , Pandemics , Pneumonia, Viral/epidemiology , Severe Acute Respiratory Syndrome , Betacoronavirus , COVID-19 , Genetic Variation , Genomics , Humans , SARS-CoV-2
15.
J Genet Genomics ; 47(10): 610-617, 2020 10 20.
Article in English | MEDLINE | ID: mdl-33388272

ABSTRACT

In response to the current coronavirus disease 2019 (COVID-19) pandemic, it is crucial to understand the origin, transmission, and evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which relies on close surveillance of genomic diversity in clinical samples. Although the mutation at the population level had been extensively investigated, how the mutations evolve at the individual level is largely unknown. Eighteen time-series fecal samples were collected from nine patients with COVID-19 during the convalescent phase. The nucleic acids of SARS-CoV-2 were enriched by the hybrid capture method. First, we demonstrated the outstanding performance of the hybrid capture method in detecting intra-host variants. We identified 229 intra-host variants at 182 sites in 18 fecal samples. Among them, nineteen variants presented frequency changes > 0.3 within 1-5 days, reflecting highly dynamic intra-host viral populations. Moreover, the evolution of the viral genome demonstrated that the virus was probably viable in the gastrointestinal tract during the convalescent period. Meanwhile, we also found that the same mutation showed a distinct pattern of frequency changes in different individuals, indicating a strong random drift. In summary, dramatic changes of the SARS-CoV-2 genome were detected in fecal samples during the convalescent period; whether the viral load in feces is sufficient to establish an infection warranted further investigation.


Subject(s)
COVID-19/prevention & control , Feces/virology , Genome, Viral/genetics , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/virology , Convalescence , Gene Expression Profiling/methods , Genomics/methods , Haplotypes , High-Throughput Nucleotide Sequencing/methods , Humans , Mutation , Pandemics , Polymorphism, Single Nucleotide , SARS-CoV-2/physiology , Time Factors
17.
J Proteomics ; 197: 53-59, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30790687

ABSTRACT

Peptide-spectrum matches (PSM) scoring between the experimental and theoretical spectrum is a key step in the identification of proteins using mass spectrometry (MS)-based proteomics analyses. Efficient protein identification using MS/MS data remains a challenge. The strategy of using RNA-seq data increases the number of proteins identified by re-constructing the custom search database and integrating mRNA abundance into the false discovery rate of post-PSM. However, this process lacks an algorithm that can allow the incorporation of mRNA abundance into the key scoring model of PSM. Therefore, we developed a novel PSM scoring model, which incorporates mRNA abundance for improved peptide and protein identification. In the new algorithm, abundance information of mRNA was transformed to the prior probability of protein identification and integrated to re-score in PSM using the binomial probability distribution model. Compared with other algorithms using five MS/MS datasets, the results showed that the least improvement ratios of peptide and protein groups were 3.39%-9.79% and 0.48%-8.16% in different datasets (human, rat, zebrafish, yeast, and Arabidopsis thaliana). The new strategy offers an effective solution for MS-based identification of peptides and proteins. SIGNIFICANCE: The new algorithm identifies proteins by quantifying mRNA abundance (FPKM) and incorporating it into a scoring model for peptide-spectrum matches. It is important to improve peptide and protein identification from MS/MS datasets in proteomics research.


Subject(s)
Algorithms , Arabidopsis/metabolism , Databases, Nucleic Acid , RNA, Fungal/metabolism , RNA, Messenger/metabolism , RNA, Plant/metabolism , Saccharomyces cerevisiae/metabolism , Zebrafish/metabolism , Animals , Humans , Rats , Tandem Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL
...