Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 69
Filter
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38343322

ABSTRACT

Vaccination stands as the most effective and economical strategy for prevention and control of influenza. The primary target of neutralizing antibodies is the surface antigen hemagglutinin (HA). However, ongoing mutations in the HA sequence result in antigenic drift. The success of a vaccine is contingent on its antigenic congruence with circulating strains. Thus, predicting antigenic variants and deducing antigenic clusters of influenza viruses are pivotal for recommendation of vaccine strains. The antigenicity of influenza A viruses is determined by the interplay of amino acids in the HA1 sequence. In this study, we exploit the ability of convolutional neural networks (CNNs) to extract spatial feature representations in the convolutional layers, which can discern interactions between amino acid sites. We introduce PREDAC-CNN, a model designed to track antigenic evolution of seasonal influenza A viruses. Accessible at http://predac-cnn.cloudna.cn, PREDAC-CNN formulates a spatially oriented representation of the HA1 sequence, optimized for the convolutional framework. It effectively probes interactions among amino acid sites in the HA1 sequence. Also, PREDAC-CNN focuses exclusively on physicochemical attributes crucial for the antigenicity of influenza viruses, thereby eliminating unnecessary amino acid embeddings. Together, PREDAC-CNN is adept at capturing interactions of amino acid sites within the HA1 sequence and examining the collective impact of point mutations on antigenic variation. Through 5-fold cross-validation and retrospective testing, PREDAC-CNN has shown superior performance in predicting antigenic variants compared to its counterparts. Additionally, PREDAC-CNN has been instrumental in identifying predominant antigenic clusters for A/H3N2 (1968-2023) and A/H1N1 (1977-2023) viruses, significantly aiding in vaccine strain recommendation.


Subject(s)
Influenza A Virus, H1N1 Subtype , Influenza A virus , Vaccines , Influenza A virus/genetics , Influenza A Virus, H3N2 Subtype/genetics , Hemagglutinin Glycoproteins, Influenza Virus/genetics , Seasons , Retrospective Studies , Antigens, Viral/genetics , Neural Networks, Computer , Amino Acids
2.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38048079

ABSTRACT

Identification of viruses and further assembly of viral genomes from the next-generation-sequencing data are essential steps in virome studies. This study presented a one-stop tool named VIGA (available at https://github.com/viralInformatics/VIGA) for eukaryotic virus identification and genome assembly from NGS data. It was composed of four modules, namely, identification, taxonomic annotation, assembly and novel virus discovery, which integrated several third-party tools such as BLAST, Trinity, MetaCompass and RagTag. Evaluation on multiple simulated and real virome datasets showed that VIGA assembled more complete virus genomes than its competitors on both the metatranscriptomic and metagenomic data and performed well in assembling virus genomes at the strain level. Finally, VIGA was used to investigate the virome in metatranscriptomic data from the Human Microbiome Project and revealed different composition and positive rate of viromes in diseases of prediabetes, Crohn's disease and ulcerative colitis. Overall, VIGA would help much in identification and characterization of viromes, especially the known viruses, in future studies.


Subject(s)
Colitis, Ulcerative , Crohn Disease , Humans , High-Throughput Nucleotide Sequencing , Genome, Viral , Metagenome
3.
PLoS Pathog ; 19(6): e1011443, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37327222

ABSTRACT

The host always employs various ways to defend against viral infection and spread. However, viruses have evolved their own effective strategies, such as inhibition of RNA translation of the antiviral effectors, to destroy the host's defense barriers. Protein synthesis, commonly controlled by the α-subunit of eukaryotic translation initiation factor 2 (eIF2α), is a basic cellular biological process among all species. In response to viral infection, in addition to inducing the transcription of antiviral cytokines by innate immunity, infected cells also inhibit the RNA translation of antiviral factors by activating the protein kinase R (PKR)-eIF2α signaling pathway. Regulation of innate immunity has been well studied; however, regulation of the PKR-eIF2α signaling pathway remains unclear. In this study, we found that the E3 ligase TRIM21 negatively regulates the PKR-eIF2α signaling pathway. Mechanistically, TRIM21 interacts with the PKR phosphatase PP1α and promotes K6-linked polyubiquitination of PP1α. Ubiquitinated PP1α augments its interaction with PKR, causing PKR dephosphorylation and subsequent translational inhibition release. Furthermore, TRIM21 can constitutively restrict viral infection by reversing PKR-dependent translational inhibition of various previously known and unknown antiviral factors. Our study highlights a previously undiscovered role of TRIM21 in regulating translation, which will provide new insights into the host antiviral response and novel targets for the treatment of translation-associated diseases in the clinic.


Subject(s)
RNA , Virus Diseases , Humans , RNA/metabolism , eIF-2 Kinase/metabolism , Protein Processing, Post-Translational , Phosphorylation , Antiviral Agents , Eukaryotic Initiation Factor-2/genetics , Eukaryotic Initiation Factor-2/metabolism , Virus Replication
4.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36377755

ABSTRACT

Virus-encoded small RNAs (vsRNA) have been reported to play an important role in viral infection. Unfortunately, there is still a lack of an effective method for vsRNA identification. Herein, we presented vsRNAfinder, a de novo method for identifying high-confidence vsRNAs from small RNA-Seq (sRNA-Seq) data based on peak calling and Poisson distribution and is publicly available at https://github.com/ZenaCai/vsRNAfinder. vsRNAfinder outperformed two widely used methods namely miRDeep2 and ShortStack in identifying viral miRNAs with a significantly improved sensitivity. It can also be used to identify sRNAs in animals and plants with similar performance to miRDeep2 and ShortStack. vsRNAfinder would greatly facilitate effective identification of vsRNAs from sRNA-Seq data.


Subject(s)
MicroRNAs , Animals , RNA-Seq , MicroRNAs/genetics , Sequence Analysis, RNA/methods
5.
Int Immunol ; 35(4): 181-196, 2023 04 04.
Article in English | MEDLINE | ID: mdl-36409527

ABSTRACT

Innate immunity is the first line of host defense against pathogenic invasion in metazoans. The transcription factor basic leucine zipper transcriptional factor ATF-like 3 (BATF3) plays a crucial role in the development of conventional dendritic cells and the program of CD8 + T cell survival and memory, but the role of BATF3 in innate immune responses remains unclear. Here, we show an evolutionarily conserved basic-region leucine zipper (bZIP) transcription factor BATF3/ZIP-10 suppresses innate immune response through repressing the p38/PMK-1 mitogen-activated protein kinase (MAPK) pathway in vitro and in vivo. The worm mutant lacking the Caenorhabditis elegans homolog BATF3, ZIP-10, exhibited enhanced resistance to PA14 infection, which was completely rescued by transgenic expression of either endogenous zip-10 or mouse or human Batf3 cDNA driven by the worm zip-10 promoter. ZIP-10 expression was inhibited by a microRNA miR-60 that was downregulated upon PA14 infection. Moreover, the level of phosphorylated but not total PMK-1/p38 was attenuated by ZIP-10 and stimulated by miR-60. The human HEK293 cells with Batf3 overexpression or RNA-interference knockdown exhibited a reduction or increase of the cell viability upon Pseudomonas aeruginosa PA14 infection, respectively. The overexpression of either worm ZIP-10 or human BATF3 abolished the activation of p38 and inhibited the expression of antimicrobial peptides and cytokine genes in HEK293 cells. Our findings indicate that the genetic transcriptional program of the evolutionally conserved bZIP transcription factor BATF3/ZIP-10 suppresses innate immunity by attenuating the p38 MAPK signaling activity, which expands our understanding of the pathological mechanisms underlying relevant infectious diseases.


Subject(s)
Caenorhabditis elegans Proteins , MicroRNAs , Pseudomonas Infections , Animals , Humans , Mice , Basic-Leucine Zipper Transcription Factors/genetics , Basic-Leucine Zipper Transcription Factors/metabolism , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , HEK293 Cells , Caenorhabditis elegans/genetics , Caenorhabditis elegans/metabolism , Immunity, Innate , Transcription Factors/metabolism , p38 Mitogen-Activated Protein Kinases/metabolism , MicroRNAs/genetics , Mitogen-Activated Protein Kinases/genetics , Mitogen-Activated Protein Kinases/metabolism
6.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33333556

ABSTRACT

African swine fever virus (ASFV) poses serious threats to the pig industry. The multigene family (MGF) proteins are extensively distributed in ASFVs and are generally classified into five families, including MGF-100, MGF-110, MGF-300, MGF-360 and MGF-505. Most MGF proteins, however, have not been well characterized and classified within each family. To bridge this gap, this study first classified MGF proteins into 31 groups based on protein sequence homology and network clustering. A web server for classifying MGF proteins was established and kept available for free at http://www.computationalbiology.cn/MGF/home.html. Results showed that MGF groups of the same family were most similar to each other and had conserved sequence motifs; the genetic diversity of MGF groups varied widely, mainly due to the occurrence of indels. In addition, the MGF proteins were predicted to have large structural and functional diversity, and MGF proteins of the same MGF family tended to have similar structure, location and function. Reconstruction of the ancestral states of MGF groups along the ASFV phylogeny showed that most MGF groups experienced either the copy number variations or the gain-or-loss changes, and most of these changes happened within strains of the same genotype. It is found that the copy number decrease and the loss of MGF groups were much larger than the copy number increase and the gain of MGF groups, respectively, suggesting the ASFV tended to lose MGF proteins in the evolution. Overall, the work provides a detailed classification for MGF proteins and would facilitate further research on MGF proteins.


Subject(s)
African Swine Fever Virus/genetics , DNA Copy Number Variations , Evolution, Molecular , Multigene Family , Viral Proteins/classification , Viral Proteins/genetics , Animals , Swine
7.
Brief Bioinform ; 22(2): 2182-2190, 2021 03 22.
Article in English | MEDLINE | ID: mdl-32349124

ABSTRACT

Circular RNAs (circRNAs) are covalently closed long noncoding RNAs critical in diverse cellular activities and multiple human diseases. Several cancer-related viral circRNAs have been identified in double-stranded DNA viruses (dsDNA), yet no systematic study about the viral circRNAs has been reported. Herein, we have performed a systematic survey of 11 924 circRNAs from 23 viral species by computational prediction of viral circRNAs from viral-infection-related RNA sequencing data. Besides the dsDNA viruses, our study has also revealed lots of circRNAs in single-stranded RNA viruses and retro-transcribing viruses, such as the Zika virus, the Influenza A virus, the Zaire ebolavirus, and the Human immunodeficiency virus 1. Most viral circRNAs had reverse complementary sequences or repeated sequences at the flanking sequences of the back-splice sites. Most viral circRNAs only expressed in a specific cell line or tissue in a specific species. Functional enrichment analysis indicated that the viral circRNAs from dsDNA viruses were involved in KEGG pathways associated with cancer. All viral circRNAs presented in the current study were stored and organized in VirusCircBase, which is freely available at http://www.computationalbiology.cn/ViruscircBase/home.html and is the first virus circRNA database. VirusCircBase forms the fundamental atlas for the further exploration and investigation of viral circRNAs in the context of public health.


Subject(s)
Database Management Systems , RNA, Circular/genetics , RNA, Viral/genetics , Viruses/genetics , Humans
8.
Brief Bioinform ; 22(2): 1297-1308, 2021 03 22.
Article in English | MEDLINE | ID: mdl-33757279

ABSTRACT

The life-threatening coronaviruses MERS-CoV, SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) have caused and will continue to cause enormous morbidity and mortality to humans. Virus-encoded noncoding RNAs are poorly understood in coronaviruses. Data mining of viral-infection-related RNA-sequencing data has resulted in the identification of 28 754, 720 and 3437 circRNAs encoded by MERS-CoV, SARS-CoV-1 and SARS-CoV-2, respectively. MERS-CoV exhibits much more prominent ability to encode circRNAs in all genomic regions than those of SARS-CoV-1/2. Viral circRNAs typically exhibit low expression levels. Moreover, majority of the viral circRNAs exhibit expressions only in the late stage of viral infection. Analysis of the competitive interactions of viral circRNAs, human miRNAs and mRNAs in MERS-CoV infections reveals that viral circRNAs up-regulated genes related to mRNA splicing and processing in the early stage of viral infection, and regulated genes involved in diverse functions including cancer, metabolism, autophagy, viral infection in the late stage of viral infection. Similar analysis in SARS-CoV-2 infections reveals that its viral circRNAs down-regulated genes associated with metabolic processes of cholesterol, alcohol, fatty acid and up-regulated genes associated with cellular responses to oxidative stress in the late stage of viral infection. A few genes regulated by viral circRNAs from both MERS-CoV and SARS-CoV-2 were enriched in several biological processes such as response to reactive oxygen and centrosome localization. This study provides the first glimpse into viral circRNAs in three deadly coronaviruses and would serve as a valuable resource for further studies of circRNAs in coronaviruses.


Subject(s)
Middle East Respiratory Syndrome Coronavirus/genetics , RNA, Circular/genetics , SARS-CoV-2/genetics , Severe acute respiratory syndrome-related coronavirus/genetics , Humans
9.
Brief Bioinform ; 22(2): 1267-1278, 2021 03 22.
Article in English | MEDLINE | ID: mdl-33126244

ABSTRACT

Accessory proteins play important roles in the interaction between coronaviruses and their hosts. Accordingly, a comprehensive study of the compositional diversity and evolutionary patterns of accessory proteins is critical to understanding the host adaptation and epidemic variation of coronaviruses. Here, we developed a standardized genome annotation tool for coronavirus (CoroAnnoter) by combining open reading frame prediction, transcription regulatory sequence recognition and homologous alignment. Using CoroAnnoter, we annotated 39 representative coronavirus strains to form a compositional profile for all of the accessary proteins. Large variations were observed in the number of accessory proteins of 1-10 for different coronaviruses, with SARS-CoV-2 and SARS-CoV having the most (9 and 10, respectively). The variation between SARS-CoV and SARS-CoV-2 accessory proteins could be traced back to related coronaviruses in other hosts. The genomic distribution of accessory proteins had significant intra-genus conservation and inter-genus diversity and could be grouped into 1, 4, 2 and 1 types for alpha-, beta-, gamma-, and delta-coronaviruses, respectively. Evolutionary analysis suggested that accessory proteins are more conservative locating before the N-terminal of proteins E and M (E-M), while they are more diverse after these proteins. Furthermore, comparison of virus-host interaction networks of SARS-CoV-2 and SARS-CoV accessory proteins showed that they share multiple antiviral signaling pathways, those involved in the apoptotic process, viral life cycle and response to oxidative stress. In summary, our study provides a tool for coronavirus genome annotation and builds a comprehensive profile for coronavirus accessory proteins covering their composition, classification, evolutionary pattern and host interaction.


Subject(s)
Biological Evolution , COVID-19/virology , SARS-CoV-2/metabolism , Viral Proteins/genetics , Viral Proteins/metabolism , Genes, Viral , Humans , Molecular Sequence Annotation , Open Reading Frames , Protein Interaction Maps , SARS-CoV-2/genetics
10.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33313676

ABSTRACT

The genus Culicoides includes biting midges, some of which are vectors for viruses that cause diseases in humans and animals. Knowledge of the roles of Culicoides in viral ecology is inadequate. We collected ~300 000 samples of Culicoides and mosquitoes in 15 representative regions within Yunnan, China. Using mosquitoes as reference vectors, we designed a comparative virome strategy to study the viral composition, diversity, hosts and spatiotemporal distribution of Culicoides. A map of viromes in Culicoides and mosquitoes in Yunan province, China, was constructed. At the same locations, Culicoides and mosquitoes usually share a similar viral diversity. At least 10 important pathogenic viruses were detected from Culicoides. Many novel viruses were discovered, including 21 segmented viruses of Flaviviridae, 180 viruses of Monjiviricetes and 130 viruses of Bunyavirales. The findings demonstrate that Culicoides is an important part of viral ecology and should be studied and monitored for potentially emerging viruses.


Subject(s)
Ceratopogonidae/virology , Culicidae/virology , Positive-Strand RNA Viruses/classification , Virome , Animals
11.
Bioinformatics ; 38(11): 3087-3093, 2022 05 26.
Article in English | MEDLINE | ID: mdl-35435220

ABSTRACT

MOTIVATION: Viruses continue to threaten human health. Yet, the complete viral species carried by humans and their infection characteristics have not been fully revealed. RESULTS: This study curated an atlas of human viruses from public databases and literature, and built the Human Virus Database (HVD). The HVD contains 1131 virus species of 54 viral families which were more than twice the number of the human-infecting virus species reported in previous studies. These viruses were identified in human samples including 68 human tissues, the excreta and body fluid. The viral diversity in humans was age-dependent with a peak in the infant and a valley in the teenager. The tissue tropism of viruses was found to be associated with several factors including the viral group (DNA, RNA or reverse-transcribing viruses), enveloped or not, viral genome length and GC content, viral receptors and the virus-interacting proteins. Finally, the tissue tropism of DNA viruses was predicted using a random-forest algorithm with a middle performance. Overall, the study not only provides a valuable resource for further studies of human viruses but also deepens our understanding toward the diversity and tissue tropism of human viruses. AVAILABILITY AND IMPLEMENTATION: The HVD is available at http://computationalbiology.cn/humanVirusBase/#/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Viral Tropism , Viruses , Adolescent , Humans , Genome, Viral , Viral Proteins , Viruses/genetics
12.
J Med Virol ; 95(3): e28617, 2023 03.
Article in English | MEDLINE | ID: mdl-36840404

ABSTRACT

Virus-encoded small RNAs (vsRNAs) have been reported to play an important role in viral infections. Unfortunately, there is still a lack of a systematic characterization and resource of vsRNAs. Herein, we identified a total of 19 734 high-confidence vsRNAs including 2746 microRNAs (miRNAs) in 64 viral species from more than 800 samples of public small RNA-Seq data. The number of vsRNAs identified in viruses varied from 1 to 2489 with a median of 170. The length distribution of vsRNAs peaked at 21 and 22 nt. Plant viruses were found to express larger number and higher levels of vsRNAs than those of animal viruses. Besides, the number of vsRNAs identified increased as the viral infection persisted. Interestingly, the vsRNA showed strong expression specificity as little overlap was observed among vsRNAs identified in different strains of a virus, or in different hosts, cells, or tissues infected by the same virus. Little conservation was observed among vsRNAs of different viruses. The viral miRNAs were found to interact with host genes involved in multiple biological processes related to organization, development, action potential, polarity establishment, methylation, immune response, gene regulation, localization, and so on. To facilitate the usage of vsRNAs, a database named vsRNAdb was built for organizing and storing vsRNAs which is available at http://www.computationalbiology.cn/vsRNAdb/#/vsRNAdb/#/. Overall, the study deepens our understanding about the diversity and complexity of vsRNAs and provides a rich resource for further studies of vsRNAs.


Subject(s)
MicroRNAs , RNA, Viral , Animals , RNA, Viral/metabolism , RNA-Seq , MicroRNAs/genetics , MicroRNAs/metabolism , Methylation
13.
J Med Virol ; 95(1): e28111, 2023 01.
Article in English | MEDLINE | ID: mdl-36042689

ABSTRACT

Parkinson's disease (PD) is a kind of neurodegenerative disease that causes a huge burden to society. Previous studies have suggested the association between PD and multiple viruses. However, there is still a lack of a virome study about PD. This study systematically identified viruses from the public RNA-sequencing data of more than 700 samples from both PD patients and the control group (most were healthy people). Only nine viruses such as human betaherpesvirus 5 and Merkel cell polyomavirus have been detected in several human brain tissues of the central nervous system, the appendix, and blood of PD patients, and all of these viruses were also detected in the control group. Most viruses were observed to have low abundance in no more than three tissues. No statistically significant differences were observed between the virus abundance in the PD patients and the control group for all viruses. The positive rates of most viruses in PD patients were higher or similar to that in the control group, although those were less than 5% for most viruses. Overall, this is the first study to systematically investigate the virome in PD patients, and provides new insights into the association between viruses and PD.


Subject(s)
Merkel cell polyomavirus , Neurodegenerative Diseases , Parkinson Disease , Viruses , Humans , Virome , Viruses/genetics
14.
J Med Virol ; 95(7): e28931, 2023 07.
Article in English | MEDLINE | ID: mdl-37448226

ABSTRACT

Monitoring variations in the virus genome to understand the SARS-CoV-2 evolution and spread of the virus is extremely important. Seven early SARS-CoV-2 isolates in China were cultured in vitro and were analyzed for their viral infectivity through viral growth assay, tissue culture infectious dose (TCID50 ) assay, spike protein quantification, and next generation sequencing analysis, and the resultant mutations in spike protein were used to generate the corresponding pseudoviruses for analysis of immune escape from vaccination and postinfection immunity. The results revealed that in vitro cultured SARS-CoV-2 virus had much higher mutation frequency (up to ~20 times) than that in infected patients, suggesting that SARS-CoV-2 diversify under favorable conditions. Monitoring viral mutations is not only helpful for better understanding of virus evolution and virulence change, but also the key to prevent virus transmission and disease progression. Compared with the D614G reference strain, a pseudovirus strain of SARS-CoV-2 was constructed with a high mutation rate site on the spike protein. We found some novel spike mutations during in vitro culture, such as E868Q, conferred further immune escape ability.


Subject(s)
COVID-19 , Humans , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Biological Assay , Mutation , Immunity
15.
Vet Res ; 53(1): 101, 2022 Dec 02.
Article in English | MEDLINE | ID: mdl-36461107

ABSTRACT

African swine fever virus (ASFV) is a large DNA virus that infects domestic pigs with high morbidity and mortality rates. Repeat sequences, which are DNA sequence elements that are repeated more than twice in the genome, play an important role in the ASFV genome. The majority of repeat sequences, however, have not been identified and characterized in a systematic manner. In this study, three types of repeat sequences, including microsatellites, minisatellites and short interspersed nuclear elements (SINEs), were identified in the ASFV genome, and their distribution, structure, function, and evolutionary history were investigated. Most repeat sequences were observed in noncoding regions and at the 5' end of the genome. Noncoding repeat sequences tended to form enhancers, whereas coding repeat sequences had a lower ratio of alpha-helix and beta-sheet and a higher ratio of loop structure and surface amino acids than nonrepeat sequences. In addition, the repeat sequences tended to encode penetrating and antimicrobial peptides. Further analysis of the evolution of repeat sequences revealed that the pan-repeat sequences presented an open state, showing the diversity of repeat sequences. Finally, CpG islands were observed to be negatively correlated with repeat sequence occurrences, suggesting that they may affect the generation of repeat sequences. Overall, this study emphasizes the importance of repeat sequences in ASFVs, and these results can aid in understanding the virus's function and evolution.


Subject(s)
African Swine Fever Virus , Animals , Swine , African Swine Fever Virus/genetics , Sus scrofa , Amino Acids , Antimicrobial Peptides , Minisatellite Repeats
16.
BMC Biol ; 19(1): 5, 2021 01 14.
Article in English | MEDLINE | ID: mdl-33441133

ABSTRACT

BACKGROUND: Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. RESULTS: We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28-34%, genus level). PHP also outperformed these two alignment-free methods much (24-38% vs 18-20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. CONCLUSIONS: The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.


Subject(s)
Archaeal Viruses/physiology , Bacteriophages/physiology , Host-Pathogen Interactions , Metagenomics/methods , Models, Biological , Normal Distribution , Software
17.
BMC Genomics ; 22(1): 76, 2021 Jan 22.
Article in English | MEDLINE | ID: mdl-33482734

ABSTRACT

BACKGROUND: Though interest in human simple sequence repeats (SSRs) is increasing, little is known about the exact distributional features of numerous SSRs in human Y-DNA at chromosomal level. Herein, totally 540 maps were established, which could clearly display SSR landscape in every bin of 1 k base pairs (Kbp) along the sequenced part of human reference Y-DNA (NC_000024.10), by our developed differential method for improving the existing method to reveal SSR distributional characteristics in large genomic sequences. RESULTS: The maps show that SSRs accumulate significantly with forming density peaks in at least 2040 bins of 1 Kbp, which involve different coding, noncoding and intergenic regions of the Y-DNA, and 10 especially high density peaks were reported to associate with biological significances, suggesting that the other hundreds of especially high density peaks might also be biologically significant and worth further analyzing. In contrast, the maps also show that SSRs are extremely sparse in at least 207 bins of 1 Kbp, including many noncoding and intergenic regions of the Y-DNA, which is inconsistent with the widely accepted view that SSRs are mostly rich in these regions, and these sparse distributions are possibly due to powerfully regional selection. Additionally, many regions harbor SSR clusters with same or similar motif in the Y-DNA. CONCLUSIONS: These 540 maps may provide the important information of clearly position-related SSR distributional features along the human reference Y-DNA for better understanding the genome structures of the Y-DNA. This study may contribute to further exploring the biological significance and distribution law of the huge numbers of SSRs in human Y-DNA.


Subject(s)
Microsatellite Repeats , Polymorphism, Genetic , DNA/genetics , Genome , Genome, Plant , Humans , Microsatellite Repeats/genetics , Sequence Analysis, DNA
18.
Bioinformatics ; 36(10): 2975-2979, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32096819

ABSTRACT

MOTIVATION: Receptors on host cells play a critical role in viral infection. How phages select receptors is still unknown. RESULTS: Here, we manually curated a high-quality database named phageReceptor, including 427 pairs of phage-host receptor interactions, 341 unique viral species or sub-species and 69 bacterial species. Sugars and proteins were most widely used by phages as receptors. The receptor usage of phages in Gram-positive bacteria was different from that in Gram-negative bacteria. Most protein receptors were located on the outer membrane. The phage protein receptors (PPRs) were highly diverse in their structures, and had little sequence identity and no common protein domain with mammalian virus receptors. Further functional characterization of PPRs in Escherichia coli showed that they had larger node degrees and betweennesses in the protein-protein interaction network, and higher expression levels, than other outer membrane proteins, plasma membrane proteins or other intracellular proteins. These findings were consistent with what observed for mammalian virus receptors reported in previous studies, suggesting that viral protein receptors tend to have multiple interaction partners and high expressions. The study deepens our understanding of virus-host interactions. AVAILABILITY AND IMPLEMENTATION: phageReceptor is publicly available from: http://www.computationalbiology.cn/phageReceptor/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Bacteriophages , Receptors, Virus , Animals , Bacteriophages/genetics , Escherichia coli , Membrane Proteins , Viral Proteins
19.
Bioinformatics ; 36(10): 3251-3253, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32049310

ABSTRACT

MOTIVATION: Newly emerging influenza viruses keep challenging global public health. To evaluate the potential risk of the viruses, it is critical to rapidly determine the phenotypes of the viruses, including the antigenicity, host, virulence and drug resistance. RESULTS: Here, we built FluPhenotype, a one-stop platform to rapidly determinate the phenotypes of the influenza A viruses. The input of FluPhenotype is the complete or partial genomic/protein sequences of the influenza A viruses. The output presents five types of information about the viruses: (i) sequence annotation including the gene and protein names as well as the open reading frames, (ii) potential hosts and human-adaptation-associated amino acid markers, (iii) antigenic and genetic relationships with the vaccine strains of different HA subtypes, (iv) mammalian virulence-related amino acid markers and (v) drug resistance-related amino acid markers. FluPhenotype will be a useful bioinformatic tool for surveillance and early warnings of the newly emerging influenza A viruses. AVAILABILITY AND IMPLEMENTATION: It is publicly available from: http://www.computationalbiology.cn : 18888/IVEW. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Influenza A virus , Influenza, Human , Orthomyxoviridae , Amino Acid Sequence , Animals , Hemagglutinin Glycoproteins, Influenza Virus , Humans , Influenza A virus/genetics
20.
Mol Biol Evol ; 36(6): 1172-1186, 2019 06 01.
Article in English | MEDLINE | ID: mdl-30851115

ABSTRACT

Seasonal influenza viruses undergo frequent mutations on their surface hemagglutinin (HA) proteins to escape the host immune response. In these mutations, a few key amino acid sites are associated with significant antigenic cluster transitions. To recognize the cluster-transition determining sites of seasonal influenza A/H3N2 and A/H1N1 viruses systematically and quickly, we developed a computational model named RECDS (recognition of cluster-transition determining sites) to evaluate the contribution of a specific amino acid site on the HA protein in the whole history of antigenic evolution. In RECDS, we ranked all of the HA sites by calculating the contribution scores derived from the forest of gradient boosting classifiers trained by various sequence- and structure-based features. With the RECDS model, we found out that the sites determining influenza antigenicity were mostly around the receptor-binding domain both for the influenza A/H3N2 and A/H1N1 viruses. Specifically, half of the cluster-transition determining sites of the influenza A/H1N1 virus were located in the vestigial esterase domain and basic path area on the HA, which indicated that the differential driving force of the antigenic evolution of the A/H1N1 virus refers to the A/H3N2 virus. Beyond that, the footprints of substitutions responsible for antigenic evolution were inferred according to the phylogenetic trees for the cluster-transition determining sites. The monitoring of genetic variation occurring at these cluster-transition determining sites in circulating influenza viruses on a large scale will potentially reduce current assay workloads in influenza surveillance and the selection of new influenza vaccine strains.


Subject(s)
Antigens, Viral/genetics , Evolution, Molecular , Hemagglutinins/genetics , Influenza A Virus, H1N1 Subtype/genetics , Influenza A Virus, H3N2 Subtype/genetics , Algorithms , Genetic Techniques , Influenza A Virus, H1N1 Subtype/immunology , Influenza A Virus, H3N2 Subtype/immunology , Software
SELECTION OF CITATIONS
SEARCH DETAIL