RESUMEN
Vaccination stands as the most effective and economical strategy for prevention and control of influenza. The primary target of neutralizing antibodies is the surface antigen hemagglutinin (HA). However, ongoing mutations in the HA sequence result in antigenic drift. The success of a vaccine is contingent on its antigenic congruence with circulating strains. Thus, predicting antigenic variants and deducing antigenic clusters of influenza viruses are pivotal for recommendation of vaccine strains. The antigenicity of influenza A viruses is determined by the interplay of amino acids in the HA1 sequence. In this study, we exploit the ability of convolutional neural networks (CNNs) to extract spatial feature representations in the convolutional layers, which can discern interactions between amino acid sites. We introduce PREDAC-CNN, a model designed to track antigenic evolution of seasonal influenza A viruses. Accessible at http://predac-cnn.cloudna.cn, PREDAC-CNN formulates a spatially oriented representation of the HA1 sequence, optimized for the convolutional framework. It effectively probes interactions among amino acid sites in the HA1 sequence. Also, PREDAC-CNN focuses exclusively on physicochemical attributes crucial for the antigenicity of influenza viruses, thereby eliminating unnecessary amino acid embeddings. Together, PREDAC-CNN is adept at capturing interactions of amino acid sites within the HA1 sequence and examining the collective impact of point mutations on antigenic variation. Through 5-fold cross-validation and retrospective testing, PREDAC-CNN has shown superior performance in predicting antigenic variants compared to its counterparts. Additionally, PREDAC-CNN has been instrumental in identifying predominant antigenic clusters for A/H3N2 (1968-2023) and A/H1N1 (1977-2023) viruses, significantly aiding in vaccine strain recommendation.
Asunto(s)
Subtipo H1N1 del Virus de la Influenza A , Virus de la Influenza A , Vacunas , Virus de la Influenza A/genética , Subtipo H3N2 del Virus de la Influenza A/genética , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Estaciones del Año , Estudios Retrospectivos , Antígenos Virales/genética , Redes Neurales de la Computación , AminoácidosRESUMEN
Ribozymes are excellent systems in which to study 'sequence - structure - function' relationships in RNA molecules. Understanding these relationships may greatly help structural modeling and design of functional RNA structures and some functional structural modules could be repurposed in molecular design. At present, there is no comprehensive database summarising all the natural ribozyme families. We have therefore created Ribocentre, a database that collects together sequence, structure and mechanistic data on 21 ribozyme families. This includes available information on timelines, sequence families, secondary and tertiary structures, catalytic mechanisms, applications of the ribozymes together with key publications. The database is publicly available at https://www.ribocentre.org.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN Catalítico , Humanos , Secuencia de Bases , Conformación de Ácido Nucleico , ARN Catalítico/químicaRESUMEN
Virus-encoded small RNAs (vsRNA) have been reported to play an important role in viral infection. Unfortunately, there is still a lack of an effective method for vsRNA identification. Herein, we presented vsRNAfinder, a de novo method for identifying high-confidence vsRNAs from small RNA-Seq (sRNA-Seq) data based on peak calling and Poisson distribution and is publicly available at https://github.com/ZenaCai/vsRNAfinder. vsRNAfinder outperformed two widely used methods namely miRDeep2 and ShortStack in identifying viral miRNAs with a significantly improved sensitivity. It can also be used to identify sRNAs in animals and plants with similar performance to miRDeep2 and ShortStack. vsRNAfinder would greatly facilitate effective identification of vsRNAs from sRNA-Seq data.
Asunto(s)
MicroARNs , Animales , RNA-Seq , MicroARNs/genética , Análisis de Secuencia de ARN/métodosRESUMEN
The H1N1pdm09 virus has been a persistent threat to public health since the 2009 pandemic. Particularly, since the relaxation of COVID-19 pandemic mitigation measures, the influenza virus and SARS-CoV-2 have been concurrently prevalent worldwide. To determine the antigenic evolution pattern of H1N1pdm09 and develop preventive countermeasures, we collected influenza sequence data and immunological data to establish a new antigenic evolution analysis framework. A machine learning model (XGBoost, accuracy = 0.86, area under the receiver operating characteristic curve = 0.89) was constructed using epitopes, physicochemical properties, receptor binding sites, and glycosylation sites as features to predict the antigenic similarity relationships between influenza strains. An antigenic correlation network was constructed, and the Markov clustering algorithm was used to identify antigenic clusters. Subsequently, the antigenic evolution pattern of H1N1pdm09 was analyzed at the global and regional scales across three continents. We found that H1N1pdm09 evolved into around five antigenic clusters between 2009 and 2023 and that their antigenic evolution trajectories were characterized by cocirculation of multiple clusters, low-level persistence of former dominant clusters, and local heterogeneity of cluster circulations. Furthermore, compared with the seasonal H1N1 virus, the potential cluster-transition determining sites of H1N1pdm09 were restricted to epitopes Sa and Sb. This study demonstrated the effectiveness of machine learning methods for characterizing antigenic evolution of viruses, developed a specific model to rapidly identify H1N1pdm09 antigenic variants, and elucidated their evolutionary patterns. Our findings may provide valuable support for the implementation of effective surveillance strategies and targeted prevention efforts to mitigate the impact of H1N1pdm09.
Asunto(s)
Antígenos Virales , Subtipo H1N1 del Virus de la Influenza A , Gripe Humana , Subtipo H1N1 del Virus de la Influenza A/genética , Subtipo H1N1 del Virus de la Influenza A/inmunología , Humanos , Gripe Humana/epidemiología , Gripe Humana/prevención & control , Gripe Humana/virología , Gripe Humana/inmunología , Antígenos Virales/genética , Antígenos Virales/inmunología , Aprendizaje Automático , Evolución Molecular , Epítopos/genética , Epítopos/inmunología , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , COVID-19/inmunología , Pandemias/prevención & control , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Glicoproteínas Hemaglutininas del Virus de la Influenza/inmunología , SARS-CoV-2/genética , SARS-CoV-2/inmunologíaRESUMEN
With the outbreak of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), coronaviruses have begun to attract great attention across the world. Of the known human coronaviruses, however, Middle East respiratory syndrome coronavirus (MERS-CoV) is the most lethal. Coronavirus proteins can be divided into three groups: nonstructural proteins, structural proteins, and accessory proteins. While the number of each of these proteins varies greatly among different coronaviruses, accessory proteins are most closely related to the pathogenicity of the virus. We found for the first time that the ORF3 accessory protein of MERS-CoV, which closely resembles the ORF3a proteins of severe acute respiratory syndrome coronavirus and SARS-CoV-2, has the ability to induce apoptosis in cells in a dose-dependent manner. Through bioinformatics analysis and validation, we revealed that ORF3 is an unstable protein and has a shorter half-life in cells compared to that of severe acute respiratory syndrome coronavirus and SARS-CoV-2 ORF3a proteins. After screening, we identified a host E3 ligase, HUWE1, that specifically induces MERS-CoV ORF3 protein ubiquitination and degradation through the ubiquitin-proteasome system. This results in the diminished ability of ORF3 to induce apoptosis, which might partially explain the lower spread of MERS-CoV compared to other coronaviruses. In summary, this study reveals a pathological function of MERS-CoV ORF3 protein and identifies a potential host antiviral protein, HUWE1, with an ability to antagonize MERS-CoV pathogenesis by inducing ORF3 degradation, thus enriching our knowledge of the pathogenesis of MERS-CoV and suggesting new targets and strategies for clinical development of drugs for MERS-CoV treatment.
Asunto(s)
Apoptosis , Infecciones por Coronavirus/metabolismo , Coronavirus del Síndrome Respiratorio de Oriente Medio/metabolismo , Proteínas Supresoras de Tumor/metabolismo , Ubiquitina-Proteína Ligasas/metabolismo , Ubiquitinación , Proteínas no Estructurales Virales/metabolismo , Células A549 , Línea Celular , Biología Computacional , Infecciones por Coronavirus/fisiopatología , Infecciones por Coronavirus/virología , Células Epiteliales/fisiología , Células Epiteliales/virología , Células HEK293 , Interacciones Huésped-Patógeno , HumanosRESUMEN
Within the past 2 decades, three highly pathogenic human coronaviruses have emerged, namely, severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The health threats and economic burden posed by these tremendously severe coronaviruses have paved the way for research on their etiology, pathogenesis, and treatment. Compared to SARS-CoV and SARS-CoV-2, MERS-CoV genome encoded fewer accessory proteins, among which the ORF4b protein had anti-immunity ability in both the cytoplasm and nucleus. Our work for the first time revealed that ORF4b protein was unstable in the host cells and could be degraded by the ubiquitin proteasome system. After extensive screenings, it was found that UBR5 (ubiquitin protein ligase E3 component N-recognin 5), a member of the HECT E3 ubiquitin ligases, specifically regulated the ubiquitination and degradation of ORF4b. Similar to ORF4b, UBR5 can also translocate into the nucleus through its nuclear localization signal, enabling it to regulate ORF4b stability in both the cytoplasm and nucleus. Through further experiments, lysine 36 was identified as the ubiquitination site on the ORF4b protein, and this residue was highly conserved in various MERS-CoV strains isolated from different regions. When UBR5 was knocked down, the ability of ORF4b to suppress innate immunity was enhanced and MERS-CoV replication was stronger. As an anti-MERS-CoV host protein, UBR5 targets and degrades ORF4b protein through the ubiquitin proteasome system, thereby attenuating the anti-immunity ability of ORF4b and ultimately inhibiting MERS-CoV immune escape, which is a novel antagonistic mechanism of the host against MERS-CoV infection. IMPORTANCE ORF4b was an accessory protein unique to MERS-CoV and was not present in SARS-CoV and SARS-CoV-2 which can also cause severe respiratory disease. Moreover, ORF4b inhibited the production of antiviral cytokines in both the cytoplasm and the nucleus, which was likely to be associated with the high lethality of MERS-CoV. However, whether the host proteins regulate the function of ORF4b is unknown. Our study first determined that UBR5, a host E3 ligase, was a potential host anti-MERS-CoV protein that could reduce the protein level of ORF4b and diminish its anti-immunity ability by inducing ubiquitination and degradation. Based on the discovery of ORF4b-UBR5, a critical molecular target, further increasing the degradation of ORF4b caused by UBR5 could provide a new strategy for the clinical development of drugs for MERS-CoV.
Asunto(s)
Infecciones por Coronavirus , Interacciones Microbiota-Huesped , Coronavirus del Síndrome Respiratorio de Oriente Medio , Proteolisis , Ubiquitina-Proteína Ligasas , Ubiquitinación , Proteínas Virales , Infecciones por Coronavirus/inmunología , Infecciones por Coronavirus/prevención & control , Infecciones por Coronavirus/virología , Citocinas/inmunología , Humanos , Inmunidad Innata , Coronavirus del Síndrome Respiratorio de Oriente Medio/inmunología , Coronavirus del Síndrome Respiratorio de Oriente Medio/metabolismo , Terapia Molecular Dirigida , Complejo de la Endopetidasa Proteasomal/metabolismo , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo , SARS-CoV-2 , Ubiquitina-Proteína Ligasas/metabolismo , Ubiquitinas/metabolismo , Proteínas Virales/química , Proteínas Virales/metabolismo , Replicación ViralRESUMEN
The rapid spread and huge impact of the COVID-19 pandemic caused by the emerging SARS-CoV-2 have driven large efforts for sequencing and analyzing the viral genomes. Mutation analyses have revealed that the virus keeps mutating and shows a certain degree of genetic diversity, which could result in the alteration of its infectivity and pathogenicity. Therefore, appropriate delineation of SARS-CoV-2 genetic variants enables us to understand its evolution and transmission patterns. By focusing on the nucleotides that co-substituted, we first identified 42 co-mutation modules that consist of at least two co-substituted nucleotides during the SARS-CoV-2 evolution. Then based on these co-mutation modules, we classified the SARS-CoV-2 population into 43 groups and further identified the phylogenetic relationships among groups based on the number of inconsistent co-mutation modules, which were validated with phylogenetic trees. Intuitively, we tracked tempo-spatial patterns of the 43 groups, of which 11 groups were geographic-specific. Different epidemic periods showed specific co-circulating groups, where the dominant groups existed and had multiple sub-groups of parallel evolution. Our work enables us to capture the evolution and transmission patterns of SARS-CoV-2, which can contribute to guiding the prevention and control of the COVID-19 pandemic. An interactive website for grouping SARS-CoV-2 genomes and visualizing the spatio-temporal distribution of groups is available at https://www.jianglab.tech/cmm-grouping/.
Asunto(s)
COVID-19/genética , Evolución Molecular , Genoma Viral/genética , SARS-CoV-2/genética , COVID-19/virología , Variación Genética/genética , Humanos , Mutación/genética , Pandemias , Filogenia , SARS-CoV-2/patogenicidad , Secuenciación Completa del GenomaRESUMEN
It is of considerable interest to detect somatic mutations in paired tumor and normal sequencing data. A number of callers that are based on statistical or machine learning approaches have been developed to detect somatic small variants. However, they take into consideration only limited information about the reference and potential variant allele in both tumor and normal samples at a candidate somatic site. Also, they differ in how biological and technological noises are addressed. Hence, they are expected to produce divergent outputs. To overcome the drawbacks of existing somatic callers, we develop a deep learning-based tool called DeepSSV, which employs a convolutional neural network (CNN) model to learn increasingly abstract feature representations from the raw data in higher feature layers. DeepSSV creates a spatially oriented representation of read alignments around the candidate somatic sites adapted for the convolutional architecture, which enables it to expand to effectively gather scattered evidence. Moreover, DeepSSV incorporates the mapping information of both reference allele-supporting and variant allele-supporting reads in the tumor and normal samples at a genomic site that are readily available in the pileup format file. Together, the CNN model can process the whole alignment information. Such representational richness allows the model to capture the dependencies in the sequence and identify context-based sequencing artifacts. We fitted the model on ground truth somatic mutations and did benchmarking experiments on simulated and real tumors. The benchmarking results demonstrate that DeepSSV outperforms its state-of-the-art competitors in overall F1 score.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Neoplasias/genética , Redes Neurales de la Computación , Análisis de Secuencia de ADN , Programas Informáticos , Genómica , Humanos , Neoplasias/metabolismoRESUMEN
Circular RNAs (circRNAs) are covalently closed long noncoding RNAs critical in diverse cellular activities and multiple human diseases. Several cancer-related viral circRNAs have been identified in double-stranded DNA viruses (dsDNA), yet no systematic study about the viral circRNAs has been reported. Herein, we have performed a systematic survey of 11 924 circRNAs from 23 viral species by computational prediction of viral circRNAs from viral-infection-related RNA sequencing data. Besides the dsDNA viruses, our study has also revealed lots of circRNAs in single-stranded RNA viruses and retro-transcribing viruses, such as the Zika virus, the Influenza A virus, the Zaire ebolavirus, and the Human immunodeficiency virus 1. Most viral circRNAs had reverse complementary sequences or repeated sequences at the flanking sequences of the back-splice sites. Most viral circRNAs only expressed in a specific cell line or tissue in a specific species. Functional enrichment analysis indicated that the viral circRNAs from dsDNA viruses were involved in KEGG pathways associated with cancer. All viral circRNAs presented in the current study were stored and organized in VirusCircBase, which is freely available at http://www.computationalbiology.cn/ViruscircBase/home.html and is the first virus circRNA database. VirusCircBase forms the fundamental atlas for the further exploration and investigation of viral circRNAs in the context of public health.
Asunto(s)
Sistemas de Administración de Bases de Datos , ARN Circular/genética , ARN Viral/genética , Virus/genética , HumanosRESUMEN
African swine fever virus (ASFV) poses serious threats to the pig industry. The multigene family (MGF) proteins are extensively distributed in ASFVs and are generally classified into five families, including MGF-100, MGF-110, MGF-300, MGF-360 and MGF-505. Most MGF proteins, however, have not been well characterized and classified within each family. To bridge this gap, this study first classified MGF proteins into 31 groups based on protein sequence homology and network clustering. A web server for classifying MGF proteins was established and kept available for free at http://www.computationalbiology.cn/MGF/home.html. Results showed that MGF groups of the same family were most similar to each other and had conserved sequence motifs; the genetic diversity of MGF groups varied widely, mainly due to the occurrence of indels. In addition, the MGF proteins were predicted to have large structural and functional diversity, and MGF proteins of the same MGF family tended to have similar structure, location and function. Reconstruction of the ancestral states of MGF groups along the ASFV phylogeny showed that most MGF groups experienced either the copy number variations or the gain-or-loss changes, and most of these changes happened within strains of the same genotype. It is found that the copy number decrease and the loss of MGF groups were much larger than the copy number increase and the gain of MGF groups, respectively, suggesting the ASFV tended to lose MGF proteins in the evolution. Overall, the work provides a detailed classification for MGF proteins and would facilitate further research on MGF proteins.
Asunto(s)
Virus de la Fiebre Porcina Africana/genética , Variaciones en el Número de Copia de ADN , Evolución Molecular , Familia de Multigenes , Proteínas Virales/clasificación , Proteínas Virales/genética , Animales , PorcinosRESUMEN
The life-threatening coronaviruses MERS-CoV, SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) have caused and will continue to cause enormous morbidity and mortality to humans. Virus-encoded noncoding RNAs are poorly understood in coronaviruses. Data mining of viral-infection-related RNA-sequencing data has resulted in the identification of 28 754, 720 and 3437 circRNAs encoded by MERS-CoV, SARS-CoV-1 and SARS-CoV-2, respectively. MERS-CoV exhibits much more prominent ability to encode circRNAs in all genomic regions than those of SARS-CoV-1/2. Viral circRNAs typically exhibit low expression levels. Moreover, majority of the viral circRNAs exhibit expressions only in the late stage of viral infection. Analysis of the competitive interactions of viral circRNAs, human miRNAs and mRNAs in MERS-CoV infections reveals that viral circRNAs up-regulated genes related to mRNA splicing and processing in the early stage of viral infection, and regulated genes involved in diverse functions including cancer, metabolism, autophagy, viral infection in the late stage of viral infection. Similar analysis in SARS-CoV-2 infections reveals that its viral circRNAs down-regulated genes associated with metabolic processes of cholesterol, alcohol, fatty acid and up-regulated genes associated with cellular responses to oxidative stress in the late stage of viral infection. A few genes regulated by viral circRNAs from both MERS-CoV and SARS-CoV-2 were enriched in several biological processes such as response to reactive oxygen and centrosome localization. This study provides the first glimpse into viral circRNAs in three deadly coronaviruses and would serve as a valuable resource for further studies of circRNAs in coronaviruses.
Asunto(s)
Coronavirus del Síndrome Respiratorio de Oriente Medio/genética , ARN Circular/genética , SARS-CoV-2/genética , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , HumanosRESUMEN
Accessory proteins play important roles in the interaction between coronaviruses and their hosts. Accordingly, a comprehensive study of the compositional diversity and evolutionary patterns of accessory proteins is critical to understanding the host adaptation and epidemic variation of coronaviruses. Here, we developed a standardized genome annotation tool for coronavirus (CoroAnnoter) by combining open reading frame prediction, transcription regulatory sequence recognition and homologous alignment. Using CoroAnnoter, we annotated 39 representative coronavirus strains to form a compositional profile for all of the accessary proteins. Large variations were observed in the number of accessory proteins of 1-10 for different coronaviruses, with SARS-CoV-2 and SARS-CoV having the most (9 and 10, respectively). The variation between SARS-CoV and SARS-CoV-2 accessory proteins could be traced back to related coronaviruses in other hosts. The genomic distribution of accessory proteins had significant intra-genus conservation and inter-genus diversity and could be grouped into 1, 4, 2 and 1 types for alpha-, beta-, gamma-, and delta-coronaviruses, respectively. Evolutionary analysis suggested that accessory proteins are more conservative locating before the N-terminal of proteins E and M (E-M), while they are more diverse after these proteins. Furthermore, comparison of virus-host interaction networks of SARS-CoV-2 and SARS-CoV accessory proteins showed that they share multiple antiviral signaling pathways, those involved in the apoptotic process, viral life cycle and response to oxidative stress. In summary, our study provides a tool for coronavirus genome annotation and builds a comprehensive profile for coronavirus accessory proteins covering their composition, classification, evolutionary pattern and host interaction.
Asunto(s)
Evolución Biológica , COVID-19/virología , SARS-CoV-2/metabolismo , Proteínas Virales/genética , Proteínas Virales/metabolismo , Genes Virales , Humanos , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta , Mapas de Interacción de Proteínas , SARS-CoV-2/genéticaRESUMEN
The genus Culicoides includes biting midges, some of which are vectors for viruses that cause diseases in humans and animals. Knowledge of the roles of Culicoides in viral ecology is inadequate. We collected ~300 000 samples of Culicoides and mosquitoes in 15 representative regions within Yunnan, China. Using mosquitoes as reference vectors, we designed a comparative virome strategy to study the viral composition, diversity, hosts and spatiotemporal distribution of Culicoides. A map of viromes in Culicoides and mosquitoes in Yunan province, China, was constructed. At the same locations, Culicoides and mosquitoes usually share a similar viral diversity. At least 10 important pathogenic viruses were detected from Culicoides. Many novel viruses were discovered, including 21 segmented viruses of Flaviviridae, 180 viruses of Monjiviricetes and 130 viruses of Bunyavirales. The findings demonstrate that Culicoides is an important part of viral ecology and should be studied and monitored for potentially emerging viruses.
Asunto(s)
Ceratopogonidae/virología , Culicidae/virología , Virus ARN Monocatenarios Positivos/clasificación , Viroma , AnimalesRESUMEN
MOTIVATION: Viruses continue to threaten human health. Yet, the complete viral species carried by humans and their infection characteristics have not been fully revealed. RESULTS: This study curated an atlas of human viruses from public databases and literature, and built the Human Virus Database (HVD). The HVD contains 1131 virus species of 54 viral families which were more than twice the number of the human-infecting virus species reported in previous studies. These viruses were identified in human samples including 68 human tissues, the excreta and body fluid. The viral diversity in humans was age-dependent with a peak in the infant and a valley in the teenager. The tissue tropism of viruses was found to be associated with several factors including the viral group (DNA, RNA or reverse-transcribing viruses), enveloped or not, viral genome length and GC content, viral receptors and the virus-interacting proteins. Finally, the tissue tropism of DNA viruses was predicted using a random-forest algorithm with a middle performance. Overall, the study not only provides a valuable resource for further studies of human viruses but also deepens our understanding toward the diversity and tissue tropism of human viruses. AVAILABILITY AND IMPLEMENTATION: The HVD is available at http://computationalbiology.cn/humanVirusBase/#/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Tropismo Viral , Virus , Adolescente , Humanos , Genoma Viral , Proteínas Virales , Virus/genéticaRESUMEN
Genomic reassortment is an important genetic event in the generation of emerging influenza viruses, which can cause numerous serious flu endemics and epidemics within hosts or even across different hosts. However, there is no dedicated and comprehensive repository for reassortment events among influenza viruses. Here, we present FluReassort, a database for understanding the genomic reassortment events in influenza viruses. Through manual curation of thousands of literature references, the database compiles 204 reassortment events among 56 subtypes of influenza A viruses isolated in 37 different countries. FluReassort provides an interface for the visualization and evolutionary analysis of reassortment events, allowing users to view the events through the phylogenetic analysis with varying parameters. The reassortment networks in FluReassort graphically summarize the correlation and causality between different subtypes of the influenza virus and facilitate the description and interpretation of the reassortment preference among subtypes. We believe FluReassort is a convenient and powerful platform for understanding the evolution of emerging influenza viruses. FluReassort is freely available at https://www.jianglab.tech/FluReassort.
Asunto(s)
Bases de Datos Genéticas , Virus de la Influenza A , Orthomyxoviridae , Filogenia , Animales , Evolución Molecular , Genoma Viral , Genómica , Humanos , Virus de la Influenza A/genética , Orthomyxoviridae/genéticaRESUMEN
The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread globally to over 200 countries with more than 23 million confirmed cases and at least 800,000 fatalities as of 23 August 2020. Declared a pandemic on March 11 by World Health Organization, the disease caused by SARS-CoV-2 infection, called coronavirus disease 2019 (COVID-19), has become a global public health crisis that challenged all national healthcare systems. This review summarized the current knowledge about virologic and pathogenic characteristics of SARS-CoV-2 with emphasis on potential immunomodulatory mechanism and drug development. With multiple emerging technologies and cross-disciplinary approaches proving to be crucial in our global response against COVID-19, the application of PROteolysis TArgeting Chimeras strategy, CRISPR-Cas9 gene editing technology, and Single-Nucleotide-Specific Programmable Riboregulators technology in developing antiviral drugs and detecting infectious diseases are proposed here. We also discussed the available but still limited epidemiology of COVID-19 as well as the ongoing efforts on vaccine development. In brief, we conducted an in-depth analysis of the pathogenesis of SARS-CoV-2 and reviewed the therapeutic options for COVID-19. We also proposed key research directions in the future that may help uncover more underlying molecular mechanisms governing the pathology of COVID-19.
Asunto(s)
COVID-19 , SARS-CoV-2 , Antivirales/uso terapéutico , Humanos , Pandemias , Salud Pública , SARS-CoV-2/genéticaRESUMEN
BACKGROUND: Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario. OBJECTIVE: In this study, we aimed to develop a deep-phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. METHODS: The core of the methodology was to identify linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and perform deep phenotyping of Chinese EHRs by recognizing linguistic patterns in free text. Specifically, 1000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (Semantic Structured Unit of Phenotypes). The annotation data set was randomly divided into a training set (n=700, 70%) and a testing set (n=300, 30%). The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool-MEME (Multiple Expectation Maximums for Motif Elicitation)-was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning-based method for named entity recognition and a pattern recognition-based method for attribute prediction. RESULTS: In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers-bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern-based method. CONCLUSIONS: We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non-English-speaking countries.
Asunto(s)
Registros Electrónicos de Salud , Informática Médica , Algoritmos , Humanos , Fenotipo , SemánticaRESUMEN
BACKGROUND: Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. RESULTS: We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28-34%, genus level). PHP also outperformed these two alignment-free methods much (24-38% vs 18-20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. CONCLUSIONS: The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.
Asunto(s)
Virus de Archaea/fisiología , Bacteriófagos/fisiología , Interacciones Huésped-Patógeno , Metagenómica/métodos , Modelos Biológicos , Distribución Normal , Programas InformáticosRESUMEN
MOTIVATION: Receptors on host cells play a critical role in viral infection. How phages select receptors is still unknown. RESULTS: Here, we manually curated a high-quality database named phageReceptor, including 427 pairs of phage-host receptor interactions, 341 unique viral species or sub-species and 69 bacterial species. Sugars and proteins were most widely used by phages as receptors. The receptor usage of phages in Gram-positive bacteria was different from that in Gram-negative bacteria. Most protein receptors were located on the outer membrane. The phage protein receptors (PPRs) were highly diverse in their structures, and had little sequence identity and no common protein domain with mammalian virus receptors. Further functional characterization of PPRs in Escherichia coli showed that they had larger node degrees and betweennesses in the protein-protein interaction network, and higher expression levels, than other outer membrane proteins, plasma membrane proteins or other intracellular proteins. These findings were consistent with what observed for mammalian virus receptors reported in previous studies, suggesting that viral protein receptors tend to have multiple interaction partners and high expressions. The study deepens our understanding of virus-host interactions. AVAILABILITY AND IMPLEMENTATION: phageReceptor is publicly available from: http://www.computationalbiology.cn/phageReceptor/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Bacteriófagos , Receptores Virales , Animales , Bacteriófagos/genética , Escherichia coli , Proteínas de la Membrana , Proteínas ViralesRESUMEN
MOTIVATION: Newly emerging influenza viruses keep challenging global public health. To evaluate the potential risk of the viruses, it is critical to rapidly determine the phenotypes of the viruses, including the antigenicity, host, virulence and drug resistance. RESULTS: Here, we built FluPhenotype, a one-stop platform to rapidly determinate the phenotypes of the influenza A viruses. The input of FluPhenotype is the complete or partial genomic/protein sequences of the influenza A viruses. The output presents five types of information about the viruses: (i) sequence annotation including the gene and protein names as well as the open reading frames, (ii) potential hosts and human-adaptation-associated amino acid markers, (iii) antigenic and genetic relationships with the vaccine strains of different HA subtypes, (iv) mammalian virulence-related amino acid markers and (v) drug resistance-related amino acid markers. FluPhenotype will be a useful bioinformatic tool for surveillance and early warnings of the newly emerging influenza A viruses. AVAILABILITY AND IMPLEMENTATION: It is publicly available from: http://www.computationalbiology.cn : 18888/IVEW. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.