RESUMEN
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.
Asunto(s)
COVID-19/prevención & control , Biología Computacional , SARS-CoV-2/aislamiento & purificación , Investigación Biomédica , COVID-19/epidemiología , COVID-19/virología , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genéticaRESUMEN
Mass spectrometry (MS) is a key technology for the analysis of small molecules. For the identification and structural elucidation of novel molecules, new approaches beyond straightforward spectral comparison are required. In this review, we will cover computational methods that help with the identification of small molecules by analyzing fragmentation MS data. We focus on the four main approaches to mine a database of metabolite structures, that is rule-based fragmentation spectrum prediction, combinatorial fragmentation, competitive fragmentation modeling, and molecular fingerprint prediction. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:624-633, 2017.
RESUMEN
The determination of the molecular formula is one of the earliest and most important steps when investigating the chemical nature of an unknown compound. Common approaches use the isotopic pattern of a compound measured using mass spectrometry. Computational methods to determine the molecular formula from this isotopic pattern require a fixed set of elements. Considering all possible elements severely increases running times and more importantly the chance for false positive identifications as the number of candidate formulas for a given target mass rises significantly if the constituting elements are not prefiltered. This negative effect grows stronger for compounds of higher molecular mass as the effect of a single atom on the overall isotopic pattern grows smaller. On the other hand, hand-selected restrictions on this set of elements may prevent the identification of the correct molecular formula. Thus, it is a crucial step to determine the set of elements most likely comprising the compound prior to the assignment of an elemental formula to an exact mass. In this paper, we present a method to determine the presence of certain elements (sulfur, chlorine, bromine, boron, and selenium) in the compound from its (high mass accuracy) isotopic pattern. We limit ourselves to biomolecules, in the sense of products from nature or synthetic products with potential bioactivity. The classifiers developed here predict the presence of an element with a very high sensitivity and high specificity. We evaluate classifiers on three real-world data sets with 663 isotope patterns in total: 184 isotope patterns containing sulfur, 187 containing chlorine, 14 containing bromine, one containing boron, one containing selenium. In no case do we make a false negative prediction; for chlorine, bromine, boron, and selenium, we make ten false positive predictions in total. We also demonstrate the impact of our method on the identification of molecular formulas, in particular on the number of considered candidates and running time. The element prediction will be part of the next SIRIUS release, available from https://bio.informatik.uni-jena.de/software/sirius/ .
Asunto(s)
Fenómenos Químicos , Elementos Químicos , Isótopos/química , Aprendizaje Automático , Algoritmos , Conjuntos de Datos como Asunto , Espectrometría de Masas , Peso MolecularRESUMEN
Covering: 2008 to 2014 Mass spectrometry is a key technology for the identification and structural elucidation of natural products. Manual interpretation of the resulting data is tedious and time-consuming, so methods for automated analysis are highly sought after. In this review, we focus on four recently developed methods for the detection and investigation of small molecules, namely MetFrag/MetFusion, ISIS, FingerID, and FT-BLAST. These methods have the potential to significantly advance the field of computational mass spectrometry for the research of natural products. For example, they may help with the dereplication of compounds at an early stage of the drug discovery process; that is, the detection of molecules that are identical or highly similar to known drugs or drug leads. Furthermore, when a potential drug lead has been determined, these tools may help to identify it and elucidate its structure.
Asunto(s)
Productos Biológicos/química , Descubrimiento de Drogas , Espectrometría de Masas/métodos , Productos Biológicos/análisis , Humanos , Estructura Molecular , Bibliotecas de Moléculas PequeñasRESUMEN
MOTIVATION: Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of 'unknown' small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns. RESULTS: Aligning fragmentation trees is computationally hard. Nevertheless, we present three exact algorithms for the problem: a dynamic programming (DP) algorithm, a sparse variant of the DP, and an Integer Linear Program (ILP). Evaluation of our methods on three different datasets showed that thousands of alignments can be computed in a matter of minutes using DP, even for 'challenging' instances. Running times of the sparse DP were an order of magnitude better than for the classical DP. The ILP was clearly outperformed by both DP approaches. We also found that for both DP algorithms, computing the 1% slowest alignments required as much time as computing the 99% fastest.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Espectrometría de Masas , Metabolómica/métodos , Bases de Datos FactualesRESUMEN
The 2023 International Virus Bioinformatics Meeting was held in Valencia, Spain, from 24-26 May 2023, attracting approximately 180 participants worldwide. The primary objective of the conference was to establish a dynamic scientific environment conducive to discussion, collaboration, and the generation of novel research ideas. As the first in-person event following the SARS-CoV-2 pandemic, the meeting facilitated highly interactive exchanges among attendees. It served as a pivotal gathering for gaining insights into the current status of virus bioinformatics research and engaging with leading researchers and emerging scientists. The event comprised eight invited talks, 19 contributed talks, and 74 poster presentations across eleven sessions spanning three days. Topics covered included machine learning, bacteriophages, virus discovery, virus classification, virus visualization, viral infection, viromics, molecular epidemiology, phylodynamic analysis, RNA viruses, viral sequence analysis, viral surveillance, and metagenomics. This report provides rewritten abstracts of the presentations, a summary of the key research findings, and highlights shared during the meeting.
Asunto(s)
Bacteriófagos , Virus ARN , Virosis , Virus , Humanos , Biología Computacional , Virus/genéticaRESUMEN
Herbivory leads to changes in the allocation of nitrogen among different pools and tissues; however, a detailed quantitative analysis of these changes has been lacking. Here, we demonstrate that a mass spectrometric data-independent acquisition approach known as LC-MS(E), combined with a novel algorithm to quantify heavy atom enrichment in peptides, is able to quantify elicited changes in protein amounts and (15)N flux in a high throughput manner. The reliable identification/quantitation of rabbit phosphorylase b protein spiked into leaf protein extract was achieved. The linear dynamic range, reproducibility of technical and biological replicates, and differences between measured and expected (15)N-incorporation into the small (SSU) and large (LSU) subunits of ribulose-1,5-bisphosphate-carboxylase/oxygenase (RuBisCO) and RuBisCO activase 2 (RCA2) of Nicotiana attenuata plants grown in hydroponic culture at different known concentrations of (15)N-labeled nitrate were used to further evaluate the procedure. The utility of the method for whole-plant studies in ecologically realistic contexts was demonstrated by using (15)N-pulse protocols on plants growing in soil under unknown (15)N-incorporation levels. Additionally, we quantified the amounts of lipoxygenase 2 (LOX2) protein, an enzyme important in antiherbivore defense responses, demonstrating that the approach allows for in-depth quantitative proteomics and (15)N flux analyses of the metabolic dynamics elicited during plant-herbivore interactions.
Asunto(s)
Nicotiana/metabolismo , Nitrógeno/metabolismo , Hojas de la Planta/metabolismo , Ribulosa-Bifosfato Carboxilasa/metabolismo , Algoritmos , Secuencia de Aminoácidos , Animales , Teorema de Bayes , Cromatografía Liquida/normas , Herbivoria , Funciones de Verosimilitud , Lipooxigenasa/química , Lipooxigenasa/aislamiento & purificación , Lipooxigenasa/metabolismo , Datos de Secuencia Molecular , Isótopos de Nitrógeno/metabolismo , Fragmentos de Péptidos/química , Mapeo Peptídico/normas , Fosforilasa b/química , Extractos Vegetales/química , Extractos Vegetales/aislamiento & purificación , Hojas de la Planta/química , Proteínas de Plantas/química , Proteínas de Plantas/aislamiento & purificación , Proteínas de Plantas/metabolismo , Conejos , Estándares de Referencia , Ribulosa-Bifosfato Carboxilasa/química , Ribulosa-Bifosfato Carboxilasa/aislamiento & purificación , Espectrometría de Masa por Ionización de Electrospray/normas , Espectrometría de Masas en Tándem/normas , Nicotiana/químicaRESUMEN
Mass spectrometry allows sensitive, automated, and high-throughput analysis of small molecules. In principle, tandem mass spectrometry allows us to identify "unknown" small molecules not in any database, but the automated interpretation of such data is in its infancy. Fragmentation trees have recently been introduced for the automated analysis of the fragmentation patterns of small molecules. We present a method for the automated comparison of such fragmentation patterns, based on aligning the compounds' fragmentation trees. We cluster compounds based solely on their fragmentation patterns and show a good agreement with known compound classes. Fragmentation pattern similarities are strongly correlated with the chemical similarity of molecules. We present a tool for searching a database for compounds with fragmentation pattern similar to an unknown sample compound. We apply this tool to metabolites from Icelandic poppy. Our method allows fully automated computational identification of small molecules that cannot be found in any database.
Asunto(s)
Espectrometría de Masas/métodos , Estadística como Asunto/métodos , Análisis por Conglomerados , Bases de Datos Factuales , Papaver/químicaRESUMEN
Non-coding RNAs (ncRNAs) play a central and regulatory role in almost all cells, organs, and species, which has been broadly recognized since the human ENCODE project and several other genome projects. Nevertheless, a small fraction of ncRNAs have been identified, and in the placenta they have been investigated very marginally. To date, most examples of ncRNAs which have been identified to be specific for fetal tissues, including placenta, are members of the group of microRNAs (miRNAs). Due to their quantity, it can be expected that the fairly larger group of other ncRNAs exerts far stronger effects than miRNAs. The syncytiotrophoblast of fetal origin forms the interface between fetus and mother, and releases permanently extracellular vesicles (EVs) into the maternal circulation which contain fetal proteins and RNA, including ncRNA, for communication with neighboring and distant maternal cells. Disorders of ncRNA in placental tissue, especially in trophoblast cells, and in EVs seem to be involved in pregnancy disorders, potentially as a cause or consequence. This review summarizes the current knowledge on placental ncRNA, their transport in EVs, and their involvement and pregnancy pathologies, as well as their potential for novel diagnostic tools.
Asunto(s)
Vesículas Extracelulares , MicroARNs , Vesículas Extracelulares/metabolismo , Femenino , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Placenta/metabolismo , Embarazo , ARN no Traducido/genética , ARN no Traducido/metabolismo , Trofoblastos/metabolismoRESUMEN
The International Virus Bioinformatics Meeting 2022 took place online, on 23-25 March 2022, and has attracted about 380 participants from all over the world. The goal of the meeting was to provide a meaningful and interactive scientific environment to promote discussion and collaboration and to inspire and suggest new research directions and questions. The participants created a highly interactive scientific environment even without physical face-to-face interactions. This meeting is a focal point to gain an insight into the state-of-the-art of the virus bioinformatics research landscape and to interact with researchers in the forefront as well as aspiring young scientists. The meeting featured eight invited and 18 contributed talks in eight sessions on three days, as well as 52 posters, which were presented during three virtual poster sessions. The main topics were: SARS-CoV-2, viral emergence and surveillance, virus-host interactions, viral sequence analysis, virus identification and annotation, phages, and viral diversity. This report summarizes the main research findings and highlights presented at the meeting.
Asunto(s)
COVID-19 , Virus no Clasificados , Virus , Biología Computacional , Virus ADN , Humanos , SARS-CoV-2RESUMEN
Viruses are the cause of a considerable burden to human, animal and plant health, while on the other hand playing an important role in regulating entire ecosystems. The power of new sequencing technologies combined with new tools for processing "Big Data" offers unprecedented opportunities to answer fundamental questions in virology. Virologists have an urgent need for virus-specific bioinformatics tools. These developments have led to the formation of the European Virus Bioinformatics Center, a network of experts in virology and bioinformatics who are joining forces to enable extensive exchange and collaboration between these research areas. The EVBC strives to provide talented researchers with a supportive environment free of gender bias, but the gender gap in science, especially in math-intensive fields such as computer science, persists. To bring more talented women into research and keep them there, we need to highlight role models to spark their interest, and we need to ensure that female scientists are not kept at lower levels but are given the opportunity to lead the field. Here we showcase the work of the EVBC and highlight the achievements of some outstanding women experts in virology and viral bioinformatics.
Asunto(s)
Biología Computacional , Investigadores , Virus , Europa (Continente) , Femenino , Humanos , Investigadores/estadística & datos numéricos , Virus/genéticaRESUMEN
BACKGROUND: The center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance at most d to each input string. This problem is NP complete. RESULTS: In this paper, we focus on exact methods for the problem that are also swift in application. We first introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. We describe how to use this information to speed up two previously published search tree algorithms. Then, we describe a novel iterative search strategy that is efficient in practice, where some of our reduction techniques can also be applied. Finally, we present results of an evaluation study for two different data sets from a biological application. CONCLUSIONS: We find that the running time for computing the optimal center string is dominated by the subroutine calls for d = dopt -1 and d = dopt. Our data reduction is very effective for both, either rejecting unsolvable instances or solving trivial positions. We find that this speeds up computations considerably.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Genómica/métodos , Bacterias/genética , Análisis por Conglomerados , Genoma Bacteriano , Modelos GenéticosRESUMEN
The International Virus Bioinformatics Meeting 2020 was originally planned to take place in Bern, Switzerland, in March 2020. However, the COVID-19 pandemic put a spoke in the wheel of almost all conferences to be held in 2020. After moving the conference to 8-9 October 2020, we got hit by the second wave and finally decided at short notice to go fully online. On the other hand, the pandemic has made us even more aware of the importance of accelerating research in viral bioinformatics. Advances in bioinformatics have led to improved approaches to investigate viral infections and outbreaks. The International Virus Bioinformatics Meeting 2020 has attracted approximately 120 experts in virology and bioinformatics from all over the world to join the two-day virtual meeting. Despite concerns being raised that virtual meetings lack possibilities for face-to-face discussion, the participants from this small community created a highly interactive scientific environment, engaging in lively and inspiring discussions and suggesting new research directions and questions. The meeting featured five invited and twelve contributed talks, on the four main topics: (1) proteome and RNAome of RNA viruses, (2) viral metagenomics and ecology, (3) virus evolution and classification and (4) viral infections and immunology. Further, the meeting featured 20 oral poster presentations, all of which focused on specific areas of virus bioinformatics. This report summarizes the main research findings and highlights presented at the meeting.
Asunto(s)
Biología Computacional , Virus ARN/genética , Virología , COVID-19 , Congresos como Asunto , Evolución Molecular , Genoma Viral , Humanos , Metagenómica , Virus ARN/patogenicidadRESUMEN
The Third Annual Meeting of the European Virus Bioinformatics Center (EVBC) took place in Glasgow, United Kingdom, 28-29 March 2019. Virus bioinformatics has become central to virology research, and advances in bioinformatics have led to improved approaches to investigate viral infections and outbreaks, being successfully used to detect, control, and treat infections of humans and animals. This active field of research has attracted approximately 110 experts in virology and bioinformatics/computational biology from Europe and other parts of the world to attend the two-day meeting in Glasgow to increase scientific exchange between laboratory- and computer-based researchers. The meeting was held at the McIntyre Building of the University of Glasgow; a perfect location, as it was originally built to be a place for "rubbing your brains with those of other people", as Rector Stanley Baldwin described it. The goal of the meeting was to provide a meaningful and interactive scientific environment to promote discussion and collaboration and to inspire and suggest new research directions and questions. The meeting featured eight invited and twelve contributed talks, on the four main topics: (1) systems virology, (2) virus-host interactions and the virome, (3) virus classification and evolution and (4) epidemiology, surveillance and evolution. Further, the meeting featured 34 oral poster presentations, all of which focused on specific areas of virus bioinformatics. This report summarizes the main research findings and highlights presented at the meeting.
Asunto(s)
Biología Computacional , Virosis/virología , Virus/química , Virus/genética , Animales , Bacteriófagos/clasificación , Bacteriófagos/genética , Bacteriófagos/aislamiento & purificación , Humanos , Filogenia , Virosis/veterinaria , Virus/aislamiento & purificación , Virus/metabolismoRESUMEN
Despite the recognized excellence of virology and bioinformatics, these two communities have interacted surprisingly sporadically, aside from some pioneering work on HIV-1 and influenza. Bringing together the expertise of bioinformaticians and virologists is crucial, since very specific but fundamental computational approaches are required for virus research, particularly in an era of big data. Collaboration between virologists and bioinformaticians is necessary to improve existing analytical tools, cloud-based systems, computational resources, data sharing approaches, new diagnostic tools, and bioinformatic training. Here, we highlight current progress and discuss potential avenues for future developments in this promising era of virus bioinformatics. We end by presenting an overview of current technologies, and by outlining some of the major challenges and advantages that bioinformatics will bring to the field of virology.
Asunto(s)
Biología Computacional/métodos , Virología/métodos , Virus/crecimiento & desarrollo , Virus/genética , Biología Computacional/tendencias , Virología/tendenciasRESUMEN
Mycoses induced by C.albicans or A.fumigatus can cause important host damage either by deficient or exaggerated immune response. Regulation of chemokine and cytokine signaling plays a crucial role for an adequate inflammation, which can be modulated by vitamins A and D. Non-coding RNAs (ncRNAs) as transcription factors or cis-acting antisense RNAs are known to be involved in gene regulation. However, the processes during fungal infections and treatment with vitamins in terms of therapeutic impact are unknown. We show that in monocytes both vitamins regulate ncRNAs involved in amino acid metabolism and immune system processes using comprehensive RNA-Seq analyses. Compared to protein-coding genes, fungi and bacteria induced an expression change in relatively few ncRNAs, but with massive fold changes of up to 4000. We defined the landscape of long-ncRNAs (lncRNAs) in response to pathogens and observed variation in the isoforms composition for several lncRNA following infection and vitamin treatment. Most of the involved antisense RNAs are regulated and positively correlated with their sense protein-coding genes. We investigated lncRNAs with stimulus specific immunomodulatory activity as potential marker genes: LINC00595, SBF2-AS1 (A.fumigatus) and RP11-588G21.2, RP11-394l13.1 (C.albicans) might be detectable in the early phase of infection and serve as therapeutic targets in the future.
Asunto(s)
Infecciones Bacterianas/genética , Regulación de la Expresión Génica/efectos de los fármacos , Monocitos/metabolismo , Micosis/genética , ARN Largo no Codificante/genética , Vitamina A/farmacología , Vitamina D/farmacología , Infecciones Bacterianas/microbiología , Humanos , Micosis/microbiología , ARN sin Sentido/genética , ARN Largo no Codificante/química , ARN Mensajero/genética , ARN no Traducido/genética , Vitamina A/metabolismo , Vitamina D/metabolismoRESUMEN
The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate (FDR) for 70 public metabolomics data sets. We show that the spectral matching settings need to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from -92 up to +5705%) when compared with a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to assess the scoring criteria for large scale analysis of mass spectrometry based metabolomics data that has been essential in the advancement of proteomics, transcriptomics, and genomics science.
Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem/métodos , Algoritmos , Cromatografía Liquida , Biología Computacional/métodos , Bases de Datos de ProteínasRESUMEN
The unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms. The molecular basis of this striking difference in the response to filovirus infections is not well understood. We report a systematic overview of differentially expressed genes, activity motifs and pathways in human and bat cells infected with the Ebola and Marburg viruses, and we demonstrate that the replication of filoviruses is more rapid in human cells than in bat cells. We also found that the most strongly regulated genes upon filovirus infection are chemokine ligands and transcription factors. We observed a strong induction of the JAK/STAT pathway, of several genes encoding inhibitors of MAP kinases (DUSP genes) and of PPP1R15A, which is involved in ER stress-induced cell death. We used comparative transcriptomics to provide a data resource that can be used to identify cellular responses that might allow bats to survive filovirus infections.
Asunto(s)
Ebolavirus/metabolismo , Regulación de la Expresión Génica , Fiebre Hemorrágica Ebola/metabolismo , Enfermedad del Virus de Marburg/metabolismo , Marburgvirus/metabolismo , Transducción de Señal , Transcripción Genética , Animales , Línea Celular Tumoral , Quirópteros , HumanosRESUMEN
We present the results of a fully automated de novo approach for identification of molecular formulas in the CASMI 2013 contest. Only results for Category 1 (molecular formula identification) were submitted. Our approach combines isotope pattern analysis and fragmentation pattern analysis and is completely independent from any (spectral and structural) database. We correctly identified the molecular formula for ten out of twelve challenges, being the best automated method competing in this category.