RESUMO
SUMMARY: The assessment of novel phylogenetic models and inference methods is routinely being conducted via experiments on simulated as well as empirical data. When generating synthetic data it is often unclear how to set simulation parameters for the models and generate trees that appropriately reflect empirical model parameter distributions and tree shapes. As a solution, we present and make available a new database called 'RAxML Grove' currently comprising more than 60 000 inferred trees and respective model parameter estimates from fully anonymized empirical datasets that were analyzed using RAxML and RAxML-NG on two web servers. We also describe and make available two simple applications of RAxML Grove to exemplify its usage and highlight its utility for designing realistic simulation studies and analyzing empirical model parameter and tree shape distributions. AVAILABILITY AND IMPLEMENTATION: RAxML Grove is freely available at https://github.com/angtft/RAxMLGrove. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Computadores , Software , Filogenia , Simulação por Computador , Bases de Dados FactuaisRESUMO
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss) creates, maintains and disseminates a portfolio of reliable and state-of-the-art bioinformatics services and resources for the storage, analysis and interpretation of biological data. Through Expasy (https://www.expasy.org), the Swiss Bioinformatics Resource Portal, the scientific community worldwide, freely accesses more than 160 SIB resources supporting a wide range of life science and biomedical research areas. In 2020, Expasy was redesigned through a user-centric approach, known as User-Centred Design (UCD), whose aim is to create user interfaces that are easy-to-use, efficient and targeting the intended community. This approach, widely used in other fields such as marketing, e-commerce, and design of mobile applications, is still scarcely explored in bioinformatics. In total, around 50 people were actively involved, including internal stakeholders and end-users. In addition to an optimised interface that meets users' needs and expectations, the new version of Expasy provides an up-to-date and accurate description of high-quality resources based on a standardised ontology, allowing to connect functionally-related resources.
Assuntos
Biologia Computacional , Bases de Dados Factuais , Software , Interface Usuário-ComputadorRESUMO
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Assuntos
Biologia Computacional , Sistema de Registros , Curadoria de Dados , SoftwareRESUMO
BACKGROUND: The purpose of gene set enrichment analysis (GSEA) is to find general trends in the huge lists of genes or proteins generated by many functional genomics techniques and bioinformatics analyses. RESULTS: Here we present SetRank, an advanced GSEA algorithm which is able to eliminate many false positive hits. The key principle of the algorithm is that it discards gene sets that have initially been flagged as significant, if their significance is only due to the overlap with another gene set. The algorithm is explained in detail and its performance is compared to that of other methods using objective benchmarking criteria. Furthermore, we explore how sample source bias can affect the results of a GSEA analysis. CONCLUSIONS: The benchmarking results show that SetRank is a highly specific tool for GSEA. Furthermore, we show that the reliability of results can be improved by taking sample source bias into account. SetRank is available as an R package and through an online web interface.
Assuntos
Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Encéfalo/metabolismo , Genoma Humano , Genômica , Humanos , Modelos Teóricos , Neoplasias/genética , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
The activation, or maturation, of dendritic cells (DCs) is crucial for the initiation of adaptive T-cell mediated immune responses. Research on the molecular mechanisms implicated in DC maturation has focused primarily on inducible gene-expression events promoting the acquisition of new functions, such as cytokine production and enhanced T-cell-stimulatory capacity. In contrast, mechanisms that modulate DC function by inducing widespread gene-silencing remain poorly understood. Yet the termination of key functions is known to be critical for the function of activated DCs. Genome-wide analysis of activation-induced histone deacetylation, combined with genome-wide quantification of activation-induced silencing of nascent transcription, led us to identify a novel inducible transcriptional-repression pathway that makes major contributions to the DC-maturation process. This silencing response is a rapid primary event distinct from repression mechanisms known to operate at later stages of DC maturation. The repressed genes function in pivotal processes--including antigen-presentation, extracellular signal detection, intracellular signal transduction and lipid-mediator biosynthesis--underscoring the central contribution of the silencing mechanism to rapid reshaping of DC function. Interestingly, promoters of the repressed genes exhibit a surprisingly high frequency of PU.1-occupied sites, suggesting a novel role for this lineage-specific transcription factor in marking genes poised for inducible repression.
Assuntos
Células Dendríticas/metabolismo , Inativação Gênica , Proteínas Nucleares/genética , Transativadores/genética , Transcrição Gênica , Animais , Humanos , Camundongos , Proteínas Proto-Oncogênicas/metabolismo , Transativadores/metabolismoRESUMO
BACKGROUND: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. RESULTS: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. CONCLUSIONS: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.
Assuntos
Aminoácidos/metabolismo , Biologia Computacional/métodos , Evolução Molecular , Internet , Filogenia , Análise de Sequência de DNA/métodos , Software , Algoritmos , Teorema de Bayes , HumanosRESUMO
SUMMARY: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. AVAILABILITY: http://iann.pro/iannviewer CONTACT: manuel.corpas@tgac.ac.uk.
Assuntos
Disciplinas das Ciências Biológicas , Software , Aniversários e Eventos Especiais , Congressos como Assunto , InternetRESUMO
ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.
Assuntos
Biologia Computacional , Proteômica , Software , Gráficos por Computador , Genômica , Internet , Integração de Sistemas , Interface Usuário-ComputadorRESUMO
Obesity is considered by many as a lifestyle choice rather than a chronic progressive disease. The Innovative Medicines Initiative (IMI) SOPHIA (Stratification of Obesity Phenotypes to Optimize Future Obesity Therapy) project is part of a momentum shift aiming to provide better tools for the stratification of people with obesity according to disease risk and treatment response. One of the challenges to achieving these goals is that many clinical cohorts are siloed, limiting the potential of combined data for biomarker discovery. In SOPHIA, we have addressed this challenge by setting up a federated database building on open-source DataSHIELD technology. The database currently federates 16 cohorts that are accessible via a central gateway. The database is multi-modal, including research studies, clinical trials, and routine health data, and is accessed using the R statistical programming environment where statistical and machine learning analyses can be performed at a distance without any disclosure of patient-level data. We demonstrate the use of the database by providing a proof-of-concept analysis, performing a federated linear model of BMI and systolic blood pressure, pooling all data from 16 studies virtually without any analyst seeing individual patient-level data. This analysis provided similar point estimates compared to a meta-analysis of the 16 individual studies. Our approach provides a benchmark for reproducible, safe federated analyses across multiple study types provided by multiple stakeholders.
RESUMO
The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with several major public-private partnership projects, demonstrating and delivering improvements across all aspects of FAIR and across a variety of datasets and their contexts. We therefore managed to establish the reproducibility and far-reaching applicability of our approach to FAIRification tasks.
Assuntos
COVID-19 , Conjuntos de Dados como Assunto , Humanos , Pandemias , Parcerias Público-Privadas , Reprodutibilidade dos TestesRESUMO
Achieving a good outcome for a person with Psoriatic Arthritis (PsA) is made difficult by late diagnosis, heterogenous clinical disease expression and in many cases, failure to adequately suppress inflammatory disease features. Single-centre studies have certainly contributed to our understanding of disease pathogenesis, but to adequately address the major areas of unmet need, multi-partner, collaborative research programmes are now required. HIPPOCRATES is a 5-year, Innovative Medicines Initiative (IMI) programme which includes 17 European academic centres experienced in PsA research, 5 pharmaceutical industry partners, 3 small-/medium-sized industry partners and 2 patient-representative organizations. In this review, the ambitious programme of work to be undertaken by HIPPOCRATES is outlined and common approaches and challenges are identified. It is expected that, when completed, the results will ultimately allow for changes in the approaches to diagnosing, managing and treating PsA allowing for better short-term and long-term outcomes.
Improving outcomes in Psoriatic Arthritis Psoriatic Arthritis (PsA) is a form of arthritis which is found in approximately 30% of people who have the skin condition, Psoriasis. Frequently debilitating and progressive, achieving a good outcome for a person with PsA is made difficult by late diagnosis, disease clinical features and in many cases, failure to adequately control features of inflammation. Research studies from individual centres have certainly contributed to our understanding of why people develop PsA but to adequately address the major areas of unmet need, multi-centre, collaborative research programmes are now required. HIPPOCRATES is a 5-year, Innovative Medicines Initiative (IMI) programme which includes 17 European academic centres experienced in PsA research, 5 pharmaceutical industry partners, 3 small-/medium-sized industry partners and 2 patient representative organisations (see appendix). In this review, the ambitious programme of work to be undertaken by HIPPOCRATES is outlined and common approaches and challenges are identified. The participation of patient research partners in all stages of the work of HIPPOCRATES is highlighted. It is expected that, when completed, the results will ultimately allow for changes in the approaches to diagnosing, managing and treating PsA allowing for improvements in short-term and long-term outcomes.
RESUMO
The notion that data should be Findable, Accessible, Interoperable and Reusable, according to the FAIR Principles, has become a global norm for good data stewardship and a prerequisite for reproducibility. Nowadays, FAIR guides data policy actions and professional practices in the public and private sectors. Despite such global endorsements, however, the FAIR Principles are aspirational, remaining elusive at best, and intimidating at worst. To address the lack of practical guidance, and help with capability gaps, we developed the FAIR Cookbook, an open, online resource of hands-on recipes for "FAIR doers" in the Life Sciences. Created by researchers and data managers professionals in academia, (bio)pharmaceutical companies and information service industries, the FAIR Cookbook covers the key steps in a FAIRification journey, the levels and indicators of FAIRness, the maturity model, the technologies, the tools and the standards available, as well as the skills required, and the challenges to achieve and improve data FAIRness. Part of the ELIXIR ecosystem, and recommended by funders, the FAIR Cookbook is open to contributions of new recipes.
RESUMO
Despite the intuitive value of adopting the Findable, Accessible, Interoperable, and Reusable (FAIR) principles in both academic and industrial sectors, challenges exist in resourcing, balancing long- versus short-term priorities, and achieving technical implementation. This situation is exacerbated by the unclear mechanisms by which costs and benefits can be assessed when decisions on FAIR are made. Scientific and research and development (R&D) leadership need reliable evidence of the potential benefits and information on effective implementation mechanisms and remediating strategies. In this article, we describe procedures for cost-benefit evaluation, and identify best-practice approaches to support the decision-making process involved in FAIR implementation.
Assuntos
Descoberta de Drogas , Análise Custo-BenefícioRESUMO
The definitive diagnosis and early treatment of many immune-mediated inflammatory diseases (IMIDs) is hindered by variable and overlapping clinical manifestations. Psoriatic arthritis (PsA), which develops in ~30% of people with psoriasis, is a key example. This mixed-pattern IMID is apparent in entheseal and synovial musculoskeletal structures, but a definitive diagnosis often can only be made by clinical experts or when an extensive progressive disease state is apparent. As with other IMIDs, the detection of multimodal molecular biomarkers offers some hope for the early diagnosis of PsA and the initiation of effective management and treatment strategies. However, specific biomarkers are not yet available for PsA. The assessment of new markers by genomic and epigenomic profiling, or the analysis of blood and synovial fluid/tissue samples using proteomics, metabolomics and lipidomics, provides hope that complex molecular biomarker profiles could be developed to diagnose PsA. Importantly, the integration of these markers with high-throughput histology, imaging and standardized clinical assessment data provides an important opportunity to develop molecular profiles that could improve the diagnosis of PsA, predict its occurrence in cohorts of individuals with psoriasis, differentiate PsA from other IMIDs, and improve therapeutic responses. In this review, we consider the technologies that are currently deployed in the EU IMI2 project HIPPOCRATES to define biomarker profiles specific for PsA and discuss the advantages of combining multi-omics data to improve the outcome of PsA patients.
RESUMO
Fatty acid ß-oxidation (FAO), the breakdown of lipids, is a metabolic pathway used by various stem cells. FAO levels are generally high during quiescence and downregulated with proliferation. The endogenous metabolite malonyl-CoA modulates lipid metabolism as a reversible FAO inhibitor and as a substrate for de novo lipogenesis. Here we assessed whether malonyl-CoA can be exploited to steer the behavior of hematopoietic stem/progenitor cells (HSPCs), quiescent stem cells of clinical relevance. Treatment of mouse HSPCs in vitro with malonyl-CoA increases HSPC numbers compared with nontreated controls and ameliorates blood reconstitution capacity when transplanted in vivo, mainly through enhanced lymphoid reconstitution. Similarly, human HSPC numbers also increase upon malonyl-CoA treatment in vitro. These data corroborate that lipid metabolism can be targeted to direct cell fate and stem cell proliferation. Physiological modulation of metabolic pathways, rather than genetic or pharmacological inhibition, provides unique perspectives for stem cell manipulations in health and disease.
Assuntos
Células-Tronco Hematopoéticas/citologia , Células-Tronco Hematopoéticas/metabolismo , Metabolismo dos Lipídeos , Linfócitos/citologia , Metaboloma , Animais , Diferenciação Celular/genética , Linhagem da Célula/genética , Proliferação de Células/genética , Células Cultivadas , Ácidos Graxos/metabolismo , Regulação da Expressão Gênica , Metabolismo dos Lipídeos/genética , Linfócitos/metabolismo , Malonil Coenzima A/metabolismo , Metaboloma/genética , Camundongos Endogâmicos C57BL , OxirreduçãoRESUMO
The MyHits web site (http://myhits.isb-sib.ch) is an integrated service dedicated to the analysis of protein sequences. Since its first description in 2004, both the user interface and the back end of the server were improved. A number of tools (e.g. MAFFT, Jacop, Dotlet, Jalview, ESTScan) were added or updated to improve the usability of the service. The MySQL schema and its associated API were revamped and the database engine (HitKeeper) was separated from the web interface. This paper summarizes the current status of the server, with an emphasis on the new services.
Assuntos
Biologia Computacional/métodos , Estrutura Terciária de Proteína , Análise de Sequência de Proteína , Software , Gráficos por Computador , Bases de Dados de Proteínas , Internet , Linguagens de Programação , Alinhamento de Sequência , Integração de Sistemas , Interface Usuário-ComputadorRESUMO
Peroxidases (EC 1.11.1.x), which are encoded by small or large multigenic families, are involved in several important physiological and developmental processes. Analyzing their evolution and their distribution among various phyla could certainly help to elucidate the mystery of their extremely widespread and diversified presence in almost all living organisms. PeroxiBase was originally created for the exhaustive collection of class III peroxidase sequences from plants (Bakalovic, N., Passardi, F., et al., 2006. PeroxiBase: a class III plant peroxidase database. Phytochemistry 67, 534-539). The extension of the class III peroxidase database to all proteins capable to reduce peroxide molecules appears as a necessity. Our database contains haem and non-haem peroxidase sequences originated from annotated or not correctly annotated sequences deposited in the main repositories such as GenBank or UniProt KnowledgeBase. This new database will allow obtaining a global overview of the evolution the protein families and superfamilies capable of peroxidase reaction. In this rapidly growing field, there is a need for continual updates and corrections of the peroxidase protein sequences. Following the lack of unified nomenclature, we also introduced a unique abbreviation for each different family of peroxidases. This paper thus aims to report the evolution of the PeroxiBase database, which is freely accessible through a web server (http://peroxibase.isb-sib.ch). In addition to new categories of peroxidases, new specific tools have been created to facilitate query, classification and submission of peroxidase sequences.
Assuntos
Bases de Dados Genéticas , Peroxidases/química , Plantas/enzimologia , Peroxidases/classificação , Peroxidases/genética , Filogenia , Plantas/genéticaRESUMO
Class III plant peroxidases (EC 1.11.1.7), which are encoded by multigenic families in land plants, are involved in several important physiological and developmental processes. Their varied functions are not yet clearly determined, but their characterization will certainly lead to a better understanding of plant growth, differentiation and interaction with the environment, and hence to many exciting applications. Since there is currently no central database for plant peroxidase sequences and many plant sequences are not deposited in the EMBL/GenBank/DDBJ repository or the UniProt KnowledgeBase, this prevents researchers from easily accessing all peroxidase sequences. Furthermore, gene expression data are poorly covered and annotations are inconsistent. In this rapidly moving field, there is a need for continual updating and correction of the peroxidase superfamily in plants. Moreover, consolidating information about peroxidases will allow for comparison of peroxidases between species and thus significantly help making correlations of function, structure or phylogeny. We report a new database (PeroxiBase) accessible through a web server with specific tools dedicated to facilitate query, classification and submission of peroxidase sequences. Recent developments in the field of plant peroxidase are also mentioned.
Assuntos
Bases de Dados Genéticas , Peroxidases/classificação , Peroxidases/genética , Evolução Molecular , Internet , Isoenzimas/classificação , Isoenzimas/genética , Isoenzimas/metabolismo , Peroxidases/metabolismoRESUMO
EMBnet is a consortium of collaborating bioinformatics groups located mainly within Europe (http://www.embnet.org). Each member country is represented by a 'node', a group responsible for the maintenance of local services for their users (e.g. education, training, software, database distribution, technical support, helpdesk). Among these services a web portal with links and access to locally developed and maintained software is essential and different for each node. Our web portal targets biomedical scientists in Switzerland and elsewhere, offering them access to a collection of important sequence analysis tools mirrored from other sites or developed locally. We describe here the Swiss EMBnet node web site (http://www.ch.embnet.org), which presents a number of original services not available anywhere else.
Assuntos
Análise de Sequência de Proteína/métodos , Software , Bases de Dados de Proteínas , Europa (Continente) , Internet , Alinhamento de SequênciaRESUMO
The MyHits web server (http://myhits.isb-sib.ch) is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures. Guest users can use the system anonymously, with full access to (i) standard bioinformatics programs (e.g. PSI-BLAST, ClustalW, T-Coffee, Jalview); (ii) a large number of protein sequence databases, including standard (Swiss-Prot, TrEMBL) and locally developed databases (splice variants); (iii) databases of protein motifs (Prosite, Interpro); (iv) a precomputed list of matches ('hits') between the sequence and motif databases. All databases are updated on a weekly basis and the hit list is kept up to date incrementally. The MyHits server also includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features. Free registration enables users to upload their own sequences and motifs to private databases. These are then made available through the same web interface and the same set of analytical tools. Registered users can manage their own sequences and annotations using only web tools and freeze their data in their private database for publication purposes.