Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 37(1): 89-96, 2021 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-33416858

RESUMEN

MOTIVATION: One avenue to address the paucity of clinically testable targets is to reinvestigate the druggable genome by tackling complicated types of targets such as Protein-Protein Interactions (PPIs). Given the challenge to target those interfaces with small chemical compounds, it has become clear that learning from successful examples of PPI modulation is a powerful strategy. Freely accessible databases of PPI modulators that provide the community with tractable chemical and pharmacological data, as well as powerful tools to query them, are therefore essential to stimulate new drug discovery projects on PPI targets. RESULTS: Here, we present the new version iPPI-DB, our manually curated database of PPI modulators. In this completely redesigned version of the database, we introduce a new web interface relying on crowdsourcing for the maintenance of the database. This interface was created to enable community contributions, whereby external experts can suggest new database entries. Moreover, the data model, the graphical interface, and the tools to query the database have been completely modernized and improved. We added new PPI modulators, new PPI targets and extended our focus to stabilizers of PPIs as well. AVAILABILITY AND IMPLEMENTATION: The iPPI-DB server is available at https://ippidb.pasteur.fr The source code for this server is available at https://gitlab.pasteur.fr/ippidb/ippidb-web/ and is distributed under GPL licence (http://www.gnu.org/licences/gpl). Queries can be shared through persistent links according to the FAIR data standards. Data can be downloaded from the website as csv files. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Methods Mol Biol ; 2075: 265-283, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31584169

RESUMEN

We present a computational method to identify conjugative systems in plasmids and chromosomes using the CONJscan module of MacSyFinder. The method relies on the identification of the protein components of the system using hidden Markov model profiles and then checking that the composition and genetic organization of the system is consistent with that expected from a conjugative system. The method can be assessed online using the Galaxy workflow or locally using a standalone software. The latter version allows to modify the models of the module (i.e., to change the expected components, their number, and their organization).CONJscan identifies conjugative systems, but when the mobile genetic element is integrative (ICE), one often also wants to delimit it from the chromosome. We present a method, with a script, to use the results of CONJscan and comparative genomics to delimit ICE in chromosomes. The method provides a visual representation of the ICE location. Together, these methods facilitate the identification of conjugative elements in bacterial genomes.


Asunto(s)
Biología Computacional/métodos , Conjugación Genética , Transferencia de Gen Horizontal , Plásmidos/genética , Programas Informáticos , Elementos Transponibles de ADN , Genoma Bacteriano , Islas Genómicas , Genómica
3.
Nucleic Acids Res ; 47(W1): W260-W265, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31028399

RESUMEN

Phylogeny.fr, created in 2008, has been designed to facilitate the execution of phylogenetic workflows, and is nowadays widely used. However, since its development, user needs have evolved, new tools and workflows have been published, and the number of jobs has increased dramatically, thus promoting new practices, which motivated its refactoring. We developed NGPhylogeny.fr to be more flexible in terms of tools and workflows, easily installable, and more scalable. It integrates numerous tools in their latest version (e.g. TNT, FastME, MrBayes, etc.) as well as new ones designed in the last ten years (e.g. PhyML, SMS, FastTree, trimAl, BOOSTER, etc.). These tools cover a large range of usage (sequence searching, multiple sequence alignment, model selection, tree inference and tree drawing) and a large panel of standard methods (distance, parsimony, maximum likelihood and Bayesian). They are integrated in workflows, which have been already configured ('One click'), can be customized ('Advanced'), or are built from scratch ('A la carte'). Workflows are managed and run by an underlying Galaxy workflow system, which makes workflows more scalable in terms of number of jobs and size of data. NGPhylogeny.fr is deployable on any server or personal computer, and is freely accessible at https://ngphylogeny.fr.


Asunto(s)
Bases de Datos Factuales , Internet , Filogenia , Programas Informáticos
4.
Cell Syst ; 6(6): 752-758.e1, 2018 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-29953864

RESUMEN

The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org.


Asunto(s)
Biología Computacional/educación , Biología Computacional/métodos , Investigadores/educación , Curriculum , Análisis de Datos , Educación a Distancia/métodos , Educación a Distancia/tendencias , Humanos , Programas Informáticos
5.
BMC Genomics ; 18(1): 553, 2017 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-28732463

RESUMEN

BACKGROUND: While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. METHODS: Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. RESULTS: A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. CONCLUSIONS: We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics.


Asunto(s)
Bacterias/genética , Péptidos/genética , Filogenia , ARN Bacteriano/genética , ARN Pequeño no Traducido/genética , Antitoxinas/genética , Toxinas Bacterianas/genética , Internet , Aprendizaje Automático , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Ribosomas/genética
6.
Gigascience ; 6(6): 1-4, 2017 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-28402416

RESUMEN

Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .


Asunto(s)
Biología Computacional/métodos , Automatización , Sistemas de Computación , Internet , Reproducibilidad de los Resultados , Programas Informáticos , Interfaz Usuario-Computador , Flujo de Trabajo
7.
Nucleic Acids Res ; 44(D1): D38-47, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26538599

RESUMEN

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.


Asunto(s)
Biología Computacional , Sistema de Registros , Curaduría de Datos , Programas Informáticos
8.
F1000Res ; 4: 86, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-28451381

RESUMEN

The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.

9.
Protein Sci ; 19(4): 847-67, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-20162627

RESUMEN

Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.


Asunto(s)
Proteínas Portadoras/clasificación , Purinas/metabolismo , Programas Informáticos , Algoritmos , Sitios de Unión , Proteínas Portadoras/química , Bases de Datos de Proteínas , Ligandos , Modelos Moleculares , Conformación Proteica , Purinas/química , Relación Estructura-Actividad
10.
Drug Des Devel Ther ; 3: 59-72, 2009 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-19920922

RESUMEN

Three-dimensional structural information is critical for understanding functional protein properties and the precise mechanisms of protein functions implicated in physiological and pathological processes. Comparison and detection of protein binding sites are key steps for annotating structures with functional predictions and are extremely valuable steps in a drug design process. In this research area, MED-SuMo is a powerful technology to detect and characterize similar local regions on protein surfaces. Each amino acid residue's potential chemical interactions are represented by specific surface chemical features (SCFs). The MED-SuMo heuristic is based on the representation of binding sites by a graph structure suitable for exploration by an efficient comparison algorithm. We use this approach to analyze one particular SCOP superfamily which includes HSP90 chaperone, MutL/DNA topoisomerase, histidine kinases, and alpha-ketoacid dehydrogenase kinase C (BCK). They share a common fold and a common region for ATP-binding. To analyze both similar and differing features of this fold, we use a novel classification method, the MED-SuMo multi approach (MED-SMA). We highlight common and distinct features of these proteins. The different clusters created by MED-SMA yield interesting observations. For instance, one cluster gathers three types of proteins (HSP90, topoisomerase VI, and BCK) which all bind the drug radicicol.

11.
Infect Disord Drug Targets ; 9(3): 344-57, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19519487

RESUMEN

Resolved three-dimensional protein structures are a major source of information for understanding protein functional properties. The current explosive growth of publicly available protein structures is producing large volumes of data for computational modelling and drug design methods. Target-based in silico drug design tools aid design and optimize compounds to bind to specific targets. MED-SuMo is a powerful technology for comparing local regions on protein surfaces, allowing similarities to be discovered and explored. This is a target-based tool that can exploit all available macromolecule structures. Its computational efficiency differentiates its approach from widely used methods such as docking and scoring, or map-based methods. As a result, MED-SuMo contributes to a large variety of real-world drug discovery applications. We review specific applications where MED-SuMo performed a significant role. These examples include functional annotation, pocket profiling, structural superposition, and functional binding site classification. We also review cases where MED-SuMo provided an innovative solution to frequent undertakings of the medicinal chemist and molecular modeller during lead discovery and lead optimization. These further cases include drug repurposing and fragment-based drug design.


Asunto(s)
Diseño de Fármacos , Descubrimiento de Drogas , Conformación Proteica , Programas Informáticos , Sitios de Unión , Biología Computacional/métodos , Simulación por Computador , Bases de Datos de Proteínas , Modelos Moleculares , Estructura Molecular , Unión Proteica , Relación Estructura-Actividad
12.
J Comput Aided Mol Des ; 23(8): 571-82, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19533373

RESUMEN

Eg5, a mitotic kinesin exclusively involved in the formation and function of the mitotic spindle has attracted interest as an anticancer drug target. Eg5 is co-crystallized with several inhibitors bound to its allosteric binding pocket. Each of these occupies a pocket formed by loop 5/helix alpha2 (L5/alpha2). Recently designed inhibitors additionally occupy a hydrophobic pocket of this site. The goal of the present study was to explore this hydrophobic pocket with our MED-SuMo fragment-based protocol, and thus discover novel chemical structures that might bind as inhibitors. The MED-SuMo software is able to compare and superimpose similar interaction surfaces upon the whole protein data bank (PDB). In a fragment-based protocol, MED-SuMo retrieves MED-Portions that encode protein-fragment binding sites and are derived from cross-mining protein-ligand structures with libraries of small molecules. Furthermore we have excluded intra-family MED-Portions derived from Eg5 ligands that occupy the hydrophobic pocket and predicted new potential ligands by hybridization that would fill simultaneously both pockets. Some of the latter having original scaffolds and substituents in the hydrophobic pocket are identified in libraries of synthetically accessible molecules by the MED-Search software.


Asunto(s)
Descubrimiento de Drogas , Cinesinas/química , Ligandos , Bibliotecas de Moléculas Pequeñas/química , Sitio Alostérico , Diseño Asistido por Computadora , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Cinesinas/antagonistas & inhibidores , Espectroscopía de Resonancia Magnética , Unión Proteica , Estructura Terciaria de Proteína , Bibliotecas de Moléculas Pequeñas/uso terapéutico , Programas Informáticos , Huso Acromático/química , Relación Estructura-Actividad
13.
J Chem Inf Model ; 49(2): 280-94, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19434830

RESUMEN

The large volume of protein-ligand structures now available enables innovative and efficient protocols in computational FBDD (Fragment-Based Drug Design) to be proposed based on experimental data. In this work, we build a database of MED-Portions, where a MED-Portion is a new structural object encoding protein-fragment binding sites. MED-Portions are derived from mining all available protein-ligand structures with any library of small molecules. Combined with the MED-SuMo software to superpose similar protein interaction surfaces, pools of matching MED-Portions can be retrieved from any binding surface query. The rapidity of this technology allows its application to a diverse set of 107 protein binding sites. The selectivity of the protocol is shown by a qualitative correlation between the average hydrophobicity of the pools of MED-Portions and those of the binding sites. To generate hitlike molecules, MED-Portions are combined in 3D with the MED-Hybridise toolkit. Our MED-Portion/MED-SuMo/MED-Hybridise protocol is applied to two targets that represent important protein superfamilies in drug design: a protein kinase and a G-Protein Coupled Receptor (GPCR). We retrieved actives molecules of PubChem bioassays for the two targets. The results show the potential for finding relevant leads from any protein 3D structure since the occurrence of interfamily MED-Portions is 25% for protein kinase and almost 100% for the GPCR.


Asunto(s)
Bases de Datos de Proteínas , Fragmentos de Péptidos/química , Proteínas/química , Ligandos , Modelos Moleculares , Unión Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...